Message |
|
The mystery deepens... Please send another diagnostic report (or just your QRecall.log file). That should include the logs for the experiments you just performed. If you ever have a question about whether an action was successful or not, the definitive answer is in the log. Open the log window and the find the action. If it says "successful," the action completed without any significant problems.
|
|
|
Ralph, The error I'm seeing in the log for the negative hash map is 13 (Permission denied). You might want to check the permissions and ownership of the archive package and files to make sure the MacBook Pro user has sufficient rights. Also, did you previously set the "Ignore ownership" option on that volume when mounted over the network? If so, you might want to check that as reformatting the drive can cause the OS to forget that setting. I'm glad to hear it's not your drive.
|
|
|
Thanks for sending the diagnostic report. The QRecall app is definitely crashing, and pretty consistently, although it's difficult to tell why. It's happening after the compact action has finished and the QRecall app is trying to reopen the updated archive in the document window. So the crash has nothing to do with the action the took place (the compact). It's just something to do with redrawing the window. My big question is this: After it crashes, are there any problems opening the archive again?
|
|
|
Aubrey Grey wrote:I am just evaluating QRecall. I started a capture of a 500GB disk. First I tried clicking on Capture in the Archive Window after creating a new Archive and creating a new action. I discovered that the Action I created was ignored and it was just doing a complete capture of the disk ignoring the action.
Aubrey, You can create QRecall actions to do things (like capturing files), or you can capture items immediately and directly in the QRecall archive window. The two are independent of one another.
I also noticed after about 4 hours it was stuck about 2/3 through showing a speed of 6MB per minute. Reading this forum I thought it was probably being held by the Time Machine bug so I decided to unclick the item about using the Time Machine info for skipping.
It probably just had a lot of work to do. A QRecall archive is essentially a huge database. As the database grows (as it will when you add 500GB of data to it), it occasionally has to take a pause and restructure the index files it uses to find things. This usually only happens a handful of times during the life of an archive, but will definitely happen at least once when adding 500GB of new data. This causes the capture action to "pause" while the database is being restructured. Even if you cancel the capture action, QRecall may still require a lot of time to finish this restructuring before it can finish with the archive.
Then I tried to stop the capture. I found no way to kill it! there Suspend icon is grayed out and even quitting QRecall does not stop it (as advertised).
If you started a capture directly in the browser window, a modal sheet in the archive window should display it's progress and that sheet has a stop button. If the capture was started by an action, the activity monitor window (Window > Activity Monitor) will show its progress, and it also has a stop button. Every QRecall archive action runs in its own process (named QRecallHelper). Action processes also respond to the standard SIGTERM signal, so a kill <PID> command is the same as mashing the stop button in the UI. Note that all of these methods send a request to stop the action. The action will try to honor that request, gracefully stopping the action, and safely closing the archive. This can still take some time, anywhere from a few seconds to a minute or more.
I tried creating a new Archive and switching to that but when I tried a new capture it insisted in using the old one.
I'm not sure what was going on there. I'll need more details.
I finally deleted the archive tree but the file system would not release the space
A file's space is not reclaimed by the UNIX filesystem until all processes that are accessing that file have closed it. This is true even if you've delete the directory record for that file. So until the original capture action was done, the first archive will continue to occupy disk space.
and I rebooted. Then I could start over.
That will do it.
BTW, a full TM backup only took 3 hours and a full Carbon Copy Cloner backup roughly the same, so being only 2/3 done in 4 hours was not really what I expected.
Time Machine, CCC, and the like copy files. That's all they do. QRecall is doing block-level data de-duplication on every file you capture. This is orders of magnitude more complicated, and vastly more computationally intensive. QRecall is also generating data integrity checks for every directory, file, and block of data so it can later determine if any of your captured files have been compromised. Time Machine and CCC don't do any of that.
|
|
|
The Send Report function sends the following:
An anonymized system profile (model of computer, OS version, amount of memory installed, and so on), but no identifiable information (no serial numbers, MAC addresses, and so on).
The last 20MB of QRecall log entries.
Your ~/Library/Preferences/com.qrecall.* files, which have your current QRecall preferences and settings.
Your ~/Library/Preferences/QRecall/Actions/* files, which define what QRecall actions you've created.
All CrashReporter and DiagnosticReports files that begin with QRecall*, which should capture the core dumps of any crashed QRecall-related processes This information is compressed and uploaded, along with your comments, to the QRecall diagnostic report server. You're welcome to examine the script that prepares this information. It's located in the QRecall.app/Contents/Resources/buildreport.sh script file.
|
|
|
Aubrey, The "on-line" documentation is in the application. Launch the QRecall program and choose Help > QRecall Help. There's a Quick Start section if you just want to get started, and a Guide section with complete documentation. There's also a Capture Assistant (Help > Capture Assistant). It will ask you a few questions and then use those answers to set up a basic backup solution.
|
|
|
Ralph, It's hard to tell what's going on, but here are a few thoughts. The two captures that failed failed in exactly the same way: The checksum of the record at file offset 992,333,119,488 was inconsistent with the data. The position was the same and the (bad) data was the same both times. This would indicate a permanent media failure. The data stored on that portion of the drive is incorrect, and remains incorrect, after being repeatedly re-read. Another type of error is a transient error, where the data is stored on the media correctly but gets randomly scrambled on its way to QRecall. This kind of error isn't repeatable. When you ran the repair, you got a massive number of data corruption errors. Since you ran disk diagnostics on the volume, we can assume these are not the result of cross-linked files. I conclude that either the drive is experiencing rampant media failures, or the archive data was scrambled while it was being duplicated. (This latter theory could be explained by transient errors, but you'd have to get a lot of them.) I would bet on the former, since you successfully captured data to the new archive immediately after the copy. It seems highly unlikely that the copy wouldn't result in any damaged records that belonged to the first computer, but munged thousands of records belonging to the second. Regular disk diagnostics (i.e. Disk Utility) will only tell you if the volume directory structure is correct. There are extremely few utilities that will perform a surface test. A surface test writes a pattern of data to every sector on the drive, and then reads it back to make sure it's still correct. These tests can take hours, if not days, to complete. It's immaterial whether the data on your drive is encrypted or not. It only matters that it's written and read correctly. I don't need to know anything about the data to write it and look for any discrepancies when it's read back. Having said that, you can easily perform a surface test yourself, since your archive is so large and would cover a significant portion of the volume. Erase the new drive and start over. Copy archive #2 to it again. After writing it, verify the archive or use the command-line cmp tool to perform a byte-by-byte comparison of the original and the copy. (Make sure you've placed your QRecall actions on hold so the original doesn't get modified during the copy.) If this is successful, move on to trying to use the new archive again and note at what point (using it on the laptop, for example) the problems reappear. If the comparison test fails, you've narrowed down the problem to either the drive or the busses (USB) you're using to transfer the data. Which is to say, you haven't really narrowed it down at all. The next step is to use a different interface. If the drive supports both FireWire and USB, switch to FireWire, or eSATA, or whatever you've got. It's a little geeky, but I keep a spare external drive enclosure around just for testing drives in an enclosure with an interface I trust. If, after performing the copy and compare again using a different interface, you get the same kind of data corruption, my money would be on a bad drive. If switching interfaces cures the problem, then that's where you need to look next. It could be a motherboard issue with the computer or (more likely) the interface controller in the drive's enclosure.
|
|
|
Ming-Li, I looked at the log file you sent, and I can't see a crash. Please send a diagnostic report (in the QRecall app, choose Help > Send Report). This will automatically include any crash logs recorded by the system for QRecall and its related processes. The capture action itself ran just fine, according to the log file you posted. The compact action started at 13:28:28 and finished successfully at 13:29:12. It's possible that the QRecall application crashed during this time. But the QRecall app is just a browser; the actions that change your archive is always run in a separate process, which is why your archive verified successfully and you encountered no problems with subsequent captures. In fact, the QRecall application never modifies archives directly, so a crash or forced quit of this app can never corrupt your archive. So the good news is your data is safe. I'll look into the crash further, once I receive the diagnostic report.
|
|
|
|
|
|
Yes, there are internal identifiers, but you shouldn't have to worry about it (too much). Each archive is associated with a unique ID, which is used to monitor its status. When you duplicate your archive, you'll have two archives with the same ID. At least, for a while... The next time you do something with the duplicate archive (verify, capture, whatever), just make sure the original archive is mounted at the same time. QRecall will see that you have two archives with identical IDs and will automatically assign your second archive a new ID. If, for some reason, you *never* have both archives mounted at the same time, that could confuse QRecall. To fix that situation, simply control/right+click on the duplicate archive in the Finger, choose Show Package Contents, and trash its status.plist file. QRecall will create a new one, with a new ID, and all will be right with the world.
|
|
|
For anyone wondering, Mark and I went back and forth, over the course of several days, trying to figure this out. Mark was really helpful in getting to the root of this problem, and I thank him for his persistence. It turns out, Mark had relocated his home folder to a different volume and that volume was set to ignore ownership and permissions. Apparently, new security policies in OS X 10.10 don't allow the launchd service (the part of OS X that keeps background processes running, among other things) to start a background agent process if the path to that agent executable passes through a symbolic link that's on a volume that doesn't enforce ownership and permissions. I know, right? Enabling ownership for the home folder volume fixed the problem?and is probably a good idea anyway. I'll add some code so that future versions of QRecall will warn unsuspecting users about this situation.
|
|
|
Ming-Li Thanks for taking the time to detail the differences you found between QRecall and Time Machine. I've seen similar discrepancies here, which reinforces my suspicion and Time Machine has its own internal set of exclusion rules, beyond those flagged by the system's "do not backup" list. I can also appreciate Time Machine's aversion to backing up log files and databases. Without QRecall's block-level de-duplciation, these would quickly consume all of the available backup storage. Luckily, we don't have that problem.
|
|
|
Mark, Thanks for posting your problem. (I also got your message via email, but I'll respond here.) First, try sending a diagnostic report from the desktop computer you're having problems with (in the QRecall app, choose Help > Send Report...). Also open up your Console app. In the system.log, filter the output with "qrecall", copy anything that pops up, and send those to me too, either via email or in the comments of your diagnostic report. I suspect, however, that your problem is permissions. The issues you describe ("scheduler service could not be contacted" and "Show in menu bar" not working) all depend on background processes. QRecall installs several, but the two key ones are the QRecallScheduler and the QRecallMonitor. These processes are installed and registered with OS X's launchd service in your ~/Library/LaunchAgents/ and /Library/LaunchDaemons/ folders. For security reasons, launchd is extremely picky about the location, ownership, and permissions of these folders. If they are anything except what launchd expects, it will refuse to launch anything installed there. (launchd will complain about this in the system.log, which is why I had you look there.) My first suggestion would be to run the "Repair Disk Permissions" function in the Disk Utilities app, launch QRecall again (to reinstall), and then restart your system. If your QRecall monitor appears in the menu bar, I suspect your troubles are behind you. If that doesn't fix it, write/post again and we'll get more specific in the investigation.
|
|
|
Ming-Li, I'm surprised too, but I'm beginning to suspect that Time Machine has some exclusion rules hard-wired, and those aren't reflected in the set of folder it claims to exclude. Sadly, your version of Recall also has a bug that fails to exclude the /System/Library/Caches folder when you select the "Exclude caches" option. (This is fixed in a later version, which also correctly excludes any networked cache folders too.) The manual exclusion you added will work just fine for now. Thanks for bringing this to my attention.
|
|
|
Norbert Karls wrote:May I follow up on this asking if there has been any progress already?
I?m pleased to write that QRecall development is?once again, thankfully, finally?in high gear. I'm putting the final touches on the command line tool, which does a lot more than I orignally planned. I have some new scheduling logic/features that I want to get working before I release it as a beta, along with some massive UI code clean-up to do. There are also a bunch of random problems, glitches, and crashes that I?m still hunting down. Consequences of all that new code. *sigh* And I have more testing to do. QRecall needs to be really stable before I spring the next version on my loyal users. This version will NOT be backwards compatible with earlier versions. Once you capture new data using 2.0, you can?t open that archive with an earlier version. So early adopators need to know they?ll be flying without a safety net...
|
|
|
|
|