QRecall

john g wrote:Should 3TB take so long? At this point, it looks like I'll need about 100 hours to backup the entire 3TB.

It's possible. 3TB is a lot of data to de-duplicate.

The computations needed to de-duplicate data increase exponentially as the corpus of data grows. Especially once your archive is past a TB or two, it requires a massive number of small data reads to check each new block against what's already in the archive. Even though you've got a very fast network configuration, this is still going to have a higher transaction latency than a direct buss connection (SATA, eSATA, Thunderbolt, and so on).

First tip would be to turn off shifted quanta detection. Shifted quanta detection performs numerous lookups into the archive database for every new block, instead of simply checking it once to see if it's a duplicate. Especially for an initial backup, shifted quanta detection won't save you much. (You're free to turn it back on once your initial capture is finished.)

Be patient. It's a lot of data to de-dup, and it's just going to take time, memory, and bandwidth. You might consider scheduling your backup with an action and adding a condition so the capture stops if it's taking longer than, say, 10 hours to finish. Every day it will do another 10 hours, picking up where it left off yesterday. Eventually, it should catch up. At that point you might want to merge all of those incomplete layers into a single baseline layer.

If you're desperate to reduce the de-duplication overhead, you might also consider spitting up your archive. For example, you might capture all of your virtual machine images to one archive while capturing all of your multi-media files to a second archive. Unless your virtual machine images contain copies of your multi-media content, it's unlikely that they would have much in common.

Your post is also timely, in as much as I've been writing code all week to add a new feature to QRecall. In QRecall 2.0, you'll be able to schedule a capture that copies just the raw data to the archive, without performing any de-duplication. This should be nearly as fast as a simple file copy. The de-deduplication work is then deferred until the next compact action is run. This should make short captures to large (>2TB) archives much more efficient.

Jon Lindemann wrote:1. Does one need to "Schedule" the action for Rolling Merges or is specifying the periods of merging layers sufficient?

No action does anything until it's run, either manually or by scheduling it to run periodically. The time periods in the rolling merge action simply determine the granularity of the merge when it is run.

So the answer is "yes." You need to schedule your rolling merge action to run periodically. I suggest once a day, if you make a lot of small captures during the day, or once a week otherwise.

2. You mentioned weekly COMPACT and VERIFY actions. Should one VERIFY before COMPACTing or does COMPACT also verify?

It doesn't really matter which runs first.

All actions verify the data they use, but only the data necessary to perform their work.

Only the verify action checks the integrity of the entire archive. And that's why it's a good idea to run a verify action from time to time, just to make sure nothing has been corrupted. How often you verify your archive is up to you.

Jon Lindemann wrote:So when I enter a permanent key in QRecall Preferences : Identity Key, I have to create a new archive and then combine the two archives? As in putting another 680GB archive on the same external hard drive and then combining them?

Jon,

You don't need to create a new archive. In fact, don't create a new archive. Enter you permanent identity key and simply go back to capturing your three volumes.

QRecall will preform a "full" recapture of all three volumes. But since that data has already been captured once, it won't add any appreciable amount of data to the archive—this is the magic of data de-duplication at work.

After recapturing your three volumes, you'll have an archive with two owners, each of which will have three volumes. You now have the choice of joining these owners, essentially migrating the captures performed with the trail key to your new permanent key, or you can simply delete the volumes belonging to the trial key. It's your choice.

Documentation for both deleting items and joining volumes and owners is described in the help (in the QRecall application, Help > QRecall Help > Guide > Advanced > Delete items / Combine Items).

I took a brief look at your log files and I'm mystified.

According to your log, the QRecall application intentionally uninstalled itself almost every time you quit the application. (This also explains why it was taking several seconds to quit.)

When you quit QRecall, you have two choices. Command+Q quits the application normally. Command+Shift+Option+Q quits the application and uninstalls all active components, which includes removing the preauthorized helper.

I can't think of any reason why OS X should think that you're holding down the Option and Shift keys while quiting. Out of curiosity, when you click on the QRecall menu, does it say "Quit" or "Quit and Uninstall"?

Schmye,

It should stick. You only need to preauthorize QRecall once to use administrative privileges, and you shouldn't have to authorize it again until you reinstall it or upgrade it.

Mt guess is that something isn't getting installed correctly. Every time the QRecall application starts, it checks to see that all of its components are installed correctly. If if finds that QRecallHelper executable is not preauthorized it will try to install it again.

Start by sending a diagnostic report (in the QRecall app choose Help > Send Report). This will include the log of your previous installation attempts and what QRecall finds when it starts up again.

Ming-Li Wang wrote:Diag. report has been sent as requested.

Received. I'll take a look at it soon.

Speaking of log, I found several peculiar entries that may (or may not) have something to do with the mystery. It's an error message that says "Unable to connect with helper". There are several occurrences in the log, including a recent one at 10:35:21 local time (it's 11:05 right now).

If the actions are running and finishing correctly, you can ignore those messages.

QRecall uses Mach ports to communicate with running processes. Prior to OS X 10.8, Mach ports were rock solid. You could open a port and leave it open for days without any problem. Now they seem to spontaneously close themselves, and I've never learned why. Earlier versions of QRecall would treat a port communications error very seriously because it almost always meant one of the processes had crashed?which is pretty serious. But now this just happens for reasons unknown, so it's much less indicative of a problem.

The next version of QRecall is more tolerate of the kernel's behavior. (It will still log the closed port, but it now doesn't complain unless it is also unable to reestablish a new connection.)

There was one action scheduled around that time, at 10:35 sharp (a "Capture" to a different archive), and there's no log entry of the action (before or after the error), but I'm pretty sure no change was done to the associated source folder, so there's nothing to capture.

There could be some log window subtlety here. Every log message (and every hierarchical group of log messages) has a severity value associated with it. Failures are a big deal, warnings not so much, and so on, down to minutia and debug messages. The slider at the top of the log window filters out less significant messages so you aren't bombarded by trivial details if all you want to know is if there was an error.

When a capture action starts, it logs a regular "Capture started" message. If the capture action finds no changes (meaning nothing was added to the archive), it changes the severity of the message to "minutia." If the slider isn't all the way to the right, you won't see "minutia" messages. This is so, if you've created a capture that runs every 15 minutes, your log window won't be filled with "Captured nothing" 96 times a day. You'll only see the capture actions that actually captured something, or had problems.

The mystery deepens...

Please send another diagnostic report (or just your QRecall.log file). That should include the logs for the experiments you just performed.

If you ever have a question about whether an action was successful or not, the definitive answer is in the log. Open the log window and the find the action. If it says "successful," the action completed without any significant problems.

Ralph,

The error I'm seeing in the log for the negative hash map is 13 (Permission denied). You might want to check the permissions and ownership of the archive package and files to make sure the MacBook Pro user has sufficient rights. Also, did you previously set the "Ignore ownership" option on that volume when mounted over the network? If so, you might want to check that as reformatting the drive can cause the OS to forget that setting.

I'm glad to hear it's not your drive.

Thanks for sending the diagnostic report.

The QRecall app is definitely crashing, and pretty consistently, although it's difficult to tell why. It's happening after the compact action has finished and the QRecall app is trying to reopen the updated archive in the document window.

So the crash has nothing to do with the action the took place (the compact). It's just something to do with redrawing the window.

My big question is this: After it crashes, are there any problems opening the archive again?

Aubrey Grey wrote:I am just evaluating QRecall. I started a capture of a 500GB disk. First I tried clicking on Capture in the Archive Window after creating a new Archive and creating a new action. I discovered that the Action I created was ignored and it was just doing a complete capture of the disk ignoring the action.

Aubrey,

You can create QRecall actions to do things (like capturing files), or you can capture items immediately and directly in the QRecall archive window. The two are independent of one another.

I also noticed after about 4 hours it was stuck about 2/3 through showing a speed of 6MB per minute. Reading this forum I thought it was probably being held by the Time Machine bug so I decided to unclick the item about using the Time Machine info for skipping.

It probably just had a lot of work to do. A QRecall archive is essentially a huge database. As the database grows (as it will when you add 500GB of data to it), it occasionally has to take a pause and restructure the index files it uses to find things. This usually only happens a handful of times during the life of an archive, but will definitely happen at least once when adding 500GB of new data.

This causes the capture action to "pause" while the database is being restructured. Even if you cancel the capture action, QRecall may still require a lot of time to finish this restructuring before it can finish with the archive.

Then I tried to stop the capture. I found no way to kill it! there Suspend icon is grayed out and even quitting QRecall does not stop it (as advertised).

If you started a capture directly in the browser window, a modal sheet in the archive window should display it's progress and that sheet has a stop button. If the capture was started by an action, the activity monitor window (Window > Activity Monitor) will show its progress, and it also has a stop button.

Every QRecall archive action runs in its own process (named QRecallHelper). Action processes also respond to the standard SIGTERM signal, so a kill <PID> command is the same as mashing the stop button in the UI. Note that all of these methods send a request to stop the action. The action will try to honor that request, gracefully stopping the action, and safely closing the archive. This can still take some time, anywhere from a few seconds to a minute or more.

I tried creating a new Archive and switching to that but when I tried a new capture it insisted in using the old one.

I'm not sure what was going on there. I'll need more details.

I finally deleted the archive tree but the file system would not release the space

A file's space is not reclaimed by the UNIX filesystem until all processes that are accessing that file have closed it. This is true even if you've delete the directory record for that file. So until the original capture action was done, the first archive will continue to occupy disk space.

and I rebooted. Then I could start over.

That will do it.

BTW, a full TM backup only took 3 hours and a full Carbon Copy Cloner backup roughly the same, so being only 2/3 done in 4 hours was not really what I expected.

Time Machine, CCC, and the like copy files. That's all they do. QRecall is doing block-level data de-duplication on every file you capture. This is orders of magnitude more complicated, and vastly more computationally intensive. QRecall is also generating data integrity checks for every directory, file, and block of data so it can later determine if any of your captured files have been compromised. Time Machine and CCC don't do any of that.

The Send Report function sends the following:

An anonymized system profile (model of computer, OS version, amount of memory installed, and so on), but no identifiable information (no serial numbers, MAC addresses, and so on).

The last 20MB of QRecall log entries.

Your ~/Library/Preferences/com.qrecall.* files, which have your current QRecall preferences and settings.

Your ~/Library/Preferences/QRecall/Actions/* files, which define what QRecall actions you've created.

All CrashReporter and DiagnosticReports files that begin with QRecall*, which should capture the core dumps of any crashed QRecall-related processes

This information is compressed and uploaded, along with your comments, to the QRecall diagnostic report server.

You're welcome to examine the script that prepares this information. It's located in the QRecall.app/Contents/Resources/buildreport.sh script file.

Aubrey,

The "on-line" documentation is in the application. Launch the QRecall program and choose Help > QRecall Help.

There's a Quick Start section if you just want to get started, and a Guide section with complete documentation.

There's also a Capture Assistant (Help > Capture Assistant). It will ask you a few questions and then use those answers to set up a basic backup solution.

Ralph,

It's hard to tell what's going on, but here are a few thoughts.

The two captures that failed failed in exactly the same way: The checksum of the record at file offset 992,333,119,488 was inconsistent with the data. The position was the same and the (bad) data was the same both times. This would indicate a permanent media failure. The data stored on that portion of the drive is incorrect, and remains incorrect, after being repeatedly re-read.

Another type of error is a transient error, where the data is stored on the media correctly but gets randomly scrambled on its way to QRecall. This kind of error isn't repeatable.

When you ran the repair, you got a massive number of data corruption errors. Since you ran disk diagnostics on the volume, we can assume these are not the result of cross-linked files.

I conclude that either the drive is experiencing rampant media failures, or the archive data was scrambled while it was being duplicated. (This latter theory could be explained by transient errors, but you'd have to get a lot of them.) I would bet on the former, since you successfully captured data to the new archive immediately after the copy. It seems highly unlikely that the copy wouldn't result in any damaged records that belonged to the first computer, but munged thousands of records belonging to the second.

Regular disk diagnostics (i.e. Disk Utility) will only tell you if the volume directory structure is correct. There are extremely few utilities that will perform a surface test. A surface test writes a pattern of data to every sector on the drive, and then reads it back to make sure it's still correct. These tests can take hours, if not days, to complete.

It's immaterial whether the data on your drive is encrypted or not. It only matters that it's written and read correctly. I don't need to know anything about the data to write it and look for any discrepancies when it's read back.

Having said that, you can easily perform a surface test yourself, since your archive is so large and would cover a significant portion of the volume. Erase the new drive and start over. Copy archive #2 to it again. After writing it, verify the archive or use the command-line cmp tool to perform a byte-by-byte comparison of the original and the copy. (Make sure you've placed your QRecall actions on hold so the original doesn't get modified during the copy.)

If this is successful, move on to trying to use the new archive again and note at what point (using it on the laptop, for example) the problems reappear.

If the comparison test fails, you've narrowed down the problem to either the drive or the busses (USB) you're using to transfer the data. Which is to say, you haven't really narrowed it down at all.

The next step is to use a different interface. If the drive supports both FireWire and USB, switch to FireWire, or eSATA, or whatever you've got. It's a little geeky, but I keep a spare external drive enclosure around just for testing drives in an enclosure with an interface I trust.

If, after performing the copy and compare again using a different interface, you get the same kind of data corruption, my money would be on a bad drive. If switching interfaces cures the problem, then that's where you need to look next. It could be a motherboard issue with the computer or (more likely) the interface controller in the drive's enclosure.

Ming-Li,

I looked at the log file you sent, and I can't see a crash. Please send a diagnostic report (in the QRecall app, choose Help > Send Report). This will automatically include any crash logs recorded by the system for QRecall and its related processes.

The capture action itself ran just fine, according to the log file you posted. The compact action started at 13:28:28 and finished successfully at 13:29:12.

It's possible that the QRecall application crashed during this time. But the QRecall app is just a browser; the actions that change your archive is always run in a separate process, which is why your archive verified successfully and you encountered no problems with subsequent captures. In fact, the QRecall application never modifies archives directly, so a crash or forced quit of this app can never corrupt your archive.

So the good news is your data is safe.

I'll look into the crash further, once I receive the diagnostic report.