QRecall

Jon Lindemann wrote:I know this is an cold "Beta" thread, but do you have any new recommendations regarding archiving large virtual disk images in Parallels?

Well, there's a new beta on the horizon, so this seems like a good time to warm up some of these threads.

As you mentioned, shifted quanta detection isn't effective with files like disk images, for all of the reasons previously given. So the only issue is how much time/effort will it waste.

The Virtual machine is run only 1-2 times/month; the associated ".hdd" file is currently 45 GB.

Shifted quanta detection can add somewhere between a little and a lot of overhead to the capture process, depending on a wide variety of factors including the aggressiveness of the detection, the size of the archive, the amount of RAM available, the speed of your I/O, and so on. As a rule, it can slow your captures speeds anywhere from 20% to 10,000%.

But in your case, it probably won't matter in the grand scheme of things. Let's say you move your disk image to your new "Docs" partition and begin capturing it to your Docs archive, and that archive uses shifted quanta detection. By your own admission, you only modify these virtual machine images a couple of times a month. So 93% of the time, QRecall won't be capturing these files because they haven't changed.

Of the 7% of the time QRecall does recapture it, most of the data in these disk images is already duplicate (unshifted). Even with shifted quanta detection set to it's most aggressive setting, QRecall always looks for duplicate un-shifted data first. Since 99% of the data in your disk image file doesn't change or move, 99% of the data will be immediately recaptured as duplicate data with no shifted quanta analysis.

In the end, you'll only be taking a performance hit on 1% of the data in 7% of your captures. Even if shifted quanta detection made your captures 10 times slower, that's only impacting 0.01% of your total capture time. I doubt you'll ever notice.

Well, you will notice once; specifically, the first time you move these disk image files to your new partition and capture them for the first time. That's going to be your worse-performing capture. But after that, it should be smooth sailing.

Parallels apparently has an option to split the ".hdd" into 2 GB files: would that avoid archiving the entire 45 GB ".hdd" file every time it's changed?

This won't make any difference. That option is just so you can store Parallel virtual machine files on a filesystem that don't support huge files. It would only improve your QRecall performance if only one of the .hdd segments changed. It's virtually (no pun intended) impossible for that to happen. It would be like booting your OS X system and expecting only data on the second half of your hard drive to be modified. What will happen is that all of the .hdd segment get modified and QRecall will have to recapture them all, and ultimately the exact same amount of data.

Matthias Kallweit wrote:The log says: Cannot open file, permissions error (on file open), Permissions 0x0004

QRecall needs the ownership and permissions necessary to access the files in archive package (like any document).

This is complicated when the archive is on a shared volume. I suggest turning the "Ignore ownership and permission" option on for a volume that's going to be used for QRecall archives, especially one being accessed by more than one user.

If you don't have this option on the volume, make sure all of the users accessing the QRecall archive log into the networked volume using the same account.

If you're still stuck, send a diagnostic report (QRecall application > Help > Send Report).

I'm glad to hear you're trying out QRecall.

The recommended method for setting up a recovery drive is outlined in the QRecall Help > Quick Start > Backup Strategies > Bootable Recovery Drive.

In a nutshell, I recommend installing a clean copy of OS X using the OS X installer, creating an account, deleting all extraneous applications and utilities, and then installing QRecall on the recovery volume. (Note that you don't need an identity key for this installation.)

Reboot to your normal system and configure QRecall to perform your regular backups to an archive on the recovery volume.

If you run into trouble with your startup volume, reboot from the recovery volume and use QRecall to restore or recall whatever is required.

If you already have a QRecall archive on an external volume, but haven't set up a bootable recovery volume, you have two choices for restoring your startup volume. The first choice is to install OS X on the external volume (without erasing the volume), download QRecall from this website, and then restore your startup volume from the archive. Alternatively, you can wipe and reinstall OS X over your existing installation (using network Recovery mode), and once you can successfully logged in again, re-download QRecall and recover your files from the archive. Before doing the later, I would definitely verify the archive and make sure it has recently captured all of your files.

David,

First, my condolences. Problems like this can be very frustrating.

QRecall can, of course, restore the files in your sister-in-law's home folder. If your brother has any questions about the validity or content of the archive, I would encourage him to open the archive and carefully browse it. Make sure all of the expected files are there, they've been recently captured, and no recent layers are marked as incomplete or damaged. For good measure, run a verify on the archive.

You can then perform a restore on your sister-in-law's home folder (/Users/sisnlaw) or any sub-portion (/Users/sisnlaw/Document, etc.). The only, possibly unexpected, side effect is that anything that was excluded from the capture will be deleted by the restore.

But as far as solving the problem, I'm not entirely sure this will help. If the account won't log in because of something in that user's home folder, restoring the files won't make any difference. Unless, of course, this is something that happened very recently and you simply want to rewind the account back to the point in time that it was working. In that case, I might also suggest using QRecall as a research tool. By shading the older layers just before the issue arose, the QRecall browser will show just those files that were captured (and thus changed) around the time the problem started. That might give you some clues as to what/where the problem is.

If you create a new account and recall files from the original account you will, as you mention, encounter permission problems. The captured files will belong to a different user and simply dropping them into the new account will leave you with files and folder you can't access from that account. In this situation a little terminal slight-of-hand can be used to fix things up.

Let's say the UID of your sister-in-law's account is 502, and the new account you created has a UID of 503. The following terminal command will change the ownership of all files and folders belonging to 502 to 503:

sudo find /Path/To/Recalled/Items -user 502 -print0 | xargs -0 sudo chown 503

(To find out the UID of the accounts, use the ls -ln /Users command.)

Let me know if you have any other questions.

Paulino Michelazzo wrote:I think that your approach could be fine but I would like to be sure about how many data I need to backup in each cycle.

This is where trial identity keys are useful.

Get a trial key, install QRecall, and just try it out. Start with your music or just a subset of your photos/videos and see how much QRecall uses and how much it adds each day. (Tip: the log details will record exactly how many files it captured, how much duplicate data was discovered, and how much the archive grew.)

Here are some things to consider:

QRecall performs block level data de-duplication. If editing a photo only changes "a few bytes," QRecall will find the one block of data that was changed and will only capture that single block.

QRecall can also perform shifted quanta detection to locate duplicate blocks that have moved to a different offset in the file. If editing a photo or movie only modifies or appends metadata, but doesn't move any other data around, then block level de-duplication is fine. But if you're using a program like Photoshop that rewrites the entire file, a lot of the data gets rewritten with different offsets. For example, open an image, add a filter layer, and save the file. The original image data is still there, but gets written at a different offset. Turning on QRecall's shifted quanta detection can find that relocated duplicate data. (Note: shifted quanta detection is much more CPU and I/O intensive, but sometimes it's worth it.)

QRecall lets you choose what you capture, when, and to where. You can set up different archives for your photos, music, and videos (since it's unlikely that there is any duplicate data shared between them). This will also let you experiment with different shifted quanta detection levels, compression levels, etc., to find the settings the work best for each category of data.

Dividing up your backup into multiple archives also means that using rsync to mirror them to a remote server will be much more manageable and will require less free disk space.

Paulino,

Essentially, you'll use QRecall to make incremental backup to a local drive and then periodically mirror your archive (via rsync) to your server so you have a backup of your backup.

Here's what I do:

Schedule QRecall to make regular captures to an archive on an external volume.

Added conditions to the actions so I'm sure that no actions are modifying the archive during a regular period each day (say, between 4:00 and 7:00 AM).

Once or twice a week, run an rsync that mirrors my active archive with its remote copy. I do this starting at 4:00 AM so that the archive isn't changing while the rsync occurs.

QRecall will capture and de-duplicate your files into the local archive. The rsync will then transfer just those changes to the remote archive to keep it up-to-date.

This is sub-optimal solution for a couple of reasons. First, it requires you maintain two copies of your backup archive, one local and one remote. (This can also be an advantage if that's what you want.) The other problem is that rsync makes a local copy of the archive being updated, so the remote copy of the archive has to be a volume with enough space to duplicate it every time it's synced.

I've been running this solution on my server for about 5 years. Actually, I do it in reverse, keeping a copy of my server's backup on my local computer, but the principle is the same. As long as the archive is a reasonable size (say 200-300GB), it's very manageable.

Paulino,

The current version of QRecall can only capture to an archive document on a mounted filesystem. If you can mount your server's backup volume as a remote volume on your client system, QRecall can work that way. QRecall understands remote volumes and external drives and will automatically attempt to mount the archive volume when an action starts.

Another possible solution (which is one I use) is to capture to a local external volume and then periodically rsync that archive with a duplicate on your server(s).

There are long-term plans for QRecall to provide true client/server backup services, but that's a ways off?there are so many new features to implement first.

Aubrey Grey wrote:Where do I find the 'advanced settings page'?

I apologize that the link in the previous post isn't more obvious.

You can find it under the Cookbook and FAQ forum, or here's the plain link: http://forums.qrecall.com/posts/list/66.page

Aubrey Grey wrote:I tried to read the value by using: defaults read com.qrecall.client QRAuditFileSystemHistoryDays, but it reports it does not exist.

If this value is not set in the preferences, QRecall uses the default value of 6.9 days.

Related: is the 6.9 days since last backup or since last full backup?

Neither. It's the amount of time that QRecall has been trusting the file system change events to tell it which directories have changed.

Here's an example. Let's say that you've set up QRecall to capture your Documents folder once a day. On Monday you capture the entire Documents folder for the first time. On Tuesday, QRecall will query the file system change history to determine which folders changed between Monday and Tuesday. QRecall has now been depending on the accuracy of the change history for 1 full day. On Wednesday, it's now been depending on the change history for 2 days, and so on.

When the capture starts the following Monday, QRecall will have been depending on the accuracy of the file system change history for 7 days. That exceeds the default limit. It ignores the change history and exhaustively recaptures the entire Documents folder again. On Tuesday, it will only be using the past 24 hours of change history to locate changes, and the whole thing starts over.

So it's not the time from your last capture. It's the amount of time since the last capture that did not rely on the file system change history.

I read about QRecall using as much memory as it can... I read that with systems over (was it 10GB) it uses 8G of memory. Is that parameter adjustable to reduce the interference with other work especially if it is going to take all day to do it's backup.

Yes. QRecall definitely assumes VM is turned on, and if you turn it off you'll want to "dial back" QRecall's use of memory. Advanced settings can be found on the advanced settings page.

The setting you're interested in is QRPhysicalMemoryMB. Set it to a value (in megabytes) less than the amount of physical RAM you have, say something between 3072 and 4096. QRecall will try to keep it's index and lookup memory footprint within that limit. Caution: lowering this value might reduce QRecall's ability to cache index data, and thus slows the capture process. But that's a reasonable trade off if it means you can still use your computer.

john g wrote:Should 3TB take so long? At this point, it looks like I'll need about 100 hours to backup the entire 3TB.

It's possible. 3TB is a lot of data to de-duplicate.

The computations needed to de-duplicate data increase exponentially as the corpus of data grows. Especially once your archive is past a TB or two, it requires a massive number of small data reads to check each new block against what's already in the archive. Even though you've got a very fast network configuration, this is still going to have a higher transaction latency than a direct buss connection (SATA, eSATA, Thunderbolt, and so on).

First tip would be to turn off shifted quanta detection. Shifted quanta detection performs numerous lookups into the archive database for every new block, instead of simply checking it once to see if it's a duplicate. Especially for an initial backup, shifted quanta detection won't save you much. (You're free to turn it back on once your initial capture is finished.)

Be patient. It's a lot of data to de-dup, and it's just going to take time, memory, and bandwidth. You might consider scheduling your backup with an action and adding a condition so the capture stops if it's taking longer than, say, 10 hours to finish. Every day it will do another 10 hours, picking up where it left off yesterday. Eventually, it should catch up. At that point you might want to merge all of those incomplete layers into a single baseline layer.

If you're desperate to reduce the de-duplication overhead, you might also consider spitting up your archive. For example, you might capture all of your virtual machine images to one archive while capturing all of your multi-media files to a second archive. Unless your virtual machine images contain copies of your multi-media content, it's unlikely that they would have much in common.

Your post is also timely, in as much as I've been writing code all week to add a new feature to QRecall. In QRecall 2.0, you'll be able to schedule a capture that copies just the raw data to the archive, without performing any de-duplication. This should be nearly as fast as a simple file copy. The de-deduplication work is then deferred until the next compact action is run. This should make short captures to large (>2TB) archives much more efficient.

Jon Lindemann wrote:1. Does one need to "Schedule" the action for Rolling Merges or is specifying the periods of merging layers sufficient?

No action does anything until it's run, either manually or by scheduling it to run periodically. The time periods in the rolling merge action simply determine the granularity of the merge when it is run.

So the answer is "yes." You need to schedule your rolling merge action to run periodically. I suggest once a day, if you make a lot of small captures during the day, or once a week otherwise.

2. You mentioned weekly COMPACT and VERIFY actions. Should one VERIFY before COMPACTing or does COMPACT also verify?

It doesn't really matter which runs first.

All actions verify the data they use, but only the data necessary to perform their work.

Only the verify action checks the integrity of the entire archive. And that's why it's a good idea to run a verify action from time to time, just to make sure nothing has been corrupted. How often you verify your archive is up to you.

Jon Lindemann wrote:So when I enter a permanent key in QRecall Preferences : Identity Key, I have to create a new archive and then combine the two archives? As in putting another 680GB archive on the same external hard drive and then combining them?

Jon,

You don't need to create a new archive. In fact, don't create a new archive. Enter you permanent identity key and simply go back to capturing your three volumes.

QRecall will preform a "full" recapture of all three volumes. But since that data has already been captured once, it won't add any appreciable amount of data to the archive—this is the magic of data de-duplication at work.

After recapturing your three volumes, you'll have an archive with two owners, each of which will have three volumes. You now have the choice of joining these owners, essentially migrating the captures performed with the trail key to your new permanent key, or you can simply delete the volumes belonging to the trial key. It's your choice.

Documentation for both deleting items and joining volumes and owners is described in the help (in the QRecall application, Help > QRecall Help > Guide > Advanced > Delete items / Combine Items).

I took a brief look at your log files and I'm mystified.

According to your log, the QRecall application intentionally uninstalled itself almost every time you quit the application. (This also explains why it was taking several seconds to quit.)

When you quit QRecall, you have two choices. Command+Q quits the application normally. Command+Shift+Option+Q quits the application and uninstalls all active components, which includes removing the preauthorized helper.

I can't think of any reason why OS X should think that you're holding down the Option and Shift keys while quiting. Out of curiosity, when you click on the QRecall menu, does it say "Quit" or "Quit and Uninstall"?

Schmye,

It should stick. You only need to preauthorize QRecall once to use administrative privileges, and you shouldn't have to authorize it again until you reinstall it or upgrade it.

Mt guess is that something isn't getting installed correctly. Every time the QRecall application starts, it checks to see that all of its components are installed correctly. If if finds that QRecallHelper executable is not preauthorized it will try to install it again.

Start by sending a diagnostic report (in the QRecall app choose Help > Send Report). This will include the log of your previous installation attempts and what QRecall finds when it starts up again.

Ming-Li Wang wrote:Diag. report has been sent as requested.

Received. I'll take a look at it soon.

Speaking of log, I found several peculiar entries that may (or may not) have something to do with the mystery. It's an error message that says "Unable to connect with helper". There are several occurrences in the log, including a recent one at 10:35:21 local time (it's 11:05 right now).

If the actions are running and finishing correctly, you can ignore those messages.

QRecall uses Mach ports to communicate with running processes. Prior to OS X 10.8, Mach ports were rock solid. You could open a port and leave it open for days without any problem. Now they seem to spontaneously close themselves, and I've never learned why. Earlier versions of QRecall would treat a port communications error very seriously because it almost always meant one of the processes had crashed?which is pretty serious. But now this just happens for reasons unknown, so it's much less indicative of a problem.

The next version of QRecall is more tolerate of the kernel's behavior. (It will still log the closed port, but it now doesn't complain unless it is also unable to reestablish a new connection.)

There was one action scheduled around that time, at 10:35 sharp (a "Capture" to a different archive), and there's no log entry of the action (before or after the error), but I'm pretty sure no change was done to the associated source folder, so there's nothing to capture.

There could be some log window subtlety here. Every log message (and every hierarchical group of log messages) has a severity value associated with it. Failures are a big deal, warnings not so much, and so on, down to minutia and debug messages. The slider at the top of the log window filters out less significant messages so you aren't bombarded by trivial details if all you want to know is if there was an error.

When a capture action starts, it logs a regular "Capture started" message. If the capture action finds no changes (meaning nothing was added to the archive), it changes the severity of the message to "minutia." If the slider isn't all the way to the right, you won't see "minutia" messages. This is so, if you've created a capture that runs every 15 minutes, your log window won't be filled with "Captured nothing" 96 times a day. You'll only see the capture actions that actually captured something, or had problems.