QRecall

Gary K. Griffey

Greetings James...

I have been rolling along using QRecall 1.2.1 with great success...no issues....thanks again for the great product....I use it every day...and rely on it constantly.

I wanted to run a new scenario by you...and seek your advice.

Currently, one of the things that I use QRecall for is to create archives of large VMware Fusion virtual disk drives. These virtual disk files are large binary files, some as large as 180 GB. QRecall does an excellent job of creating weekly incremental images of these large objects very efficiently. My desire, however, is to also maintain an offsite QRecall mirror archive.

This is how I envision it working.

1) A new QRecall archive is created at site "A" that includes one or more of these virtual disks. Even with the best compression and highest shifted quanta options...this archive could easily reach 100 GB's in size.
2) This archive is then copied to an external drive...that is physically relocated to site "B".

Now, the problem statement. When the archive at site "A" is subsequently updated with a recapture operation of the virtual disks...I need a way to "refresh" site B's copy of the archive...preferably via a network connection....just the delta data would be transmitted, of course...then the archive at site "B" would somehow be "patched", for lack of a better term, and thus be a mirror of site "A"'s archive.

I have used many diff/patch utilities in the past to mimic this functionality...but they were all geared toward single binary files...not a package file/database, as QRecall uses.

Any suggestions? I would just love to leverage QRecall's amazing de-duplication abilities to assist in this endeavor.

Thanks for your time...as always...

GKG

James Bucanek

Gary K. Griffey wrote:Greetings James...
1) A new QRecall archive is created at site "A" that includes one or more of these virtual disks. Even with the best compression and highest shifted quanta options...this archive could easily reach 100 GB's in size.

Aside: Shifted quanta detection rarely helps with virtual machine files (which are essentially disk images) because disk images are organized into blocks so data can only "shift" to another block boundry. Shifted quanta detection is looking for shifts of data at the byte level. I'm not saying cranking up shifted quanta detection won't make any difference, but it will add a lot of overhead for very little gain. Now, back to our regularly scheduled program...

2) This archive is then copied to an external drive...that is physically relocated to site "B".

Now, the problem statement. When the archive at site "A" is subsequently updated with a recapture operation of the virtual disks...I need a way to "refresh" site B's copy of the archive...preferably via a network connection....just the delta data would be transmitted, of course...then the archive at site "B" would somehow be "patched", for lack of a better term, and thus be a mirror of site "A"'s archive.

I have used many diff/patch utilities in the past to mimic this functionality...but they were all geared toward single binary files...not a package file/database, as QRecall uses.

A package is just a collection of files. Synchronize all of the files in the archive's package, and you've sync'd the archive.

Gary, I do this using rsync. I have a couple of archives that I maintain off-site mirrors of. I do this by running rsync once a day/week to mirror the changes made to one archive with another. Since QRecall adds the blocks of data that changed, and rsync only transmits the block of data that have changed, the two are almost a perfect match. The end result is that rsync will transmit pretty must just the new data captured by QRecall and not much else.

To do this over a network requires (a) one system with rsync and a second system running an rsync server or ssh, (b) a fairly fast network connection, (c) a generous period of time in which neither system is updating its archive, and (d) more free disk space than the size of the archive.

I schedule rsync (via cron) to run at 3AM every morning. It uploads an archive of my important projects (30GB) to my server and then downloads the running backup of my server (175GB) to a local drive. This process takes a little over an hour each day and typically ends up transferring about 1GB-1.5GB of data.

One of the drawbacks to this scheme is in how rsync synchronizes files. rsync first makes a copy of a file, patches the copy with the changes, and finally replaces the original with the updated version. For small files this isn't any big deal, but for the main repository.data file (which is 99%) of your archive, this means the remote system will first duplicate the entire (100GB) data file. This requires a lot of time, I/O, and disk space, but is really the only downside to this method.

My tip to making this work efficiently is to minimize other changes to the active archive. Schedule your merge actions so they run only occasionally (weekly at most), and compact the archive only rarely. Merging creates lots of small changes through the archive, and compacting makes massive changes. The next rsync will be compelled to mirror these changes, which will require a lot of data to be transmitted.

I keep giving this problem a lot of thought, as there are more than a few individuals who want to do this. I have a "cascading" archive feature on the to-do list, which would accomplish exactly what rsync does (although a little more efficiently). But I still don't like the design restrictions, so I keep putting it off.

Gary K. Griffey

James,

Thanks for the reply.

Yes...from my previous testing...I realize that using the most aggressive shifted quanta setting for a virtual disk is not the best use of resources...my intent was to illustrate that I need the resulting archive to be as small as possible...even if it costs CPU, etc.

I will take a another look at rsync. I have tested with it before...but never had much luck getting it to perform block changes only...possibly, my options settings were incorrect.

In any event, thanks again for the reply.

GKG

James Bucanek

Gary K. Griffey wrote:I will take a another look at rsync. I have tested with it before...but never had much luck getting it to perform block changes only...possibly, my options settings were incorrect.

Here's the setup I use.

My server is reachable via the Internet and has ssh (Remote Login) enabled. This allows rsync to securely connect to the server and transparently start the rsync server on the remote system.

My two systems have also been configured with a pair of public/private security keys so that my local system can connect to the server via ssh without requiring a password. The server approves the connection by matching the locally stored key with its signature, which has been saved in the ~/.ssh/authorized_keys files on the server. You'll either need to do this with your systems, or explore the various options for opening an ssh connection with a password. If you're security conscious, this isn't recommended as it usually means your password will be exposed as plain-text somewhere.

With that setup out of the way, I run the following shell script every morning on my local system.

#!/bin/bash



# Mountain Lion: run caffeinate in the background so the system doesn't sleep

caffeinate &



# Download updates to the daily backup on Red King to the local drive

server='username@my.server.com'

backups='/Volumes/Backups'



archive='Teacup.quanta'

echo "$(date): Downloading ${archive} from ${server}"

rsync --recursive --delete --times --verbose "${server}:${backups}/${archive}" '/Volumes/Local Backups/Server'

echo ""



archive='Important Stuff.quanta'

echo "$(date): Uploading ${archive} to ${server}"

rsync --recursive --delete --times --verbose "/Volumes/Local Backups/${archive}" "${server}:${backups}"

echo ""



echo "$(date): Synchronization complete"



# kill the caffeinate process; we're done now

kill %1; sleep 1

echo "==========================================================="

This script first downloads any changes from the server's backup archive (Teacup) to a mirrored copy on my local system. This allows me to have a local copy of my server's daily backups on hand for recovery.

Next, it uploads an archive named "Important Stuff" to the server. I routinely archive important projects I'm working on to this archive. This maintains an off-site copy of all of my important documents. It's this second use of rsync that you'd be interested in.

The script is launched with a crontab entry:

0	3	*	*	*	/Users/james/bin/dailyarchivesync.sh >> '/Volumes/Local Backups/Server/rsync.log'

The redirection lets me maintain a log of all upload/download activity for later review.

James Bucanek

Footnote: Notice that I don't use rsync's --compress option. That's because both of these archives have compression turned on, and trying to compress the data twice would just slow things down. If the archives didn't use compression, I'd probably add the --compress switch to the rsync command.

Gary K. Griffey

James,

Thanks for the update and the detailed info on your process.

I will give the rsync another try...it sounds like just what I need to leverage QRecall's abilities and maintain offsite archives...

Thanks again!

GKG

Johannes

I hope you don't mind me picking up this thread.

My offsite solution at present looks like this:

I have two external disks: BackupA and BackupB. Each contain a QRecall Archive and I capture the same source to them alternating.
The benefit is, that in case one Archive corrupts, I have an independent second Archive (and no risk to copy the corruption to the other offsite backup). The downside is that it takes (probably) longer to capture twice than to duplicate.

The only hassle with this approach is, that I have to define each Action twice. Would it be possible to tell a QRecall Action to capture to two Archives?

Johannes

James Bucanek

Johannes wrote:Would it be possible to tell a QRecall Action to capture to two Archives?

This has been suggested before, and is on the list of features to consider.

The issue, for me, is the utility of such a feature vs. the amount of damage it would do to the interface.

There are a number of different ways of doing this. In it's simplest form, an action would act on multiple archives. But the only benefit would be to reduce the number of actions you have to maintain. That's not a horrific burden, and the way it works now doesn't actually prevent you from accomplishing anything, which makes the merits of such a feature rather low.

On the other hand, there are more sophisticated ways in which this could be implemented (logical archive groups, fail-over lists, etc.) but those add a lot of complexity and ambiguity to the interface. So much so, that it's hard to imagine that the benefits would outweigh the potential confusion that would arise or the amount of code that would be required to implement it.

So as of now, I'm throughly ambivalent about adding such a feature.

Johannes

James Bucanek wrote:

Johannes wrote:Would it be possible to tell a QRecall Action to capture to two Archives?

There are a number of different ways of doing this. In it's simplest form, an action would act on multiple archives.

That would be sufficient for me.

But the only benefit would be to reduce the number of actions you have to maintain. That's not a horrific burden

It depends

If you use item selections or filters a lot, it's error prone to maintain them in two places.

Maybe an other idea would be to go for a non-UI solution: scripting (tell application "QRecall" to run action X using archive A)

Johannes