QRecall Community Forum
  [Search] Search   [Recent Topics] Recent Topics   [Hottest Topics] Hottest Topics   [Top Downloads] Top Downloads   [Groups] Back to home page 
[Register] Register /  [Login] Login 

working with large files RSS feed
Forum Index » Beta Version
Author Message
john hampson


Joined: Apr 13, 2007
Messages: 22
Offline
I use VM's quite extensively, and individual files can easily become several GB in size.

How does QRecall cope with these files?

If I open up a VM and make a small modification, how much will be changed in the archive for QRecall. (I haven't yet tried QRecall on my VM partition).

BTW, I've found your previous responses clear and to the point, thanks!
James Bucanek


Joined: Feb 14, 2007
Messages: 1568
Offline
I'm not sure exactly what a VM file is, but if VM files are like large multi-media projects or disk images, then they are exactly the kind of incremental backup problem that QRecall was designed to deal with.

When the file is recaptured, QRecall will read each block (quanta) of data and compare it with those that have already been captured. Only new data blocks are added to the archive.

In the situation of a disk image, take a 600MB disk image file. If you mount that image and add a single 1MB file, only a little over 1MB worth of data has changed. When QRecall recaptures the file, it will discover 599MB of duplicate data and only adds 1MB of new data to the archive.

There's a similar situation with large multimedia files like Photoshop or audio files. Open up a 300MB Photoshop file and add an adjustment layer. The original 300MB of data is still in the file, although it might have shifted to a slightly new location. That's where QRecall's shifted-quanta analysis gets to work. It still finds the 300MB of existing data and only adds the new data to the archive. Ditto for changing the MP3 tags on a large audio file. The audio hasn't changed, only the tiny bit of tag data.

Now this still requires QRecall to re-analyze all of that data, which will be time consuming. So this is definitely something that you'll want to schedule to run at night or when you're not using your system.

(And I'm glad you like the answers.)

- QRecall Development -
[Email]
john hampson


Joined: Apr 13, 2007
Messages: 22
Offline
VM = Virtual Machine.

For work I sometimes need to run WinXP, so I use Parallels. This gives me large virtual disks of maybe 4-5GB in size. I might open one up and make a change in a document of around 10K.

From what you say, this should be a very efficient way of backing up these VMs. I will test it tonight
James Bucanek


Joined: Feb 14, 2007
Messages: 1568
Offline
Here's a tip:
If you are recapturing large disk images, you'll get much better performance if you turn off (lowest setting) shifted-quanta analysis in the archive's settings. This will greatly improve recapture performance.

Disk images are an array of immovable blocks. Data doesn't "shift" in a disk image the way it can in a file. So searching for shifted data wastes a great deal of resources.

- QRecall Development -
[Email]
Jon Lindemann


Joined: Feb 12, 2015
Messages: 9
Offline
Here's a tip:
If you are recapturing large disk images, you'll get much better performance if you turn off (lowest setting) shifted-quanta analysis in the archive's settings. This will greatly improve recapture performance.


Disk images are an array of immovable blocks. Data doesn't "shift" in a disk image the way it can in a file. So searching for shifted data wastes a great deal of resources.



James,

I know this is an cold "Beta" thread, but do you have any new recommendations regarding archiving large virtual disk images in Parallels?

My virtual machine (.pvm) and its associated virtual hard disk (.hdd) currently reside on my BootApps partition on a PCIe SSD which is backed up by cloning. I am considering moving the .pvm to a "Master" or "Docs" partition on a RAID to free up space on my boot partition. That "Master" and "Docs" partitions (about 350 GB each) are archived by QRecall. Archives are performed daily only @ 0300h or perhaps manually if substantial new files are added to my Mac. The Virtual machine is run only 1-2 times/month; the associated ".hdd" file is currently 45 GB.

In the "Help" file on "Shifted Quanta Detection" you indicate that "Many files (log files, disk images, virtual machine files, and so on) do not benefit at all from shifted quanta detection."

My question therefore related to how QRecall will handle the relatively infrequent changes in my "Virtual" hard drive (.hdd). If I enable "Shifted Quanta Detection" in the archive preferences, will that add hours to the nightly backups? Is it of any benefit?

Parallels apparently has an option to split the ".hdd" into 2 GB files: would that avoid archiving the entire 45 GB ".hdd" file every time it's changed? (The archive is on a 4 TB, USB-3 volume).

Thanks,

Jon
James Bucanek


Joined: Feb 14, 2007
Messages: 1568
Offline
Jon Lindemann wrote:I know this is an cold "Beta" thread, but do you have any new recommendations regarding archiving large virtual disk images in Parallels?

Well, there's a new beta on the horizon, so this seems like a good time to warm up some of these threads.

As you mentioned, shifted quanta detection isn't effective with files like disk images, for all of the reasons previously given. So the only issue is how much time/effort will it waste.

The Virtual machine is run only 1-2 times/month; the associated ".hdd" file is currently 45 GB.

Shifted quanta detection can add somewhere between a little and a lot of overhead to the capture process, depending on a wide variety of factors including the aggressiveness of the detection, the size of the archive, the amount of RAM available, the speed of your I/O, and so on. As a rule, it can slow your captures speeds anywhere from 20% to 10,000%.

But in your case, it probably won't matter in the grand scheme of things. Let's say you move your disk image to your new "Docs" partition and begin capturing it to your Docs archive, and that archive uses shifted quanta detection. By your own admission, you only modify these virtual machine images a couple of times a month. So 93% of the time, QRecall won't be capturing these files because they haven't changed.

Of the 7% of the time QRecall does recapture it, most of the data in these disk images is already duplicate (unshifted). Even with shifted quanta detection set to it's most aggressive setting, QRecall always looks for duplicate un-shifted data first. Since 99% of the data in your disk image file doesn't change or move, 99% of the data will be immediately recaptured as duplicate data with no shifted quanta analysis.

In the end, you'll only be taking a performance hit on 1% of the data in 7% of your captures. Even if shifted quanta detection made your captures 10 times slower, that's only impacting 0.01% of your total capture time. I doubt you'll ever notice.

Well, you will notice once; specifically, the first time you move these disk image files to your new partition and capture them for the first time. That's going to be your worse-performing capture. But after that, it should be smooth sailing.

Parallels apparently has an option to split the ".hdd" into 2 GB files: would that avoid archiving the entire 45 GB ".hdd" file every time it's changed?

This won't make any difference. That option is just so you can store Parallel virtual machine files on a filesystem that don't support huge files. It would only improve your QRecall performance if only one of the .hdd segments changed. It's virtually (no pun intended) impossible for that to happen. It would be like booting your OS X system and expecting only data on the second half of your hard drive to be modified. What will happen is that all of the .hdd segment get modified and QRecall will have to recapture them all, and ultimately the exact same amount of data.

- QRecall Development -
[Email]
Jon Lindemann


Joined: Feb 12, 2015
Messages: 9
Offline
James,

Thanks for your prompt and cogent reply.

Jon
 
Forum Index » Beta Version
Go to:   
Mobile view
Powered by JForum 2.8.2 © 2022 JForum Team • Maintained by Andowson Chang and Ulf Dittmer