QRecallDownloadIdentity KeysForumsSupport
  [Search] Search   [Recent Topics] Recent Topics   [Hottest Topics] Hottest Topics   [Groups] Back to home page 
Errors Backing up Virtual Machine Package...  XML
Forum Index » Beta Version
Author Message
Gary K. Griffey



Joined: 21-Mar-09 10:25
Messages: 156
Offline

Greetings James,

Possibly, you could provide me with some troubleshooting guidance. I am continuing to have issues successfully capturing a virtual machine disk file with QRecall beta 1.2v8.

The capture is being performed on a macBook Pro...the laptop is always freshly rebooted just prior to running the captures...thus, the virtual machine is not running...in fact, nothing is running except the capture...the capture is targeting the entire laptop's system volume...which includes this 32 GB virtual machine disk.

The initial capture seems to run fine...however, on subsequent captures...usually the 4th or 5th incremental...an error inevitably pops-up...the error is always identified by a Verify operation...the capture completes normally...and the subsequent Repair operation always identifies the exact same file as being Damaged...namely the VMWare Fusion virtual disk file (.VMDK).

I know that you mentioned in our prior discussion that "stuff happens" and that files do get corrupted. These errors, however, only seem to happen on this particular capture process...and it is always the virtual machine disk file that is damaged...which seems a bit odd to me.

Is there any troubleshooting options that I have? To be honest...the main reason I run QRecall on this laptop is to efficiently backup this virtual machine image on a weekly basis...so...I am really looking to try and resolve this issue. Could this be some type of capacity / threshold being reached by the 4th or 5th incremental? I know you said that archives up to 6 TB could be created with beta 1.2v8...this archive is only around 43 GB...the funny thing is...the capture always appears to finish just fine...but the Verify catches the error. I would think that if the disk drive in the laptop was getting "flakey" the capture might flag read errors, etc. I have experienced the errors writing with the target archive on a Firewire drive and the same errors with the archive on an AFP network share...so I doubt the target media is to blame.

Any help would be appreciated.

Thanks,

GKG

James Bucanek



Joined: 14-Feb-07 10:05
Messages: 1473
Offline

Gary,

It's really hard to tell what's going on without more information. Upload a diagnostic report (Help > Send Report...) and I'll see if there are any clues in the log.

One plausible explanation would be that the drive you're using to store the archive has at least one sector that's causing intermittent data loss and the drive's controller is either failing to detect this or spare the sector. The scenario would go something like this: A sector fails, resulting in a bad data record. The verify detects this. The repair erases the record and marks it as empty. The next capture/compact sees an empty record and writes a new record to the same location. The sector fails, corrupting the new record. The next verify detects this and...

One clue would be to see if the data errors all occur at, or near, the same file position. This would indicate a failure associated with a particular region of the hard drive's surface. One simple workaround would be to copy the archive (since it's only 40GB), rename the old one, and update all of your actions to use the new one. Now see if the problem reoccurs.

The fact that the error occurs in the same file isn't all that surprising. Even if the data corruption was completely random, VM files tend to be huge. I wouldn't be surprised if most of the data in your archive belongs to that one VM file. Throwing a dart, it would be hard not to hit it.

- QRecall Development -
[Email]
Gary K. Griffey



Joined: 21-Mar-09 10:25
Messages: 156
Offline

James,

I have basically ruled-out your supposition of a bad target disk drive/controller...as I have attempted this exact backup scenario to 4 different disk drives...both internal and external...using 3 different computers. Each of the 4 tests began with a new QRecall archive....the initial backup always Verified OK...but a subsequent incremental always fails...usually the third or forth...but this varies. I always Verify the archive prior to running the next incremental.

One experiment I tried this morning...I deleted the 6th "damaged" layer in the archive that was created yesterday...compacted and verified the now 5 layer archive....and it verified successfully. Then, I re-ran the exact same incremental capture again...but this time I used the advanced setting "defaults write com.qrecall.client QRAuditFileSystemHistoryDays -float 0.0"...and this time the 6th layer was not only captured but also verified successfully. This is by no means proof...I realize...as the issue is intermittent...but I have "bumped my head" on this setting before using QRecall...and I am wondering if it is somehow playing a role here.

Certainly, during a capture...you display the virtual machine package file in the status window...this appeared in both captures....the first bad one and the second good one...but there is no display of the individual package components that are being captured...my assumption was that if a package was flagged for capture...then all components were captured...is it possible that a component of the package was not captured in my first incremental yesterday but was captured in the second incremental because if the "QRAuditFileSystemHistoryDays" setting? And could this omission be causing the bad data?

Thanks,

Gary
James Bucanek



Joined: 14-Feb-07 10:05
Messages: 1473
Offline

Gary K. Griffey wrote:I have basically ruled-out your supposition of a bad target disk drive/controller...

Excellent! Then this problem is actually interesting.

To get started, you'll need to upload a diagnostic report (Help > Send Report...) from the computer you're using to capture the VM file.

Could you also provide me with information about the volume that contains your archive? Specifically, it's size, format (HFS Extended, journaled, ...?), and how you connect to it (Firewire, USB, network, ...).

One experiment I tried this morning...

The fact that one capture failed and the next one succeeded when QRAuditFileSystemHistoryDays was set to 0 may, or may not, be relevant. QRAuditFileSystemHistoryDays only changes what folders QRecall scans for changes. It doesn't change any of the logic that QRecall uses to determine which items to capture, or how they are captured.

It may be relevant because this could be a QRecall bug, one related to the size of the file(s) being captured. Changing QRAuditFileSystemHistoryDays might have changed the folders that QRecall examined, which in turn could have changed which files it captured and how much data it added to the archive. I'd be more interested in a capture that failed, you delete the layer, and then run the same capture (with the exact same settings) again.

(Beside the issue, I would recommend that you leave QRAuditFileSystemHistoryDays set to something relatively low; certainly lower than the normal interval in which you recapture your laptop, since moving your laptop drive from one system to another other invalidates the file system events information.)

To further diagnose this problem, I'd be most interested in getting a snapshot of your archive's structure when it was OK and again after the capture corrupts it. To do that, make a dump of the archive after a successful verify:

- Open the archive in QRecall
- Choose Archive > Dump (not File > Dump...)
- In the dump options sheet, choose the following:
- - In the Data section, check all options EXCEPT Data Packages
- - In the Layers section, check all options
- - Make sure all options in the Fill map, Hash Table, Package Index, and Names Index sections are off
- Click Dump and save the file to a convenient location

The Dump command is a diagnostic tool that only exists in the beta version. It will lock up the QRecall application (you'll get the spinning Technicolor pizza of death cursor) until it finishes. It should take about the same amount of time as a verify, so be patient. When it's done, compress the dump file in the Finder and send it to me. If it's too big to e-mail, contact me and I'll provide you with alternative upload methods.

And, I'm going to ask again (just because I can't stop myself) that you use Disk Utility to verify the structure of the volume that contains the archive. I ask this simply because I've spent days in the past trying to diagnose problems that turned out to be a corrupted volume.

Continue taking normal captures. When the verify following the capture fails, create another dump (same options) and send me that along with another diagnostic report (Help > Send Report...). It would be awesome if you could capture the dump of the archive immediately before the failure, but I'll understand if you don't have the time to generate a dump file after every successful capture.

I'm also very interested in the size and structure of the files in your VM, but that information will be in the dump file. This should give me enough information to create a simulation of your situation here.

- QRecall Development -
[Email]
Gary K. Griffey



Joined: 21-Mar-09 10:25
Messages: 156
Offline

James,

Thanks for all the info and direction. Here is what my simple test plan will include:

Prior to executing a normal incremental Capture of the laptop which will always include the large VM image:

1) Run verify on existing archive.
2) Run Disk Utility on the target disk drive...just to make sure everything is normal.
3) Create Archive dump using the options that you have detailed.

Then...

4) Execute incremental Capture process with a QRAuditFileSystemHistoryDays set to a somewhat reduced value.
5) Execute Verify action on newly created archive.
6) If Verify action fails...create another archive dump.

I will also make every effort to keep other variables to a minimum...like using the exact same target drive...namely an internal SATA drive on my Mac Pro...and keeping the connection method the same...a FireWire cable.

I will let you know my results after I have performed a few Captures..

Thanks, again...

Gary
Gary K. Griffey



Joined: 21-Mar-09 10:25
Messages: 156
Offline

James,

An interesting observation was noted this morning...I would like to solicit your comments.

After running the virtual machine on my laptop for an hour or so...I shut it down...then attempted to perform a QRecall incremental capture. As I have seen many times before...the virtual machine package file was not included in the capture...even though its Date Modified was well after the previous capture.

Normally, I simply modify the QRAuditFileSystemHistoryDays setting to force the virtual machine to be included...this morning, however, I decided to enable the QRLogCaptureDecisions switch instead. This appears to have solved the issue...in other words...unlike setting the QRAuditFileSystemHistoryDays to say, 0.0...which forces the "deep traversal"...simply enabling QRLogCaptureDecisions correctly included the virtual machine package in the capture...but did not perform the "deep traversal".



Any thoughts as to why?

Gary
Gary K. Griffey



Joined: 21-Mar-09 10:25
Messages: 156
Offline

Ok...this only seems to work for file changes made before a system restart.

So...if you run the virtual machine....shut it down...then reboot your mac...then, run a QRecall Capture...it misses the virtual machine...until the virtual machine is started and stopped again...

So...it would appear that file system changes, at least for certain files, are not "surviving" through a system restart...that sure seems odd to me...

James Bucanek



Joined: 14-Feb-07 10:05
Messages: 1473
Offline

Gary K. Griffey wrote:Any thoughts as to why?

Pure coincidence. The most innocuous things can cause file system events to record a change for a directory. Merely opening the folder that contains your VM file in the Finder could be enough to trigger a rescan of that item.

So...it would appear that file system changes, at least for certain files, are not "surviving" through a system restart...

They should. File system event records are stored in an invisible folder at to root level of each volume, and happy live on between system restarts.

But as I remember, you're still shutting down your laptop and connecting to another system using target disk mode to perform the QRecall capture. As soon as you mount a volume on another system, the entire file system events history for that volume becomes invalid. And it gets scrambled again when you move it back. So under these circumstances, all bets are off.

- QRecall Development -
[Email]
Gary K. Griffey



Joined: 21-Mar-09 10:25
Messages: 156
Offline

Understood thanks...

I will continue testing...just to clarify...I am no longer restarting the laptop and using target disk mode for Capture per your advice...I am executing QRecall right from the laptop itself...so...I am hoping the file system events work as expected.

Thanks,

GKG
Gary K. Griffey



Joined: 21-Mar-09 10:25
Messages: 156
Offline

James,

I am getting this error on one of my archives. The archive verifies OK...but when I try to Compress it...I get the error.

"cannot open negative hash map"
"A network or disk error was encountered"

I checked the disk with disk utility and it shows clean...

Thanks,

GKG
James Bucanek



Joined: 14-Feb-07 10:05
Messages: 1473
Offline

Gary K. Griffey wrote:I will continue testing...just to clarify...I am no longer restarting the laptop and using target disk mode for Capture per your advice...I am executing QRecall right from the laptop itself...so...I am hoping the file system events work as expected.

So do I! I'd be really disappointed if file system events stopped working.

Gary K. Griffey wrote:I am getting this error on one of my archives.

That's an odd error to get from a local drive. Send a diagnostic report and I'll look into the exact nature of the error and what it might mean.

- QRecall Development -
[Email]
Gary K. Griffey



Joined: 21-Mar-09 10:25
Messages: 156
Offline

Ok...report sent...
James Bucanek



Joined: 14-Feb-07 10:05
Messages: 1473
Offline

Gary,

You're getting a generic "permission denied" error on one of the archive's internal files:


Error 13 is EACCES (Permission denied). It means the user that's running the compact action doesn't have permission to write to the internal negative hash map file inside the archive's package.

This doesn't make a lot of sense, assuming that QRecall can access all of the other files in the package. Have you been updating this archive using other user accounts or on systems that might have created files that belong to different users?

Anyway, there are two ways of fixing this. If the problem is just the negative map index file, then just delete it. The negative map is one of those index files that will automatically rebuild itself. In the Finder, right+click on the archive icon and choose Show Package Contents. Inside the archive package folder, find and trash the negative.index file.

The other solution is to fix the permissions problem. Again, open the package contents and then explore the ownership and permissions for each file. Ideally, they should all be owned by the user account used to update the archive, or at least have appropriate permissions so that all users that need to update the archive have read and write permissions for those files.

- QRecall Development -
[Email]
Gary K. Griffey



Joined: 21-Mar-09 10:25
Messages: 156
Offline

Ok...I have deleted the negative index file...and the Compact is now running...

Thanks,

GKG
Gary K. Griffey



Joined: 21-Mar-09 10:25
Messages: 156
Offline

James,

After some further testing...I am still experiencing issues capturing my virtual machine package file. I don't believe the issue is with QRecall itself...but, rather the file system events. Here is my very simple process that evidently will not work.

1) QRecall incremental backup taken of laptop drive yesterday AM...the backup include the large VM package file as it should have...the backup was executed from the laptop itself to a target network share on my mac Pro desktop. No issues.
2) I used the virtual machine all day yesterday and this morning.
3) I shutdown the VM...and rebooted the laptop.
4) I repeated the exact same QRecall action....i.e., running the incremental backup from the laptop to the same network share...a small number of files were re-captured...but not the VM...which again, is my main goal.

The laptop is running with no special QRAuditFileSystemHistoryDays value...so it is using the 6.9 default.

So...it would appear that the 10.6 File System events did not advise QRecall to backup the Vm package..even though it most certainly had been updated. From your previous notes...I do not see any reason the file system events on the laptop's drive should have been deleted or otherwise compromised. What am I doing wrong? You mentioned that these file system events are located in a hidden file on the drive itself...obviously, you are using an API or equivalent to read these values...it there any way that I could actually peer into this hidden file to see if the VM package file is being flagged as changed?

I just wonder what other user/system related files are being skipped in my incremental backups...makes you question the value of any backup system using these events..like Time Machine, etc.

Thanks,

GKG
 
Forum Index » Beta Version
Go to:   
Powered by JForum 2.1.8 © JForum Team