Message |
|
Gary K. Griffey wrote:I then cycled the VM...rebooted....and ran a Capture. The Capture used the "deep traversal"...even though QRAuditFileSystemHistoryDays was set to the default value....I guess because .fsevenetsd was recreated?
That's correct. When QRecall requests historic file system event information for a date earlier than what's in the database, it receives a message that basically says "there's no history for that period, you're on your own," so QRecall has to do it the hard way.
I then, however, cycled the VM again...restarted the computer...and ran another capture...this one missed the VM...so it still does not work correctly. I am attaching the .fseventsd-suspect folder as a single zip file.
Thanks. I'll also ask that you send another diagnostic report (Help > Send Report) so I can report the actual events it did receive and what was expected.
I hope this provides the info you need to supply Apple with a bug report.
I'll report this to Apple, and I thank you for your patience and perseverance in isolating this problem. I would also suggest that you bring this to the attention of VMWare, or I'll be happy to do so on your behalf. I strongly suspect that this has something to do with VMWare. As I've mentioned, software that hosts a virtual operating system do not interact with OS X the way normal software does, which may result in side-effects like the one you're seeing.
I guess for now...I will have to run all my captures using QRAuditFileSystemHistoryDays = 0.0...just to be sure everything is backed-up.
Have you tried some of the other workarounds (shutdown script, separate capture)? I'd be really surprised if this was a fundamental flaw in file system events. I've been working intimately with the FSEvents API since Leopard and this is the first time I've ever seen it miss or forget a change event (at least outside the reasons that are known to trip it up).
|
|
|
Gary K. Griffey wrote:Ok...followed your list
Almost. I also wanted you to perform a capture, open your VM, close it, and immediately capture the drive again without restarting; and then do the same again with a restart in between stopping the VM and capturing. (And, of course, send a diagnostic report when you're done.) You stated that when running FSEventer you saw change event being logged for your VM image package. I wanted to see if you could catch OS X reporting a change event for something that happened very recently, but then not report something that should have been saved between restarts. By the way, what is the path to your VM fileāthe one that should be in the events history?
So...it appears that OSX and FSEvenetsd are at fault here...your thoughts?
It would appear so, which is disappointing. I can fix QRecall, I can't fix OS X...
It seems that if this was a common issue there would be a lot of "long faces" out there come restore time.
This is the first instance that I've ever seen where file system events didn't record a change event.
I just don't see how Finder could "know" about the directory update but evidently not FSEventsd...
Like I said, there are two sources of file system change events: The live event stream that is broadcast to procsses as events occur and the historic events recorded on the volume. The Finder subscribes to the live stream. QRecall, Time Machine, and so on request historic change events. (Actually, that's not entirely true; Both Time Machine and QRecall request historic events, and then switch to the live stream during the backup so they can catch additional changes as they happen.) But the point is, just because file system events detects and broadcast a change event doesn't (apparently) guarantee that it's also getting saved and replayed in the future.
Also, any suggestions or thoughts as to why FSEventsd is not working?
I'd like to stick with the theory that the fseventsd process is receiving change events, but failing to record them permanently for some reason. So after running the experiment that I mentioned above, try this experiment: In the Terminal, issue this commands: $ sudo mv /.fseventsd /fseventsd-suspect (you'll be prompted for your password) $ sudo chown -R $UID /fseventsd-suspect Restart your computer. Now perform the "capture, run VM, restart, capture" sequence again and see if it works any differently. If that works, then there may have been something weird with the records stored in your .fseventsd folder. Please archive the fseventsd-suspect folder you'll find at the root of your hard drive and e-mail that to me. I'd like to include it my bug report to Apple.
could the VMWare Fusion application somehow be suppressing this from FSEventsd?
There's no official API for doing this. An application can flush (erase) all of the history on a volume and there's a means for disabling file system events on a volume, but there's no API for selectively deleting or suppressing change events for an individual folder. Apple does provide an API for excluding individual items from a Time Machine backup, but that shouldn't interfere with file system events. If it is, then that's a bug. But as I mentioned very early on, it might have something to do with the nature of the VMWare system. Virtual machines interface with the operating system at a very low level. It's entirely possible that VMWare is making changes to the file system in such a way that the operating system isn't seeing it, or doesn't believe that it's significant enough to record for posterity. It's also possible that VMWare doesn't like Time Machine trying to copy its VM image and has figured out how to "hack" the file system events service into not recording specific changes. Out of curiosity, does the VMWare system provide any means of automatically running a script, ideally when the VM shutsdown? It might be possible to work around this problem by adding a script that touches the VM image whenever you shut down. Another thing you might try is create a second capture action the captures just the VM Image package. Assuming the package doesn't have subfolders, QRecall always captures the items that are in its capture set. It only consults the file system events to determine if subfolders need to be captured. In essence, this would ignore file system events for the one VM image folder.
|
|
|
Gary (and anyone else who wants to play), Download QRecall 1.2(a)9. Install it. From the Terminal, you'll need to set both the QRLogCaptureDecisions and QRLogFSEvents (undocumented) settings to true:
defaults write com.qrecall.client QRLogCaptureDecisions -boolean true
defaults write com.qrecall.client QRLogFSEvents -boolean true
With both of these settings on, QRecall will log (a) all of the historic file system event paths reported by the operating system, (b) the final tree of changed directories (this is QRecall's internal picture of what has changed on the volume based on the FS events it received), and (c) which folders in the capture were skipped because file system events ensured QRecall that nothing within those folders changed. Once that's all done, I suggest that you:
Perform a capture of the volume containing your VM image.
Start and stop your VM.
Perform the same capture again.
Start and stop your VM.
Restart your computer.
Perform the same capture again.
Send a diagnostic report (Help > Send Report...) That should pretty definitively determine whether QRecall is either not getting the correct change events or if it's misinterpreting them.
|
|
|
Gary K. Griffey wrote:I wonder what the time delay is between fseventsd receiving an event and actually performing the write to the hidden file...
File system events are coalesced approximately every 30 seconds. The coalesced version is what gets saved as history. When these actually get written to the drive will largely depend on buffering, etc., but I'm sure the process flushes them from time-to-time. You can always fire up iosnoop if you really want to know...
I can also certainly try your suggestion of simply deleting the fseventsd hidden file on the laptop's volume and allowing it to be re-created from scratch...I don't think that would hurt anything...
Ideally, the the most definitive course of action would be to leave it as it is until you can demonstrate the problem. If the problem is that file system events are not being saved correctly, and you can prove that, you can then delete the .fseventsd folder and see if that fixes the problem. If you "fix" the problem before you isolate it, you'll be left wondering what was really wrong.
If you do have time to work-up a special version that would log exactly what QRecall "sees" in that hidden file it would be great...I would just love to get to the bottom of this...
I'll get to that later today.
|
|
|
Gary K. Griffey wrote:but isn't Spotlight "fed" by the same FSEvent daemon as QRecall? And, therefore, shouldn't the results be the same, i.e., if it shows up in a Smart Folder "Last Modified" mask shouldn't it be captured with QRecall?
Yes and no. As I mentioned earlier, FSEventer, the Finder, Spotlight, and others subscribe to the "live" feed of file system change events. You change something; a message is immediately sent to those processes. QRecall, Time Machines, and similar application request historic file system change events. These are collected by a background process (fseventsd) and periodically written to the hidden folder .fseventsd at the root level of each volume. At some later date, a process can request the history of change events from some arbitrary point in time, and they receive a "replay" of those events. You already mentioned that you after restarting, QRecall didn't capture your VM package. The possibilities at this point seem to be narrowed down to (a) your historic FSEvent event records aren't being saved properly on your volume or (b) QRecall is misinterpreting those events. At this point, I think the most expedient diagnostic approach would be to prepare a special version of QRecall that logs exactly what it gets from the FSEvents API. That way, you can try restarting and running QRecall and see what folders FSEvents has recorded. Fixing this might be as simple as deleting the hidden .fseventsd folder and letting the operating system rebuild it, but I want to rule out QRecall as the culprit first.
|
|
|
Gary K. Griffey wrote:I just wonder if QRecall is actually looking "inside" the VM package when it does its timestamp compare on an incremental..
Absolutely. A package is just a directory (folder) that acts as a single item in the Finder. The modification date of a directory does not change when the contents of a file it contains changes. So the mod date of the directory is not used as a criteria for deciding if the files within that directory have changed. QRecall will record any changes to the modification date of the directory, and then proceed to examine every file inside the directory for changes? unless, file system events ensures QRecall that nothing in that folder has changed, in which case QRecall will skip it. You need to determine if file system events are being recorded for your VM package. Start FSEventer, start up your VM, shut it down, and then go see if a change event for your VM package was recorded. If not, then that's your problem.
|
|
|
|
|
|
Gary K. Griffey wrote:So...it would appear that the 10.6 File System events did not advise QRecall to backup the Vm package..even though it most certainly had been updated. From your previous notes...I do not see any reason the file system events on the laptop's drive should have been deleted or otherwise compromised. What am I doing wrong?
You're not doing anything wrong. If QRecall is not scanning the directory containing your VM image, then it's because file system events is not recording changes for it. I have a couple of thoughts... First, check your console for anything related to FSEvent or fseventd. In the past, people have reported errors being logged by the fseventd daemon. This may indicated that file system events are not being recorded property. How do you run your Windows system? Do you reboot your computer into Windows (a la Bootcamp) and the restart again in OS X, or do you run something like Parallels so that OS X and Windows are running concurrently? It occurred to me that in the former situation, OS X isn't actually running, so changes to VM files (even if they're recorded in the file system) won't be recorded by the file system events service. If you're running the later configuration, you can see if file system events records changes for your VM package using a free utility like FSEventer (insert Google search here). Your VM might be going behind OS X's back (so to speak) and reading and writing blocks at the driver level. This would also circumvent the file system events service. A tool like FSEventer still won't show you exactly what QRecall sees because it only displays file system events as they occur, whereas QRecall requests historic events (i.e. please Mr. OS, send me all change events from point X in the past). So where FSeventer logs events as they occur, QRecall is interested in the permanent record of events stored on the drive.
You mentioned that these file system events are located in a hidden file on the drive itself...
That's they way it's supposed to work. But if FSEventer shows change events for your VM folder, you reboot, and then QRecall doesn't see them, then the FSEvents service might not be recording them property.
obviously, you are using an API or equivalent to read these values...it there any way that I could actually peer into this hidden file to see if the VM package file is being flagged as changed?
I'm not aware of any freeware tool for requesting historic FSEvent data. I wrote one of my own for testing. If you can't figure this out, I'll see if I can dig it up and polish it off.
I just wonder what other user/system related files are being skipped in my incremental backups...makes you question the value of any backup system using these events..like Time Machine, etc.
That's a good question. QRecall, Time Machine, Apple Events, and most other disk backup and synchronization software all rely on FSEvents. If it's not reporting changes consistently, then none of these solutions will function property.
|
|
|
Glenn Henshaw wrote:The backup target (/Volumes/MBP/) is a drive mounted on an Airport Express.
When you attach it directly, first run Disk Utility and verify/repair the volume. If there's any problems found, repair the QRecall archive. Also, restart the Airport Express before reattaching the drive.
|
|
|
Glenn Henshaw wrote:I recently upgraded the hard drive in my laptop. The backups are now failing with an odd error. The Mac is up to date (10.6.4) using QRecall Version 1.2.0(8) beta (1.2.0.8). The upgraded HD was cloned from the old one. I opened the archive manually and tried to re-index it and the following logs were the result. Thoughts?
I can't tell you much beyond the fact that QRecall was trying to write a file and the operating system returning an I/O error (ioErr, -36):
2010-09-08 22:00:40.075 -0400 Details File: /Volumes/MBP/thraxisp.quanta/hash.index
2010-09-08 22:00:40.075 -0400 #debug# API: FSWriteFork
2010-09-08 22:00:40.075 -0400 #debug# OSErr: -36
What would cause your brand new drive to start returning I/O errors is a bit puzzling. Although I'm not sure which drive is new, your internal startup drive or the external drive a /Volumes/MBP? If it's the internal drive that was upgraded, that this could cause your external drive to start returning I/O errors is equally bizarre. You also have a mysterious -50 (invalid parameter) error recorded in the log, which also doesn't make much sense, although I've seen both of these errors caused by corrupted volume structures.
|
|
|
Gary, You're getting a generic "permission denied" error on one of the archive's internal files:
2010-09-07 15:01:08.477 -0500 Details cannot open negative hash map
2010-09-07 15:01:08.477 -0500 #debug# IO exception
2010-09-07 15:01:08.477 -0500 #debug# API: open
2010-09-07 15:01:08.478 -0500 #debug# OSErr: 13 Error 13 is EACCES (Permission denied). It means the user that's running the compact action doesn't have permission to write to the internal negative hash map file inside the archive's package. This doesn't make a lot of sense, assuming that QRecall can access all of the other files in the package. Have you been updating this archive using other user accounts or on systems that might have created files that belong to different users? Anyway, there are two ways of fixing this. If the problem is just the negative map index file, then just delete it. The negative map is one of those index files that will automatically rebuild itself. In the Finder, right+click on the archive icon and choose Show Package Contents. Inside the archive package folder, find and trash the negative.index file. The other solution is to fix the permissions problem. Again, open the package contents and then explore the ownership and permissions for each file. Ideally, they should all be owned by the user account used to update the archive, or at least have appropriate permissions so that all users that need to update the archive have read and write permissions for those files.
|
|
|
Gary K. Griffey wrote:I will continue testing...just to clarify...I am no longer restarting the laptop and using target disk mode for Capture per your advice...I am executing QRecall right from the laptop itself...so...I am hoping the file system events work as expected.
So do I! I'd be really disappointed if file system events stopped working.
Gary K. Griffey wrote:I am getting this error on one of my archives.
That's an odd error to get from a local drive. Send a diagnostic report and I'll look into the exact nature of the error and what it might mean.
|
|
|
Gary K. Griffey wrote:Any thoughts as to why?
Pure coincidence. The most innocuous things can cause file system events to record a change for a directory. Merely opening the folder that contains your VM file in the Finder could be enough to trigger a rescan of that item.
So...it would appear that file system changes, at least for certain files, are not "surviving" through a system restart...
They should. File system event records are stored in an invisible folder at to root level of each volume, and happy live on between system restarts. But as I remember, you're still shutting down your laptop and connecting to another system using target disk mode to perform the QRecall capture. As soon as you mount a volume on another system, the entire file system events history for that volume becomes invalid. And it gets scrambled again when you move it back. So under these circumstances, all bets are off.
|
|
|
Gary K. Griffey wrote:I have basically ruled-out your supposition of a bad target disk drive/controller...
Excellent! Then this problem is actually interesting. To get started, you'll need to upload a diagnostic report (Help > Send Report...) from the computer you're using to capture the VM file. Could you also provide me with information about the volume that contains your archive? Specifically, it's size, format (HFS Extended, journaled, ...?), and how you connect to it (Firewire, USB, network, ...).
One experiment I tried this morning...
The fact that one capture failed and the next one succeeded when QRAuditFileSystemHistoryDays was set to 0 may, or may not, be relevant. QRAuditFileSystemHistoryDays only changes what folders QRecall scans for changes. It doesn't change any of the logic that QRecall uses to determine which items to capture, or how they are captured. It may be relevant because this could be a QRecall bug, one related to the size of the file(s) being captured. Changing QRAuditFileSystemHistoryDays might have changed the folders that QRecall examined, which in turn could have changed which files it captured and how much data it added to the archive. I'd be more interested in a capture that failed, you delete the layer, and then run the same capture (with the exact same settings) again. (Beside the issue, I would recommend that you leave QRAuditFileSystemHistoryDays set to something relatively low; certainly lower than the normal interval in which you recapture your laptop, since moving your laptop drive from one system to another other invalidates the file system events information.) To further diagnose this problem, I'd be most interested in getting a snapshot of your archive's structure when it was OK and again after the capture corrupts it. To do that, make a dump of the archive after a successful verify: - Open the archive in QRecall - Choose Archive > Dump (not File > Dump...) - In the dump options sheet, choose the following: - - In the Data section, check all options EXCEPT Data Packages - - In the Layers section, check all options - - Make sure all options in the Fill map, Hash Table, Package Index, and Names Index sections are off - Click Dump and save the file to a convenient location The Dump command is a diagnostic tool that only exists in the beta version. It will lock up the QRecall application (you'll get the spinning Technicolor pizza of death cursor) until it finishes. It should take about the same amount of time as a verify, so be patient. When it's done, compress the dump file in the Finder and send it to me. If it's too big to e-mail, contact me and I'll provide you with alternative upload methods. And, I'm going to ask again (just because I can't stop myself) that you use Disk Utility to verify the structure of the volume that contains the archive. I ask this simply because I've spent days in the past trying to diagnose problems that turned out to be a corrupted volume. Continue taking normal captures. When the verify following the capture fails, create another dump (same options) and send me that along with another diagnostic report (Help > Send Report...). It would be awesome if you could capture the dump of the archive immediately before the failure, but I'll understand if you don't have the time to generate a dump file after every successful capture. I'm also very interested in the size and structure of the files in your VM, but that information will be in the dump file. This should give me enough information to create a simulation of your situation here.
|
|
|
Gary, It's really hard to tell what's going on without more information. Upload a diagnostic report (Help > Send Report...) and I'll see if there are any clues in the log. One plausible explanation would be that the drive you're using to store the archive has at least one sector that's causing intermittent data loss and the drive's controller is either failing to detect this or spare the sector. The scenario would go something like this: A sector fails, resulting in a bad data record. The verify detects this. The repair erases the record and marks it as empty. The next capture/compact sees an empty record and writes a new record to the same location. The sector fails, corrupting the new record. The next verify detects this and... One clue would be to see if the data errors all occur at, or near, the same file position. This would indicate a failure associated with a particular region of the hard drive's surface. One simple workaround would be to copy the archive (since it's only 40GB), rename the old one, and update all of your actions to use the new one. Now see if the problem reoccurs. The fact that the error occurs in the same file isn't all that surprising. Even if the data corruption was completely random, VM files tend to be huge. I wouldn't be surprised if most of the data in your archive belongs to that one VM file. Throwing a dart, it would be hard not to hit it.
|
|
|
|
|