Register / Login  |  Desktop view  |  Jump to bottom of page

General » Exhaustive File System Scan

Author: Adrian Chapman
1 decade ago
James

Does QRecall still perform a periodic exhaustive file system scan in Lion and if so how frequently?

Adrian

Author: James Bucanek
1 decade ago
 
Adrian Chapman wrote:Does QRecall still perform a periodic exhaustive file system scan in Lion and if so how frequently?

QRecall has a "trusted" duration for filesystem change events. When using FSEvents to determine past changes, it trusts them to be accurate for a period of time. After that, it ignores FSEvents and performs an exhaustive traversal of the filesystem directories. It records the date of this exhaustive traversal in the archive, and begins trusting FSEvents again until the "trusted" duration expires once more.

The default duration is 6.9 days, which should force QRecall to perform a deep scan of the volume once a week.

You can change this duration by setting the QRAuditFileSystemHistoryDays setting to a different value:

# perform a deep scan every 3 days

defaults write com.qrecall.client QRAuditFileSystemHistoryDays -float 2.9

Author: Adrian Chapman
1 decade ago
Thanks James. What prompted my question was that in the advanced settings page of the Cookbook & FAQ forum it says "Leopard only" for this parameter and I just wondered if anything had changed with later releases of OS X.

Out of curiosity I tried changing QRAuditFileSystemHistoryDays to zero so it would perform an exhaustive scan on every capture but I didn't notice any change in the capture speed which seems a bit odd if QRecall has to examine every file. Also how confident are you that a seven day interval is reasonable?

Can you tell me where in the archive the date is stored as I have had a look at the various files and I can't see anything obvious.

Thanks

Adrian

Author: James Bucanek
1 decade ago
 
Adrian Chapman wrote:Thanks James. What prompted my question was that in the advanced settings page of the Cookbook & FAQ forum it says "Leopard only" for this parameter and I just wondered if anything had changed with later releases of OS X.

Thank you for noting that. I've fixed the note. This is really a "Leopard and later" feature, because the FSEvents API was added in Leopard.

Out of curiosity I tried changing QRAuditFileSystemHistoryDays to zero so it would perform an exhaustive scan on every capture but I didn't notice any change in the capture speed which seems a bit odd if QRecall has to examine every file.

When QRecall examines a folder to capture, it compares the file metadata (name, size, last modified date, extended attributes, etc., etc.) with what's in the archive. If nothing is different, it assumes that nothing in the file has changed. This is actually very fast for a single folder. Things bog down when QRecall has to reexamine thousands and thousands of folders, most of which contain no changes. So the amount of speed increase that you get from using filesystem events is proportional to the depth and complexity of the folder structure, not the size or content of the files in question.

Also how confident are you that a seven day interval is reasonable?

It seemed reasonable to me. There is no time period by which filesystem events are, or are not, accurate. It's a very robust service and I have never seen it report false information, or had the information it reports become corrupted over time. The problem is that it contains a very small number of fundamental flaws, for which there is no workaround. These flaws can cause it to fail to report changes in a folder that contains changes.

The question of "reasonable" is really one of tolerance. How long are you willing to wait before you are sure QRecall has captured all of the changes on a volume, vs. how much extra overhead are you willing to put up with while QRecall exhaustively rescans the entire volume?

Can you tell me where in the archive the date is stored as I have had a look at the various files and I can't see anything obvious.

I could, but then I'd have to shoot you.

History markers are part of the bookmark structure in a session package. A session package essentially identifies a layer in the archive. The "date" is actually a filesystem history event ID—a 64-bit number that unambiguously identifies a point in the historical event stream. If you're running the beta, you can use the Dump command to dump the session packages in the repository data file. Otherwise, these data structures are opaque.

Author: James Bucanek
1 decade ago
If you're not following the other thread, we've found a bug in 1.2.0(56) where QRecall will fail to scan any folders once the QRAuditFileSystemHistoryDays time has expired.

This is fixed in 1.2.0(59), which had already been released. Please update and try your captures again.

Author: Adrian Chapman
1 decade ago
James

I have a feeling that QRecall isn't performing an exhaustive scan as often as it should. What am I looking for in the logs which will confirm it is doing this approximately once a week with the default setting of 6.9 days? Also is it possible to force an exhaustive scan?

Author: James Bucanek
1 decade ago
 
Adrian Chapman wrote:I have a feeling that QRecall isn't performing an exhaustive scan as often as it should.

I hate that feeling.

What am I looking for in the logs which will confirm it is doing this approximately once a week with the default setting of 6.9 days?

The message "Filesystem change history expired" will appear in the log when the archive have been relying on change history information for longer than the trust limit setting. This will replace the normal "Locating changes since..." message.

The log entry is classified as "minutia," so you'll have to set your log window detail to its highest setting to find it. If you want to find them all quickly, run the detail slider to max and enter "history" in the search field.

Also is it possible to force an exhaustive scan?

If the last layer(s) containing the same items being captured now was incomplete (or damaged), QRecall forces the current capture to ignore the change history and perform an exhaustive scan. In this situation the message "Filesystem change history ignored" appears in the log.

To force a scan, start a capture, wait for a few items to be added, and then stop it. Start the same capture again and it perform an exhaustive scan of the directory. You can then merge the two layers, if you like neatness as much as I do.

(Don't cancel the capture before it's had a chance to add at least one new item to the archive. If the capture doesn't capture anything, no new layer is created, which means there won't be an incomplete layer to trigger to deep scan.)

Author: Adrian Chapman
1 decade ago
Thanks.

QRecall is definitely not performing regular exhaustive scans, at best it is once per month and in May it didn't do it at all.

I have checked the defaults with the "defaults read" command in terminal and I get:

QRAuditFileSystemHistoryDays = "6.9";

I should add that I have had cause to perform repairs a few times so would this reset the count down for FileSystemHistoryDays?

Adrian

Author: James Bucanek
1 decade ago
 
Adrian Chapman wrote:QRecall is definitely not performing regular exhaustive scans, at best it is once per month and in May it didn't do it at all.

That's sounds unusual. It would be helpful is you sent a diagnostic report to review.

I should add that I have had cause to perform repairs a few times so would this reset the count down for FileSystemHistoryDays?

That could be a factor. Some repair conditions cause the history information in a layer to be discarded (because it can't be trusted now). This will cause a deep scan of the captured items, but will require a little sleuthing to find. It results in a capture action that doesn't log anything about the change history. So you're looking for a capture action that doesn't say either "Filesystem change history..." or "Locating changes since...".

Author: Adrian Chapman
1 decade ago
Diagnostic report sent.

Author: James Bucanek
1 decade ago
Adrian,

I've analyzed your log files and it would appear that QRecall is performing exhaustive scans of your directory structure on a pretty regular basis (see attached file).

You have a fairly complicated capture schedule, and that's causing a lot of captures to simply ignore the file system events history and scan the item directories in their entirety.

For QRecall to use filesystem event history, it must first capture an entire directory structure. This creates a history "bookmark" in the layer. The next capture examines that bookmark and uses it to query the filesystem event's manager for changes that have occurred since, and then it only has to review the folders that are known to have changed.

This is greatly complicated, however, when captures overlap (capture a volume, then later capture one or more subfolders), or if items are being ignored (using the new "ignore changes") features. This requires QRecall to determine the largest region of overlap, and it must also replay the history from the oldest captured items in the folders that are being ignored.

All of these conditions require QRecall to request change information for a much larger period of time from the filesystem history manager. If the manager doesn't have that much history, there's no history "bookmark" in the layers that intersect this capture, changes in specific folders have been ignored too long, the geometry of the overlapping items is too complex, any intersecting layers were damaged/repaired, or the volume doesn't support FSEvents, then QRecall simply ignores the filesystem change history and performs a deep scan. It doesn't log any messages about gathering, or ignoring, changes because it's not using change history at all.

I prepared a summary of your captures since April and attached it to this post. The captures marked with "- no FSEvents" are captures that are not using filesystem changes at all (for any of the above reasons), which means they preformed an exhaustive scan of the directory structure. The "- deep scan" captures were using change history, but it expired (QRAuditFileSystemHistoryDays) and a deep scan was performed anyway. All of the unmarked captures used filesystem change history to perform a quick scan.

The report was created by first discarding all captures that reported a warning or error (for simplicity), then finding all captures that reported to be using filesystem change history, those that explicitly ignore it (trust expired), and then those that don't log any messages regarding filesystem change history at all. The later are marked "- no FSEvents" and would have performed a deep scan.

Filename Capture Scan Summary.txt
Description Summary of capture scanning
Filesize 63 Kbytes
Downloaded 924 time(s)
[Disk] Download


Author: Adrian Chapman
1 decade ago
James

Thanks for yet another in depth explanation of what is going on. I suppose my first question is do I have anything to worry about in terms of my schedules and exclusions? The only thing I ignore changes in is my Virtual machines folder which I archive once per day as the last activity. As I don't use my VMs much would it be better to not bother ignoring them?

As you can probably see from the logs, my System is on it's own drive, in fact now an SSD, and the users are on a Softraid RAID 1 array. The system volume and Users directory are saved to separate archives, which I thought to be a good way of doing things, i.e. keeping archives specific to volumes.


Adrian

Author: James Bucanek
1 decade ago
 
Adrian Chapman wrote:Thanks for yet another in depth explanation of what is going on.

You're welcome. I should have summarized, though. The quick and dirty tip to remember is that if the capture action logs "Locating changes since ..." then it's going to use filesystem change events to speed up its search for items to capture. If it logs anything else, or nothing at all, it doesn't.

I suppose my first question is do I have anything to worry about in terms of my schedules and exclusions? The only thing I ignore changes in is my Virtual machines folder which I archive once per day as the last activity. As I don't use my VMs much would it be better to not bother ignoring them?

In principle, your capture actions are fine: you capture your user files during the day (ignoring the the VM folder) and capture your VM folder once a day.

Except, you're not. You're actually capturing a subfolder in the VM folder once a day. So the top-level VM folder is never captured (while not ignoring changes), which what's causing your ignored history information to accumulate indefinitely. At the very least, I would suggets modifying your once-a-day capture to capture the entire Virtual Machines folder/package.

When using the Ignore Changes In feature of the capture filter, I find this technique the easiest way to manage it:

- Create the action you want to capture your top-level folders/volumes, and set the Ignore Changes In to those sub-items you're not interested in tracking during the day. Set whatever items you want to exclude, etc.

- Schedule that action to run at regular intervals (once per hour, or whatever). Save it.

- Duplicate the previous item, remove all of the Ignore Changes In items, and change the schedule to run once a day (or so). Save it.

Now you're sure that the second action will capture (and exclude) exactly the same items as the more frequently run one, but will NOT ignore the changes any any subitems. This gives the archive a chance to get up-to-date with all of the items once a day.

As you can probably see from the logs, my System is on it's own drive, in fact now an SSD, and the users are on a Softraid RAID 1 array. The system volume and Users directory are saved to separate archives, which I thought to be a good way of doing things, i.e. keeping archives specific to volumes.

I think that's just dandy. Whether you keep one archive per volume, or capture multiple volumes to a single archive, is entirely up to you and how you want to manage your actions and backup space. QRecall is designed to be flexible and tries not to dictate any source/archive relationships.

Author: Adrian Chapman
1 decade ago
James

Thanks again, I now see the flaw in my methodology. Why are these things so damned obvious in hindsight

Author: James Bucanek
1 decade ago
 
Adrian Chapman wrote:Why are these things so damned obvious in hindsight

Because, I fear, it's just complicated.

I've added a to-do list item to log a message when an item has been ignored for too long. Maybe that will help point people in the right direction.




Register / Login  |  Desktop view  |  Jump to top of page