QRecall

Gavin Macfarlane wrote:I discovered that items and applications that had previously been deleted had also been restored

Gavin,

It sounds like you had the Show Deleted Items option enabled in the View menu. With this option set, any recall action will recall not just the most recent items, by all previously deleted items from earlier layers.

When you merged all of your layers, those previously deleted items were permanently removed from the archive, so the Show Deleted Items option now had no effect (since there were no deleted items to show/recall anymore).

There's a stern warning in the help about recalling packages and volumes with this item turned on, but I'm considering adding a warning dialog to the recall command. Showing deleted items makes it really easy to find and recall lost documents, but it's not an option you want turned on when recalling packages, applications, or system files.

I also discovered that the restore process was much faster if I erased the target drive

When you recall/restore over existing items, QRecall takes the time to compare what is being recalled with what's already on the volume. QRecall then only modifies what's changed. By erasing the volume first, you saved QRecall that work.

Adam Horne wrote:Any idea when the beta will be ready?

No firm date yet. I was hoping by the end of September, but that ship has already sailed. It turns out that there's a lot of low level file access code to replace.

I agree that the menubar status item icon is a little overloaded. There's only so much information you can impart in a 20x20 icon.

If you prefer to see the activity indicator at all times, you can disable the warning/problem indicator in the menubar icon by unchecking the Show status warnings in icon option in the QRecall monitor preferences. Then you'll only see the warning/problem summary icon when you drop down the status menu (it appears next to the Status Window item).

Johannes,

All good suggestions.

Johannes wrote:I would suggest an additional entry in the Context Menu of the Status Window: "That's okay". This would set the state of that archive as if a verify/capture has been performed. Alternatively a submenu with a few entries like 1 day, 1 week would do a similar job.

On the to-do list is a new menu item to ignore a capture/verify status forever or until the next capture/verify. I think that would solve most of the issues people are encountering with the current status indicators.

Another thought: The main issue with the red/yellow indicator in the Menu item is, that I now longer see the other indicators like running and paused.

That information is (still) in the QRecall activity window. The activity and status windows serve different purposes. The activity window shows you what is happening right now, while the status window is an (mostly static) overview of the health of your archives. If you're not seeing the activity window, check your monitor preferences in the QRecall application.

As we are on the Status Window Context Menu, I would find a few more items handy:
- Open Archive in QRecall
- Capture now
- Verify now

Menu commands to capture/verify now were already on the to-do list. They were actually intended for version 1.2, but an architectural conundrum in the code base prevented an easy implementation. In version 1.3, you'll be able to directly run any of your actions from the QRecall status item menu.

Johannes (looking forward to 1.3 beta and the scripting options

So am I.

Adam,

The error you're getting is, I believe, a bug in Mountain Lion's implementation of FSCopyObj, the core library function that copies files. It's not "technically" a bug, because Apple has deprecated the core library file services API in Mountain Lion, which means that they are no longer maintaining or supporting those functions.

The problem is that after FSCopyObj has duplicated a file, it supposed to return a reference to the new (duplicate) file. On your new volume, that reference is invalid for some reason and QRecall can't open the file it just duplicated. (This is not a new bug; both Leopard and AppleShare file servers, which include airport base stations, have a similar bug.)

Many moons ago I cobbled together a workaround that duplicates the file using other means. If you're interested in trying that code, let me know and I'll build a special version of QRecall 1.2.1 that doesn't use FSCopyObj.

I'm in the process of rewriting all of the low-level file functions to use the (ancient, but now official) BSD API for all filesystem services. If all goes well, a beta that uses the new filesystem APIs will be available for testing soon.

Ralph,

You can probably just ignore it. The repair marks a layer as "incomplete" when it can't be absolutely sure that it contains all of the items it originally had.

If the damage occurred during a capture that was interrupted, the result is a bunch of duplicate records in the archive. The repair ignores these duplicate records, but finds them suspicious enough to mark the layers containing those records as "incomplete."

There shouldn't be any data loss. There's a discussion of this very issue in the help. See Help > Advanced > Compact, and look for the sidebar "If a compact is interrupted..."

I agree that QRecall needs to be a little more sophisticated about how it calculates the expected capture and verify intervals. Right now it's a weighed running average, but that obvious breaks down when archives are not captured at uniform intervals, or are not being captured at all.

I don't have a solution for an individual archive, but you can modify QRecall's archive status window behavior globally.

The warnings that a capture or verify have not occurred recently is based on a simple formula, the constants of which you can tweak using advanced settings. I'll describe how the capture formula works and how the adjust it. The verify logic is the same, it just uses a different set of constants.

(1) Every time an archive is (successfully) captured, the interval since the last capture is calculated. This updates the AverageInterval property for that archive.

(2) The AverageInterval is then compared with the global QRStatusMinimunCaptureInterval and the larger of the two values is chosen as the NominalInterval. The QRStatusMinimunCaptureInterval prevents rapid (i.e. hourly) captures from creating a ridiculously small nominal interval.

(3) The NominalInterval is multiplied by QRStatusCaptureWarningIntervals. If the time difference between now and the last time the archive was captured is greater than that product, the status window shows a "warning" (yellow) indicator for that archive.

(4) The NominalInterval is multiplied by QRStatusCaptureTruantIntervals. If the time difference between now and the last time the archive was captured is greater than that product, the status window shows a "problem" (red) indicator for that archive.

QRStatusMinimunCaptureInterval is expressed in seconds and defaults to 21600 (6 hours)

QRStatusCaptureWarningIntervals defaults to 3 (shows a warning when 3x the nominal interval has elapsed without a capture)

QRStatusCaptureTruantIntervals defaults to 7 (there's a problem when 7x the nominal interval has elapsed without a capture)

For an archive that you capture once a day and rotate once a week, you could try setting the warning multiplier to 10 and the problem multiplier to 15, like this:

defaults write com.qrecall.monitor QRStatusCaptureWarningIntervals -float 10.0
defaults write com.qrecall.monitor QRStatusCaptureTruantIntervals -float 15.0

If you capture more often than that, consider setting the QRStatusMinimunCaptureInterval too.

I don't have a solution for an archive that is never captured, except to set the intervals to excessive values (like 1000) so they simply don't occur.

The verify logic works the same, except that is uses these settings:

QRStatusMinimunVerifyInterval is expressed in seconds and defaults to 86400 (1 day)

QRStatusVerifyWarningIntervals defaults to 8 (shows a warning when 8x the nominal interval has elapsed without a verify)

QRStatusVerifyTruantIntervals defaults to 16 (there's a problem when 16x the nominal interval has elapsed without a verify)

Adam Horne wrote:Now I'm getting an error from the Qrecall log:

I'd be interested to know where in the log this message occurs. Rather than playing "20 questions" I'll ask that you send a diagnostic report (Help > Send Report...) and we'll work from there.

Adam Horne wrote:Is this possible on a RAID volume?

It should be. HFS+ encryption is performed at the filesystem plug-in level (that controls how files are organized on a device), while RAID works at the device driver level (that does the work of reading and writing data blocks to hardware).

In Disk Utility, select a volume on your RAID and go to the Erase tab. If one of the choices is Mac OS Extended (Journaled, Encrypted), then you can create an encrypted volume on your RAID.

Andrew Reid wrote:I will test out the software I will try the archive on a NFS volume with and without the sparsebundle and report back.

I look forward to hearing what you discover.

The sparse bundle has appeal as it can be encrypted, and it allows for network efficient offsite backup of the archive . I did read the note about caution in using encrypted sparsebundles as archive volumes, can you expand a bit on that ?

Sparse bundles add another layer of complexity to the filesystem that I prefer to avoid. In theory, it shouldn't make any difference. In practice, layering filesystem technologies like this can cause strange behavour that's difficult to track down.

In my latest post about sparse bundles, a user was getting strange archive verify errors. I was encouraging them to abandon the sparse bundle and use the volume directly?reducing the complexity. As it turned out, the volume/drive itself might have been the cause of the problem. They switched to a different drive and was able to write and verify the archive. I haven't heard back from them, but it probably wasn't the sparse bundle that was the problem.

Andrew Reid wrote:I would like like to use QRecall to backups to an archive stored on a NAS unit.

In general, there shouldn't be any problem. There are numerous QRecall users who backup to various NAS devices.

For various (compelling) reasons I would prefer to have the archive housed in a sparsebundle based volume which is mounted "locally" over NFS to to computer being backed up.

Just be aware that you'll be giving up some performance. NAS adds volume access overhead, and a sparse bundle adds more on top of that.

Also be aware the disk full situations could be a problem. Some NAS and sparsebundle implementations "lie" about how much free space is available on the volume. QRecall uses the volume status information to determine when an archive has filled up the volume so it can stop the capture and gracefully close the archive. If the volume lies, the archive could overfill the volume, damaging the archive and making it difficult to repair.

So to QRecall the archive is just a local volume. I can easily script the mounting of the NFS/sparsebundle combo in cron or with a pre-run script hook if QRecall support those.

Depending on your NAS implementation, QRecall may do this for you already. If the volume mounts as a networked volume, or appears (at the disk driver level) to be a physically connected external drive, QRecall will attempt to mount the volume/drive before an action begins. If QRecall mounted the volume, it will then attempt to automatically unmount it when all of the actions have run. Assuming the drive is compatible with the OS X's network volume or disk arbitration framework, this work has already been done for you.

a) Have I explained this well enough, does it make sense ?

It all seemed perfectly reasonable to me.

b) Does the archive volume even have to support any Apple-centric (HFS) features. What features does the volume need to support.

QRecall makes as few assumptions about the filesystem that the archive is located on as it can. So it should work on any filesystem that OS X supports.

It is, however, still being accessed via the Core Services framework in OS X. Apple has deprecated this framework and QRecall is currently being rewritten to use only the low-level BSD filesystem APIs. This should provide even better compatibility with foreign filesystems in the future, but for now QRecall has problems with some filesystems that aren't compatible with the Core Services APIs. For example, QRecall can't currently store an archive on a ZFS volume from TenOne at the movement because of quirks in the Core Services framework.

Adrian Chapman wrote:Why are these things so damned obvious in hindsight

Because, I fear, it's just complicated.

I've added a to-do list item to log a message when an item has been ignored for too long. Maybe that will help point people in the right direction.

Adrian Chapman wrote:Thanks for yet another in depth explanation of what is going on.

You're welcome. I should have summarized, though. The quick and dirty tip to remember is that if the capture action logs "Locating changes since ..." then it's going to use filesystem change events to speed up its search for items to capture. If it logs anything else, or nothing at all, it doesn't.

I suppose my first question is do I have anything to worry about in terms of my schedules and exclusions? The only thing I ignore changes in is my Virtual machines folder which I archive once per day as the last activity. As I don't use my VMs much would it be better to not bother ignoring them?

In principle, your capture actions are fine: you capture your user files during the day (ignoring the the VM folder) and capture your VM folder once a day.

Except, you're not. You're actually capturing a subfolder in the VM folder once a day. So the top-level VM folder is never captured (while not ignoring changes), which what's causing your ignored history information to accumulate indefinitely. At the very least, I would suggets modifying your once-a-day capture to capture the entire Virtual Machines folder/package.

When using the Ignore Changes In feature of the capture filter, I find this technique the easiest way to manage it:

- Create the action you want to capture your top-level folders/volumes, and set the Ignore Changes In to those sub-items you're not interested in tracking during the day. Set whatever items you want to exclude, etc.

- Schedule that action to run at regular intervals (once per hour, or whatever). Save it.

- Duplicate the previous item, remove all of the Ignore Changes In items, and change the schedule to run once a day (or so). Save it.

Now you're sure that the second action will capture (and exclude) exactly the same items as the more frequently run one, but will NOT ignore the changes any any subitems. This gives the archive a chance to get up-to-date with all of the items once a day.

As you can probably see from the logs, my System is on it's own drive, in fact now an SSD, and the users are on a Softraid RAID 1 array. The system volume and Users directory are saved to separate archives, which I thought to be a good way of doing things, i.e. keeping archives specific to volumes.

I think that's just dandy.

Whether you keep one archive per volume, or capture multiple volumes to a single archive, is entirely up to you and how you want to manage your actions and backup space. QRecall is designed to be flexible and tries not to dictate any source/archive relationships.

Adrian,

I've analyzed your log files and it would appear that QRecall is performing exhaustive scans of your directory structure on a pretty regular basis (see attached file).

You have a fairly complicated capture schedule, and that's causing a lot of captures to simply ignore the file system events history and scan the item directories in their entirety.

For QRecall to use filesystem event history, it must first capture an entire directory structure. This creates a history "bookmark" in the layer. The next capture examines that bookmark and uses it to query the filesystem event's manager for changes that have occurred since, and then it only has to review the folders that are known to have changed.

This is greatly complicated, however, when captures overlap (capture a volume, then later capture one or more subfolders), or if items are being ignored (using the new "ignore changes") features. This requires QRecall to determine the largest region of overlap, and it must also replay the history from the oldest captured items in the folders that are being ignored.

All of these conditions require QRecall to request change information for a much larger period of time from the filesystem history manager. If the manager doesn't have that much history, there's no history "bookmark" in the layers that intersect this capture, changes in specific folders have been ignored too long, the geometry of the overlapping items is too complex, any intersecting layers were damaged/repaired, or the volume doesn't support FSEvents, then QRecall simply ignores the filesystem change history and performs a deep scan. It doesn't log any messages about gathering, or ignoring, changes because it's not using change history at all.

I prepared a summary of your captures since April and attached it to this post. The captures marked with "- no FSEvents" are captures that are not using filesystem changes at all (for any of the above reasons), which means they preformed an exhaustive scan of the directory structure. The "- deep scan" captures were using change history, but it expired (QRAuditFileSystemHistoryDays) and a deep scan was performed anyway. All of the unmarked captures used filesystem change history to perform a quick scan.

The report was created by first discarding all captures that reported a warning or error (for simplicity), then finding all captures that reported to be using filesystem change history, those that explicitly ignore it (trust expired), and then those that don't log any messages regarding filesystem change history at all. The later are marked "- no FSEvents" and would have performed a deep scan.

Adrian Chapman wrote:QRecall is definitely not performing regular exhaustive scans, at best it is once per month and in May it didn't do it at all.

That's sounds unusual. It would be helpful is you sent a diagnostic report to review.

I should add that I have had cause to perform repairs a few times so would this reset the count down for FileSystemHistoryDays?

That could be a factor. Some repair conditions cause the history information in a layer to be discarded (because it can't be trusted now). This will cause a deep scan of the captured items, but will require a little sleuthing to find. It results in a capture action that doesn't log anything about the change history. So you're looking for a capture action that doesn't say either "Filesystem change history..." or "Locating changes since...".