QRecall

Wayne Fruhwald wrote:At times I need to search for a file that is no longer on my hard drive (I may have deleted it days, weeks or months ago). Is there a way to search for that file across all layers?

Yes, but it requires waiting for the next version of QRecall. I have an "x-ray" view on the drawing board that let deleted items from earlier layers show through later layers.

In the mean time, you can set the search options to "Only show matching files", enter you search string, then start moving the layer shade backwards (Command-up arrow). The lost file should appear pretty quickly.

Wayne Fruhwald wrote:I would guess that you currently have a layer index that has an entry for each file in that layer.

Programmers always find it amusing with other people guess how their code works.

Assuming that to be true, it would seem that "Delete Item" for just one layer would only entail removing the appropriate file entry from that layer rather than removing it from all layers.

I must be missing something here. Would you explain why removing a file entry from one layer would be "difficult to implement."?

OK, if you want to get technical, here's the problem. Each layer is essentially a directory structure that represents a complete picture of the captured items at the time they were captured. As QRecall examines the items being captured, it tries to find duplicate data and directory information already in the archive and creates links to it (not really links, but references since the archive is really a database not a directory structure). If a block of data is the same, it links to it. If a file hasn't changed, the layer simply refers to the file record in an older layer. If a directory hasn't change, it stores a reference to that entire directory. So each layer inherits everything that hasn't changed from previous layers.

Deleting an arbitrary file record from a layer would corrupt the data structure if any later layers referred to that record. Likewise, deleting a file from an arbitrary layer changes the "snapshot" of it's enclosing folder. This means that subsequent references to that enclosing folder in later layers are now invalid and might have to be rectified. This in turn would alter any reference to that folder, and its enclosing folder, and so on, and so on.

This problem is completely avoided by deleting the same item from every layer. This "vertical" deletion guarantees that there can't be any dangling references between layers, because all references to the item are deleted from every layer. The only housekeeping that must be done is to remove any direct references between the item and its enclosing folder in each layer, to and account for the size change in those folders. This is pretty straight forward and can be done very quickly.

So now you know.

Wayne Fruhwald wrote:There are times I would like to selectively delete an item from just one or more layers in the archive.

That's a tough one. I'll put it on the wish list, but that would be somewhat difficult to implement.

For example, I may have a large VM and to save space, I would like to delete older versions of the VM while preserving newer ones.

My knee jerk response is to say "then merge the layer with a newer one." That's what merging does. But trying to merge some items in a layer but not others creates anomalies in the archive's structure.

Wayne Fruhwald wrote:Hi James.

Hello, Wayne.

I would like to have the ability to have columns for the information contained in the "Captured n items, n KB (n% duplicate)" section of the log file.

For example, I would like to add columns for: Captured, Written, Duplicate, Files, Folders and Rate.

Unfortunately, those statistics are per-capture, not per-layer. When you merge two layers, there's no reasonable way to recalculate them.

What is sounds like you're interested in is a kind of "capture log" for an archive. I'll put that on the wish list.

It would also be nice to have a comments field to describe the reason for the capture (e.g. "Before applying Mac OS X 10.5.8 update"). This would also be a column field.

A generalized annotation feature is planned for an upcoming version.

Thanks for the suggestions and the feedback.

James

Charles,

I did find some inconsistencies in your log records. For example, at one point you changed from running the scheduler only when you're logged in to always. In the log, I can see that the "agent" scheduler is shutdown and uninstalled. Then the "daemon" scheduler is installed, but it never starts. I think this is why QRecall spontaniously asked if you wanted to install it. That makes sense, but why it didn't start is a mystery.

Anyway, I did find a recent occurrence where your scheduler was configured to run when you're logged out and did, indeed, continue to run through several log-out/log-in cycles. So that's good.

But some things still seem confused. I'm going to suggest that you (almost) completely uninstall QRecall and set it up again. This will require two restarts, so do it when it's convenient:

1. In QRecall > Preferences > Authorization, uncheck the option "Start and run actions while logged out."
2. Hold down the Option and Shift keys and choose the QRecall > Quit and Uninstall command.
3. Trash any files beginning with com.qrecall from the /Library/LaunchDaemons, /Library/LaunchAgents, and ~/Library/LaunchAgents folders. (Do not attempt to empty the trash yet.)
4. Trash the /Library/Application Support/QRecall and the ~/Library/Application Support/QRecall folders (if they exist).
5. Restart your computer. (Now you can empty your trash.)
6. Launch QRecall.
7. In Preferences, pre-authorize QRecall to use Administrative privileges.
8. Check the option "Start and run actions while logged out."
9. Restart your computer.

After all of that, log out and back in at least twice (there's no rush) and then send me another diagnostics report.

Charles Watts-Jones wrote:I tried this (deleting the preference to run when not logged) and all went well. Activity Monitor confirmed that the Scheduler was running and QRecall ran a back-up as soon as the external drive mounted. QRecall then surprised me by asking if I wanted it to install the Scheduler (which was running at the time).

This seems completely wrong. If the scheduler is configured to run only while you're logged in, then QRecall should never ask to install it. It only does this when the scheduler is set to run when you're logged out and you haven't pre-authorized QRecall to use administrative privileges. In this one circumstance, QRecall needs your administrative privileges in order to install the scheduler—the problem being is that is shouldn't be asking in the first place.

Something strange is going on. It almost sounds like QRecall's preferences don't agree with the installation. What I need to do is sit down with the log records you've sent me and see if I can determine where the processes is going wrong. I may also send you some simple shell commands to execute, to verify that we don't have a permissions or file ownership problem, and that the scheduler is getting installed correctly.

Charles,

I'm not sure what's wrong, but your system is not behaving correctly and I don't know why.

When you install QRecall to run actions while you're logged out, it installs the QRecallScheduler as a system-wide user agent in /Library/LaunchAgents/com.qrecall.scheduler.plist (OS X 10.5 only). The scheduler is configured to run in the "background" so it should get launched whenever you log in or start up, and should then run forever.

But that's not what's happening. Something is killing your scheduler when you log out:

2009-08-08 15:15:07.096 +0200 #debug# user logout

2009-08-08 15:15:11.517 +0200 #debug# received SIGTERM <-- something killed the scheduler process

2009-08-08 15:15:12.045 +0200 #debug# stopping Scheduler-1.1.2-Jul 21 2009

This shouldn't be happening. I tested this on three other 10.5.8 systems here and none of them stop the schedule when the user logs out.

Because the schedule is configured to run as a Background daemon, the system doesn't automatically restart it again when you log back in (as it does the QRecallMonitor).

At this point I'm a little baffled. The OS should not be killing user agents that are configured to run in the background (at least not until you shutdown).

My only guess is that something else is sending kill signals to your user processes when you log out. Once the scheduler is stopped, it's never started again—until you launch the QRecall application, it notices that the scheduler is dead, and starts it manually.

My only suggestion is that you track down what might be killing the scheduler or change the option to run the scheduler only while you're logged in. This would install the scheduler as a user agent, just like the QRecallMonitor, which seems to be working.

Charles,

Thanks for sending the diagnostic report. The problem is that your scheduler isn't, in fact, running. Or, more precisely, it's not running when it's supposed to, which is all the time.

Your log shows that your scheduler is getting stopped when you log out, but not restarted when you log in. When you launch QRecall, it sees that the scheduler isn't running and starts it. That's what's causing your actions to start.

Looking deeper into the log, it appears that QRecall is trying to install the schedule as a daemon (so that it runs while you're logged out), but your configuration assumes that runs only while you're logged in. So I suspect QRecall's configuration and/or installation are confused:

Try this:

Open your QRecall preferences, go to the Authorization tab, change the option that says "Start and run actions while logged out", wait a few seconds, and change it again. This will cause QRecall to uninstall and reinstall the scheduler (twice). I suspect that you want this option off as Mac OS X doesn't mount volumes while you're logged out, so trying to run actions while logged out doesn't do any good.

Now, log out and back in (or restart) and send me another diagnostic report. I'll be able to tell if the scheduler gets restarted when you log back in.

Thanks,

James

Charles,

It might be a event bug, the scheduler wasn't starting correctly, or the action just isn't showing up in the activity window.

The easiest way to begin diagnosing this would be to send a diagnostic report (Help > Send Report...). When you send the report, please note the approximate date/time that the the volume was mounted and the action didn't run.

Christian Roth wrote:I thought the compact action was atomic and once force-cancelled would either leave the archive in inconsistent state (requiring a re-index or repair) or have not achieved anything.

Stopping a compact won't reduce its overall size, but in the interim space in the archive is being consolidated by packing data blocks together. Think of sliding beads on a string; you can't cut the string and make it shorter until you move all the beads to one end, but you can move some beads, stop, and come back later to move more beads, until you're done.

Christian Roth wrote:I once did a compact action on that large archive and it took (I think) about 30 hours.

See my note about stopping and restarting compact actions. Compacting should never get in the way of being employed.

Christian Roth wrote:When I combine this with the possible reduction in size of the archive, I am now willing to give that strategy a try.

If you're going to go for it, I'll make a couple of points:

Turning on compression only compresses new data until you perform a compact.

An automatic compact won't recompress anything until the archive has at least 4% of empty space, but you can force that by starting a compact manually from the QRecall application.

Recompressing a terabyte of data over a slow network connection might take weeks. Fortunately, compacts are incremental and you stop them and restart them later. I'd suggest setting up a compact action the runs every night and set it to stop after running for 4 or 5 hours. Also set the QRCaptureFreeSpaceSweep option, at least temporarily. Eventually the compaction will work its way through your entire archive and it will be completely compressed.

In setting up the above, you might also consider leaving the capture level compression off and setting the compact compression high. Compression will only be performed during the compact action, improving the performance of new-data captures.

Turning off compression later will never uncompress any of the compressed data in the archive. So you can't undo this if it doesn't work out.

OK, one more suggestion.

This is a bit radical, but you might consider it. I do something similar with my system here (for completely different reasons, but that's another story).

Set up two archives on your NAS: A compressed archive for long term backups, and an uncompressed one for short term backups.

Capture to the short term archive every day, or even multiple times a day. Set up a rolling merge to keep only the last 14 days worth of layers.

Capture to the long term archive once a week.

If you're keeping archive data for a long period of time, the sheer bulk of your existing archive is probably the biggest drain on your performance. By keeping a short term archive that never grows beyond 100GB and a dozen layers (or so), you'll see significant improvements in incremental captures.

Just an idea...

Christian,

Having said that about the performance of compression, you might consider changing your QRCaptureFreeSpaceSweep setting. This will cause the archive to grow more quickly when capturing, but does improve the performance of the capture. You'll want to schedule an occasional compact action to recover the free space now being ignored by the captures.

Christian,

I can imagine that there might be a situation where the access to the archive volume was so slow (maybe 802.11b?) that compression would improve performance, but the difference between processing power and I/O speed would have be to extreme.

In practice, adding compression invariably reduces performance. I did a few tests this morning just to confirm (since I haven't benchmarked compression in awhile). I set up a capture from a brand-new eight-core Xeon Mac Pro networked to a MacMini (PPC) with an external Firewire drive. The transfer rate between the Mac Pro and the Mini tops out at about 10 MB/sec. Not terribly slow, but certainly not fast.

I captured a reasonable number (9,000) of fairly compressible files (mostly word processing documents and uncompressed TIFF images used for publication). The difference between using an uncompressed and a highly compressed archive were:

Uncompressed archive: 1.7GB
Compressed archive: 687MB (about a 60% reduction in size)

Initial capture: 04:35 (uncompressed), 06:52 (compressed) - 50% slower
Recapture (100% duplicate): 03:32 (uncompressed), 02:10 (compressed) - 80% faster

So in the case where all the data was being recaptured, a really fast computer was able to take advantage of the slightly smaller data to outperform the uncompressed archive. But in almost all other situations, the amount of work required to compress and uncompress the data far outweighs the slight advantage of a smaller data transfer. If you average these figures (assume that most captures are a combination of new and duplicate data), the performance is about the same. And if you work with data the resists compression (video, audio, most images), then there will be no advantage at all.

In conclusion, you're not likely to see any increase in performance by increasing your compression level, and you could see significant decreases. The bottom line is that compression saves space, not time.