QRecall

Gary K. Griffey wrote:The error did occur again, however, this time, the verify task did not remain in "running" status.

Thanks, Gary. This definitely gets a little closer to the problem. I suspect, however, that the code to debug the problem is interfering with reproducing it.

Here's a new version to try. Again, drop this in and send a diagnostic report after the problem happens again.

QRecall 1.2.0a76

Thank you for your patience.

Odd indeed.

Try this: Open a terminal window and use the ls -lA@ command to list the items in that directory and their extended attributes.

ls -la@ <path to folder>

If the .dmg file lists a com.apple.FinderInfo attribute, try to delete it:

xattr -d com.apple.FinderInfo <path to file>

Now try the capture again and see what happens.

Gary,

Thanks for all of the useful info.

It appears that the QRecall action process (in this case the verify) and the scheduler are struggling with a race condition.

When an action is finished, it sends a "stopped" message to the scheduler. When the scheduler receives this message it knows the action is finished and can schedule the next one to start.

What's happening on your mini is this: the action finishes, sends a stop message, and terminates. But the scheduler doesn't get that message right away. In fact, it doesn't appear to get it for a couple of minutes, long after the process has terminated. What happens after that is a confused mess of messages and communication errors, some of which get processed and some don't, leaving the scheduler thinking the action is still running—which is clearly is not.

In an attempt to untangle what's going on, I've build a special alpha version of QRecall that logs a lot more information about the state of actions, the "stopped" message handling, and communication link errors. Please install it, wait for this to happen again, and send me another diagnostic report.

QRecall 1.2.0a73

For future reference, there's also another way around this problem. An action in the activity window that is being held by the scheduler for other actions to finish can be started anyway by right-clicking (or click and hold) on the action's stop/menu button and choosing the Ignore Hold and Run command. This causes the action to ignore the scheduler's suggestion to wait and starts execution immediately.

Gary K. Griffey wrote:First, the scheduled Action that is active when the "Lost Connection with Helper" occurs does indeed complete succesfully. However, the Actions window continues to show it as "Running"...and subsequent scheduled Actions that use the same archive just say "Waiting" in the Monitor window...and they never run....

Good to know. That tells me that, not only did the monitor process lose its connection with the running action, but that something has happened to the scheduler too.

My fix is to reboot the mac.

That will certainly do it.

I do believe that I tried to just kill the Scheduler process once...and it did seem to change the Actions window from the "Running" status...but I was not sure if that would cause other issues...so...I am simply rebooting.

As a general rule, all QRecall processes will respond gracefully to a Quit request (not a Force Quit, just a regular Quit). When quit, the schedule and monitor processes should automatically restart.

One other odd symptom is that you only see the "Lost Connection" verbiage in the log..whereas normally, I would see it in the Monitor window. I'm not sure if that helps you or not.

It might. The "lost connection" appears in the activity window when the monitor process loses its connection. Other processes (like the scheduler) don't have a UI, so they just log the problem.

I will run the sample.sh script the next time this occurs...and send a diagnostic report.

At this point that's what I'm most interested in seeing—the state of the scheduler after this happens.

James Bucanek wrote:Broken communications pipes is a known problem in Lion, but I've rarely seen it in earlier versions of OS X.

Hah, wouldn't you know that not a hour after I posted this my OS X Server (running 10.6.

got a "Lost communications" error.

But it was just that—a loss of communications between the two processes. The log says the capture completed successfully, and the actions window shows the action is no longer running. So not quite the same situation.

Gary K. Griffey wrote:The real problem here is that when this connection to the helper is lost...the scheduled Action that was executing remains in "Running" status in the Action window...and therefore, subsequent scheduled Actions never run because they are "waiting" for the archive.

That really doesn't make much sense to me.

If the monitor process loses communications with the helper process it gets logged as a "Lost communications with Helper" error. This is often because the helper process crashed, but it can also mean that the communications pipe between the two processes is broken and the helper process is just fine. Broken communications pipes is a known problem in Lion, but I've rarely seen it in earlier versions of OS X.

The "running" status in the actions window comes from the scheduler. If the scheduler thinks the process is still running, then either the scheduler is stuck or the action really is still running (which infers that the communications pipe between the scheduler and the action is still valid). So either the action has stopped and the scheduler is confused, or the action is still running and the monitor is confused. It seems unlikely that both of those would be true, which is why I'm confused.

Thus, the machine has to be manually checked every day to see if the condition has occurred and further actions are waiting.

When this happens, what do you do?

Also, I'd very much like to get a sample of your QRecall processes and a diagnostic report. The next time this happens, please do the following:

(this assumes that you've upgraded to 1.2.0b69 or later)
Open the Terminal application
Enter the command

/Applications/QRecall.app/Contents/Resources/sample.sh

Press Return.
Enter your administrator's password.
When the sample.sh script is finished, open the QRecall application
Choose Help > Send Report...

Gary K. Griffey wrote:The volume being captured in this case is an SMB share on a Windows server machine

Try copying the file to a local Mac volume, delete the file on the server, and then copy it back.

Gary K. Griffey wrote:errno: 93

Error 93 is ENOATTR (Attribute not found). It means that QRecall got a list of all of the extended attributes for the file, but when it went to read them one of them wasn't there.

This is either a transient problem, which won't repeat and can be ignored, or something else is odd with the volume structure. See if it happens again. If it does, try repairing the volume and capture again. If that doesn't resolve it, send a diagnostic report.

It is only a "Caution" so I would imagine it could be ignored?

In general, failing to capture extended attributes isn't a serious problem because they typically don't contain information critical to the use of the file.

QRecall 1.2.0b70 is ready. Upgrade and see if these problems are resolved.

Gary K. Griffey wrote:This seems to be related to beta 69

It is.

It's a side effect of the change to how finder metadata is captured. The change results in all items that have finder metadata stored in a com.apple.FinderInfo extended attribute to get recaptured in 1.2.0b69.

But this change broke the capture logging. The code that makes the capture decision and the code that tries to figure out what changed and log it are different blocks of code. The finder metadata change in b69 causes the files to be recaptured, but then the code that logs the difference can't find any difference (because, logically, there isn't any). So it logs "unknown".

Straightening this all out has revealed a far more subtle bug that would cause some items to be unnecessary recaptured. I'm testing the bug fixes now and a new version should be out in a few hours.

Gary K. Griffey wrote:I know I asked about this before...but I just wanted to follow-up. Is there anything in the works that would allow me to run a QRecall action from an Automator or Apple script?

I currently have some kind of AppleScript/Automator support on the to-do list, but it's still on the back burner.

Much more immediate is work that's being done to create a QRecall command-line tool. This already exists, in a primitive form, and is currently being used for development, testing, and diagnostics. The goal is to polish it up and turn it into full-feature command-line tool that could be invoked via shell scripts or Automator.

Steven Arnold wrote:I'd like to remove all copies of a specific file from all layers of my backups. How do I do this?

Open the archive and select the item(s) you want to delete.
Choose Archive > Delete Item...

David Ramsey wrote:I think I've got it: created one action to handle to backups and another to handle the rolling merges.

Correct.

It's also recommended that you schedule compact and verify actions to run periodically (approximately once a week). The compact action performs housekeeping, consolidation of unused space, and optimization. A verify action confirms the data integrity of the entire archive and will alert you of any data corruption that might have occurred.

David Ramsey wrote:Apparently there is a QRecall manual. If so, where can I get it?

In the QRecall application, choose Help > QRecall Help. The QRecall help was recently rewritten in its entirety, so feedback is welcome.

So far I'm impressed with QRecall running on Lion.

We're very glad to hear that.

Based on what I've seen so far with my QRecall trial, the advantages Time Machine has are:

1. Somewhat simpler U.I. (Of course it does less, too, but the QRecall interface can still be a little daunting, especially when you open a folder with a lot of files and see all those "timeline lines". It's kind complex visually.)

I'm compelled to note that you can turn timelines on and off as desired (View > Show/Hide Timelines).

2. Its deep integration with OS X means you can do things like recover individual Mail messages. And you have the option of restoring directly from a Time Machine backup when you install OS X.

Absolutely. Apple can do things with Time Machine that third party developers can't touch. Apple has even changed how the HFS filesystem work, just to make Time Machine's job easier.

Apple's overriding goal with Time Machine, however, is simplicity. Which is great. We think everyone should be making backups, and anything that makes that easier is welcome. QRecall's goals are data integrity and efficiency, and we think there's room in the universe for both approaches.

3. Time Machine consolidates backups (i.e. hourly backups for the last 24 hours, daily backups for the last week...) and automatically deletes older backups as space is needed.

...

For #3, the backup consolidation doesn't seem necessary with QRecall since the incremental "quantized" backup means the individual backups are typically much smaller. But what happens when the backup disk fills up? Do I delete older backups manually somehow?

QRecall can/will do exactly the same thing (it's called "rolling" your incremental backups). The different between QRecall and Time Machine is that in QRecall you can specify exactly the scope, frequency, and granularity of the roll. And like Time Machine, you can also make these actions conditional on the amount of free disk space available (or not). Again, you get to decide the strategy that makes sense for you.

I'd suggest using the Capture Assistant (Help > Capture Assistant) to create a backup strategy. The assistant will create actions that "roll" your incremental backups on a regular schedule. You can then open those actions and see how they are set up and adjust them as you see fit.

To read about automating QRecall in general, see the Guide > Automation > Action section in the QRecall Help. To learn more about rolling your incremental backups, see the Guide > Automation > Actions > Rolling Merge section.

Gary K. Griffey wrote:Report sent..

Thanks for the report.

Looking at the problem, it appears the helper process is crashing when it tries to load one the application preferences (which is very strange). This might turn out to be problem with the Mountain Lion beta, but I'll see if I can find a workaround.