QRecall

Ralph,

Thanks for letting us know. We've been having a horrible time with OS X Server and our secure SSL certificates.

Try it now and see if things are better.

Jan,

I'm guessing you're still using QRecall 1.2.x. That version doesn't work so well on El Capitan (OS X 10.11).

You need to hop on the QRecall 2.0 beta bandwagon.

QRecall 2.0 will want to make some small adjustments to your archive's settings and actions, so if at all possible have all of your QRecall archives mounted and reachable when you launch QRecall 2.0 the first time. (Don't worry if this isn't practical, QRecall will keep trying each time you launch it.)

Jack,

I'm really sorry to hear you've had so much trouble. The sad fact is that QRecall 1.2 is not fully compatible with Yosemite, and exhibits serious bugs on El Capitan.

QRecall 2.0 addresses all of these issues. While technically still in beta (hammering out the last few bugs), it's pretty stable and much more compatible with 10.10 and 10.11. Here's what I'd suggest:

Download and install the latest QRecall 2.0 beta.

Open the archive with 2.0 and restore your Yosemite volume. QRecall 2.0 can transparently read an archive created with 1.2.

Because QRecall 1.2 doesn't completely capture the Yosemite system files correctly, reinstall Yosemite using your Install OS X Yosemite app, over the volume you just restored. This will replace all of your system files with a fresh copy.

That should get you back on your feet. QRecall 2.0 should see a final release candidate in a new few weeks.

No apology necessary. It's a common problem, and I'm still trying to design an interface that makes this situation more obvious.

Thanks for the feedback.

Tim,

I definitely know what your problem is. You're still running QRecall 1.2.3. QRecall 1.2 is not fully compatible with Yosemite (OS X 10.10) and exhibits a number of serious problems in El Capitan (OS X 10.11).

QRecall 2.0 is still in beta, but is stable and very close to a final release. You can jump on the beta bandwagon now if you like (http://www.qrecall.com/download/) or wait until 2.0 is officially released, at which time it will show up and an automatic update for 1.2.

Just be aware that QRecall 1.2 is incapable of correctly capturing a OS X 10.11 system or its applications, although it should do an adequate job of capturing all of your user documents. Everything you have captured with 1.2 can be restored by QRecall 2.0.

Tim,

Sorry to hear you're having trouble. First of all, did you try restoring your actions from your QRecall archive? They're just documents in the ~/Library/Preferences/QRecall/Actions folder. If they just got accidentally removed, that would at least restore them.

I'll look into the problem with the assistant. In the mean time, please send a diagnostic report (Help > Send Report) so I can check other details, like what version of QRecall you're using and so on.

My bad.

The log entries are property sorted, the problem is that the dates are wrong. What's happened is that the timestamps in the last week of December have the wrong year, 2016-12-27 instead of 2015-12-27, which puts them far into the future.

The date formatter used to create the log entries was erroneously using the "YYYY-MM-dd..." format, when it should have been using "yyyy-MM-dd..." format. The YYYY specifier is the "year designation used in ISO year-week calendar as defined by ISO 8601", whatever the hell that is. Anyway, it's not the calendar year.

I've fixed it, but you'll have to wait until your log records get rotated out before they disappear. If you find it super annoying, just trash all of the log files in ~/Library/Logs/QRecall and restart. If you want to fix this without losing your log history, you'll have to get tricky. You could use an editor to replace all instances of "2016-12-" with "2015-12-", for example.

Well, that was four days I really wish I could have back.

Regardless, I now have a solution to the keychain access problem for privileged actions (specifically, capture). It turns out there's some tricky issues with running code outside the user's login session that prevents it from accessing the user's security information—as it should.

The problem was, this is one of those situations where an outside process really did need to access your user's private information and the OS X security framework was, naturally, not inclined to make this easy.

The fix for this problem will appear in the next release.

Bruce Giles wrote:So, everything appears to be back to normal, but is there anything you can do in QRecall to keep from generating so many error messages so quickly? Or is that the OS itself that was generating those?

This is 100% OS X. QRecall installs a number of "services" that are managed via launchd. Each service is defined by a configuration file (the .plist files in ~/Library/LaunchAgents for example). This is entirely set-and-forget; QRecall writes these configuration files and then expects launchd to manage the processes.

I'm actually surprised you had this behavior. In most circumstances, launchd will throttle a misbehaving service to keep it from chewing up all of the available resources. So if a service doesn't start, or starts but immediately terminates, launchd will ignore it for awhile before trying again. But 300 times a second is pretty much the opposite of "ignoring."

I'm going to file a bug report on this one, because it's clearly a simple misconfiguration that leads to serious problems.

For future reference, if you had just launched QRecall once it would/should have fixed the problem. When you launch the QRecall app, it tries to establish a connection with the scheduler (that launchd is supposed to keep running). If it can't, it then checks to see if the scheduler is properly installed. It should have discovered that the scheduler .plist file was misconfigured and rewrote it. That might not have immediately solved the problem (because launchd tends to cache the .plist info), but the next restart should have stopped it.

Had I not noticed the problem when I did, I can imagine that I would have run out of disk space fairly soon, and I hear that OS X gets really hard to work with when it runs out of disk space.

The word you're looking for is "nightmare." HTH

Gary,

Thanks for the info. I found the problem, which was an off-by-one error. When you set the keep count to 1, it was actually getting set to 0 which freaked out the scheduler.

I've fixed it, but more importantly if the log rotation schedule is out of bounds the scheduler now simply logs it, picks a valid value, and keeps running. I have no idea what possessed me to make that a fatal error.

Ralph Strauch wrote:I don't think the problem was a bad drive, since it seemed to affect both drives equally during the period it was happening.

I hope I didn't imply that there was definitively something wrong with your drive. I meant to emphasize that the failures are all "drive related," meaning that they indicate a drive that has spontaneously gone offline, unmounted, or simply stopped responding. If the drive is being accessed via a network or file server, there are lots of moving parts between your system and the actual hard drive that can cause these symptoms.

I did seem to be having network problems, which definitely contributed.Some of those involved scheduled backups run automatically at night when both machines were otherwise asleep ...

And I think this is the most likely explanation. A dropped network, a file server that goes to sleep while it's still being used, or a computer that's running code while it's asleep (power nap issue) would all cause the kinds of errors you see in the log.

I am still concerned about the seven failures cited above, which all appear to occur as the archive attempted to close immediately after writing the backup to the archive and the backup statistics to the log,. The timing of these failures, occurring at a specific well-defined point in a long ongoing process, makes it hard to attribute them to a source external to that process, like a mechanical or network failure.

Except for the other evidence.

In between the failures you highlighted are identical failures that occur before the archive beings to close, and a slew of successful captures without any problems at all. I think the fact that there are a high number of failures occuring while closing the archive is simply because that's when the greatest amount of archive activity begins. Most (incremental) captures spend most of their time reading the local hard drive looking for changes. It's when the archive is about to be closed that things get busy, and if there's a problem read/writing to the archive, that's when it's statistically most likely to happen.

I'm curious -- what app are you pasting the log entries from, to get that formatting and the line numbers?

The forum has various tags you can use to format text, like [b]bold[/b], [i]italic[/i], and [u]underline[/u]. You can also surround a line or block of text with the [b]

[/b]code goes here[b]

[/b] tags and they'll be formatted like the log listing in my earlier post. Just select the block of text and click the "Code" button above the post entry field.

My bad.

I was trying to give you a "safer" command to use, but failed shell expansion rules 101.

Here's the command:

sudo rm -rf /.fseventsd

Please don't mistype this one; one errant space and you could wipe your entire drive.

Alexandre Takacs wrote:(that's me canceling after 7h)

You have a lot more patience than I do.

The "Locating changes" phase is playing back the filesystem event log to determine what folders contain changes. QRecall requests a playback of changes and then waits to receive them. This should take, at most, a minute, and usually only takes a few seconds.

I suspect something is wrong and OS X simply failed, or started and never finished, sending the events and QRecall was left waiting forever.

This might be a transient problem. First, try restarting your system and try again.

If it happens again, you might consider resetting your filesystem history on Macintosh HD. There's an official API for doing this programmatically, but I don't know of any utilities you can download that will do that for you.

The "unsupported" method is simply wipe the filesystem history for your volume and restart your system. Open up a Terminal window and execute the following command:

sudo rm /.fseventsd/*

The command will prompt you for your admin password. After it finishes, immediately restart your system.

Be very careful about entering this command exactly as shown. The sudo rm command can do some serious damage if incorrectly used.

One of those should fix your problem. If it doesn't, I suspect a deeper issue with OS X.

Ralph,

Here's a random sampling of a few capture issues I found in the logs you sent me:

2015-11-08 11:12:38.071 -0800 ------- Capture to 3rd backup.quanta

2015-11-08 11:38:12.467 -0800 Failure Problem closing archive

2015-11-08 11:38:12.468 -0800 Details ErrDescription: Operation timed out



2015-11-10 07:54:44.131 -0800 ------- Capture to 3rd backup.quanta

2015-11-10 08:46:52.575 -0800 Failure Could not capture file

2015-11-10 08:46:52.575 -0800 Details archive I/O error

2015-11-10 08:46:52.575 -0800 Details Cause: <IO> cannot read hash page(s) { ErrDescription='Operation timed out', POSIXErr=60, Position=5731762176, API=pread, Path='/Volumes/BUD3/3rd backup.quanta/hash.index', Length=8192 }



2015-11-11 03:54:28.850 -0800 ------- Capture to 3rd backup.quanta

2015-11-11 03:56:06.984 -0800 Failure Problem closing archive

2015-11-11 04:55:24.154 -0800 Details ErrDescription: Operation timed out



2015-11-11 06:24:16.879 -0800 ------- Capture to 3rd backup.quanta

2015-11-11 07:25:35.814 -0800 Failure Could not capture file

2015-11-11 07:25:35.814 -0800 Details failed to write envelope header

2015-11-11 07:25:35.814 -0800 Details ErrDescription: Device not configured



2015-11-19 03:37:19.406 -0800 ------- Capture to 2nd backup.quanta

2015-11-19 04:39:06.269 -0800 Details cannot read envelope content length

2015-11-19 04:39:06.269 -0800 Details ErrDescription: Device not configured



2015-11-30 08:56:33.900 -0800 ------- Capture to 2nd backup.quanta

2015-11-30 09:11:35.848 -0800 Details problem closing file

2015-11-30 09:11:35.848 -0800 Details ErrDescription: Operation timed out



2015-12-06 03:00:01.518 -0800 ------- Capture to 3rd backup.quanta

2015-12-06 03:27:38.240 -0800 Details problem closing file

2015-12-06 03:27:38.240 -0800 Details ErrDescription: Operation timed out



2015-12-10 10:35:35.671 -0800 ------- Capture to 3rd backup.quanta

2015-12-10 10:44:21.337 -0800 Details cannot read envelope content length

2015-12-10 10:44:21.337 -0800 Details ErrDescription: Device not configured

There are a couple of things to note. First, all of the errors are either "Operation timed out" or "Device not configured." These are not file content errors, media errors, file structure errors, or volume structure errors. These error indicate that storage device has gone off line, or in the case a remote connection the connection to the server has been lost.

The other thing that points to this not being an issue with this particular archive or its volume structures are that these events are the exception. There are scores of successful captures interleaved between these failures. If the archive was corrupted, the media unreliable, or the volume directory structure was damaged, these other captures would have run into the same problems—but they didn't.

I still suspect that the drive containing the archive is either spontaneously going off-line or unmounting, or the network connection to the device or server is timing out, disconnecting, going to sleep, shutting down, going off-line, etc.

Jeff,

This is turning out the be quite the mystery.

It looks like QRecall is doing everything right, but it's still not working. I've posted the problem in the Apple developers forum in hopes of discovering a solution.

Here's what's happening. When you add a password to the keychain, it's stored there along with a list of applications that are allowed to freely access it. This list includes the QRecall application and the QRecallHelper process.

When the QRecall application asks for the password, it gets it. That's why you can open the archive in a browser window.

Most other actions, like capture, are performed by the QRecallHelper process. When it asks for the password, the keychain says no such record exists. That's why your capture (and most everything else you try) won't run.

If you open the keychain record for your archive encryption password, you'll see that both QRecall and QRecallHelper are both listed as trusted apps, but for some reason it's not working. (I suspect one of the recent security updates, but it's too soon to tell).

For now, I suggest removing the password and storying your encryption key in plain text. Not quite as secure, but your archive data will still be encrypted.