QRecall

Chris Caouette wrote:My three macs have been running QRecall happily for several months now but I started wondering when we might see an update with any new features etc.

Chris,

Thanks for posting. My original plan was to add some of the many planned features to QRecall this spring and summer.

Unfortunately (for QRecall) I was sidetracked by two book deals. I'm currently writing Learning Objective-C for Java Developers to be published by Apress, and Professional Xcode 3 for Wiley Publishing. These two projects are currently consuming 98% of my available time, forcing QRecall development to the back burner for the moment.

I do have a dot release planned to fix a number of minor bugs and provide some performance improvements, pending the resolution of two very hard-to-reproduce bugs.

Once the books are behind me, I'm planning on full-time QRecall development—and I've got about a billion new features I want to implement.

In the mean time, please stay tuned. Let me know if you're having any problems, and don't hesitate to send or open a discussion on new features.

David Kaminsky wrote:If I start the backup again will the incomplete files (if QRecall doesn't finish the file it was copying when cancelled) be recopied?

Yes. QRecall always captures every file and folder that differs from what's in the archive. So the next capture will capture everything the incomplete layer missed, along with anything else that might have changed since the incomplete layer was captured.

Do I need to merge the last 2 layers?

You can, if you like. Merging any earlier layer with a complete layer leaves a complete layer. I prefer to merge incomplete layers with complete ones to avoid any "confusion" in the future. But if you choose to keep the incomplete layer for some reason, it won't affect how QRecall behaves.

I assume the archive files are treated as just files and can be copied themselves and put back in place if needed.

The archive itself is a package (a directory of files that acts as a single file in the Finder). It can be copied, moved, renamed, synchronized, or backed up like any other document.

Thanks for your assistance.

My pleasure.

Steven,

Steven Arnold wrote:Here's a theory: maybe iTunes changes something about one of the files being backed up during the backup process. There are two cases of potential importance, I think. One is that the file is changed after QRecall backs it up. This might cause QRecall to look at the file and die because it sees it's different.

QRecall does not compare the data it captures with the original files in order to verify it's validity. It's not a definitive test, for exactly the reason you mentioned. Instead, QRecall reads the original file and adds additional checksum and data consistency information to the archive. This allows QRecall to determine if the file data in the archive is OK, even after the original has changed or been deleted.

The second, probably more serious case, is where the actual file itself changes partway through the process of QRecall backing it up.

Another very valid concern, but that's not your problem. The error you got was an I/O error trying to finalize the archive. This happens long after all of the files have been captured, so it can't have anything to do with the source data files.

QRecall tries to deal with mutating data files in three ways:
- QRecall reads files in large chunks, very quickly. For most files (<12MB) this gets a "snapshot" of the file as quickly as possible, so any future modifications don't influence what's captured.
- If the file being captured is replaced, QRecall will stop capturing the file and start over. This gets logged as minutia.
- If the file is extended or truncated during the capture, QRecall will record that as a warning in the log. You'll want to recapture the file.

These are, admittedly, stop-gap measures that try to make as accurate of a capture as possible. They are all part of the larger issue of trying to copy files that are open and being modified, for which there is no complete solution. This is just a consequence of the way the operating system works and there are no backup solutions available that can completely eliminate the possibility of capturing a partially modified file.

If you have groups of files that are being modified regularly, and you're running OS X 10.5 or later, you might consider adding a "pick-up" capture that immediately follows your regular capture. Starting with Leopard, QRecall will very quickly recapture any files that were modified after the first capture started.

Steven,

Thanks for sending a diagnostic report.

I've looked at your log files, and you're encountering a consistent problem. Your archive is occationally encountering an I/O error when QRecall tries to close the archive. This happened on 4-30, and again 5-14:

2009-04-30 07:19:38.672 -0400 Failure Problem closing archive

2009-04-30 07:19:38.672 -0400 Details could not set file size

2009-04-30 07:19:38.672 -0400 #debug# IO exception

2009-04-30 07:19:38.672 -0400 Details Path: /Volumes/encrypted_backups 2/music_and_video.quanta/hash.index

2009-04-30 07:19:38.672 -0400 #debug# OSErr: -36

2009-04-30 07:19:38.672 -0400 Details Pos: 3221225528

2009-05-14 07:19:49.005 -0400 Failure Problem closing archive

2009-05-14 07:19:49.005 -0400 Details could not set file size

2009-05-14 07:19:49.005 -0400 #debug# IO exception

2009-05-14 07:19:49.005 -0400 Details Path: /Volumes/encrypted_backups 2/music_and_video.quanta/hash.index

2009-05-14 07:19:49.005 -0400 #debug# OSErr: -36

2009-05-14 07:19:49.005 -0400 Details Pos: 3221225528

Once an error like this occurs, the archive is considered suspect until it can be reindexed or repaired. Until that happens, QRecall will refuse to use it.

The problem (i.e. "curse") of QRecall is that it's absolutely fastidious about data integrity and it checks its work afterwards. The "invalid header length" errors you encounter afterwards is QRecall's way of saying that the file size that expected to find doesn't agree with the actual file size. This makes sense, since both errors occurred when trying to set the file size before closing the archive.

QRecall attempts to reindex the archive, but this inevitably fails, and it then has to rebuild the archive. I am not sure what the difference is, but both processes take about equally long -- and a very long time.

All the important data in an archive is stored in a single file. Most of the remaining files in an archive are "index" files that provide rapid access to archive data. If the data file is undamaged, a reindex reconstructs the index files by scanning the entire data file. A reindex assumes the single data file is completely valid; any problems or inconsistencies will cause it to fail.

A repair is very similar to a reindex, except that it make no assumptions about the information in the data file. It too scans the entire data file and rebuilds the indexes, but automatically "fixes" any problems that it finds in the data file.

So as long as you don't have any problems with your master data file, a reindex will fix your archive. However, there's no way of knowing that in advance. If reindexing your archive takes a very long time, just repair it instead. If the data file is OK, repair and reindex are essentially identical. If the data file isn't OK, the repair will fix it.

Now, back to the real question of why this is happening. Alas, I don't know. An I/O error (-36) is the generic "something when horribly wrong and I can't perform this file operation" error. It really doesn't say what the problem is or what could be done about it.

Two I/O errors performing the same operation at the same point in processing is unlikely to be a coincidence. "Real" I/O errors are typically random events caused by hardware glitches or by accidentally pulling out a FireWire cable.

I've never seen this problem in testing, so I suspect it's something specific to your environment. You say that the archive is on an encrypted volume? Are your other archives also encrypted? Are the as large? Have/could you try moving it to a non-encrypted volume to see if you have the same problem?

Steven,

Thanks for posting. I'm glad that you've found QRecall useful, despite the occasional problem.

I have two questions. First, what exactly is the problem that's causing your archiving to require repair? The answer should be in your log files, which I'd like to look at. You can send a diagnostic report or just attach/e-mail your recent log files (~/Library/Logs/QRecall). The failures that would require a repair are typically data corruption problems.

Which leads me to my second question. Is this archive on the same volume and/or physical drive as your other archives?

Ralph Strauch wrote:I tried to stop the backup by clicking on the x in Qrecallmonitor, with no response. ... While it would have been nicer if the interrupted backup had responded to my attempts to quit from Qrecall Monitor, I found this behavior overall acceptable and a big improvement over the way Qrecall used to handle such interruptions the last time I had one a year or so ago.

Ralph,

Thanks for the feedback. Technically, there's not much that QRecall can do in a situation like this. Manually killing the QRecallHelper process, as you did, is exactly the course of action I would have suggested.

Unplugging a hard drive may cause the thread that's trying to write to the volume to seize. The other threads in the process weren't affected, which is why the action appeared to continue running and responded to requests to quit. However, it can't actually quit until the stuck thread resumed—which I suspect it never would.

I agree that the auto-recovery feature added to the latest version is great improvement over having to repair the archive. I know it's saved me many a time.

Christian Roth wrote:I think the Instruments (an app included in Apple Dev Tools) file I linked to in my last post should contain already one, if you want to have a look at it right away.

I downloaded your instruments file and took a look. Sure enough, nearly 100% of the first thread's time is spend in NSHashInsertIfAbsent, which is the function that's called to add one entry to the name index hash table.

I'll just wait for the new release, since the archive in question is my actual working archive which I do not want to lose/corrupt.

I'll keep you posted. Hopefully, I'll have some time soon to squash that lingering bug and get a maintenance release out.

Christian,

Standard disclaimer: There's a lot that happens when an archive is being closed. How long will depend on how many new quanta were added to the archive, how many unused data blocks (aka "holes") are in the archive, the speed of your data connection with the archive, and so on. It could take 10 seconds or 10 minutes.

However, I sheepishly admit that you're probably encountering a serious problem with the names index that's present in the current release.

As the names index grows in size—and it sounds like you're is larger than most—the amount of time spent reading, updating, and writing the names table becomes enormous. I've seen QRecall sit and spin for 40 minutes just sorting the names index.

For the most part I've fixed this already. The new names index routines are anywhere from 10 to 60 times faster than the current release version. Unfortunately, there's another bug in the names index that I really need to get fixed before I release a new version, and I just haven't had the time to do that yet.

If you want to verify this, you can use the 'sample' tool to sample the QRecallHelper process when it gets stuck closing the archive. You'll have to run it as root (i.e. sudo sample ...) if QRecall is pre-authorized to use administrative privileges. Send me the sample file and I can tell if it's the same problem.

If this is causing you a great deal of grief, you're welcome to try—at your own risk—the unreleased QRecall 1.1.1 that I've given to a few users experiencing other problems, also fixed in this version.

Hansen Teidee wrote:the capture-process keeps on breaking up. dont kwow by what its caused.

Hannes, thanks for posting your log files.

What I see are bunch of I/O and data integrity errors. This usually indicates one of three problems:

1) The volume the archive is has a corrupted directory structure.
2) The hard drive is losing data.
3) You have RAM or data transfer problems.

The first problem is easy to check and fix. Use Disk Utility to repair the volume. If problems are found, and fixed, then repair your QRecall archive and go back to taking backups. That's probably all it was.

The second and third problems are more difficult to isolate. I'm not aware of any utilities that will fill an entire drive with test data and then verify it. I've considered creating one myself. This is what you really need to detect problem 2.

There are a number of utilities that will perform RAM tests. TechTool Deluxe that is included with most AppleCare plans should be sufficient. The Integrity utility by Intech will do a good job of detecting data transfer or firmware problems between your computer and drive.

Gary,

Thanks for sending me your files. They didn't include any crash logs, but after reviewing your log files I don't think you have any.

The only instance of a "lost connection" error was recorded on April 4. (If you want to to look further back, send me some older log files). Most notable is this log entry:

2009-04-04 05:16:21.946 -0500 #debug# VerifyCommand listeners probably dead, terminating process

It appears that the listener process (the Activity Monitor window) was simply running too slowly to keep up with the action's updates. The inter-process equivalent of the spinning beach-ball cursor.

What happens when an action is done is that it sends all of its listeners a message that it has finished. It then waits around for about a minute for those listeners to acknowledge that it's done and close their communications connection. If the listeners don't, the action process assumes that they died and just terminates. That's what is recorded in your log. But, if it turns out, the listeners weren't really dead, they may attempt to communicate with the action process again, which fails because that action no longer exists. This results in a "lost communication with helper" error.

Since I don't see any other errors or problems in your log, I'm confident that this was a harmless, and sporadic, communications failure between processes and not a failure of the verify action that was running at the time.

Gary K. Griffey wrote:I will send the requested info...

Gary (and anyone else this might affect),

I haven't received a diagnostic report from you. There's a known bug (fixed in the currently unreleased version) that can cause the report upload to fail if the log files are too big.

If you, or anyone, sends a crash report and you do not immediately receive an automated e-mail acknowledgement from the server, then please report your problem manually.

At a minimum, please include a copy of your latest QRecall.log files (in ~/Library/Logs/QRecall) and any crash report files. These can be found in either /Library/Logs/CrashReporter/ or ~/Library/Logs/CrashReporter/ and will all have names beginning with QRecall.

Post a description of your problem and those files, or e-mail them to support@qrecall.com.

Thanks,

James

Gary K. Griffey wrote:I am running the 1.1.0.42 release of QRecall. I am now experiencing the same issue as the original poster..."Lost Contact with the Helper"...

The "lost connection" message is a generic one that just means that the monitoring application(s) lost touch with the process that's actually performing the action. This usually happens because the action process crashed.

A process can crash for lots of reasons, it doesn't mean it has anything to do with the issue fixed in 1.0.0.55. The best thing to do is to send in a diagnostic report (Help > Send Report...). This will include any crash logs for your QRecallHelper process, which might provide some clues as to what the problem is.

Steven M. Alper wrote:Is there any problem in continuing to use an archive that has had a simple Repair operation run on it, but that now contains an "Unknown - Damaged -" layer?

There's no problem. A "Damaged" layer means that QRecall detected data loss when reassuming the layer. For example, if a data block was bad there may be a missing file. The log should contain the details.

When the repair is finished the layer and archive are still usable. The "Damaged" warning is displayed as a reminder that the layer should not be treated as a complete record of all the changes originally captured.

If the other changes in that layer aren't important to you, you can delete the layer. If the damaged layer is the very last layer, select it and delete it. If it's not the last layer, merge it with the next layer in the archive. All changes unique the damaged layer will be discard.

To absolutely guarantee that no data is missing and all layers represent a true and complete record of changes, perform a new capture and then merge everything from the damaged layer to the latest capture.

There's a slightly simpler approach, but you would lose the ability to browse individual files in the encrypted disk image.

Keep the single, encrypted, sparse disk image on your primary partition that contains your sensitive files. Then simply capture the entire volume to an external archive. The archive would contain all of the non-encrypted files and the single encrypted disk image file.

You would not be able to browse changes to your encrypted files; you would have to recall the entire encrypted volume in order to recover one or more items. But it would simplify the arrangement.

Mostly Harmless wrote:There are many things that speak for QRecall. The lack of a more simple 1:1 copy sort of spoils it for me. An archive or versioned backup is more or less an afterthought for me while I consider simple 1:1 copies to be important.

QRecall provides a reliable, secure, and verifiable backup solution that balances the practical needs of file management and performance with the security of multi-generational backups (and QRecall can keep more generations than any other solution available today).

If you prefer another technique, I completely understand.