QRecall

Pierpaolo Remelli

Hi, I discovered QRecall reading a book by Joe Kissel and I can say it is exactly what I was looking for: a time machine on steroids. Simple and effective, a pure mac style app.

I only have a few questions about your app:

1. I want to use it to back-up my DevonThink indexed folders. Each top level folder (corresponding to a DT database) includes some 50.000-100.000 files in 10.000-15.000 sub-folders. I back them up creating a separate QRecall archive for each of that folders. Are these dimensions likely to cause any problems to QRecall? Is there a limit (in file number o MB/GB) to be respected for a single QRecall archive?

2. Partial update of a file (sub-file updates), deduplication, compression, etc are all good features of a "modern" back-up system (as stressed out by Joe Kissel). Anyway I'm a bit concerned about reliability issues that these "structures" may introduce when compared to the old "make a Finder copy of the entire changed file" strategy. E.g. if a few blocks of the HD get corrupted (say the ones storing archive indexes, not quanta chunks), is there any chance that the corruption can destroy the integrity of the whole archive and not only the integrity of a few files? Have you got any good news that can reassure us (paranoid people)?

3. Working with very large archives, visual instruments lost partially their utility and one must rely on creation of text logs to be then managed/filtered/manipulated with app like TextWrangler.
E.g. Highlights (all items captured in a layer) is a nice feature but would it be possible to export the highlight in a text file instead of simply graphically reaveal it on screen? There could be a command to create highlights file for a given layer, a set of layers or all the layers of the archive (one separate txt file for each layer).

4. Great part of my files are closed projects archive files, i.e. they are not supposed to change over time (or change only 1 or 2 times in several years). Is there any suggestions on your side on how to keep these files integrity under control (detect as soon as possible a corruption so to restore on the disk the correct file version using QRecall)?
I was thinking again to text files: e.g. create a file listing all the files that never changed since the first capture (checked using MD5 checksum; I know they're OK), then create a file with all items changed 1 time since the first capture and verify if the change was made by the user or if it is a corruption, than create a file with items changed 2 times, etc. Using this procedure for items changed up to 3-4 times should cover all long term storage critical files.
Primitive strategy, I know; can you think about something better?
Of course, after a positive check, there could be the possibility to merge two "snapshots" of a file to reduce the effort during next time check procedure (e.g. files changed 1 time could be merged and then go to "never changed" category that needs no further control). Is it possible to make this merge on "per file" basis instead of doing for the whole layers?

Best regards,
Pierpaolo

James Bucanek

pirem71 wrote:Hi, I discovered QRecall reading a book by Joe Kissel and I can say it is exactly what I was looking for: a time machine on steroids. Simple and effective, a pure mac style app.

I'm glad you found us.

1. I want to use it to back-up my DevonThink indexed folders. Each top level folder (corresponding to a DT database) includes some 50.000-100.000 files in 10.000-15.000 sub-folders. I back them up creating a separate QRecall archive for each of that folders. Are these dimensions likely to cause any problems to QRecall? Is there a limit (in file number o MB/GB) to be respected for a single QRecall archive?

A QRecall archive (as of version 1.2.0) can be up to 6TB in length and can easily store upwards of several billion file and data records (it's hard to translate QRecall's data records in a number of "files", because it's not a one-to-one correlation). Because of de-duplication and compression, that 6TB of archive data can represent tens of terabytes of actual file data.

2. Partial update of a file (sub-file updates), deduplication, compression, etc are all good features of a "modern" back-up system (as stressed out by Joe Kissel). Anyway I'm a bit concerned about reliability issues that these "structures" may introduce when compared to the old "make a Finder copy of the entire changed file" strategy. E.g. if a few blocks of the HD get corrupted (say the ones storing archive indexes, not quanta chunks), is there any chance that the corruption can destroy the integrity of the whole archive and not only the integrity of a few files? Have you got any good news that can reassure us (paranoid people)?

QRecall's archive structure was specifically designed to be robust and damage resistent. A QRecall archive, quite unlike a file system volume or database, is a collection of small, self-contained, data records. Each record is checksummed to ensure data integrity. The interdependence between records is deliberately kept to a minimum. As an example, in a typical file system a directory (folder) is a record that lists the files and sub-directory it contains. If that directory record is destroyed, all of the files and subfolders it contained become "lost". (Those files might be recoverable, but there's no way to know where they belong in the directory structure.) In a QRecall archive, the complete location of each folder is stored redundantly in every directory record. If a directory record is destroyed, the sub-folders of that directory can reconstruct it's name, location, and most of its content. Similarly, QRecall archives rely on a number of complex index files; however, none of the index files contain any unique information. All index files can be reconstructed from the primary set of archive records (this is one of the things the Repair command does).

3. Working with very large archives, visual instruments lost partially their utility and one must rely on creation of text logs to be then managed/filtered/manipulated with app like TextWrangler.
E.g. Highlights (all items captured in a layer) is a nice feature but would it be possible to export the highlight in a text file instead of simply graphically reaveal it on screen? There could be a command to create highlights file for a given layer, a set of layers or all the layers of the archive (one separate txt file for each layer).

I have no plans for exporting (or importing) visual information in the from of text files from the graphical user interface. However, I do have future plans for a QRecall command-line utility that might satisfy your needs.

4. Great part of my files are closed projects archive files, i.e. they are not supposed to change over time (or change only 1 or 2 times in several years). Is there any suggestions on your side on how to keep these files integrity under control (detect as soon as possible a corruption so to restore on the disk the correct file version using QRecall)?
I was thinking again to text files: e.g. create a file listing all the files that never changed since the first capture (checked using MD5 checksum; I know they're OK), then create a file with all items changed 1 time since the first capture and verify if the change was made by the user or if it is a corruption, than create a file with items changed 2 times, etc. Using this procedure for items changed up to 3-4 times should cover all long term storage critical files.
Primitive strategy, I know; can you think about something better?
Of course, after a positive check, there could be the possibility to merge two "snapshots" of a file to reduce the effort during next time check procedure (e.g. files changed 1 time could be merged and then go to "never changed" category that needs no further control). Is it possible to make this merge on "per file" basis instead of doing for the whole layers?

QRecall doesn't have specific features to enforce any kind of change policy in your files, however I can make a couple of suggestions.

To detect if documents have been inadvertently changed, you can simply review the QRecall archive from time to time and see if any of those old project files have changed (by shading older layers, QRecall will show only changed items in the browser). If they shouldn't have changed, recall the earlier, presumably correct, version. And again, a command-line version of QRecall might let you design something like this that would run automatically.

QRecall can't detect data corruption in your files. Data corruption will happen silently with nothing to clue QRecall to the fact that something has changed. If you're worried about corruption in a set of files that should not be changing you can periodically perform a restore of the files. When QRecall recalls a file from the archive, it first sees if the file already exists. If it does, it performs a comparison of the existing file and the data in the archive. It only rewrites the files if the comparison fails. So, in a way, restoring a set of existing files that shouldn't have changed is really an integrity check. All of the data in the QRecall archive is tested using data checksums and inter-record consistency checks, so QRecall knows that the data in the archive is valid.

As for the data in the QRecall archive getting corrupted, it's highly recommended that you schedule a verify action to run periodically. A verify will test and validate every byte of information in the archive; even a single bit of data out of place will be immediately detected.

I hope that helps.

Pierpaolo Remelli

Hi James,
thanks for your quick and exhaustive reply!

I'm glad to hear that data integrity is one of your major concearn when designing QRercall. I'll make use regularly of the verify and repair functions offered by QRecall.

When you say you plan the development of a command-line utility do you mean also Apple scriptability? I gave a look and I found no dictionary for QRecall in Script Editor; I must assume it is not scriptable for the time being. It would be nice to have it included into an Applescript.

As for point #4: I like QRecall ability to keep all data under control and I was imagining a way to use it to protect my files (the ones on the disk, not the archive) against corruption. I had another idea: what if the user can add a flag to files that shouldn't change (let's say an OpenMeta tag like "Frozen" to be communicated to QRecall)?
When QRecall find a file tagged "Frozen" with a changed content, it can not only capture any changes in the new layer but also alert the user that can take corrective measures. If I deliberately want to change a file I have to remove the "Frozen" tag first.
This would really give peace of mind about integrity of data.
You would have for sure a few false positives (forgotten "Frozen" tags when changing a file) but it is better than find an unusable file after 5 or 10 years.

What about the possibility to make "merge" actions at file level instead of merge two entire layers? Is it or will it be possible?

Thanks again,
Pierpaolo

James Bucanek

pirem71 wrote:When you say you plan the development of a command-line utility do you mean also Apple scriptability? I gave a look and I found no dictionary for QRecall in Script Editor; I must assume it is not scriptable for the time being. It would be nice to have it included into an Applescript.

The two are very different. I have "add AppleScript support" on my list of future features, but it's been a low priority since I consider its utility to be somewhat limited. A command-line version of QRecall, on the other hand, is pretty high on the list. And since you can invoke command-line programs from AppleScript, it would provide limited AppleScript support as well.

As for point #4: I like QRecall ability to keep all data under control and I was imagining a way to use it to protect my files (the ones on the disk, not the archive) against corruption. I had another idea: what if the user can add a flag to files that shouldn't change (let's say an OpenMeta tag like "Frozen" to be communicated to QRecall)?
When QRecall find a file tagged "Frozen" with a changed content, it can not only capture any changes in the new layer but also alert the user that can take corrective measures. If I deliberately want to change a file I have to remove the "Frozen" tag first.
This would really give peace of mind about integrity of data.
You would have for sure a few false positives (forgotten "Frozen" tags when changing a file) but it is better than find an unusable file after 5 or 10 years.

This really isn't QRecall's job. If you want to prohibit a file from being modified, there are a number of file system features at your disposal. You can clear the write permission bit, lock the file in the Get Info window, or (if you want to go completely crazy) set the low-level "immutable" attribute for the file. All of these will prevent the file from being accidentally changed. QRecall doesn't need to be involved at all.

What about the possibility to make "merge" actions at file level instead of merge two entire layers? Is it or will it be possible?

It's not feasible. A merge occurs when the data in two layers are combined. This results in the earlier layer being deleted.

But the earlier layer can't be deleted unless all of the files are merged. So you can't merge one file without deleting that file from the earlier layer, which means that if you rewound the archive to that earlier layer and recalled all of the files, the file you merged would be missing--which I don't think is a good idea.