pirem71 wrote:Hi, I discovered QRecall reading a book by Joe Kissel and I can say it is exactly what I was looking for: a time machine on steroids. Simple and effective, a pure mac style app.
I'm glad you found us.
1. I want to use it to back-up my DevonThink indexed folders. Each top level folder (corresponding to a DT database) includes some 50.000-100.000 files in 10.000-15.000 sub-folders. I back them up creating a separate QRecall archive for each of that folders. Are these dimensions likely to cause any problems to QRecall? Is there a limit (in file number o MB/GB) to be respected for a single QRecall archive?
A QRecall archive (as of version 1.2.0) can be up to 6TB in length and can easily store upwards of several billion file and data records (it's hard to translate QRecall's data records in a number of "files", because it's not a one-to-one correlation). Because of de-duplication and compression, that 6TB of archive data can represent tens of terabytes of actual file data.
2. Partial update of a file (sub-file updates), deduplication, compression, etc are all good features of a "modern" back-up system (as stressed out by Joe Kissel). Anyway I'm a bit concerned about reliability issues that these "structures" may introduce when compared to the old "make a Finder copy of the entire changed file" strategy. E.g. if a few blocks of the HD get corrupted (say the ones storing archive indexes, not quanta chunks), is there any chance that the corruption can destroy the integrity of the whole archive and not only the integrity of a few files? Have you got any good news that can reassure us (paranoid people)?
QRecall's archive structure was specifically designed to be robust and damage resistent. A QRecall archive, quite unlike a file system volume or database, is a collection of small, self-contained, data records. Each record is checksummed to ensure data integrity. The interdependence between records is deliberately kept to a minimum. As an example, in a typical file system a directory (folder) is a record that lists the files and sub-directory it contains. If that directory record is destroyed, all of the files and subfolders it contained become "lost". (Those files might be recoverable, but there's no way to know where they belong in the directory structure.) In a QRecall archive, the complete location of each folder is stored redundantly in every directory record. If a directory record is destroyed, the sub-folders of that directory can reconstruct it's name, location, and most of its content. Similarly, QRecall archives rely on a number of complex index files; however, none of the index files contain any unique information. All index files can be reconstructed from the primary set of archive records (this is one of the things the Repair command does).
3. Working with very large archives, visual instruments lost partially their utility and one must rely on creation of text logs to be then managed/filtered/manipulated with app like TextWrangler.
E.g. Highlights (all items captured in a layer) is a nice feature but would it be possible to export the highlight in a text file instead of simply graphically reaveal it on screen? There could be a command to create highlights file for a given layer, a set of layers or all the layers of the archive (one separate txt file for each layer).
I have no plans for exporting (or importing) visual information in the from of text files from the graphical user interface. However, I do have future plans for a QRecall command-line utility that might satisfy your needs.
4. Great part of my files are closed projects archive files, i.e. they are not supposed to change over time (or change only 1 or 2 times in several years). Is there any suggestions on your side on how to keep these files integrity under control (detect as soon as possible a corruption so to restore on the disk the correct file version using QRecall)?
I was thinking again to text files: e.g. create a file listing all the files that never changed since the first capture (checked using MD5 checksum; I know they're OK), then create a file with all items changed 1 time since the first capture and verify if the change was made by the user or if it is a corruption, than create a file with items changed 2 times, etc. Using this procedure for items changed up to 3-4 times should cover all long term storage critical files.
Primitive strategy, I know; can you think about something better?
Of course, after a positive check, there could be the possibility to merge two "snapshots" of a file to reduce the effort during next time check procedure (e.g. files changed 1 time could be merged and then go to "never changed" category that needs no further control). Is it possible to make this merge on "per file" basis instead of doing for the whole layers?
QRecall doesn't have specific features to enforce any kind of change policy in your files, however I can make a couple of suggestions.
To detect if documents have been inadvertently changed, you can simply review the QRecall archive from time to time and see if any of those old project files have changed (by shading older layers, QRecall will show only changed items in the browser). If they shouldn't have changed, recall the earlier, presumably correct, version. And again, a command-line version of QRecall might let you design something like this that would run automatically.
QRecall can't detect data corruption in your files. Data corruption will happen silently with nothing to clue QRecall to the fact that something has changed. If you're worried about corruption in a set of files that should not be changing you can periodically perform a restore of the files. When QRecall recalls a file from the archive, it first sees if the file already exists. If it does, it performs a comparison of the existing file and the data in the archive. It only rewrites the files if the comparison fails. So, in a way, restoring a set of existing files that shouldn't have changed is really an integrity check. All of the data in the QRecall archive is tested using data checksums and inter-record consistency checks, so QRecall knows that the data in the archive is valid.
As for the data in the QRecall archive getting corrupted, it's highly recommended that you schedule a verify action to run periodically. A verify will test and validate every byte of information in the archive; even a single bit of data out of place will be immediately detected.
I hope that helps.