QRecall Community Forum
  [Search] Search   [Recent Topics] Recent Topics   [Hottest Topics] Hottest Topics   [Top Downloads] Top Downloads   [Groups] Back to home page 
[Register] Register /  [Login] Login 

1:1 copies, source data validation... RSS feed
Forum Index » Suggestions and Feedback
Author Message
Mostly Harmless


Joined: Mar 16, 2009
Messages: 2
Offline
Hello,

I just discovered QRecall on versiontracker after I had looked into everything listed at macupdate.com.
It`s clear that data integrity is important to you - unlike to some other developers of Mac backup software.
That makes QRecall an interesting candidate although here too I see a few clouds on the horizon.

What I really want is 1:1 copies of source data. No compression. I want to be able to access the data with finder and other programs for testing and restoring.
Would this be optionally possible with QRecall (planned) or would you consider to add this feature?
I understand this is not the way QRecall is designed to work on default but of course it would be pretty easy to implement compared to the more complex system used by default (which safes disk space).
I would probably use the more complex method for daily backups. But at least once a week I want a 1:1 backup without any tricks - just to play it safe.
If you will not offer 1:1 copies you don`t have to read the rest of my post because I am afraid this is a must for me.

I would also like to be able to turn of archiving. I will probably use it but have not made up my mind, yet. Should be an option I think.


Verifying data:
1.
From your site I learned that backed up files are verified.
Usually I would ask if this is done by comparing the backed up files bit-by-bit to the original or with hash files.
I noticed however that QRecall doesn`t necessarily store complete copies of files if they are just new versions of files that already exist.
As I wrote I am looking for a solution that does 1:1 copies and I want these to by verified either by bit-by-bit comparision to the original or by using hashes derived from the original.
To have this with QRecall would first require that you use 1:1 copies as an option.
2.
Validation of source data:
Actually this is a *very* important feature though it`s not implemented in any backup solution for the Mac that I know of (and I am pretty sure I know about pretty much all products available).
I know that at this point of time I may have to to it with an additional program but I am hoping to see this integrated in a professional backup solution for the Mac in the future.
So I am now proposing this to you:
By creating and *saving* hashes for each file that is going to be backed up from the source harddrive, both the original source data and the backup copies can be verified days or weeks later.
Why?
The original source data can be corrupted. This can happen for example by a crash/powercut while writing to disk, by software bugs (think of an encrypted sparse image containing a complete user home folder protected by FileVault), harddrive defect (pretty likely these days), malware.
In most cases this data corruption will not be noticed for quite some time.
Until then the defect files may already have been replacing all previous copies on daily, weekly and monthly backup drives.
So chances are that you have bought 3 external harddrives for redundancy, maybe you keep one offsite. You have carefully chosen and paid for a professional software solution.
You do all your homework but in the end you still lose data.
Not because it`s not backed up but because both the source and all backups are damaged and it wasn`t noticed.
This is worth considering, isn`t it?

http://www.taobackup.com/integrity.html
"...You will not achieve enlightenment until you control the integrity of your data, for a copy is useless if the original is corrupt."

http://www.taobackup.com/integrity_info.html
"The greatest danger to your data is not catastrophic failure, but subtle damage that goes undetected. The corruption of several files on your disk may cause great damage to your business in the long term, but go entirely unnoticed in the short term. If it does go undetected, the corrupt files will flow through your backups until there are no uncorrupted copies left."

Due to the processing power to calculate hashes and the drive space needed to store the hash data I would probably want to use this feature only on the most important data (stuff I create myself). If would be nice to be able to browse the home directory and activate a checkbox for the feature for any folder.

A scheduled validation of the source data would be what I am dreaming of.


The rest:

I want to be actively informed of any errors (with a pop-up alert panel, some blinking symbol in the menu bar, alarm sound or a clearly visible statement in a summary at the end of a backup-job).
One competitor`s solution only writes an error to the log if the backup-job was started by scheduled (if the backup is started manually it does inform the user).
What I don`t want is the backup-job to stop.
For example:
Maybe I start a backup before I leave. When I come back I discover that the backup stopped because the user(-account) had no read permission for one folder or file. All other files could have been backed up but the program stopped the whole process when the error occurred. Not very smart.

Start a backup-job when a particular volume is mounted.

What happens if a file is saved while QRecall is backing up this particular file?

At versiontracker the info says "beta".
Is it a beta or a finished product?

I would like to be able to pause a backup-process (if I need the resources to do something with the computer). It would be nice if the process could auto-start again after while, if the computer is idle (screensaver on?), in case I would forget to continue the backup (by pressing pause again before I leave the computer).
James Bucanek


Joined: Feb 14, 2007
Messages: 1568
Offline
Mostly Harmless wrote:It`s clear that data integrity is important to you - unlike to some other developers of Mac backup software.
I'm glad that message is clear. Yes, data integrity is very important.

What I really want is 1:1 copies of source data. No compression. I want to be able to access the data with finder and other programs for testing and restoring.
There are a bazillion programs that will make straight copies of files. There are ancient UNIX utilities that will checksum files. QRecall is not one of those.

Would this be optionally possible with QRecall (planned) or would you consider to add this feature?
I have this crazy dream, from time-to-time, of writing a file system plug-in that would mount a QRecall archive as though it was a volume containing the original files (read-only, of course). I'm not sure what benefit it would have so I keep shoving the feature off to be considered for a future version.

If you will not offer 1:1 copies you don`t have to read the rest of my post because I am afraid this is a must for me.
Keep reading; you might change your mind.

Verifying data:
1.
From your site I learned that backed up files are verified.
Usually I would ask if this is done by comparing the backed up files bit-by-bit to the original or with hash files.
I noticed however that QRecall doesn`t necessarily store complete copies of files if they are just new versions of files that already exist.


QRecall doesn't archive files. It archives the data in files. It breaks every file up into small blocks of data and stores each block in a massive database. Each block is uniquely identified and hashed.

When a QRecall archive verifies its data, it doesn't need the original files for comparison. It stores hashcodes and interlocking integrity checks for every single block of data, file, and directory information. This allows it to determine if any of the archive data has been altered or damaged.

As I wrote I am looking for a solution that does 1:1 copies and I want these to by verified either by bit-by-bit comparision to the original or by using hashes derived from the original.
QRecall does the latter. But on a block-by-block basis, not a file-by-file basis.

To have this with QRecall would first require that you use 1:1 copies as an option.
2.
Validation of source data:
Actually this is a *very* important feature though it`s not implemented in any backup solution for the Mac that I know of (and I am pretty sure I know about pretty much all products available).
I've considered implementing a "compare" feature that would compare the file information in a QRecall archive with what's on the disk. Surprisingly (up to now) no one has requested this feature.

By creating and *saving* hashes for each file that is going to be backed up from the source harddrive, both the original source data and the backup copies can be verified days or weeks later.
Why?
The original source data can be corrupted.
The problem with a compare function as an integrity tool that that files change all the time. On modern computer systems, you can't blink without three files getting modified somewhere on your disk. So a compare feature can't tell you if one of your source files is damaged, only that it's different. And it will tell you that hundreds of files are different every day.

Until then the defect files may already have been replacing all previous copies on daily, weekly and monthly backup drives.
So chances are that you have bought 3 external harddrives for redundancy, maybe you keep one offsite. You have carefully chosen and paid for a professional software solution.
You do all your homework but in the end you still lose data.
Not because it`s not backed up but because both the source and all backups are damaged and it wasn`t noticed.
No software (other than the application that created it ... and even then it's not always possible) can tell you if a document has become damaged. Comparing the files on your disk with the copies in your backup only tell you what's changed, not what shouldn't have changed but did.

The more practical approach, and the one that QRecall takes, it to ensure the integrity of the copy — not the original. By keeping efficient multi-generational copies of files, should you find a corrupted copy of a source file you can go back to the archive, go through its history, and retrieve a good copy of it before it became damaged.

http://www.taobackup.com/integrity.html
"...You will not achieve enlightenment until you control the integrity of your data, for a copy is useless if the original is corrupt."
I submit that the copy is useful for recovering from a corrupt original. That is, after all, why we take backups. But a copy is useless if it's corrupt...

http://www.taobackup.com/integrity_info.html
"The greatest danger to your data is not catastrophic failure, but subtle damage that goes undetected. The corruption of several files on your disk may cause great damage to your business in the long term, but go entirely unnoticed in the short term. If it does go undetected, the corrupt files will flow through your backups until there are no uncorrupted copies left."
I wholeheartedly agree, but so far no one has figured out how to take the "subtle" out of subtle damage. Yes, corrupted source files will percolate through your backups. And if you only make 1:1 copies each day (replacing the copy each time) you have less than 24 hours to discover that damage before it's too late. QRecall lets you keep days, weeks, months, even years worth of incremental copies in a single archive. This gives you months to discover that something has gone wrong and retrieve a valid version from your archive.

Due to the processing power to calculate hashes and the drive space needed to store the hash data I would probably want to use this feature only on the most important data (stuff I create myself). If would be nice to be able to browse the home directory and activate a checkbox for the feature for any folder.
QRecall considers all data imporant and performs its data integerity checks for every nibble of data in the archive.

A scheduled validation of the source data would be what I am dreaming of.
A Verify action can be scheduled to run on whatever schedule you choose. I verify archives daily (of course, I do a lot of experimentation that's likely to damage them).

I want to be actively informed of any errors (with a pop-up alert panel, some blinking symbol in the menu bar, alarm sound or a clearly visible statement in a summary at the end of a backup-job).
The QRecall activity window will indicate any action that didn't complete successfully. QRecall is also Growl savvy.

What I don`t want is the backup-job to stop.
Maybe I start a backup before I leave. When I come back I discover that the backup stopped because the user(-account) had no read permission for one folder or file. All other files could have been backed up but the program stopped the whole process when the error occurred. Not very smart.
A transient permission or read error with a single file will be logged, but won't stop a capture. Errors with the archive itself will immediately stop a capture, in order to prevent further damage to the archive.

Start a backup-job when a particular volume is mounted.
QRecall has "event" schedules that can start a capture when the volume containing the items to capture is mounted, or when the volume containing the archive is mounted. The later lets you create a mobile volume that can be plugged into multiple machines; the capture starts as soon as the volume is plugged in.

What happens if a file is saved while QRecall is backing up this particular file?
The answer is "it depends." If the application write a monolithic file using the common "safe save" technique, QRecall will capture the old copy of the document intact. If not, then the captured copy may be inconsistent and you'll have to perform another capture of the document while it isn't being modified before you have saved a valid copy. All file copy and backup program suffer from the same problem, even Apple's Time Machine.

At versiontracker the info says "beta".
Is it a beta or a finished product?
VersionTracker has apparently been purchased by download.com, and the means to edit old listings is now broken. As soon as its fixed again, I'll update the listing.

I would like to be able to pause a backup-process (if I need the resources to do something with the computer). It would be nice if the process could auto-start again after while, if the computer is idle (screensaver on?), in case I would forget to continue the backup (by pressing pause again before I leave the computer).
In the activity window, you have two choices. Stop and Reschedule cancels the current action and schedules it to run again at some future date. Or, you can pause a running action for varying amounts of time from between 5 minutes and 4 hours. The action will automatically resume after that time, or you can manually resume it at any time.

- QRecall Development -
[Email]
Mostly Harmless


Joined: Mar 16, 2009
Messages: 2
Offline
Hi James,

Thanks for taking the time to work through my catalogue of questions and suggestions

I have this crazy dream, from time-to-time, of writing a file system plug-in that would mount a QRecall archive as though it was a volume containing the original files (read-only, of course)

There may be something that could serve as a basis of this at google code.
One implementation makes .flac files appear as .wav. And this is read & write.

Keep reading; you might change your mind.

There are many reasons why I don`t want any archiving in open or proprietary standard.
A few of these:
I want to be independent of a particular backup solution, comparing backup and original with other software, the more complex the storage mechanism, the more severe damage might be the consequence if there is a bug, backups surely require more time using packeging (be it for compression or for your method to make the most use of storage space), manual checks of individual files for piece of mind, a damaged backup archive always endangers more than one file.
QRecall`s storage mechanism is more complex than 1:1 copies as it does store fragments of files and relies on it`s database what piece of the puzzle belongs where to re-assemble the pieces in case they should be restored.
I think the benefit is supposed to be able to backup and store a total amount of data that is virtually bigger than what would be possible to store within the particular storage device`s capacity even if compression would be used.
I admit the potential is great if you are using versioned backups or archiving.

I've considered implementing a "compare" feature that would compare the file information in a QRecall archive with what's on the disk. Surprisingly (up to now) no one has requested this feature.

I am not surprised. I wasn`t interested in this prior to 3-4 days ago when I read why this is an issue at taobackup.com
This is clearly something that is not totally obvious.
Nevertheless I agree with the author of taobackup that this is an important issue.

The problem with a compare function as an integrity tool that that files change all the time. ... a compare feature can't tell you if one of your source files is damaged, only that it's different.
...No software ... can tell you if a document has become damaged. Comparing the files on your disk with the copies in your backup only tell you what's changed, not what shouldn't have changed but did.

The modification date is changed by the operating system whenever there is write access.
If the modification date does not indicate write access since a hash signature was generated but the file has changed, then it is corrupted.

By keeping efficient multi-generational copies of files, should you find a corrupted copy of a source file you can go back to the archive, go through its history, and retrieve a good copy of it before it became damaged.

The problem is that it is very likely that you won`t notice the difference.
While you can very often *see* the difference in a picture file, in case of software for instance that you bought as download and not on a DVD-Rom you can`t tell it was damaged. The software may now crash often but you will blame the developer.
There are more examples. Chances are that you will not discover it.

A Verify action can be scheduled to run on whatever schedule you choose.

So far for the destination only.

A transient permission or read error with a single file will be logged, but won't stop a capture. Errors with the archive itself will immediately stop a capture, in order to prevent further damage to the archive.

That`s what I aim at when I want 1:1 copies:
If there is just a small defect in the archive, maybe due to a bug or just a powercut when writing the backup, a large number of backed up files may not be possible to restore any longer (if data fragments where damaged which are required for re-assembly of many files (versions)).
That`s why my approach is to keep it as simple as possible. But I also think that verification is necessary (some backup solutions for the Mac with 4 star ratings don`t even check the destination after writing. Funny: the developer almost seemed to present this as a feature because it speeds up the backup process. well, I hope he has a harddrive crash soon and learns that his complete backup is useless ).
So there are many simple 1:1 copy tools (free) and commercial solutions. But some have other shortcomings. Command line tools are not for me.

QRecall has "event" schedules that can start a capture when the volume containing the items to capture is mounted, or when the volume containing the archive is mounted.

Very flexible

If the application write a monolithic file using the common "safe save" technique, QRecall will capture the old copy of the document intact. If not, then the captured copy may be inconsistent and you'll have to perform another capture of the document while it isn't being modified before you have saved a valid copy. All file copy and backup program suffer from the same problem, even Apple's Time Machine.

Is there no way for a backup application to check if a particular file is currently being saved?

In the activity window, you have two choices. Stop and Reschedule cancels the current action and schedules it to run again at some future date. Or, you can pause a running action for varying amounts of time from between 5 minutes and 4 hours. The action will automatically resume after that time, or you can manually resume it at any time.

This is very thoughtful. The perfect option for any occasion


There are many things that speak for QRecall. The lack of a more simple 1:1 copy sort of spoils it for me. An archive or versioned backup is more or less an afterthought for me while I consider simple 1:1 copies to be important.
James Bucanek


Joined: Feb 14, 2007
Messages: 1568
Offline
Mostly Harmless wrote:There are many things that speak for QRecall. The lack of a more simple 1:1 copy sort of spoils it for me. An archive or versioned backup is more or less an afterthought for me while I consider simple 1:1 copies to be important.
QRecall provides a reliable, secure, and verifiable backup solution that balances the practical needs of file management and performance with the security of multi-generational backups (and QRecall can keep more generations than any other solution available today).

If you prefer another technique, I completely understand.

- QRecall Development -
[Email]
Peter B.


Joined: May 25, 2013
Messages: 9
Offline
Mostly Harmless wrote:-snip-
The modification date is changed by the operating system whenever there is write access.
If the modification date does not indicate write access since a hash signature was generated but the file has changed, then it is corrupted.
-snip-

I, too, would like to have source data validation. The method I was going to suggest is this one posted by Mostly Harmless. However, I think that not all applications or processes update the modification date of files that are written to.
James Bucanek


Joined: Feb 14, 2007
Messages: 1568
Offline
Peter B. wrote:I, too, would like to have source data validation.

The way QRecall stores data, source data validation doesn't make much sense. In fact, QRecall's archive structure is specifically designed so that is does not have to rely on comparing the data backed-up with the original as a means of validation.

All data in a QRecall archive is validated via interlocking layers of checksums and consistency guards. There's no need to compare the data in the archive with the orignal file, because the archive stores enough redundant information to determine if the data has been damaged or altered in any way.

You could compare the data in the archive with the original files, but that would only tell you if the original file had changed or not. That's a feature that's on the to-do list, but it won't improve the security of the data in the archive.

However, I think that not all applications or processes update the modification date of files that are written to.

Applications do not update the creation, modification, attributes, or access time of file objects. That's handled automatically by the filesystem. In fact, an application would have to go to extraordinary measures to modify a file in a way that made it appear that it hadn't. I'm not aware of any apps that do this.
[Email]
Peter B.


Joined: May 25, 2013
Messages: 9
Offline
I'm confident that QRecall will know when QRecall archive data has been corrupted and I understand that QRecall doesn't need the original to do this. If I think that one of my originals has become corrupted, and I haven't modified it since the last capture, I can recall a copy from the archive and then compare it to the original. However, if one of my originals has become corrupted and I don't realize it, the corrupted file will be captured into the QRecall archive the next time I do a backup. As old layers are merged, the uncorrupted file will eventually be gone from the archive. I'd have some time to recall the uncorrupted file from the archive, but I'd still have to realize that the original was corrupted. If QRecall knew that an original has changed, but it shouldn't have since the last capture (because the modification date of the original is the same as the one in the archive), then QRecall could alert the user that the original may be corrupted.

James Bucanek wrote:Applications do not update the creation, modification, attributes, or access time of file objects. That's handled automatically by the filesystem. In fact, an application would have to go to extraordinary measures to modify a file in a way that made it appear that it hadn't. I'm not aware of any apps that do this.

The only instance I can remember where this happens is with AppleScript Editor in Snow Leopard when modifying scripts saved as applications. The date modified of the application didn't change when I modified it with AppleScript Editor in Snow Leopard. However, a few minutes ago I opened a script I hadn't touched since 2011 by using "Show Package Contents" in the Finder and items inside did have modification dates that were later than the application's modification date (which, in Snow Leopard, always stayed the same as the date created). So, the files that I modified apparently did have updated modification dates but the package/application containing them did not. Apple seems to have fixed this in Lion or Mountain Lion: scripts saved as applications get updated modification dates when they are modified by AppleScript Editor in Mountain Lion.
James Bucanek


Joined: Feb 14, 2007
Messages: 1568
Offline
Peter B. wrote:If QRecall knew that an original has changed, but it shouldn't have since the last capture (because the modification date of the original is the same as the one in the archive), then QRecall could alert the user that the original may be corrupted.

Unfortunately, the only way to do that would be read every file that didn't appear to have been modified, in its entirely, every time you performed a new capture. This would require (approximately) the same amount of work as recalling the entire volume, during every capture. It would make more sense—and be a heck of a lot less work—to just keep your incremental changes longer.

The only instance I can remember where this happens is with AppleScript Editor in Snow Leopard when modifying scripts saved as applications.

That's a different issue. An application is a bundle (a folder that appears to be a single file in the Finder). The modification date of the bundle is the modification date of the bundle's folder, which may, or may not, reflect the modification dates of its contents.

This is a non-issue for QRecall. QRecall does not use the modification date of a folder to determine if the contents of that folder have changed.
[Email]
 
Forum Index » Suggestions and Feedback
Go to:   
Mobile view
Powered by JForum 2.8.2 © 2022 JForum Team • Maintained by Andowson Chang and Ulf Dittmer