Chris Petit wrote:I was curious how verifying worked.
In practical terms, it works by making sure that nothing in the archive has been altered or is missing. It doesn't need the original files to accomplish this.
In technical terms, a verify action reads and checks the integrity of every record in the archive. To understand why this works, you need to know a little about how an archive is constructed.
An archive is made up of a multitude of small records written to a file called the
repository. Every record is stored with an Adler 32-bit checksum along with some redundant information. QRecall can use this information to determine if any data in that record has been disturbed or altered, even a little bit. Even empty records in an archive have checksums that can tell QRecall if something has been written where it shouldn't.
The first phase of a verify action consists mostly of reading every record in the repository and verifying those checksums. All of the information about a captured file (what directory it is in, its metadata, and every block of its data) are stored as individual, verifiable, records. So if all of the records are intact, QRecall can prove that none of the folders, files, or any of the data in those files, has been altered or lost.
The rest of the verify action is spent doing two things. The first is to make sure the structure of the archive is sound. For example, records refer to other records (a directory record may contain a list of records that represent the files in that folder). This is mostly a sanity check, as it's highly improbable that all of the records would be OK while a link between records is wrong. But it's not impossible, so QRecall checks.
The other thing the verify action is doing is making sure that all of the auxiliary index files in the archive package agree with the information in the repository records. The individual records of the repository are very robust and designed to be resistent to data loss, but are cumbersome to access quickly. The index files make finding and updating information in the repository much quicker, but they're only good if they accurately reflect what's in the repository. QRecall reconstructs all of the information in the index files from the information in the repository records, and then compares that to the files in the archive package. If they match, the index files are given a clean bill of health.
And that's verifying in a nutshell.