QRecall

Steven J Gold

I replaced a terribly slow internal HD in my laptop with a new SSD of approximately the same size, capturing the "old" volume(s) one last time before the swap.
Then I cloned the old drive to my new using Carbon Copy Cloner. Tremendous difference in SSD vs. HD performance experienced in computer snappiness.
I also renamed the new internal SSD volumes to avoid confusion with the removed hard drive (now living in an external enclosure(.

Then I went to capture the new volume with QRecall. Since the newly installed and partitioned SSD has both different internal ID and different volume name, I expected QRecall to capture it as a new volume but since its contents are almost completely identical to the prior replaced volume, I expected the Capture to find 99% of the data already in the archive (it turned out to be 98.69%) and complete very rapidly.

So I was surprised when it took over 4 hours to capture 167.7GB since it actually only needed to write 1.53GB.
Most surprising was the variance in speed it reported. Sometimes it reported "1.63GB per second", but sometimes only "7.28 *MB* per second" -- that's quite a magnitude variance(!). The average rate was 687 MB/min. I'm curious why it sometimes dipped into the single MB/sec digits.

Relevant facts are: the backup archive is 897 GB, on an external drive connected by USB-2. Shifted Quanta Detection is off, Capture Compression is set to maximum, Data Redundancy is None. 1,278,260 items were captured. Doing the math, it looked like 670 MB was saved by compression.

James Bucanek

Welcome to the club. I upgrade my workhorse Mac Pro to and SSD last year and can't imagine life without it.

Steven J Gold wrote:... I expected QRecall to capture it as a new volume but since its contents are almost completely identical to the prior replaced volume, I expected the Capture to find 99% of the data already in the archive (it turned out to be 98.69%) and complete very rapidly.

So I was surprised when it took over 4 hours to capture 167.7GB since it actually only needed to write 1.53GB.

QRecall wrote 1.53GB of data, and read 335.4GB of data. Remember that de-duplication requires that every block of every source file be looked up in a gigantic database of captured quanta. Once found, the archive record containing the captured quanta is read and compared, byte-for-byte, with the data block in your file to ensure they are identical.

So even if the files you're capturing are 100% duplicates of what's in the archive, it still means QRecall has to read all of that data twice (once from the files and again from the archive).

Most of the capture speed improvements come from anticipating the data being captured or determining that a file is already captured and not reading it all. Both of those optimizations only happen when items are recaptured; they never happen during the initial capture.

Most surprising was the variance in speed it reported. Sometimes it reported "1.63GB per second", but sometimes only "7.28 *MB* per second" -- that's quite a magnitude variance(!). The average rate was 687 MB/min. I'm curious why it sometimes dipped into the single MB/sec digits.

QRecall has a lot of moving parts. It's really hard to tell what's going on from one moment to the next. Sometimes the capture needs to pause while directories are pre-scanned, hash tables are updated, record number indexes are pruned, or a glut of empty records are being erased. The bottom line is, unless you perform a sample of the QRecallHelper process while it appears to be stuck, I can't tell you exactly what it was doing (not that it's usually that interesting anyway).

Steven J Gold

Ah, so that helps explain why the first backup of the new disk wasn't as fast as an initial capture -- the need to read the earlier backups for de-duplication.

It never appeared to be stuck, just varying in speed.

Thanks for the explanation!

BTW, does the existing quanta need to be decompressed for the comparison, or does the comparison operate on the compressed data?
I guess I'm asking if the de-duplication process is slower if the archive is compressed?

James Bucanek

Steven J Gold wrote:BTW, does the existing quanta need to be decompressed for the comparison, or does the comparison operate on the compressed data?
I guess I'm asking if the de-duplication process is slower if the archive is compressed?

Short answer: yes, compression adds overhead, which means it's probably going to be slower.

Uncompressed file data is used to search for a duplicate block in the archive. If found, the quanta is decompressed and compared with the file data. If no match is found, the file data is compressed and written to the archive. This is potentially faster than first compressing the file data, because decompression is always faster than compression. In other words, QRecall avoids compressing a block until it has to.

Having said that, if you have really slow archive access (USB, slow network, ...) and a relatively fast (multi-core) computer, it's possible that using compression can actually speed up actions, if the amount of time the computer takes to decompress the data is less than the amount of additional time it would have taken to read an uncompressed record from the archive.

Steven J Gold

Remember RAM Doubler from Connectix back in the 90's? It combined compression with virtual memory. I had a friend who worked on it back then and one of the principles which it was built on was they discovered it was always faster to compress/decompress data -- in this case memory -- than to write it out to disk, by a magnitude of order. Microsoft bought Connectix in 2003, primarily to get Virtual PC. Connectix's patents on memory compression expired, and recently Apple used stuff from them to implement Memory Compression in what, Yosemite?, because it was faster to compress memory than to swap it to disk.

I actually found that the fastest way to move a large file (40+ GB) file from a USB-2 connected disk to my laptop is to Restore it from a QRecall archive to the target disk rather than do a straight copy. I assume this is because the archive is compressed and thereby takes fewer I/O operations to "read" the file than to do a Finder copy from the external disk?

(I would never have guessed I'd use QRecall as a "faster than Finder file copier"

)

James Bucanek

Steven J Gold wrote:Remember RAM Doubler from Connectix back in the 90's?

Wow, does that bring back memories. It also reminds me of Stacker (from my bad MS-DOS days).

Connectix's patents on memory compression expired, and recently Apple used stuff from them to implement Memory Compression in what, Yosemite?, because it was faster to compress memory than to swap it to disk.

Startup drive I/O is pretty fast these days, so I suspect this is more to conserve disk space than for performance. Most boot drives use eSATA that can write 100MB/s, and 300MB/s is no longer uncommon. The newer SSDs can move data as fast as 500MB/s. Even with a fast 6-core CPU, it would be tough to compress 300MB of data in less than a second.

I actually found that the fastest way to move a large file (40+ GB) file from a USB-2 connected disk to my laptop is to Restore it from a QRecall archive to the target disk rather than do a straight copy. I assume this is because the archive is compressed and thereby takes fewer I/O operations to "read" the file than to do a Finder copy from the external disk?

(I would never have guessed I'd use QRecall as a "faster than Finder file copier" )

That's very cool, and makes perfect sense.