QRecall

Steven J Gold wrote:Any known issues with QRecall under Catalina?

Other than "it won't work at all," things are looking pretty good.

But seriously, a Catalina compatible version is in the works. It's been a long road because Catalina breaks many basic assumptions about the filesystem and volumes.

In a nutshell, Catalina splits a bootable macOS volume into two volumes: a read-only system volume and a read-write "data" volume, where all of the modifiable files get stored. It then uses a new filesystem construct (called a "firm" link) to blend the contents of the two together so it appears to be a single volume.

Since the idea of backup software is to capture the files you'd want to restore, the new QRecall captures only the "data" half of a system/data volume pair. This actually accomplishes a long-term goal of QRecall, which was to isolate just the files you need to restore a bootable volume and not capture any of the immutable system files (that the OS installer would simply overwrite anyway).

While conceptually simple, this has resulted in large number of adjustments to the software. QRecall has always been "device" oriented, capturing and restoring all of the physical files on a single volume. So the idea that all of the files on a volume are, well, on that volume is deeply ingrained in the software.

But we have made a lot of progress. I can't guarantee it will be ready to for the Catalina release, but it should be close. I hope to have a beta within another week or so.

Well, color me dumbfounded.

Bruce, you were absolutely right. This is not a snapshot issue.

TL;DR Try reindexing the archive

The diagnostic report pinpointed what was eating up the time, but I have absolutely no (definitive) explanation as to why.

So here's what's going on (technical details). The archive maintains a very (sometimes very, very) large hash table used to search for duplicate data that's already been added to the archive. This table is so big, it's impractical to make a copy of it every time you perform a capture. So when QRecall captures a modest amount of data, instead of copying and updating the master hash file, it writes the handful of updates to an "adjunct" hash file. This adjunct hash file is read in again when the next capture starts, on the theory that the adjunct file will be orders of magnitude smaller than the master hash file.

Eventually, the adjunct hash entries will exceed a threshold, QRecall will "bite the bullet," making a copy the master hash file and update it. At this point there are no adjunct entries and the whole thing starts over again.

So back to the problem. Your capture is getting stuck reading in the adjunct hash entries. Here's the (interesting part of the) sample trace:

+               9814 -[CaptureCommand execute] (in QRecallHelper)

    +                 9814 -[RepositoryCommand prepareRepository] (in QRecallHelper)

    +                   9814 -[RepositoryPackage prepareWithMode:] (in QRecallHelper)

    +                     9814 -[DataHash prepareWithMode:] (in QRecallHelper)

    +                       9317 -[DataHash loadAdjunctEntries] (in QRecallHelper)

    +                       ! 9315 -[NegativeChecksumMap add:] (in QRecallHelper)

    +                       ! 2 -[DataHash addEntry:] (in QRecallHelper)

    +                       !   2 -[DataHash insertEntryIndexIntoHash:forChecksum:] (in QRecallHelper)

    +                       497 -[DataHash loadAdjunctEntries] (in QRecallHelper)

    +                         497 DDReadFile (in QRecallHelper)

    +                           497 read  (in libsystem_kernel.dylib) + 10  [0x7fff5df83ef2]

During the sample period, 5% of the time was spent reading the adjunct file and 95% of the time was spent inserting those entries into the in-memory hash cache.

And here's where it gets weird. That insert function ([NegativeChecksumMap add:]) is literally 5 lines long. It consists of an exclusive or, a bit shift, a mask, an address calculation, and an add. A modern CPU should be able to do several hundred million of these a second. It should be so fast, it shouldn't even show up in the stack trace. Yet, it's accounting for 95% of the delay...

My only guess is that it might be hitting virtual memory, assuming there are other large (memory footprint) processes running at a the same time. Or, the negative map has been mapped into memory and the page loads are just really, really, slow for some reason. Basically what I'm saying is that VM paging/contention is the only thing I can think of that would account for this miserable performance.

So that's the problem. One "solution" would be to reindex the archive. A reindex will rebuild all of the index files, including the hash file, from scratch. At the end, the hash file will be complete and up-to-date and there's won't be any adjunct entries to read or write. Of course, this just kicks the problem down the road as the adjunct entries will, again, start to accumulate as small captures are completed.

But start with a reindex and see if that resolves the problem.

Bruce,

If you suspect that snapshots are not the problem, the next step would be to send a diagnostic report during that 10 minute period. The diagnostic report will take a sample of all running QRecall processes. If the capture action is stuck, the report should pinpoint exactly where.

I'd be very curious to see what you discover.

The magic tool is tmutil.

It has a localsnapshot command to create a local snapshot of all APFS volumes. (I'm not aware of any way of creating a snapshot for a specific volume, the way QRecall does.)

You can also list the snapshots on a volume (listlocalsnapshots and listlocalsnapshotdates), and delete them by date (deletelocalsnapshots).

Both QRecall and macOS are responsible for deleting their stale snapshots, so any snapshots you create will eventually get deleted.

Bruce,

I have noticed that the Mojave snapshot process can be quit lengthy at times. I'm not sure exactly what the criteria is, but I suspect it competes with other changes being made to the volume at the same time.

I noticed this first when I was stepping through a test version of the capture action using the debugger and thought the process had deadlocked. I paused the executable to find it waiting in the create_snapshot() function. After about three minutes, it finally finished and went on.

If this is a problem for some reason, you can still disable snapshots by turning off the "Capture a Snapshot" option in the advanced settings. Just be warned that this option might go away in future versions, because without taking a snapshot capture is deeply problematic in Mojave and later.

Actually, that's a pretty good solution, espeically if you're using APFS.

By default, copying a file in APFS makes a clone of that file; essentially a "snapshot" of the file that doesn't use any additional storage (until one of the files is modified). So copying your found files into a folder, capturing that folder, and then deleting those files should be remarkably fast and efficient.

Pro tip: I can see at least two ways of automating this.

(1) The first would be to automate your copy routine to copy your recent files into a fixed folder, then set up a QRecall action to capture that folder whenever it changes. As soon as your copy is done, the capture action will take off.

(2) Same capture action, but run on a schedule. Then add a prolog script that performs the find and copy before the capture runs.

Erik,

QRecall works at the filesystem level. So-called "smart" folders are an abstraction created by the Finder. The Finder stores the metadata needed to find and display the contents of its "smart" folder in a file, and that's what QRecall captures.

Unfortunately, QRecall cannot look into the mind of the Finder to find out what those files are, anymore than it could look into iTunes to capture the audio files in a playlist.

I've considered adding the ability to capture items based on some search criteria, but there are so many problems with this idea I just keep putting it off.

Bernd,

QRecall 3.0 (the next major release) will have new features that address cascading, cloud, and off-site backups. It also has a boatload of performance and reliability improvements.

We have no firm estimate of when it will be available, but look for beta tests to start sometime this fall.

Johannes,

Thanks for the confirmation!

Look for a new release of QRecall that works around this issue.

What appears to be happening is that the repository.data file?the file with all the important data?is taking up more physical (allocated) space than it actually contains.

In the case of Archiv 1, the repository.data file contained 40GB of data, but occupied 79GB of disk space. Stunningly, Archiv 2 contained only 0.4GB of data, but occupied 151GB of disk space.

While adding sparse file support to QRecall 3.0, I've noticed that APFS can sometime over allocate space for a file and that is what seems to be happening here.

My working theory is that APFS is not correctly handing pre-allocation requests. As the repository.data file grows during a capture, QRecall periodically makes pre-allocation requests at the end of the file so if the disk suddenly runs out of space, QRecall has enough "head room" to write its wrap-up metadata and session records.

To test this theory, I've built a pre-release version of QRecall 2.1.16(1) that you can download and install. This hacked version doesn't perform any preallocation of the repository.data file. It won't fix the over-allocated space you have now, but if you compact the archive and perform new captures, those new captures won't cause it to over-allocate the file?assuming my theory is correct.

Give this version and try and please keep me posted.

Johannes,

You are the third user that has reported something like this. And I'm beginning to suspect it's a bug in APFS. It also makes no sense to me that compacting the archive would make any difference.

I would be interested in getting some allocation information about the archive files by running the 'ls -lskn' Terminal command on the archive (particularly at a point in time when the archive size and Finder size disagree), like this:

ls -lskn /Volumes/YourVolume/Path/To/Archive.quanta

Secondly, I'd be interested to know if the disk repair tool in Disk Utility produces any anomalous output when you repair that volume.

Finally, I'd be curious to know if there are any snapshots of that volume. You can find out by issuing the command:

tmutil listlocalsnapshots /Volumes/YourVolume

When you use the top shade to hide earlier layers, the item browser can't use the pre-calculated size hints that are normally available to it.

To determine the size of a folder or package, the browser view must read all of the items it contains. Since this can, potentially, mean millions of items it doesn't do that automatically.

If you "drill down" into that folder, and every subfolder it contains, when you return back to the top level folder you'll see the calculated size. This is easiest to do in list view.

Steven J Gold wrote: is there any way of looking at a specific layer/backup to see what files were captured, especially if there are any large ones?

Yes, there is.

Every layer is a delta, recording/adding only those changes that have occurred since the last layer.

Open the archive and drag the top and bottom layer shades to isolate a single layer. By hiding all of the changes that occurred before and after that layer, the item browser will show just those items that were captured in that layer. (If what you're looking for isn't obvious, use the View menu to show invisible items and package contents.)

You can also also use the shades to isolate a group of layers, showing you all of the items captured last week, for example.

Mike,

Thanks for sending the diagnostic report.

It would appear that you're running into an issue (read "bug") that was addressed in QRecall 2.1.14.

I would suggest you start by updating to the latest QRecall (in the QRecall app choose QRecall > Check for Updates…).

Once updated, verify the archive again. If you encounter errors, repair once more. I suggest choosing the default option to reconstruct your redundant data.

If the repair still fails, or fails to verify afterwards, please send another diagnostic report and we'll investigate further.

Mike,

A repaired archive should verify, so that's not right.

Start by sending a diagnostic report. In the QRecall application choose Help > Send Report…. We'll review your logs and see what we can find.