QRecall.log may quickly fill up the entire local storage?

A weird anecdote:

Yesterday, one of my storage devices decided to activate a patch on its own schedule - an action I had expressly forbidden. It took down an iSCSI target that my Mac was currently using to run a capture action to. When the service came back seconds later, the volume was re-mounted, and QRecall proceeded to touch the .lock file inside the archive but wouldn't write anything to the archive files themselves anymore/again. As I noticed a few minutes later from the alarms going off, it did write several hundred MiB/s of text into QRecall.log, though. By the time I took notice the log file was some 180GiB in size. I panicked, killall'ed QRecallHelper and deleted the QRecall.log so it wouldn't choke my Mac with clogged local storage.

When I came to, I realized I should have taken a sample of that log, maybe just a »tail -1000« of it, so it could be analyzed. I am sorry about that, but there's nothing I can (am willing to) do now.

I'm relatively certain that the log grew so big so fast because the same message was written to it over and over. The volume will have appeared to QRecall as local APFS drive, maybe it has code to deal with breaking network shares but can't cope with local disks vanishing? Maybe QRecall wants to consider skipping such messages if they occur more than a number of times per duration, or cease logging if there's been more than a certain amount of log per duration?

I'm not claiming there to be any malfunction, just sharing this so it might get attention if it needs it. It's QRecall 2.2.14.1 on macOS Ventura 13.4 on a 2020 M1 Mac mini. As always, I'm absolutely willing to help debug whatever I can if prompted to.

Olefin,

Olfan wrote:By the time I took notice the log file was some 180GiB in size. I panicked, killall'ed QRecallHelper and deleted the QRecall.log so it wouldn't choke my Mac with clogged local storage.

That was clearly the right thing to do. Once the connection to the volume was broken, the helper process was useless anyway.

QRecall's pretty fanatic about logging everything it does, but even I'm having trouble thinking of anything that would generate 180GB of log data without stopping. Most logging is self-limiting: you get an error, or three, or a hundred, but ultimately the process gives up, logs one final "I've given up" message, and terminates.

The only code that will log an error and continue to plow ahead is during a repair, and that code (at least in QRecall 3.0) does limit the number of messages it logs before logging just a summary. There is also code that corrects slightly damaged data, but if the drive was dis-connected there's no way successive corrections could be successful.

So without a peak at what was getting logged, I can't offer much in the way of useful suggestions, other that what you've already done.

But this did get me thinking.

It shouldn't be too hard to put a throttle on the log output so if the process is trying to log a ridiculous amount of data, say more than 1,000,000 messages an hour, it can simply shut the log output off.

I'll put that on the wish list for 3.x

Just a quick update:
this was a one-time event, it didn't repeat and I can't seem to trigger it at all (which is great).
I've moved Macs in the meantime, taking the opportunity to move to QRecall 3.0 as well.
Should I ever observe a similar event, I'll (hopefully) remember this thread and dig it up again.

James Bucanek wrote:Olefin,

Olfan wrote:By the time I took notice the log file was some 180GiB in size. I panicked, killall'ed QRecallHelper and deleted the QRecall.log so it wouldn't choke my Mac with clogged local storage. https://sharpedgeshop.com/collections/gyuto-knives-chefs-knife

That was clearly the right thing to do. Once the connection to the volume was broken, the helper process was useless anyway.

QRecall's pretty fanatic about logging everything it does, but even I'm having trouble thinking of anything that would generate 180GB of log data without stopping. Most logging is self-limiting: you get an error, or three, or a hundred, but ultimately the process gives up, logs one final "I've given up" message, and terminates.

The only code that will log an error and continue to plow ahead is during a repair, and that code (at least in QRecall 3.0) does limit the number of messages it logs before logging just a summary. There is also code that corrects slightly damaged data, but if the drive was dis-connected there's no way successive corrections could be successful.

So without a peak at what was getting logged, I can't offer much in the way of useful suggestions, other that what you've already done.

Your troubleshooting approach seems quite thorough, especially given the intricacies of handling log data. It's impressive how QRecall is designed to log and handle errors, limiting the potential for runaway log growth.