Ming-Li Wang wrote:Still got the rainbow wheel again when trying to click QR Monitor's icon in the menubar this morning. The system is otherwise normal, meaning I could launch Firefox and other applications without delay. I launched QR from the Launchpad instead, but again clicking anything on the menu gave me the rainbow wheel. Be patient, and the menu did pop up after a few sec., and I was able to get the Log window open, and send the diag. report. The toplog was also sent, but I believe you'll see a system mostly idle throughout the night. The OSX Activity Monitor again showed QR Scheduler is alive, but it's apparently not working.
After spending a lot of quality time with the logs, the crash reports, the toplog.txt files you've been collecting, and some experimenting, I think I'm getting a handle on what's going on, although I still don't know why.
I finally figured out the rainbow cursor problem: The helper and scheduler processes are hung up. It's not a system load or resource issue, these processes are simply locked up. When you launch QRecall, the first thing it does it try to contact the monitor and scheduler processes and, you guessed it, promptly locks up waiting for them to respond.
The messages will eventually time out and the QRecall app will run, but will report it can't install the monitor and it's waiting for the scheduler to start, even though it's already running.
The new version (2.0.0b12) tries to address these issues by reworking how messages are retried. The new logic might still lock up, but for no more than a minute or two (worst case). So the monitor and scheduler should recover within a reasonable amount of time (minutes not hours) and get back to running.
The root of the problem is the helper processes that are started around 4:00 AM simply stop running—and I have no idea why. They don't log any error messages (the log messages from the process simply stop), they don't terminate unexpectedly (nothing in the system log), they don't crash (no crash log for the process), they just stop running. The scheduler and monitor continue to try and communicate with the helper, which locks them up too waiting for a response that never happens.
Without a sample of what's going on within the QRecallHelper process, I'm at a loss to explain why it would do this. It also doesn't make much sense that it only does it at 4:00 AM and not other times.
QRecall 2.0.0b12 addresses some of these issues and adds additional debugging information that will hopefully allow me to get closer to the problem.
Ming-Li Wang wrote:Since the problem doesn't seem to be related to competing use of system resources (my own tentative conclusion while you investigate), I decided to take a closer look at the logs and see if I can spot anything that escaped my attention.
And you are correct, this is not resource issue.
1. A capture action is scheduled for my system partition at 3:20am daily. The archive's layer map says it's done this morning, with 793 captured items totaling 273.7 MB. I can't find any entry for it in the QR Log (see attached screenshot), even with maximum detail level.
If you send another diagnostic report I can review the log.
Aug 19 03:20:00 carlo com.apple.xpc.launchd[1] (com.qrecall.hoist[505]): Endpoint has been activated through legacy launch(3) APIs. Please switch to XPC or bootstrap_check_in(): QRHoist
This is normal. Apple wants developers to switch from using Mach ports to XPC. So whenever QRecall starts a privileged service using Mach ports, the system spits out this message. But until Apple provides end-to-end XPC communications, QRecall has to use Mach ports.
Aug 19 03:20:16 carlo QRecallHelper[506]: Failed to initialize IconCache named: com.apple.iconservices with error: Error Domain=NSCocoaErrorDomain Code=4097 "Couldn?t communicate with a helper application." (connection to service named com.apple.iconservices) UserInfo=0x7ff2b941f840 {NSDebugDescription=connection to service named com.apple.iconservices}
Aug 19 03:20:16 carlo QRecallHelper[506]: Error returned from iconservicesagent: Error Domain=NSCocoaErrorDomain Code=4097 "Couldn?t communicate with a helper application." (connection to service named com.apple.iconservices) UserInfo=0x7ff2be2bf6f0 {NSDebugDescription=connection to service named com.apple.iconservices}
Aug 19 03:20:16 carlo QRecallHelper[506]: ImageIO: CGImageDestinationFinalize image destination must have at least one image
These are errors from OS X's launch services framework. As QRecall captures items, it queries the launch services API for things like the item's icon, localized display name, filename character encoding, and so on. There are number of issues with launch services going back more than a decade, and sometimes it can't/won't return this information. In more recent versions of OS X, it logs messages like this to the console. (Earlier versions of OS X would simply crash.) You'll probably notice corresponding messages in the QRecall log like "cannot obtain icon for ...". These problems are entirely cosmetic, as this information is only used to display the items in the archive browser. They're annoying, but can be safely ignored. Sometimes these issue are caused by a corrupted launch services database and rebuilding your LSDB might make these go away (Google "rebuild launch services database").
2. The capture actions scheduled for 3:50am was indeed run and concluded successfully, with nothing to capture. Don't know why QR executed it again after reboot this morning (as reported in my previous post).
If the scheduler terminated abnormally, it might not have remembered that it already ran that action would immediately run it again when restarted.
3. The first in a string of "Unable to communicate with helper" warnings (at 4:20:25 am) in the QR Log has a note that says:
[NOTE: this exception originated in the server.] | connection timeout: did not receive reply
The others says:
cannot connect with helper
port: QRecallHelper.2307054f
These are the helper communication problems that are at the root of the issue.
4. QR scheduler managed to get itself up and executed a capture job at 6:34:21am.
Small victories.