Message |
|
B12 went through the grueling early Sat. morning schedule (moved to this morning as usual) for the first time here and survived. It's still working fine, and doesn't give me spinning rainbow wheels. Found some peculiar entries in the Log Window, however. 1. Five consecutive "could not communicate with helper" entries at address 8:46am, right about when I touched my computer for the first time in the morning. There were 6 of them yesterday morning (I forgot to add Tuesday morning to the group of "Sat. morning actions", so I didn't report back yesterday morning), also at about the time I touched my computer for the first time. Note that the system sleep timer is set to "never", while HDs and the displays are allowed to sleep. 2. There are 2 errors associated with the actions that took place overnight. The first at 4:49:55am, "Helper did not start" after a successful "merge" action. Then "helped did not start" again at 5:34:55am, which stopped a "verify" action. Both errors gave the same description (with different timestamps, of course):
Distributed objects message send timed out (timeout: 462228595.910597 at time: 462228595.922145) 1
A diag report has been sent.
|
|
|
Many thanks for the detailed explanation, as always.
James Bucanek wrote:You might consider changing the schedule of your actions that capture a specific subset of files (like documents, photos, or music) to use the new "Capture Items Changed" event schedule.
This is something I've meant to try since v2 beta testing, but never got around to really set it up. Good. Will change the every-15-min. action (there's only one) right away. One question: how reliable is OSX/HFS's file change mechanism? Do I need to compliment the action with, e.g., daily capture?
|
|
|
James Bucanek wrote:
1. A capture action is scheduled for my system partition at 3:20am daily. The archive's layer map says it's done this morning, with 793 captured items totaling 273.7 MB. I can't find any entry for it in the QR Log (see attached screenshot), even with maximum detail level.
If you send another diagnostic report I can review the log.
It happened in the early morning on 8/19, Taiwan time, so it's already in the diag. report I gave you that day. B12 was installed yesterday afternoon, but it didn't work. Not a single action was executed until I rebooted the system this morning. As I'll be out of town for a few days, I won't be able to tell if B12 can survive the early morning schedule without issue until next Tuesday.
|
|
|
Since the problem doesn't seem to be related to competing use of system resources (my own tentative conclusion while you investigate), I decided to take a closer look at the logs and see if I can spot anything that escaped my attention. 1. A capture action is scheduled for my system partition at 3:20am daily. The archive's layer map says it's done this morning, with 793 captured items totaling 273.7 MB. I can't find any entry for it in the QR Log (see attached screenshot), even with maximum detail level. There are several QR related entries in the system.log at the same time, however:
Aug 19 03:20:00 carlo com.apple.xpc.launchd[1] (com.qrecall.hoist[505]): Endpoint has been activated through legacy launch(3) APIs. Please switch to XPC or bootstrap_check_in(): QRHoist Aug 19 03:20:16 carlo QRecallHelper[506]: Failed to initialize IconCache named: com.apple.iconservices with error: Error Domain=NSCocoaErrorDomain Code=4097 "Couldn?t communicate with a helper application." (connection to service named com.apple.iconservices) UserInfo=0x7ff2b941f840 {NSDebugDescription=connection to service named com.apple.iconservices} Aug 19 03:20:16 carlo QRecallHelper[506]: Error returned from iconservicesagent: Error Domain=NSCocoaErrorDomain Code=4097 "Couldn?t communicate with a helper application." (connection to service named com.apple.iconservices) UserInfo=0x7ff2be2bf6f0 {NSDebugDescription=connection to service named com.apple.iconservices} ... (the previous one repeated for 12 times) Aug 19 03:20:16 carlo QRecallHelper[506]: ImageIO: CGImageDestinationFinalize image destination must have at least one image ... (Another batch of the previous two, including the repetition.)
2. The capture actions scheduled for 3:50am was indeed run and concluded successfully, with nothing to capture. Don't know why QR executed it again after reboot this morning (as reported in my previous post). 3. The first in a string of "Unable to communicate with helper" warnings (at 4:20:25 am) in the QR Log has a note that says:
[NOTE: this exception originated in the server.] | connection timeout: did not receive reply
The others says:
cannot connect with helper port: QRecallHelper.2307054f
There is no QR-related entry in the system.log at this time. 4. QR scheduler managed to get itself up and executed a capture job at 6:34:21am.
|
|
|
Before going to bed last night, I removed Crashplan's plist from /Library/LaunchDaemons, Foxtrot (a fulltext search engin) indexer plist from ~/Library/LaunchAgents, as well as all Login Items. The system was then rebooted into a state with none of my usual background unities (except QRecall) running. I also stopped Spotlight indexing for all drives. In short, the system was in a state as idle as it could be. All QR actions scheduled for early Sat. morning were again moved to this morning, and a command line window was open, running the "top" command as last night. That's it. Still got the rainbow wheel again when trying to click QR Monitor's icon in the menubar this morning. The system is otherwise normal, meaning I could launch Firefox and other applications without delay. I launched QR from the Launchpad instead, but again clicking anything on the menu gave me the rainbow wheel. Be patient, and the menu did pop up after a few sec., and I was able to get the Log window open, and send the diag. report. The toplog was also sent, but I believe you'll see a system mostly idle throughout the night. The OSX Activity Monitor again showed QR Scheduler is alive, but it's apparently not working. The Log again shows QR finished a "Capture" job without issue at 3:35am, but was "Unable to communicate with helper" at 4:20am. (No log entry in-between.) I rebooted my system again after putting everything back to normal (Crashplan and all login items), and QR has been busy catching up with the unfinished jobs last night as I write. The strange thing is, the first job QR did was a "Capture" job scheduled at 3:50am, not the "Verify" job at 4:20am. Didn't notice this the previous days. Another thing: QR has just finished all overdue actions, and the last one--a "Verify" job--found the archive corrupt. The archive is for my system partition, and it's captured daily. All recent captures came through without issue, including the one at 3:20 this morning. Yesterday's verification was canceled, according to the Log, and the one done a day earlier was successful. I'll try to repair it later. That's it for today's morning report.
|
|
|
Adrian Chapman wrote:I'm also seeing a lot of "unable to communicate" messages, but I am not aware of any slowdowns. The messages occur as a string over very short intervals, maybe a second or two at a time.
I have those as well, though the "short intervals" last a little longer here, from less than 10 sec. to about half a min. at a time. But they seem harmless, for normal QR activities resumed afterward. What I'm reporting here are not the same. QR is no longer working each morning.
|
|
|
James Bucanek wrote:Here's an experiment to try: Late in the evening, open a Terminal window and start the following command:
top -l 0 -S -s 30 -n 6 -o cpu > ~/Desktop/toplog.txt and let it run all night. When you get back to your computer in the morning, switch to the Terminal window and press Control+C to stop it.
Done and sent (only just now, though. I was busy this morning). I took a brief look myself, and didn't see anything taking too much CPU time. CrashPlan took about 2-3%, much less than I expected. This morning, everything is like yesterday. The Log again shows a string of "Unable to communicate ..." warnings since 4:20am.
|
|
|
OK, another follow-up. Still many rainbow wheels this morning despite the adjustment made to Crashplan's CPU usage. The Log still shows many "Unable to communicate with helper" warnings, several "Action could not be started" warnings, and a new one I don't remember seeing before:
Problem encountered while installing or upgrading components
QR scheduler went down last evening without my knowledge, but it obvious came online by itself for actions after that were executed normally. In fact, it's still alive according to OSX Activity Monitor, but it isn't working anymore because QR Monitor says "Waiting for scheduler to start". The last job done took place at 3:35 am this morning, a Capture job. The first "Unable to communicate with helper" warning was at 4:20 am, a Verify job, and the first one in a string of Verify and Merge actions usually scheduled on Sat. morning (and moved to this morning for testing).
|
|
|
Thank you for the detailed explanation. Since I have more actions scheduled for early Sat. morning (3-6am) when I'm most likely asleep, and it's too long to wait for the next Saturday, I rescheduled all of them to the same time this (Sun. here) morning. Surely enough, Rainbow wheels again this morning. The difference is, the scheduler didn't die. I can still see it in Activity Monitor, but it's not communicating with the helpers, several of which appear in the Activity Monitor, too. Another diag. report has been filed.
James Bucanek wrote:I see these kind of communication failures when the system is overloaded and can't allocate the kernel resources necessary to deliver the message, or can't do it fast enough. What I suspect is that when the scheduler is trying to start these actions, your system is under tremendous load; either something is using (or was using) a lot of CPU, memory, or both. This is bogging down the communications between the scheduler and helper processes, and generally making your system run very sluggishly.
Before I went to bed last night, I made sure to shotdown all my major applications (Firefox, Virtualbox, etc.), but not those "background" utilities. But now you mentioned it, one "suspect" just came to mind: Crashplan, an online backup utility, which is always running, just like QRecall. Crashplan's default (which I didn't change) is set to use "up to 80%" of CPU power (can't find any setting for memory usage) "when user is away". While QR is busy working in the middle of the night, Crashplan is busy, too, it seems. That might be the cause. I've changed Crashplan's setting to use up to 40% of CPU when I'm away. I'll reschedule the actions again and see if that changes anything. Please note, however, I've used the same setup with QRecall and Crashplan for months. It wasn't a problem with v1. (But of course, v2 is a different beast; I know.)
|
|
|
James Bucanek wrote:Please send that log via a diagnostic report. If you haven't figured this out by now, you can't send too many diagnostic reports.
Done. Another follow-up: I tried to restart the scheduler with the command line you taught me in another thread:
QRecallScheduler & It didn't work. Got
-bash: QRecallScheduler: command not found
Then I tried
sudo QRecallScheduler & and got a process id. Good. But the two pending actions are still "waiting". Restarting QR monitor didn't help. There's still no sign of QR scheduler in OSX Activity Monitor, either. The process name for the id is "sudo". Guess I'll have to logout and re-login again.
|
|
|
Ok, just came in, and found QR Monitor window open with 2 actions "waiting on scheduler". Apparently the scheduler went down again, at 12:22pm according to the log. I posted the last message at 11:44 local time, so the scheduler stopped less than an hour later. But it seemed to have run into trouble much earlier, for the two "waiting" actions (both weekly merge actions) were scheduled to take place at around 5am this morning. The rainbow wheels I saw this morning should be related, for QR seems be acting normally (no rainbow wheel) now with the scheduler gone. BTW, there are two new QR processes in the Activity Monitor that I've never seen before: QRecallHelper (2 of them). Is this normal?
|
|
|
James Bucanek wrote:Now on to my theory: If you are in the habit of leaving the QRecall application running, and its log window open, this might be your problem. As long as the log window is open, it will read in all of the log records on file. In your situation, this could be a lot.
Just a quick reply about this: no, I am not in the habit of leaving the QR application running (let alone the log window open). Only the QR Monitor icon sitting quietly in the menu bar (and the scheduler in the background, I assume). That's why only just now I found QR is giving me rainbow wheels again when trying to click on the Monitor icon. I'm going out in a min., however. Will investigate when I'm back.
|
|
|
QR became sluggish again this morning, similar to what I reported in this post (see point 4). Basically rainbow wheel spinned with every move and it's difficult to move around. Managed to get the Log window up and saw a lot of "Unable to communicate with helper" warnings. After sending in the diag report (before killing anything, as you suggested in that thread), I killed QR Monitor and it restarted automatically, as always, but the problem didn't go away. Had to logout and re-login to solve the issue. Checking the log again, and found a new warning:
Problem encountered while installing or upgrading components connection timeout: did not receive reply
Not sure if it deserved another diag report, but decided to send one in anyway.
|
|
|
Good. I like command line. Thanks!
|
|
|
James Bucanek wrote:Oh, and to make matters more complicated, QRecall 1.x uses a different method for identifying volumes. Prior to 2.0, QRecall used a combination of the filesystem identifier (part of the BSD filesystem API), volume creation date, size, and name.
Ah, this might be it. Remember I dialed my system back briefly to a state when it had v1.2.3 installed in order to confirm the fix for v1-to-v2 crash? QR might have touched the archive then since it started with the system. Never mind. At least I can reduce the number of volumes to 2.
|
|
|
|
|