Author |
Message |
9 years ago
|
#1
|
Ming-Li Wang
Joined: Jan 12, 2015
Messages: 82
Offline
|
QR became sluggish again this morning, similar to what I reported in this post (see point 4). Basically rainbow wheel spinned with every move and it's difficult to move around. Managed to get the Log window up and saw a lot of "Unable to communicate with helper" warnings. After sending in the diag report (before killing anything, as you suggested in that thread), I killed QR Monitor and it restarted automatically, as always, but the problem didn't go away. Had to logout and re-login to solve the issue. Checking the log again, and found a new warning:
Problem encountered while installing or upgrading components connection timeout: did not receive reply
Not sure if it deserved another diag report, but decided to send one in anyway.
|
|
|
9 years ago
|
#2
|
James Bucanek
Joined: Feb 14, 2007
Messages: 1572
Offline
|
Ming-Li, I reviewed the diagnostic reports you sent, especially the one sent before you tried to kill or restart anything, and what I found was ... nothing. QRecall wasn't doing much of anything at the time. Both the scheduler and monitor were idle, and there were no running actions. So I suspect something else is in play, and the QRecall application might be the cause. The next time this happens, launch the Activity Monitor. Starting in the memory tab, sort processes by total memory use (greatest to least) and note which processes are using the most memory. Also note the amount of swap used. Then go to the CPU tab, sort (greatest to least) by CPU% and see if there are one or more runaway processes. Now on to my theory: If you are in the habit of leaving the QRecall application running, and its log window open, this might be your problem. As long as the log window is open, it will read in all of the log records on file. In your situation, this could be a lot. You have a lot of actions defined, and you run those actions very frequently. You're also running a beta version of the software (which produces a lot more log messages than the regular version). Your log file preferences are set to keep log messages for up to 6 months. All of these combine to create a kind of "perfect storm" that sets up the log window to read a ridiculous amount of data. If that's what happening, your system's poor performance could simply be the amount of memory and swap space consumed the QRecall application's log window. Now if you aren't leaving the log window open, then something else is bogging down your system, which is why you need to use the Activity Monitor to conform it. If it turns out that it is the QRecall log window, you can not leave it open or try reducing amount of log history you keep.
|
- QRecall Development - |
|
|
9 years ago
|
#3
|
Ming-Li Wang
Joined: Jan 12, 2015
Messages: 82
Offline
|
James Bucanek wrote:Now on to my theory: If you are in the habit of leaving the QRecall application running, and its log window open, this might be your problem. As long as the log window is open, it will read in all of the log records on file. In your situation, this could be a lot.
Just a quick reply about this: no, I am not in the habit of leaving the QR application running (let alone the log window open). Only the QR Monitor icon sitting quietly in the menu bar (and the scheduler in the background, I assume). That's why only just now I found QR is giving me rainbow wheels again when trying to click on the Monitor icon. I'm going out in a min., however. Will investigate when I'm back.
|
|
|
9 years ago
|
#4
|
Ming-Li Wang
Joined: Jan 12, 2015
Messages: 82
Offline
|
Ok, just came in, and found QR Monitor window open with 2 actions "waiting on scheduler". Apparently the scheduler went down again, at 12:22pm according to the log. I posted the last message at 11:44 local time, so the scheduler stopped less than an hour later. But it seemed to have run into trouble much earlier, for the two "waiting" actions (both weekly merge actions) were scheduled to take place at around 5am this morning. The rainbow wheels I saw this morning should be related, for QR seems be acting normally (no rainbow wheel) now with the scheduler gone. BTW, there are two new QR processes in the Activity Monitor that I've never seen before: QRecallHelper (2 of them). Is this normal?
|
|
|
9 years ago
|
#5
|
James Bucanek
Joined: Feb 14, 2007
Messages: 1572
Offline
|
Ming-Li Wang wrote:Ok, just came in, and found QR Monitor window open with 2 actions "waiting on scheduler". Apparently the scheduler went down again, at 12:22pm according to the log.
Please send that log via a diagnostic report. If you haven't figured this out by now, you can't send too many diagnostic reports.
I posted the last message at 11:44 local time, so the scheduler stopped less than an hour later. But it seemed to have run into trouble much earlier, for the two "waiting" actions (both weekly merge actions) were scheduled to take place at around 5am this morning. The rainbow wheels I saw this morning should be related, for QR seems be acting normally (no rainbow wheel) now with the scheduler gone.
I suspect these are related, but I doubt it's a causal relationship.
BTW, there are two new QR processes in the Activity Monitor that I've never seen before: QRecallHelper (2 of them). Is this normal?
Yes, those are the two actions that are waiting to run. You can quit them via the activity monitor. Every (major) task that is performed on an archive is performed by an instance of QRecallHelper (sometimes two instances). So if you start a recall, a verify, a merge, and a compact action, you'll see four QRecallHelper instances running. The helper also performs installs, system configuration checks, updating the sleep/wake schedule, and other privileged activities.
|
- QRecall Development - |
|
|
9 years ago
|
#6
|
Ming-Li Wang
Joined: Jan 12, 2015
Messages: 82
Offline
|
James Bucanek wrote:Please send that log via a diagnostic report. If you haven't figured this out by now, you can't send too many diagnostic reports.
Done. Another follow-up: I tried to restart the scheduler with the command line you taught me in another thread:
QRecallScheduler & It didn't work. Got
-bash: QRecallScheduler: command not found
Then I tried
sudo QRecallScheduler & and got a process id. Good. But the two pending actions are still "waiting". Restarting QR monitor didn't help. There's still no sign of QR scheduler in OSX Activity Monitor, either. The process name for the id is "sudo". Guess I'll have to logout and re-login again.
|
|
|
9 years ago
|
#7
|
James Bucanek
Joined: Feb 14, 2007
Messages: 1572
Offline
|
Ming-Li Wang wrote:Another follow-up: I tried to restart the scheduler with the command line you taught me in another thread:
QRecallScheduler & It didn't work. Got
-bash: QRecallScheduler: command not found
Sorry, that was shorthand. You need to specify the path to the scheduler executable:
~/Library/Application\ Support/QRecall/QRecallScheduler &
Then I tried
sudo QRecallScheduler & and got a process id. Good.
Not good. You got a job ID for sudo. sudo then went to run the executable, couldn't find it, and quit. And it's moot anyway, because you can't run the scheduler as root.
But the two pending actions are still "waiting". Restarting QR monitor didn't help.
Restarting the monitor won't help. The monitor is just that: a monitor. It just reports on what's going on. It doesn't tell actions what to do (although it will rely commands you select in the stop menu to the action).
There's still no sign of QR scheduler in OSX Activity Monitor, either. The process name for the id is "sudo". Guess I'll have to logout and re-login again.
Even after you get the scheduler running again, you'll still need to stop those actions or choose the "Ignore Hold and Run" command from the stop button's drop-down menu in the monitor. When the scheduler starts a QRecallHelper process, it begins by placing it in a "wait for scheduler" state. Once the scheduler determines that the action is ready to run, it sends it a message that it can start. While in the "waiting" state, the action is suspended until the scheduler gives it the green light. And this is the problem I'm seeing in your log. The scheduler starts the QRecallHelper process, but then can't communicate with it ("lost communications with helper" errors). So what's happening is that the helper gets launched but never runs. I see these kind of communication failures when the system is overloaded and can't allocate the kernel resources necessary to deliver the message, or can't do it fast enough. What I suspect is that when the scheduler is trying to start these actions, your system is under tremendous load; either something is using (or was using) a lot of CPU, memory, or both. This is bogging down the communications between the scheduler and helper processes, and generally making your system run very sluggishly. And just restarting the scheduler won't help, unless the scheduler closed down cleanly and is restarted within 30 seconds. The scheduler doesn't know what QRecallHelper processes are running and, once it's forgotten about them, can't automatically reconnect with them to tell them they can run. So once you have an orphaned QRecallHelper process sitting in a "wait" state, you will either have to start it manually (using the "Ignore Hold and Run" command) or just stop the action.
|
- QRecall Development - |
|
|
9 years ago
|
#8
|
Ming-Li Wang
Joined: Jan 12, 2015
Messages: 82
Offline
|
Thank you for the detailed explanation. Since I have more actions scheduled for early Sat. morning (3-6am) when I'm most likely asleep, and it's too long to wait for the next Saturday, I rescheduled all of them to the same time this (Sun. here) morning. Surely enough, Rainbow wheels again this morning. The difference is, the scheduler didn't die. I can still see it in Activity Monitor, but it's not communicating with the helpers, several of which appear in the Activity Monitor, too. Another diag. report has been filed.
James Bucanek wrote:I see these kind of communication failures when the system is overloaded and can't allocate the kernel resources necessary to deliver the message, or can't do it fast enough. What I suspect is that when the scheduler is trying to start these actions, your system is under tremendous load; either something is using (or was using) a lot of CPU, memory, or both. This is bogging down the communications between the scheduler and helper processes, and generally making your system run very sluggishly.
Before I went to bed last night, I made sure to shotdown all my major applications (Firefox, Virtualbox, etc.), but not those "background" utilities. But now you mentioned it, one "suspect" just came to mind: Crashplan, an online backup utility, which is always running, just like QRecall. Crashplan's default (which I didn't change) is set to use "up to 80%" of CPU power (can't find any setting for memory usage) "when user is away". While QR is busy working in the middle of the night, Crashplan is busy, too, it seems. That might be the cause. I've changed Crashplan's setting to use up to 40% of CPU when I'm away. I'll reschedule the actions again and see if that changes anything. Please note, however, I've used the same setup with QRecall and Crashplan for months. It wasn't a problem with v1. (But of course, v2 is a different beast; I know.)
|
|
|
9 years ago
|
#9
|
Ming-Li Wang
Joined: Jan 12, 2015
Messages: 82
Offline
|
OK, another follow-up. Still many rainbow wheels this morning despite the adjustment made to Crashplan's CPU usage. The Log still shows many "Unable to communicate with helper" warnings, several "Action could not be started" warnings, and a new one I don't remember seeing before:
Problem encountered while installing or upgrading components
QR scheduler went down last evening without my knowledge, but it obvious came online by itself for actions after that were executed normally. In fact, it's still alive according to OSX Activity Monitor, but it isn't working anymore because QR Monitor says "Waiting for scheduler to start". The last job done took place at 3:35 am this morning, a Capture job. The first "Unable to communicate with helper" warning was at 4:20 am, a Verify job, and the first one in a string of Verify and Merge actions usually scheduled on Sat. morning (and moved to this morning for testing).
|
|
|
9 years ago
|
#10
|
James Bucanek
Joined: Feb 14, 2007
Messages: 1572
Offline
|
Here's an experiment to try: Late in the evening, open a Terminal window and start the following command:
top -l 0 -S -s 30 -n 6 -o cpu > ~/Desktop/toplog.txt and let it run all night. When you get back to your computer in the morning, switch to the Terminal window and press Control+C to stop it. The toplog.txt file will contain a sample of your system's load averages, total memory usage, swap space, and the top 6 CPU hogs, and other performance stats, recorded every 30 seconds. Compress that file and email it to me. If there are processes running overnight that are demanding a lot of CPU and/or memory, this should catch them.
|
- QRecall Development - |
|
|
9 years ago
|
#11
|
Ming-Li Wang
Joined: Jan 12, 2015
Messages: 82
Offline
|
James Bucanek wrote:Here's an experiment to try: Late in the evening, open a Terminal window and start the following command:
top -l 0 -S -s 30 -n 6 -o cpu > ~/Desktop/toplog.txt and let it run all night. When you get back to your computer in the morning, switch to the Terminal window and press Control+C to stop it.
Done and sent (only just now, though. I was busy this morning). I took a brief look myself, and didn't see anything taking too much CPU time. CrashPlan took about 2-3%, much less than I expected. This morning, everything is like yesterday. The Log again shows a string of "Unable to communicate ..." warnings since 4:20am.
|
|
|
9 years ago
|
#12
|
Adrian Chapman
Joined: Aug 16, 2010
Messages: 72
Offline
|
I'm also seeing a lot of "unable to communicate" messages, but I am not aware of any slowdowns. The messages occur as a string over very short intervals, maybe a second or two at a time.
|
|
|
9 years ago
|
#13
|
Ming-Li Wang
Joined: Jan 12, 2015
Messages: 82
Offline
|
Adrian Chapman wrote:I'm also seeing a lot of "unable to communicate" messages, but I am not aware of any slowdowns. The messages occur as a string over very short intervals, maybe a second or two at a time.
I have those as well, though the "short intervals" last a little longer here, from less than 10 sec. to about half a min. at a time. But they seem harmless, for normal QR activities resumed afterward. What I'm reporting here are not the same. QR is no longer working each morning.
|
|
|
9 years ago
|
#14
|
Ming-Li Wang
Joined: Jan 12, 2015
Messages: 82
Offline
|
Before going to bed last night, I removed Crashplan's plist from /Library/LaunchDaemons, Foxtrot (a fulltext search engin) indexer plist from ~/Library/LaunchAgents, as well as all Login Items. The system was then rebooted into a state with none of my usual background unities (except QRecall) running. I also stopped Spotlight indexing for all drives. In short, the system was in a state as idle as it could be. All QR actions scheduled for early Sat. morning were again moved to this morning, and a command line window was open, running the "top" command as last night. That's it. Still got the rainbow wheel again when trying to click QR Monitor's icon in the menubar this morning. The system is otherwise normal, meaning I could launch Firefox and other applications without delay. I launched QR from the Launchpad instead, but again clicking anything on the menu gave me the rainbow wheel. Be patient, and the menu did pop up after a few sec., and I was able to get the Log window open, and send the diag. report. The toplog was also sent, but I believe you'll see a system mostly idle throughout the night. The OSX Activity Monitor again showed QR Scheduler is alive, but it's apparently not working. The Log again shows QR finished a "Capture" job without issue at 3:35am, but was "Unable to communicate with helper" at 4:20am. (No log entry in-between.) I rebooted my system again after putting everything back to normal (Crashplan and all login items), and QR has been busy catching up with the unfinished jobs last night as I write. The strange thing is, the first job QR did was a "Capture" job scheduled at 3:50am, not the "Verify" job at 4:20am. Didn't notice this the previous days. Another thing: QR has just finished all overdue actions, and the last one--a "Verify" job--found the archive corrupt. The archive is for my system partition, and it's captured daily. All recent captures came through without issue, including the one at 3:20 this morning. Yesterday's verification was canceled, according to the Log, and the one done a day earlier was successful. I'll try to repair it later. That's it for today's morning report.
|
|
|
9 years ago
|
#15
|
Ming-Li Wang
Joined: Jan 12, 2015
Messages: 82
Offline
|
Since the problem doesn't seem to be related to competing use of system resources (my own tentative conclusion while you investigate), I decided to take a closer look at the logs and see if I can spot anything that escaped my attention. 1. A capture action is scheduled for my system partition at 3:20am daily. The archive's layer map says it's done this morning, with 793 captured items totaling 273.7 MB. I can't find any entry for it in the QR Log (see attached screenshot), even with maximum detail level. There are several QR related entries in the system.log at the same time, however:
Aug 19 03:20:00 carlo com.apple.xpc.launchd[1] (com.qrecall.hoist[505]): Endpoint has been activated through legacy launch(3) APIs. Please switch to XPC or bootstrap_check_in(): QRHoist Aug 19 03:20:16 carlo QRecallHelper[506]: Failed to initialize IconCache named: com.apple.iconservices with error: Error Domain=NSCocoaErrorDomain Code=4097 "Couldn?t communicate with a helper application." (connection to service named com.apple.iconservices) UserInfo=0x7ff2b941f840 {NSDebugDescription=connection to service named com.apple.iconservices} Aug 19 03:20:16 carlo QRecallHelper[506]: Error returned from iconservicesagent: Error Domain=NSCocoaErrorDomain Code=4097 "Couldn?t communicate with a helper application." (connection to service named com.apple.iconservices) UserInfo=0x7ff2be2bf6f0 {NSDebugDescription=connection to service named com.apple.iconservices} ... (the previous one repeated for 12 times) Aug 19 03:20:16 carlo QRecallHelper[506]: ImageIO: CGImageDestinationFinalize image destination must have at least one image ... (Another batch of the previous two, including the repetition.)
2. The capture actions scheduled for 3:50am was indeed run and concluded successfully, with nothing to capture. Don't know why QR executed it again after reboot this morning (as reported in my previous post). 3. The first in a string of "Unable to communicate with helper" warnings (at 4:20:25 am) in the QR Log has a note that says:
[NOTE: this exception originated in the server.] | connection timeout: did not receive reply
The others says:
cannot connect with helper port: QRecallHelper.2307054f
There is no QR-related entry in the system.log at this time. 4. QR scheduler managed to get itself up and executed a capture job at 6:34:21am.
|
Filename |
2.jpg |
Download
|
Description |
No description given |
Filesize |
75 Kbytes
|
Downloaded: |
1036 time(s) |
|
|
|
|
|
|