Message |
|
James Bucanek wrote:Here's an experiment to try: Late in the evening, open a Terminal window and start the following command:
top -l 0 -S -s 30 -n 6 -o cpu > ~/Desktop/toplog.txt and let it run all night. When you get back to your computer in the morning, switch to the Terminal window and press Control+C to stop it.
Done and sent (only just now, though. I was busy this morning). I took a brief look myself, and didn't see anything taking too much CPU time. CrashPlan took about 2-3%, much less than I expected. This morning, everything is like yesterday. The Log again shows a string of "Unable to communicate ..." warnings since 4:20am.
|
 |
|
OK, another follow-up. Still many rainbow wheels this morning despite the adjustment made to Crashplan's CPU usage. The Log still shows many "Unable to communicate with helper" warnings, several "Action could not be started" warnings, and a new one I don't remember seeing before:
Problem encountered while installing or upgrading components
QR scheduler went down last evening without my knowledge, but it obvious came online by itself for actions after that were executed normally. In fact, it's still alive according to OSX Activity Monitor, but it isn't working anymore because QR Monitor says "Waiting for scheduler to start". The last job done took place at 3:35 am this morning, a Capture job. The first "Unable to communicate with helper" warning was at 4:20 am, a Verify job, and the first one in a string of Verify and Merge actions usually scheduled on Sat. morning (and moved to this morning for testing).
|
 |
|
Thank you for the detailed explanation. Since I have more actions scheduled for early Sat. morning (3-6am) when I'm most likely asleep, and it's too long to wait for the next Saturday, I rescheduled all of them to the same time this (Sun. here) morning. Surely enough, Rainbow wheels again this morning. The difference is, the scheduler didn't die. I can still see it in Activity Monitor, but it's not communicating with the helpers, several of which appear in the Activity Monitor, too. Another diag. report has been filed.
James Bucanek wrote:I see these kind of communication failures when the system is overloaded and can't allocate the kernel resources necessary to deliver the message, or can't do it fast enough. What I suspect is that when the scheduler is trying to start these actions, your system is under tremendous load; either something is using (or was using) a lot of CPU, memory, or both. This is bogging down the communications between the scheduler and helper processes, and generally making your system run very sluggishly.
Before I went to bed last night, I made sure to shotdown all my major applications (Firefox, Virtualbox, etc.), but not those "background" utilities. But now you mentioned it, one "suspect" just came to mind: Crashplan, an online backup utility, which is always running, just like QRecall. Crashplan's default (which I didn't change) is set to use "up to 80%" of CPU power (can't find any setting for memory usage) "when user is away". While QR is busy working in the middle of the night, Crashplan is busy, too, it seems. That might be the cause. I've changed Crashplan's setting to use up to 40% of CPU when I'm away. I'll reschedule the actions again and see if that changes anything. Please note, however, I've used the same setup with QRecall and Crashplan for months. It wasn't a problem with v1. (But of course, v2 is a different beast; I know.)
|
 |
|
James Bucanek wrote:Please send that log via a diagnostic report. If you haven't figured this out by now, you can't send too many diagnostic reports.
Done. Another follow-up: I tried to restart the scheduler with the command line you taught me in another thread:
QRecallScheduler & It didn't work. Got
-bash: QRecallScheduler: command not found
Then I tried
sudo QRecallScheduler & and got a process id. Good. But the two pending actions are still "waiting". Restarting QR monitor didn't help. There's still no sign of QR scheduler in OSX Activity Monitor, either. The process name for the id is "sudo". Guess I'll have to logout and re-login again.
|
 |
|
Ok, just came in, and found QR Monitor window open with 2 actions "waiting on scheduler". Apparently the scheduler went down again, at 12:22pm according to the log. I posted the last message at 11:44 local time, so the scheduler stopped less than an hour later. But it seemed to have run into trouble much earlier, for the two "waiting" actions (both weekly merge actions) were scheduled to take place at around 5am this morning. The rainbow wheels I saw this morning should be related, for QR seems be acting normally (no rainbow wheel) now with the scheduler gone. BTW, there are two new QR processes in the Activity Monitor that I've never seen before: QRecallHelper (2 of them). Is this normal?
|
 |
|
James Bucanek wrote:Now on to my theory: If you are in the habit of leaving the QRecall application running, and its log window open, this might be your problem. As long as the log window is open, it will read in all of the log records on file. In your situation, this could be a lot.
Just a quick reply about this: no, I am not in the habit of leaving the QR application running (let alone the log window open). Only the QR Monitor icon sitting quietly in the menu bar (and the scheduler in the background, I assume). That's why only just now I found QR is giving me rainbow wheels again when trying to click on the Monitor icon. I'm going out in a min., however. Will investigate when I'm back.
|
 |
|
QR became sluggish again this morning, similar to what I reported in this post (see point 4). Basically rainbow wheel spinned with every move and it's difficult to move around. Managed to get the Log window up and saw a lot of "Unable to communicate with helper" warnings. After sending in the diag report (before killing anything, as you suggested in that thread), I killed QR Monitor and it restarted automatically, as always, but the problem didn't go away. Had to logout and re-login to solve the issue. Checking the log again, and found a new warning:
Problem encountered while installing or upgrading components connection timeout: did not receive reply
Not sure if it deserved another diag report, but decided to send one in anyway.
|
 |
|
Good. I like command line. Thanks!
|
 |
|
James Bucanek wrote:Oh, and to make matters more complicated, QRecall 1.x uses a different method for identifying volumes. Prior to 2.0, QRecall used a combination of the filesystem identifier (part of the BSD filesystem API), volume creation date, size, and name.
Ah, this might be it. Remember I dialed my system back briefly to a state when it had v1.2.3 installed in order to confirm the fix for v1-to-v2 crash? QR might have touched the archive then since it started with the system. Never mind. At least I can reduce the number of volumes to 2.
|
 |
|
So to answer your question, in a roundabout way, some of the things you're doing (installing new hardware, repartitioning) will result in QRecall treating the new volume as a brand new volume. Other techniques, like block-level cloning, may trick QRecall into thinking the new volume is the same one it has seen before.
I'm aware of UUIDs and except the one time I reinstalled the system from scratch, it should not have changed since I used OS X's Disk Image tool to image and restore the system. No hardware change took place recently, either, certainly not after I tested the v1-to-v2-crash issue when beta10 came out a week ago. That's why it's a puzzle to me: same procedure, different results.
QRecall > Help > Guide > Advanced > Combine Items.
Doh! I read it months ago, and apparently forgot about it. Sorry and thanks! Wait, the "combine" command is grayed out if I choose all 4 volumes. Only 3 of them--1 or 2 with the other two--can be combined. In other words, volume 1 and 2 can't be combined. This is probably due to the fact reported in my previous post that the two volumes have "parallel" layers (circled in red in the screenshot). Just read the "Limits of combining items" in the Help, and found a note about volumes with overlapping captures can't be combined. I guess that's it. The problem is: volume 1 and 2 shouldn't have overlapped. Looking closely, I found the "overlap" was from 7/4 to 8/1, and 8/2 might be the day I started trying out beta 10. (Not sure, but likely.) If that is true, then does it mean both b10 and b11 thought, when making their first backup, they were continuing the volume started on 7/4 (the day I reinstalled my system from scratch), and yet b11 thought, at the same time, the volume has changed from the previous day. How is that possible?
|
 |
|
James Bucanek wrote:QRecall's scheduler is managed by the launchd service. launchd is set to run the scheduler all the time, unless the QRecallScheduler process exits with a non-zero status. launchd will also throttle, or eventually stop running, the scheduler if it repeatedly crashes or is forcibly terminated too often.
At the time of my posting, there was one "Unexpected problem; scheduler stopping immediately" error in the log, and I force quit it once (in Activity Monitor). They are hours apart. I doubt that qualifies as "too often".
Relaunching the QRecall application won't help, because it's launchd's run the process.
Is it possible to have a "Restart Scheduler" command on QR's menu, perhaps next to the "Restart Monitor" command (which, btw, is almost never used since the Monitor always restarts itself very quickly)?
Ming-Li Wang wrote:1. I used "sudo qrecall captureprefs list -r -s /" to check the QR Capture Preferences settings, and it aborted with the following error midway through the process:
* failed to read directory
This isn't surprising, because you don't have permission to read those directories. Use sudo qrecall captureprefs list -r -s / instead.
I did use "sudo", didn't I?
Hmmm. The "scheduler stopping immediately" error might be the reason your scheduler isn't running...
I doubt it. As said, the scheduler went away and never came back only after I force quit it. The error in the log happened several hours earlier without my immediate knowledge. The scheduler apparently restarted itself without issue; otherwise I would have nothing to force quit.
|
 |
|
I have an archive for my main system partition which was started on 1/31 this year. I've since reinstalled my system from scratch once, and reverted my system back to an earlier state numerous times, though the name of the partition never changes. (The arrangement was described in greater detail in an earlier post at http://forums.qrecall.com/posts/list/522.page#2419.) Throughout the time I've always backed up the system partition to the same archive. For unknown reasons QR sometimes considers the system to be "new" and recaptures everything. The "full" backup takes longer, but doesn't take more space than it should thanks to QR's excellent deduplication capability, so I don't really mind. Today QR did it again after I 1) imaged my system partition by booting into the rescue partition, 2) restored one of my earlier system image with QR v.1.2.3 installed, in order to test if b11 would crash after installation (no, it didn't), 3) restored my system with the image produced in step 1, 4) installed QR v2 b11 and 5) ran the system capture action. As a result, I have 4 sets of captures for the same volume in the archive (see attached screenshot below). Look closer and you'll see the newest set (set no.1) has another "incarnation" before 8/1. Odd, isn't it? Furthermore, set 1 and 2 have the same layers from the start to 8/1 (in the red circle). How is that possible? Another oddity: I went through the same steps described above to test b10 about a week ago, and yet b10 didn't consider the system "new" and continued to backup to the old set (set 2). How come b11 did it differently? While I don't particularly mind QR starting a new set from time to time, this practice does have its down side: when digging through the archive for some earlier versions of a file, I can't reach all versions of the same file easily. Hence, a question: is there a way to "merge" the sets? If not, please consider it a feature request. Thanks!
|
 |
|
Most of the issues I reported earlier have been fixed as stated in the release notes. Thank you! One problem remains: I still can't restart QR scheduler after forcing it to quit in the Activity Monitor. Quit and restarted QR several times, but the scheduler never came up. A diagnostic report has been sent. The issue isn't mentioned in the release notes, I know. Just thought it might have been solved with all the changes made to the scheduler. Other issues: 1. I used "sudo qrecall captureprefs list -r -s /" to check the QR Capture Preferences settings, and it aborted with the following error midway through the process:
* failed to read directory
This, in fact, isn't new. I ran into this as well with b10, but forgot to report it. Sorry. 2. There are a few dozen of "Unable to communicate with helper" warnings and one "Unexpected problem; scheduler stopping immediately" error in the log. The error happened more than 6 hours ago, so no report was sent. I'll send a report should I spot it sooner next time.
|
 |
|
James Bucanek wrote:
Btw, I couldn't get QR scheduler to restart after having it killed in Activity Monitor (relaunching QR didn't restart the scheduler).
Please send a diagnostic report. There might be a clue as to why the scheduler didn't restart.
Done. I force quit both the scheduler and the monitor in the Activity Monitor right after a reboot. The QR monitor recovered in less than a sec., but the scheduler didn't. Opening QR didn't help, had to log out and log in again to get the scheduler back.
|
 |
|
A small issue out of this test: I removed the test archive after finishing the test, but forgot to remove the scheduled capture actions associated with it first. QR failed to execute the capture action as a result. I've since removed the action but now I get two ghost items in the QR pop-up menu that refuse to go away. (screenshot attached below) Note in the screenshot that the two ghost items were scheduled for 0:00, but the system time was at 0:28 already. Clicking any of the item on the second pop-up (Stop, Stop and Reschedule, etc.) does nothing. Guess I'll have to reboot to get rid of them.
|
 |
|
|
|