QRecall Community Forum
  [Search] Search   [Recent Topics] Recent Topics   [Hottest Topics] Hottest Topics   [Top Downloads] Top Downloads   [Groups] Back to home page 
[Register] Register /  [Login] Login 

Lost Connection with Helper... RSS feed
Forum Index » Beta Version
Author Message
Gary K. Griffey


Joined: Mar 21, 2009
Messages: 156
Offline
James,

I continue to see the "Lost Connection with Helper" on my mac mini running 10.6.8. I know that you are aware of this long standing issue...is there any fix in the works for this?

The real problem here is that when this connection to the helper is lost...the scheduled Action that was executing remains in "Running" status in the Action window...and therefore, subsequent scheduled Actions never run because they are "waiting" for the archive. Thus, the machine has to be manually checked every day to see if the condition has occurred and further actions are waiting.

Thanks,

GKG
James Bucanek


Joined: Feb 14, 2007
Messages: 1568
Offline
Gary K. Griffey wrote:The real problem here is that when this connection to the helper is lost...the scheduled Action that was executing remains in "Running" status in the Action window...and therefore, subsequent scheduled Actions never run because they are "waiting" for the archive.

That really doesn't make much sense to me.

If the monitor process loses communications with the helper process it gets logged as a "Lost communications with Helper" error. This is often because the helper process crashed, but it can also mean that the communications pipe between the two processes is broken and the helper process is just fine. Broken communications pipes is a known problem in Lion, but I've rarely seen it in earlier versions of OS X.

The "running" status in the actions window comes from the scheduler. If the scheduler thinks the process is still running, then either the scheduler is stuck or the action really is still running (which infers that the communications pipe between the scheduler and the action is still valid). So either the action has stopped and the scheduler is confused, or the action is still running and the monitor is confused. It seems unlikely that both of those would be true, which is why I'm confused.

Thus, the machine has to be manually checked every day to see if the condition has occurred and further actions are waiting.

When this happens, what do you do?

Also, I'd very much like to get a sample of your QRecall processes and a diagnostic report. The next time this happens, please do the following:

(this assumes that you've upgraded to 1.2.0b69 or later)
Open the Terminal application
Enter the command
/Applications/QRecall.app/Contents/Resources/sample.sh

Press Return.
Enter your administrator's password.
When the sample.sh script is finished, open the QRecall application
Choose Help > Send Report...

- QRecall Development -
[Email]
James Bucanek


Joined: Feb 14, 2007
Messages: 1568
Offline
James Bucanek wrote:Broken communications pipes is a known problem in Lion, but I've rarely seen it in earlier versions of OS X.

Hah, wouldn't you know that not a hour after I posted this my OS X Server (running 10.6. got a "Lost communications" error.

But it was just that—a loss of communications between the two processes. The log says the capture completed successfully, and the actions window shows the action is no longer running. So not quite the same situation.

- QRecall Development -
[Email]
Gary K. Griffey


Joined: Mar 21, 2009
Messages: 156
Offline
James,

Let me clarify a few points...

First, the scheduled Action that is active when the "Lost Connection with Helper" occurs does indeed complete succesfully. However, the Actions window continues to show it as "Running"...and subsequent scheduled Actions that use the same archive just say "Waiting" in the Monitor window...and they never run....

My fix is to reboot the mac. I do believe that I tried to just kill the Scheduler process once...and it did seem to change the Actions window from the "Running" status...but I was not sure if that would cause other issues...so...I am simply rebooting.

One other odd symptom is that you only see the "Lost Connection" verbiage in the log..whereas normally, I would see it in the Monitor window. I'm not sure if that helps you or not.

I will run the sample.sh script the next time this occurs...and send a diagnostic report. In fact I will send a report now...since it happened this morning.

Thanks..


GKG


James Bucanek


Joined: Feb 14, 2007
Messages: 1568
Offline
Gary K. Griffey wrote:First, the scheduled Action that is active when the "Lost Connection with Helper" occurs does indeed complete succesfully. However, the Actions window continues to show it as "Running"...and subsequent scheduled Actions that use the same archive just say "Waiting" in the Monitor window...and they never run....

Good to know. That tells me that, not only did the monitor process lose its connection with the running action, but that something has happened to the scheduler too.

My fix is to reboot the mac.

That will certainly do it.

I do believe that I tried to just kill the Scheduler process once...and it did seem to change the Actions window from the "Running" status...but I was not sure if that would cause other issues...so...I am simply rebooting.

As a general rule, all QRecall processes will respond gracefully to a Quit request (not a Force Quit, just a regular Quit). When quit, the schedule and monitor processes should automatically restart.

One other odd symptom is that you only see the "Lost Connection" verbiage in the log..whereas normally, I would see it in the Monitor window. I'm not sure if that helps you or not.

It might. The "lost connection" appears in the activity window when the monitor process loses its connection. Other processes (like the scheduler) don't have a UI, so they just log the problem.

I will run the sample.sh script the next time this occurs...and send a diagnostic report.

At this point that's what I'm most interested in seeing—the state of the scheduler after this happens.

- QRecall Development -
[Email]
Gary K. Griffey


Joined: Mar 21, 2009
Messages: 156
Offline
James...

The error occurred again last night. I ran the Sample.sh and sent the report.

I also have included a screen shot from the mac mini. Notice how the Verify action still shows "Running"...even though it has actually completed. Also notice the 2 tasks in "Waiting" status due to the Verify still "Running"....as they are waiting for the same archive.

Thanks,

GKG
 Filename QRecall.tiff [Disk] Download
 Description No description given
 Filesize 155 Kbytes
 Downloaded:  821 time(s)

James Bucanek


Joined: Feb 14, 2007
Messages: 1568
Offline
Gary,

Thanks for all of the useful info.

It appears that the QRecall action process (in this case the verify) and the scheduler are struggling with a race condition.

When an action is finished, it sends a "stopped" message to the scheduler. When the scheduler receives this message it knows the action is finished and can schedule the next one to start.

What's happening on your mini is this: the action finishes, sends a stop message, and terminates. But the scheduler doesn't get that message right away. In fact, it doesn't appear to get it for a couple of minutes, long after the process has terminated. What happens after that is a confused mess of messages and communication errors, some of which get processed and some don't, leaving the scheduler thinking the action is still running—which is clearly is not.

In an attempt to untangle what's going on, I've build a special alpha version of QRecall that logs a lot more information about the state of actions, the "stopped" message handling, and communication link errors. Please install it, wait for this to happen again, and send me another diagnostic report.

QRecall 1.2.0a73

For future reference, there's also another way around this problem. An action in the activity window that is being held by the scheduler for other actions to finish can be started anyway by right-clicking (or click and hold) on the action's stop/menu button and choosing the Ignore Hold and Run command. This causes the action to ignore the scheduler's suggestion to wait and starts execution immediately.

- QRecall Development -
[Email]
Gary K. Griffey


Joined: Mar 21, 2009
Messages: 156
Offline
James,

Thanks for the detailed response.

I will load the special version that you posted...and wait for the issue to occur again.

GKG
Gary K. Griffey


Joined: Mar 21, 2009
Messages: 156
Offline
James,

I sent a report with the new version. The error did occur again, however, this time, the verify task did not remain in "running" status.
James Bucanek


Joined: Feb 14, 2007
Messages: 1568
Offline
Gary K. Griffey wrote:The error did occur again, however, this time, the verify task did not remain in "running" status.

Thanks, Gary. This definitely gets a little closer to the problem. I suspect, however, that the code to debug the problem is interfering with reproducing it.

Here's a new version to try. Again, drop this in and send a diagnostic report after the problem happens again.

QRecall 1.2.0a76

Thank you for your patience.

- QRecall Development -
[Email]
Gary K. Griffey


Joined: Mar 21, 2009
Messages: 156
Offline
James,

Ok...I installed the new version...I will let you know...

Thanks again...

GKG
Gary K. Griffey


Joined: Mar 21, 2009
Messages: 156
Offline
James,

Just an update...since loading the last special version that you offered...the "Lost Connection" issue has not occurred.

I will keep watching it...

Thanks,

GKG
James Bucanek


Joined: Feb 14, 2007
Messages: 1568
Offline
Gary K. Griffey wrote:Just an update...since loading the last special version that you offered...the "Lost Connection" issue has not occurred.

I'm having the same "problem" here. My Xserve running 10.6.8 that was getting this occationally is now completely silent.

So either the problem has just decided to take a holiday, or the logging code added has subtly altered the timing enough that it changes the outcome of the race condition.

If it doesn't occur by tomorrow, go ahead and send a diagnostic report anyway. I can still review the message timing and see if there's a pattern. I'll probably release a new QRecall beta in a couple of days, without the special debugging code, and we'll see how that does.

- QRecall Development -
[Email]
Gary K. Griffey


Joined: Mar 21, 2009
Messages: 156
Offline
Will do...
Gary K. Griffey


Joined: Mar 21, 2009
Messages: 156
Offline
Ok..it happened again.

Report sent.

Thanks,

GKG
 
Forum Index » Beta Version
Go to:   
Mobile view
Powered by JForum 2.8.2 © 2022 JForum Team • Maintained by Andowson Chang and Ulf Dittmer