QRecall

Mark Gerber wrote:I tried using Activity Monitor to stop the runaway process but, while QRecallHelper did disappear from Activity Monitor, the QRecall Activity window showed it was still hanging.

That's would be expected. When a process simply "disappears" (which is what happens if you force quit it), it will take awhile for the activity monitor to figure out that the process is never coming back and remove its status from the window.

I have submitted a support ticket to OtherWorld Computing and I'm hoping they might have some clue. Or is it possible something in the network would initiate the connect/reconnect sequence?

Sporadic network communications problems could easily be the culprit. Especially if you are using wireless networking. Interference, noise, sunspots, or gremlins can often cause a wireless client to lose its connection and reconnect. This plays havoc with services that must maintain a continuous connection to a service, like a network volume.

Mark Gerber wrote:How can I stop this process?

If you ever have a run-away QRecall process, it can be stopped using the Activity Monitor (or the Terminal or any similar tool of your choice).

Open the Activity Monitor and find the QRecallHelper process (there will be one instance of QRecallHelper for every running command or action. Select the process and click the Quit Process icon or choose View > Quit Process. Try doing a Quit first (this is equivelent to clicking the stop button in the Activity Window). If that doesn't have any effect, use Force Quit.

Have I hopelessly messed things up and do I need to start over ...

I can't see that you've done anything wrong, but I do see some problems with your external drive in the logs.

Your external drive attempts to disconnect/unmount at regular intervals. While QRecall wasn't doing anything, I found this in your log file:

2009-01-10 03:36:45.600 -0500 #debug# unmounted volume /Volumes/MAXimus
2009-01-10 03:36:46.339 -0500 #debug# mounted volume /Volumes/MAXimus
2009-01-10 03:51:45.598 -0500 #debug# unmounted volume /Volumes/MAXimus
2009-01-10 03:51:45.624 -0500 #debug# mounted volume /Volumes/MAXimus
2009-01-10 04:06:45.601 -0500 #debug# unmounted volume /Volumes/MAXimus
2009-01-10 04:06:45.644 -0500 #debug# mounted volume /Volumes/MAXimus
2009-01-10 04:21:45.599 -0500 #debug# unmounted volume /Volumes/MAXimus
2009-01-10 04:21:45.643 -0500 #debug# mounted volume /Volumes/MAXimus
2009-01-10 04:36:45.600 -0500 #debug# unmounted volume /Volumes/MAXimus
2009-01-10 04:36:45.632 -0500 #debug# mounted volume /Volumes/MAXimus
2009-01-10 04:51:45.598 -0500 #debug# unmounted volume /Volumes/MAXimus
2009-01-10 04:51:45.624 -0500 #debug# mounted volume /Volumes/MAXimus
2009-01-10 05:06:45.601 -0500 #debug# unmounted volume /Volumes/MAXimus
2009-01-10 05:06:45.635 -0500 #debug# mounted volume /Volumes/MAXimus
2009-01-10 05:21:45.599 -0500 #debug# unmounted volume /Volumes/MAXimus
2009-01-10 05:21:45.658 -0500 #debug# mounted volume /Volumes/MAXimus
2009-01-10 05:36:45.599 -0500 #debug# unmounted volume /Volumes/MAXimus
2009-01-10 05:36:45.783 -0500 #debug# mounted volume /Volumes/MAXimus
2009-01-10 05:51:45.599 -0500 #debug# unmounted volume /Volumes/MAXimus
2009-01-10 05:51:46.072 -0500 #debug# mounted volume /Volumes/MAXimus
2009-01-10 06:06:45.599 -0500 #debug# unmounted volume /Volumes/MAXimus
2009-01-10 06:06:45.667 -0500 #debug# mounted volume /Volumes/MAXimus
2009-01-10 06:21:45.600 -0500 #debug# unmounted volume /Volumes/MAXimus
2009-01-10 06:21:45.623 -0500 #debug# mounted volume /Volumes/MAXimus
2009-01-10 06:36:45.599 -0500 #debug# unmounted volume /Volumes/MAXimus
2009-01-10 06:36:45.635 -0500 #debug# mounted volume /Volumes/MAXimus
2009-01-10 06:51:45.598 -0500 #debug# unmounted volume /Volumes/MAXimus
2009-01-10 06:51:45.629 -0500 #debug# mounted volume /Volumes/MAXimus
2009-01-10 07:06:45.599 -0500 #debug# unmounted volume /Volumes/MAXimus
2009-01-10 07:06:45.629 -0500 #debug# mounted volume /Volumes/MAXimus
2009-01-10 07:21:45.599 -0500 #debug# unmounted volume /Volumes/MAXimus
2009-01-10 07:21:45.630 -0500 #debug# mounted volume /Volumes/MAXimus
2009-01-10 07:36:45.599 -0500 #debug# unmounted volume /Volumes/MAXimus
2009-01-10 07:36:45.667 -0500 #debug# mounted volume /Volumes/MAXimus

I don't know if the drive is being unmounted intentionally (unlikely), but it appears to be spontaneously ejecting itself during the day. It then immediately reconnects and mounts again.

In every case where the volume unmounted while a capture was in progress, the capture immediately encountered problems -- which isn't terribly surprising.

I suspect that the volume going off-line is the principle culprit. I would contact the drive manufacturer or look for some event that coincides with the volume being unmounted.

ubrgeek wrote:I de-authorized and yet when I mount the drive which I use for backups, it still launches Qrecall and wants to merge the layers.

My guess is that you have an action that is scheduled to run when the volume mounts.

If not, send a diagnostic report (Help > Send Report) and we'll try to figure out what's going on.

Peter Pace wrote:I'm looking for software that will keep the two hard drives updated with the same information.

QRecall is not a synchronization utility. It will happily keep both of your systems backed up, but isn't designed to migrate changes from one to the other.

I've used Synchronize Pro in the past, but I don't know how it stacks up against the competition today.

I'd suggest searching VersionTracker and Apple's Macintosh Product Guide for "file synchronization". You'll have plenty of products to choose from.

Mark Gerber wrote:I'm running a copy of QRecall on my wife's MacBook Pro.
I logged in to my account on the MacBook and, using the Capture Assistant, set up actions to capture the Startup Volume to an archive on an external drive attached to a Mini.
The initial backup was successful, but there have been no subsequent captures or merges.
The QRecall Activity window on the Macbook has four actions that are waiting.

First, send a diagnostic report from the account showing the waiting actions. I'm curious to know what the reason is.

The Actions window has red "!" in front of each action with the note, "Waiting for action to finish."

Hover the cursor over the (!) to find out why QRecall doesn't think that action can be run. If the reason is "Archive not available" then the network volume isn't mounted, or QRecall can't identify the volume as the one containing the archive.

I'm not sure, because I don't spend much time on that computer, but I think my account on the MacBook was logged out at some point. So I assume the actions could not proceed.

That depends on whether you enabled the "Start and run action while logged out" option in the Authorization tab of the QRecall Preferences. If you did not, then logging out will stop any running actions.

Is there a way to kick QRecall past this waiting state?

It depends on why it's waiting. If the archive is open in another system/account then the archive really is busy and the actions can't proceed until it is closed. If the MacBook or the file sharing on the Mini have gotten out of sync, then restarting the MacBook or turning the Mini's file sharing off and back on again may also solve the problem.

Should I set it up again through my wife's account, which is always logged in, and, if so, will I have to start over with the first capture or can QRecall recognize the first one on the Mini?

If you used the same identity key for both installations it doesn't matter which account captures the item on the MacBook's hard drive. However, the ownership of the archive follows the same rules as any standard OS X document. If your account owns the archive, select the archive and choose Get Info from the finder. You'll need to either change the owner to your wife's account or change the group or everyone access so that both accounts can read and write the archive.

If the MacBook is not connected to the Mini through the network, will it automatically do this or should I make sure of that before retiring for the night?

QRecall will attempt to mount the volume containing the archive. For network volumes, this requires that the account be logged in (OS X will not mount network volumes for logged out users) and the user name and password need to have been saved on the keychain and the keychain must be open.

Thanks.
Mark

Steve Mayer wrote:net.inet.tcp.sockthreshold=0
net.inet.tcp.sendspace=8388608
net.inet.tcp.recvspace=8388608

This is the probable cause of your memory problems.

sockthreshold is the point at which the TCP/IP stack stops allocating 64K buffers (the default) and starts allocating sendspace/recvspace sized buffers instead. By setting it to 0 you're forcing all TCP/IP buffers to be the size of sendspace/recvspace. Normally these values are 8K, but you've increased them to 8M. So if you have 512 TCP sockets open the kernel will allocate 8GB of RAM just for network socket buffers!

I see two problems. First, the values for sendspace/recvspace are ridiculously high. Even the fastest fibre channel networks couldn't overrun a 8MB buffer on a modern PC. And as you've found out, anything that consumes more memory is a performance cost. In this case the performance gain of a larger buffer doesn't outweigh the performance loss of allocating so much memory. If you were running a server, had tons of memory, and a really really fast network, it might make sense to up these values to 16K or even 32K.

But you aren't running a server, so the real mystery is why you have so many TCP/IP sockets open. I suspect that sockets are timing out and getting abandoned. This leaves them allocated until they (eventually) time out and are closed. You might consult the Netgear people to find out if stray TCP sockets are to be expected when using your NAS or if there's some other problem.

Steve Mayer wrote:What you mention in your last paragraph appears to be what is happening on my machine. I've seen the virtual memory up to 5.7GB when this happens and I've rebooted the machine.

You have 4GB of physical RAM, so a 5.7GB VM size isn't that much. As a counter example, I have 4.5GB of physical RAM and I'm currently running with a VM size of 68GB with no problem. The more interesting numbers are the Swap used and Page in/out counts.

More info. I'm backing up to a Netgear ReadyNAS Duo (been working great for this) and when the QRecallHelper process seems to hang, after a while I get the following in the NAS debug logs:

That's interesting, but I don't know what that means. You'll probably need to contact Netgear for an explanation.

At this point, I have 56M of Free RAM, 1.47GB of inactive RAM,

Looks normal.

QRecall is in a (Not Respoding) state in Activity Monitor.

That's odd. If this happens again, capture a sample of the QRecall application (Activity Monitor, select the QRecall application process, choose Sample Process, then save the output to a text file) and send it to me.

QRecallHelper is showing 455.92MB of Real memory and 1.64GB of virtual memory.

Those look like perfectly reasonable values.

One way of isolating the problem would be to capture a comparable amount of data to a new archive on a local volume. If the same thing happens, then the problem could be an interaction between QRecall and OS X 10.5.6. If it doesn't happen, then it could be a problem or memory leak with the NAS volume drivers.

Hello Steve,

Please send a diagnostic report. It probably won't explain what's happening, but it will give me your system configuration.

I haven't seen anything out of the ordinary here, but that's not to say that the 10.5.6 update isn't causing problems. Memory issues can manifest themselves quite differently on different computers. The same bug on one system might not cause any noticeable effects while bringing another system to its knees.

The overall system memory measurements (wired, active, inactive, free) aren't that helpful as these labels are both vague and memory gets used for so many purposes. It's difficult use these numbers to pinpoint a cause. For example, I would always expect your free memory to creep towards zero as a capture progresses. The operating system uses any free memory to cache file and directory blocks, so the act of reading a bunch of files will naturally consume all of your free RAM. But that RAM really isn't "used" because the system will give it back immediately to any application that wants it. So some of your "used" memory is really used while other "used" memory isn't. Confusing, huh?

What I would be interested in knowing is if the virtual memory size of the QRecallHelper process grows continuously and if your system is swapping excessively during a capture. Both of these can be observed in the Activity Monitor application. Once the capture is underway, the QRecallHelper process should have a virtual memory address space around the size of you physical RAM (up to 2GB). That size should remain fairly stable, increasing or decreasing only by a few hundred megabytes, until the capture is finished. On the other hand, if the virtual memory size continually increases it will start to cause excessive swapping with the virtual memory store. This will cause your system to run very slowly, the QRecallHelper process could crash, or it could eventually cripple the OS.

If you see any of this kind of behavior, I'll see if I can reproduce the problem here and find out what's causing it.

Judith Blair wrote:Thank you for all your hard work.

You are very welcome.

Dr Fergus J Lalor wrote:What I need is a manual that starts with a generic version of the problem I face and tells me how to deal with it. Something like "You have just discovered that you have deleted a file that you need in a hurry. You know that this file was in a certain folder on your HD on (date). Here is how to find it in the archive. If it turns out that it's not in the archive this is probably why".

You're in the right place.

That's the sort of thing that I thought the Cookbook and Q&A section of the forums would develop into. If we get enough of these common questions, it might warrant a new section in the on-line help.

Here are my suggestions for finding a lost file.

If you know what folder the file is/was in:
- Open the archive
- Locate and expand the content of the folder that contains/contained the document
- Hover over the "dot" for that folder's timeline. The dot will expand into a set of play controls. Click the up arrow to rewind the archive through changes made to that folder. Stop when the document of interest appears. (Make sure Timelines are being displayed)

If you don't know where the file is/was:
- Open the archive
- Enter a complete or partial filename into the search field (in the toolbar)
- Turn on "Show only matching items" in the search field options
- Use the bottom shader, or press Command+Up Arrow, until the item appears. Make sure you give QRecall time to finish searching each layer before moving on (all of the progress spinners will stop).

Mark Gerber wrote:Unfortunately, I have little understanding of how to interpret the information Activity Monitor presents. The Network graph shows occasional spikes up to around 1.56KB/sec with nothing else but the verify process running--but it will jump to 65KB/sec when grabbing a page from the web. QRecallHelper is running: the numbers under the CPU column bounce from 0.0 to 0.1, threads seem to stay around 10, and it's using 1.39GB of virtual memory.

Here's an example from my own machine:

The verify is reading an archive on a small server (a MacMini) over a 100Mb ethernet network. As you can see, the network is saturated, reading its maximum of about 11Mbytes (100Mbits) per second, continuously. If your network is showing 1.5KB/15Kb per second then something isn't happening. Even if the verify is reading empty space (where the quanta, file, and folder numbers won't change), the I/O should still be saturated.

At a data rate of 11MB/s, the QRecallHelper should be using about 10-20% of the CPU, occasionally spiking to 100% or more (w/ multiple CPUs) as it digests index and directory information.

Caveat: You'll have to verify that Activity Monitor can actually monitor the data being transferred to/from your NAS device with the makers of the device. Some devices use their own network protocol and bypass the normal TCP/IP stack, so their communications isn't accounted for by the network traffic statistics. The traffic could show up as Disk Activity or not at all.

Mark Gerber wrote:Is it possible to tell if those errors are related to problems transferring the data or problems writing to the drive?

I can't tell you from the logs. When QRecall makes a request for data it either gets it or an error code. An I/O error code doesn't indicate what caused the failure, just that it failed.

So, here's what I've done so far:

- I successfully copied the 19 GB archive on the G3 (mini documents backup) to the NAS drive. It took about 3 hours.
- Opened the archive and ran "Verify".
- After about 1-1/2 hours and with the status window still showing the same numbers for the past hour (about 70.000 quanta, , 2.8 GB, and 15 folders) I felt I should force quit (I know, patience!).

In this case, patience probably isn't a virtue. A verify begins by sequentially reading all of the data in an archive. Except for brief pauses to cross-reference the data read with the quanta index or if it has to read a really big span of empty space, it should never stop or stall.

- Restarted the computer, launched QRecall, opened that archive again.
- I ran Verify again and this time let it run through the night.
- Now, nearly 10 hours into the process, the numbers are at 347,765 quanta, 9.92 GB, 2,280 folders, and 61,890 files. It's still cranking. I suppose that's good.

As long as the numbers are changing, then it's probably working. But 10 hours seems too long.

- But! I forgot to start and restart the G3's file server as you suggested above.

That shouldn't have anything to do with the NAS drive issue. That was just to unlock your shared archives on the G3.

Would any of these, along with the unwise force quit I did last night, compromise the Verify process that's running?

No. The verify only reads an archive. It's one of the few QRecall actions that can be stopped or killed with little or no consequences.

Is it unusual for Verify to take this long over a network?

The speed of the verify concerns me. The verify is one of the most highly pipelined QRecall processes. It reads the archive data sequentially using DMA transfer from the drive to RAM. In English, verify reads data from the archive as fast as it's possible to read data on your system. I don't know what your network speeds are, but it should certainly be able to verify the archive faster than the 3 hours is took to write it. If the verify is slow or getting stuck, then something (network, the NAS drive, ?) is stalling or retrying.

When it's done (or even before if it's going to take forever) send another diagnostic report. Or look in the log for any data retries reported by the verify.

You can also run Activity Monitor and see if the QRecallHelper process is working and what your network activity is. While a verify is running, the I/O should be pretty much saturated during the initial "Verifying archive data" phase.

- I am now using a copy of QRecall on the G3 to verify the original archive.
- Verify finished after only 26 minutes.

That's the way it's supposed to work.

Mark,

Thanks for sending the diagnostic report. There's a couple of things that pop out of the log.

In regards to your question about your (original) archive that is not on the NAS drive, it appears to be OK although QRecall currently (at the time the log ended) thinks that it's in use by another process. If it really isn't, it could be the file server has become confused. I've noticed that this can happen when the file server gets temporarily disconnected from the client (the client goes to sleep, changes IP addresses, ...). The server thinks there's still some phantom client out there that has the archive file open. The easiest way I've found to fix this is to stop and restart the file server (System Preference > Sharing, turn file sharing off then on).

Now about that NAS drive. There some I/O errors early on in the log, which is bad and QRecall can't do much about a drive that it can't read or write to. Most of the errors, however, are data errors. The suspicious thing is that they all occur around the point where the archive has grown to about 4GB.

A lot of volume formats and network storage devices have problems with files larger than 4GB. (The first release of Apple's own Airport Extreme had this problem, so don't think this is just a third-party device issue.) First make sure the NAS volume is formatted as Mac OS Extended, then make sure this NAS device doesn't have problems with files over 4GB. If you have a good archive on another volume that's larger than 5GB or so, verify it, copy it to the NAS, then verify it again there. If the copy or verify fails, the NAS probably can't handle file greater than 4GB.

Please send a diagnostic report (Help > Send Report...). This will help isolate the problem.

As a general rule, an "archive I/O error" is just that: A problem reading or writing to the archive and is not a result of what items you choose to capture. If there were problems reading items, the log would list each item that couldn't be captured and why. But since an archive error also logs the file it was trying to capture at the time, it can be confusing. The log will tell.

There are two typical causes of perpetual archive problems: A corrupted volume or a hardware problem. The volume structure can be verified using Disk Utility (Disk Utility, select volume, First Aid, Verify/Repair Disk). Since you're using a NAS drive, likely hardware problems would be a failing drive, an intermittent network connection, a flaky drive controller, or (rarely) a problem with RAM. Since you're not having problems with other archives, the items that are shared only with the problem archive would be the suspects.

On another note, I'm afraid I have a bad habit of double-clicking a folder listed in an archive to open it. Of course, this begins the Recall process which I try to stop before that process is finished. Do I have to locate and delete this partially recalled folder or is it automatically deleted when the process is stopped?

This is an unfortunate UI that has tripped up a lot of people, and one I intend to fix in the next release. The items are recalled into your /tmp folder. Items in this folder are cleared each time you restart or after the items have not been opened or modified for three days. An no, QRecall won't capture them because the /tmp folder is always excluded.

Mark Gerber wrote:But until then it sounds as if I will have to exclude any file that might have an open database in a folder I intend to backup (I work at home, my hours aren't regular, and I don't know if I can be disciplined enough to quit those programs that might fall into this category).

I would lean more towards ensuring that you get at least one good capture from time to time. One way of doing that would be to schedule a capture that occurs when you log out. Logging out would then guarantee a "clean" capture of all of your databases.

Is there someplace a list of those applications using CoreData?

I'm not aware of any.

Or am I perhaps being too paranoid about this problem?

Is that possible?