QRecall Community Forum
  [Search] Search   [Recent Topics] Recent Topics   [Hottest Topics] Hottest Topics   [Top Downloads] Top Downloads   [Groups] Back to home page 
[Register] Register /  [Login] Login 

Problems capturing some files to NAS drive RSS feed
Forum Index » Problems and Bugs
Author Message
Mark Gerber


Joined: Nov 5, 2008
Messages: 20
Offline
Until today I have had success capturing data from my documents folder to an archive on another computer. Since this morning I've been having nothing but trouble capturing that same folder to an archive on a new NAS drive.

I went through the wizard last night to set up captures using both the All Users and my Home folder options.
- The log for the All Users archive shows an "archive I/O error" after running for only few minutes.
- The Home folder log seems to indicate problems capturing a specific file. I've run that action twice since then and each of those times there was a problem capturing the same file or the folder it's in. That file is an internal DEVONthink database backup, but DtPro wasn't running at the time.

Using auto repair on the Home folder archive reduces the size of the archive from about 3.5GB to a few megabytes.

Does the fact that these files have been captured successfully to the first archive on the other computer indicate a problem with the drive or is there a new problem with these files?

If the problem lies with the files and not the drive, should I simply exclude these files from the capture or should I be looking for some underlying problem?

On another note, I'm afraid I have a bad habit of double-clicking a folder listed in an archive to open it. Of course, this begins the Recall process which I try to stop before that process is finished. Do I have to locate and delete this partially recalled folder or is it automatically deleted when the process is stopped?

Thanks.
Mark
James Bucanek


Joined: Feb 14, 2007
Messages: 1568
Offline
Please send a diagnostic report (Help > Send Report...). This will help isolate the problem.

As a general rule, an "archive I/O error" is just that: A problem reading or writing to the archive and is not a result of what items you choose to capture. If there were problems reading items, the log would list each item that couldn't be captured and why. But since an archive error also logs the file it was trying to capture at the time, it can be confusing. The log will tell.

There are two typical causes of perpetual archive problems: A corrupted volume or a hardware problem. The volume structure can be verified using Disk Utility (Disk Utility, select volume, First Aid, Verify/Repair Disk). Since you're using a NAS drive, likely hardware problems would be a failing drive, an intermittent network connection, a flaky drive controller, or (rarely) a problem with RAM. Since you're not having problems with other archives, the items that are shared only with the problem archive would be the suspects.

On another note, I'm afraid I have a bad habit of double-clicking a folder listed in an archive to open it. Of course, this begins the Recall process which I try to stop before that process is finished. Do I have to locate and delete this partially recalled folder or is it automatically deleted when the process is stopped?
This is an unfortunate UI that has tripped up a lot of people, and one I intend to fix in the next release. The items are recalled into your /tmp folder. Items in this folder are cleared each time you restart or after the items have not been opened or modified for three days. An no, QRecall won't capture them because the /tmp folder is always excluded.

- QRecall Development -
[Email]
James Bucanek


Joined: Feb 14, 2007
Messages: 1568
Offline
Mark,

Thanks for sending the diagnostic report. There's a couple of things that pop out of the log.

In regards to your question about your (original) archive that is not on the NAS drive, it appears to be OK although QRecall currently (at the time the log ended) thinks that it's in use by another process. If it really isn't, it could be the file server has become confused. I've noticed that this can happen when the file server gets temporarily disconnected from the client (the client goes to sleep, changes IP addresses, ...). The server thinks there's still some phantom client out there that has the archive file open. The easiest way I've found to fix this is to stop and restart the file server (System Preference > Sharing, turn file sharing off then on).

Now about that NAS drive. There some I/O errors early on in the log, which is bad and QRecall can't do much about a drive that it can't read or write to. Most of the errors, however, are data errors. The suspicious thing is that they all occur around the point where the archive has grown to about 4GB.

A lot of volume formats and network storage devices have problems with files larger than 4GB. (The first release of Apple's own Airport Extreme had this problem, so don't think this is just a third-party device issue.) First make sure the NAS volume is formatted as Mac OS Extended, then make sure this NAS device doesn't have problems with files over 4GB. If you have a good archive on another volume that's larger than 5GB or so, verify it, copy it to the NAS, then verify it again there. If the copy or verify fails, the NAS probably can't handle file greater than 4GB.

- QRecall Development -
[Email]
Mark Gerber


Joined: Nov 5, 2008
Messages: 20
Offline
James Bucanek wrote:The server thinks there's still some phantom client out there that has the archive file open. The easiest way I've found to fix this is to stop and restart the file server (System Preference > Sharing, turn file sharing off then on).

Now about that NAS drive. There some I/O errors early on in the log, which is bad and QRecall can't do much about a drive that it can't read or write to. Most of the errors, however, are data errors. The suspicious thing is that they all occur around the point where the archive has grown to about 4GB.
When you say "early on" are you referring to log entries prior to this past Sunday (7 Dec)? If so, those backups were going to the other computer (a G3). I didn't get the NAS drive until this past weekend.
Is it possible to tell if those errors are related to problems transferring the data or problems writing to the drive?

First make sure the NAS volume is formatted as Mac OS Extended, then make sure this NAS device doesn't have problems with files over 4GB. If you have a good archive on another volume that's larger than 5GB or so, verify it, copy it to the NAS, then verify it again there. If the copy or verify fails, the NAS probably can't handle file greater than 4GB.
Yesterday afternoon I spoke with the drive's tech support; the drive is formatted as Extended 3 and shouldn't have problems with files greater than 4 GB. He did echo your suggestion that I test it by copying a large file there and back again.
So, here's what I've done so far:

- I successfully copied the 19 GB archive on the G3 (mini documents backup) to the NAS drive. It took about 3 hours.
- Opened the archive and ran "Verify".
- After about 1-1/2 hours and with the status window still showing the same numbers for the past hour (about 70.000 quanta, , 2.8 GB, and 15 folders) I felt I should force quit (I know, patience!).
- Restarted the computer, launched QRecall, opened that archive again.
- I ran Verify again and this time let it run through the night.
- Now, nearly 10 hours into the process, the numbers are at 347,765 quanta, 9.92 GB, 2,280 folders, and 61,890 files. It's still cranking. I suppose that's good.
- But! I forgot to start and restart the G3's file server as you suggested above. And I did not use Verify on the original archive on the G3 before copying it to the NAS drive.

Would any of these, along with the unwise force quit I did last night, compromise the Verify process that's running?
Is it unusual for Verify to take this long over a network?

- I am now using a copy of QRecall on the G3 to verify the original archive.
- Verify finished after only 26 minutes.
James Bucanek


Joined: Feb 14, 2007
Messages: 1568
Offline
Mark Gerber wrote:Is it possible to tell if those errors are related to problems transferring the data or problems writing to the drive?
I can't tell you from the logs. When QRecall makes a request for data it either gets it or an error code. An I/O error code doesn't indicate what caused the failure, just that it failed.

So, here's what I've done so far:

- I successfully copied the 19 GB archive on the G3 (mini documents backup) to the NAS drive. It took about 3 hours.
- Opened the archive and ran "Verify".
- After about 1-1/2 hours and with the status window still showing the same numbers for the past hour (about 70.000 quanta, , 2.8 GB, and 15 folders) I felt I should force quit (I know, patience!).
In this case, patience probably isn't a virtue. A verify begins by sequentially reading all of the data in an archive. Except for brief pauses to cross-reference the data read with the quanta index or if it has to read a really big span of empty space, it should never stop or stall.

- Restarted the computer, launched QRecall, opened that archive again.
- I ran Verify again and this time let it run through the night.
- Now, nearly 10 hours into the process, the numbers are at 347,765 quanta, 9.92 GB, 2,280 folders, and 61,890 files. It's still cranking. I suppose that's good.
As long as the numbers are changing, then it's probably working. But 10 hours seems too long.

- But! I forgot to start and restart the G3's file server as you suggested above.
That shouldn't have anything to do with the NAS drive issue. That was just to unlock your shared archives on the G3.

Would any of these, along with the unwise force quit I did last night, compromise the Verify process that's running?
No. The verify only reads an archive. It's one of the few QRecall actions that can be stopped or killed with little or no consequences.

Is it unusual for Verify to take this long over a network?
The speed of the verify concerns me. The verify is one of the most highly pipelined QRecall processes. It reads the archive data sequentially using DMA transfer from the drive to RAM. In English, verify reads data from the archive as fast as it's possible to read data on your system. I don't know what your network speeds are, but it should certainly be able to verify the archive faster than the 3 hours is took to write it. If the verify is slow or getting stuck, then something (network, the NAS drive, ?) is stalling or retrying.

When it's done (or even before if it's going to take forever) send another diagnostic report. Or look in the log for any data retries reported by the verify.

You can also run Activity Monitor and see if the QRecallHelper process is working and what your network activity is. While a verify is running, the I/O should be pretty much saturated during the initial "Verifying archive data" phase.

- I am now using a copy of QRecall on the G3 to verify the original archive.
- Verify finished after only 26 minutes.
That's the way it's supposed to work.

- QRecall Development -
[Email]
Mark Gerber


Joined: Nov 5, 2008
Messages: 20
Offline
James Bucanek wrote:As long as the numbers are changing, then it's probably working. But 10 hours seems too long.
Sometimes it's only the timer that changes. For instance, the quanta number has been stuck at 440.506 for at least 15 minutes. But after a while, it will suddenly get over that hump and everything seems to fly. So far, I have no idea how long "a while" lasts.

I don't know what your network speeds are, but it should certainly be able to verify the archive faster than the 3 hours is took to write it. If the verify is slow or getting stuck, then something (network, the NAS drive, ?) is stalling or retrying.
I'll get back to the OWC tech support and see if they can help me determine if that's the case.

When it's done (or even before if it's going to take forever) send another diagnostic report. Or look in the log for any data retries reported by the verify.
At this point, some 16:35 into the process I'm inclined to let it finish. I estimate the progress bar has only traversed 2/3 the distance, though. When it does finish, I'll send a new report.

You can also run Activity Monitor and see if the QRecallHelper process is working and what your network activity is. While a verify is running, the I/O should be pretty much saturated during the initial "Verifying archive data" phase.
Unfortunately, I have little understanding of how to interpret the information Activity Monitor presents. The Network graph shows occasional spikes up to around 1.56KB/sec with nothing else but the verify process running--but it will jump to 65KB/sec when grabbing a page from the web. QRecallHelper is running: the numbers under the CPU column bounce from 0.0 to 0.1, threads seem to stay around 10, and it's using 1.39GB of virtual memory.
James Bucanek


Joined: Feb 14, 2007
Messages: 1568
Offline
Mark Gerber wrote:Unfortunately, I have little understanding of how to interpret the information Activity Monitor presents. The Network graph shows occasional spikes up to around 1.56KB/sec with nothing else but the verify process running--but it will jump to 65KB/sec when grabbing a page from the web. QRecallHelper is running: the numbers under the CPU column bounce from 0.0 to 0.1, threads seem to stay around 10, and it's using 1.39GB of virtual memory.
Here's an example from my own machine:

image

The verify is reading an archive on a small server (a MacMini) over a 100Mb ethernet network. As you can see, the network is saturated, reading its maximum of about 11Mbytes (100Mbits) per second, continuously. If your network is showing 1.5KB/15Kb per second then something isn't happening. Even if the verify is reading empty space (where the quanta, file, and folder numbers won't change), the I/O should still be saturated.

At a data rate of 11MB/s, the QRecallHelper should be using about 10-20% of the CPU, occasionally spiking to 100% or more (w/ multiple CPUs) as it digests index and directory information.

Caveat: You'll have to verify that Activity Monitor can actually monitor the data being transferred to/from your NAS device with the makers of the device. Some devices use their own network protocol and bypass the normal TCP/IP stack, so their communications isn't accounted for by the network traffic statistics. The traffic could show up as Disk Activity or not at all.

- QRecall Development -
[Email]
Mark Gerber


Joined: Nov 5, 2008
Messages: 20
Offline
Thanks for the screen shot of your AM window. Mine did not look like that at all during that extended verify (finished after over 28 hours). As I mentioned earlier, my peak throughput (?) was never more than around 1.5 KB/sec. More often than not, the graph showed a lone green spike followed by a long, low flat line. I envy your high green line with the big numbers.

So I don't know if this is a problem with some part of the network (which, as far as I know, could involve the cables, internal routers, or even how many people are on this party line called cable broadband), the NAS drive itself, or a glitch in the data that's being captured. My inclination is to think it's something to do with the network. I've sent another report to you and perhaps you can confirm that the data is not the problem.

Looking at the log after the verify finished, it looks like attempts made to capture data from the Mini to the NAS drive have not been successful.

This has lead me to re-think the idea of using a NAS drive to backup our three computers. The bulk of the data that needs to be backed up comes from my computer (the Mini) and, since a data transfer over a network is inherently slower than to an external hard drive, I'm inclined at this point to exchange the NAS drive for an external drive (still set up as a RAID 1) on the Mini. For the large, first-time captures made on the other computers I'll just hook the drive up to them. The rest of the time, I'll set up their copies of QRecall to mount the Mini and capture to that external RAID. The NAS drive may just be adding a layer of complexity that I don't need.

Hope that makes sense--sort of just trying to figure this out by typing out loud.

Mark
 
Forum Index » Problems and Bugs
Go to:   
Mobile view
Powered by JForum 2.8.2 © 2022 JForum Team • Maintained by Andowson Chang and Ulf Dittmer