QRecall

Dr Fergus J Lalor wrote:What I need is a manual that starts with a generic version of the problem I face and tells me how to deal with it. Something like "You have just discovered that you have deleted a file that you need in a hurry. You know that this file was in a certain folder on your HD on (date). Here is how to find it in the archive. If it turns out that it's not in the archive this is probably why".

You're in the right place.

That's the sort of thing that I thought the Cookbook and Q&A section of the forums would develop into. If we get enough of these common questions, it might warrant a new section in the on-line help.

Here are my suggestions for finding a lost file.

If you know what folder the file is/was in:
- Open the archive
- Locate and expand the content of the folder that contains/contained the document
- Hover over the "dot" for that folder's timeline. The dot will expand into a set of play controls. Click the up arrow to rewind the archive through changes made to that folder. Stop when the document of interest appears. (Make sure Timelines are being displayed)

If you don't know where the file is/was:
- Open the archive
- Enter a complete or partial filename into the search field (in the toolbar)
- Turn on "Show only matching items" in the search field options
- Use the bottom shader, or press Command+Up Arrow, until the item appears. Make sure you give QRecall time to finish searching each layer before moving on (all of the progress spinners will stop).

Mark Gerber wrote:Unfortunately, I have little understanding of how to interpret the information Activity Monitor presents. The Network graph shows occasional spikes up to around 1.56KB/sec with nothing else but the verify process running--but it will jump to 65KB/sec when grabbing a page from the web. QRecallHelper is running: the numbers under the CPU column bounce from 0.0 to 0.1, threads seem to stay around 10, and it's using 1.39GB of virtual memory.

Here's an example from my own machine:

The verify is reading an archive on a small server (a MacMini) over a 100Mb ethernet network. As you can see, the network is saturated, reading its maximum of about 11Mbytes (100Mbits) per second, continuously. If your network is showing 1.5KB/15Kb per second then something isn't happening. Even if the verify is reading empty space (where the quanta, file, and folder numbers won't change), the I/O should still be saturated.

At a data rate of 11MB/s, the QRecallHelper should be using about 10-20% of the CPU, occasionally spiking to 100% or more (w/ multiple CPUs) as it digests index and directory information.

Caveat: You'll have to verify that Activity Monitor can actually monitor the data being transferred to/from your NAS device with the makers of the device. Some devices use their own network protocol and bypass the normal TCP/IP stack, so their communications isn't accounted for by the network traffic statistics. The traffic could show up as Disk Activity or not at all.

Mark Gerber wrote:Is it possible to tell if those errors are related to problems transferring the data or problems writing to the drive?

I can't tell you from the logs. When QRecall makes a request for data it either gets it or an error code. An I/O error code doesn't indicate what caused the failure, just that it failed.

So, here's what I've done so far:

- I successfully copied the 19 GB archive on the G3 (mini documents backup) to the NAS drive. It took about 3 hours.
- Opened the archive and ran "Verify".
- After about 1-1/2 hours and with the status window still showing the same numbers for the past hour (about 70.000 quanta, , 2.8 GB, and 15 folders) I felt I should force quit (I know, patience!).

In this case, patience probably isn't a virtue. A verify begins by sequentially reading all of the data in an archive. Except for brief pauses to cross-reference the data read with the quanta index or if it has to read a really big span of empty space, it should never stop or stall.

- Restarted the computer, launched QRecall, opened that archive again.
- I ran Verify again and this time let it run through the night.
- Now, nearly 10 hours into the process, the numbers are at 347,765 quanta, 9.92 GB, 2,280 folders, and 61,890 files. It's still cranking. I suppose that's good.

As long as the numbers are changing, then it's probably working. But 10 hours seems too long.

- But! I forgot to start and restart the G3's file server as you suggested above.

That shouldn't have anything to do with the NAS drive issue. That was just to unlock your shared archives on the G3.

Would any of these, along with the unwise force quit I did last night, compromise the Verify process that's running?

No. The verify only reads an archive. It's one of the few QRecall actions that can be stopped or killed with little or no consequences.

Is it unusual for Verify to take this long over a network?

The speed of the verify concerns me. The verify is one of the most highly pipelined QRecall processes. It reads the archive data sequentially using DMA transfer from the drive to RAM. In English, verify reads data from the archive as fast as it's possible to read data on your system. I don't know what your network speeds are, but it should certainly be able to verify the archive faster than the 3 hours is took to write it. If the verify is slow or getting stuck, then something (network, the NAS drive, ?) is stalling or retrying.

When it's done (or even before if it's going to take forever) send another diagnostic report. Or look in the log for any data retries reported by the verify.

You can also run Activity Monitor and see if the QRecallHelper process is working and what your network activity is. While a verify is running, the I/O should be pretty much saturated during the initial "Verifying archive data" phase.

- I am now using a copy of QRecall on the G3 to verify the original archive.
- Verify finished after only 26 minutes.

That's the way it's supposed to work.

Mark,

Thanks for sending the diagnostic report. There's a couple of things that pop out of the log.

In regards to your question about your (original) archive that is not on the NAS drive, it appears to be OK although QRecall currently (at the time the log ended) thinks that it's in use by another process. If it really isn't, it could be the file server has become confused. I've noticed that this can happen when the file server gets temporarily disconnected from the client (the client goes to sleep, changes IP addresses, ...). The server thinks there's still some phantom client out there that has the archive file open. The easiest way I've found to fix this is to stop and restart the file server (System Preference > Sharing, turn file sharing off then on).

Now about that NAS drive. There some I/O errors early on in the log, which is bad and QRecall can't do much about a drive that it can't read or write to. Most of the errors, however, are data errors. The suspicious thing is that they all occur around the point where the archive has grown to about 4GB.

A lot of volume formats and network storage devices have problems with files larger than 4GB. (The first release of Apple's own Airport Extreme had this problem, so don't think this is just a third-party device issue.) First make sure the NAS volume is formatted as Mac OS Extended, then make sure this NAS device doesn't have problems with files over 4GB. If you have a good archive on another volume that's larger than 5GB or so, verify it, copy it to the NAS, then verify it again there. If the copy or verify fails, the NAS probably can't handle file greater than 4GB.

Please send a diagnostic report (Help > Send Report...). This will help isolate the problem.

As a general rule, an "archive I/O error" is just that: A problem reading or writing to the archive and is not a result of what items you choose to capture. If there were problems reading items, the log would list each item that couldn't be captured and why. But since an archive error also logs the file it was trying to capture at the time, it can be confusing. The log will tell.

There are two typical causes of perpetual archive problems: A corrupted volume or a hardware problem. The volume structure can be verified using Disk Utility (Disk Utility, select volume, First Aid, Verify/Repair Disk). Since you're using a NAS drive, likely hardware problems would be a failing drive, an intermittent network connection, a flaky drive controller, or (rarely) a problem with RAM. Since you're not having problems with other archives, the items that are shared only with the problem archive would be the suspects.

On another note, I'm afraid I have a bad habit of double-clicking a folder listed in an archive to open it. Of course, this begins the Recall process which I try to stop before that process is finished. Do I have to locate and delete this partially recalled folder or is it automatically deleted when the process is stopped?

This is an unfortunate UI that has tripped up a lot of people, and one I intend to fix in the next release. The items are recalled into your /tmp folder. Items in this folder are cleared each time you restart or after the items have not been opened or modified for three days. An no, QRecall won't capture them because the /tmp folder is always excluded.

Mark Gerber wrote:But until then it sounds as if I will have to exclude any file that might have an open database in a folder I intend to backup (I work at home, my hours aren't regular, and I don't know if I can be disciplined enough to quit those programs that might fall into this category).

I would lean more towards ensuring that you get at least one good capture from time to time. One way of doing that would be to schedule a capture that occurs when you log out. Logging out would then guarantee a "clean" capture of all of your databases.

Is there someplace a list of those applications using CoreData?

I'm not aware of any.

Or am I perhaps being too paranoid about this problem?

Is that possible?

Mark Gerber wrote:Some of the programs I use recommend their databases not be backed up while the program is running. Specifically, I'm thinking of DEVONthink Pro, PowerMail, and SOHO Organizer (which uses OpenBase).

That's correct.

... And, of course, I'd like to capture these files a few times during the day. It's my impression the potential for damage is due to the fact that these databases needed to be closed before copying, otherwise an incomplete file will be written.

Also correct. This was discussed some time back in the "QRecall and CoreData" thread.

For this purpose, does QRecall do anything different in capturing quanta so I wouldn't have to be concerned about quitting the program to ensure a complete, undamaged capture?

Not at the moment, but I have plans to address this (and similar problems) in an upcoming release.

To specifically address the issue of databases, I'm planning a new filter option that will ignore a folder full of files if any of those files are currently open for modification (write access). The capture would always be "safe" in that the capture would only occur if all of the files are closed.

By the way, I just found the screencasts on your home page. They are well done and present the information very clearly.
I look forward to seeing more, in particular, one that clarifies the rolling merge options.

I wanted to do one for rolling merges too, but it needs some wickedly complicated animation and my Final Cut Express skills weren't up to it.

If I get some time to extend the series, that will be the first one I attack.

Bruce Giles wrote:Note that this system is still running Tiger Server, not Leopard Server, if that makes a difference.

Whoops, that's make a huge difference.

I inadvertently linked the QRTouchXAttrItems tool against the 10.5 SDK instead of the 10.4 SDK.

Download QRTouchXAttrsXItems and try it again. This version should work on 10.4 and 10.5.

Bernard LECLAIR wrote:Do you plan to support other languages (French, spanish, german...)

Bonjour,

I would love to localize QRecall to other languages, but the resources and time required to translate it aren't currently available. I've put localization on the to-do list for version 1.3 (which is tentatively scheduled for the summer of 2009). I'll seriously look into it again then.

If anyone else would like to see QRecall translated into another language please let me know and cast a vote for what language, or languages, you'd like to have.

Bruce Giles wrote:First of all, congratulations on the release of version 1.1!

Today, I upgraded our XServe running Tiger Server to QRecall 1.1. Everything seems to have worked perfectly.

That's good news.

... After it completed, the archive window reported that the size of the captured layer was about the same as typical recapture runs under 1.0.1. But the number of items captured was over 7000, where it's typically no more than around 300. Is this because it picked up (captured) extended attributes that weren't captured in 1.0.1?

That's very likely. The rules that determine when an item is recaptured changed subtlety between 1.0 and 1.1. It's very likely that you have items with directory information that triggered 1.1 to recapture them. One significant change is that 1.1 will now recapture an item if its attribute modification date changes, even if none of its attributes actually changed.

In these cases, QRecall will recapture the item and store a new metadata record for that item. Since the contents of the files was unlikely to have changed, no new file data is added to the archive — just a new metadata record.

Note that QRecall 1.1 won't recapture an item just because it has extended attributes and the previously captured version doesn't. See the "Utility to recapture items with extended attributes" thread for details.

Does my upgraded archive now contain everything that it would have had if I had started a new archive instead of upgrading the old one?

I don't know how may files you have on that volume, but I'll guess it's more than 7,000. I'm sure there are lots of files which weren't recaptured. If true, then the latest layer is certainly busy, but probably doesn't contain every single item on the volume.

Warren Michelsen wrote:I have a QR archive which QR says is bad.

First of all, please send a diagnostic report (QRecall version 1.1 or later, choose Help > Send Report...). I'm always interested in damage archives that don't get automatically repaired by the next action or that are damaged for obvious reasons, like a corrupted volume or a failing drive.

I selected the option to recover to a new archive. When recovering, does QR move data from the old to new archive or does it copy those data?

With the copy option unchecked, QRecall will repair the archive in situ. Any corrupted data will be erased and the recoverable data is reassembled into a usable archive.

With the copy option checked, the recoverable data is transferred into a new archive and the original is untouched. There must be enough space on the new archive's volume to contain a copy of all of the recoverable data from the damaged archive.

The reason I ask is: There is only 71 GB of free space remaining on the QR archive volume but the archive itself is 398 GB. Clearly there is not enough room on the archive volume to recover much if data are copied instead of moved to the new archive.

You won't be able to repair using the copy option, unless you find another volume with at least 398GB of free space.

Try to repair the archive with the copy option off. The copy option is really for special circumstances (such as when the damaged archive is on a failing drive or a read-only volume) or for when you might want to repair the archive several times with different options.

If you choose to repair but not copy, any damaged data and any unrecovered data is erased. The possible downside is if you choose not to recover orphaned or partial files. These files will also be erased during the repair, and once erased you don't have the option of running the repair again to get them back.

But orphaned and partial files are for extreme cases where you must absolutely recover every possible scrap of salvageable data. If you just want to get the archive back into shape so you can start capturing again, leave them off.

An update:

The latest version of QRecall will force the Activity window into all spaces. This can now be turned off using the QRMonitorSpacesJoinAll expert setting.

See the Advanced QRecall Settings thread for the details.

ubrgeek wrote:Seems to be working now. Odd

Since Delete Item is an interactive command, there will be a pause following the delete action while QRecall re-reads the archive and updates the window. So it could take several seconds before the item actually disappears from the display.

ubrgeek wrote:Where is this functionality?

Select one or more items in the archive browser window, then choose Archive > Delete Item...

You must be running 1.1.0(33) beta or later.

Christian Roth wrote:Is there a way to optimize that in some way to read and write larger chunks? I fear not in that the access offsets will probably be random in nature, and caching the whole file in memory will not be a solution (though technically possible in my case since I have enough internal RAM to hold the complete file).

Until I update QRecall to run in 64-bit mode, caching the hash.index isn't an option (it's an address space issue, more than a physical RAM issue).

I've looked at several techniques for speeding up hash.index file access over the years, as it's one of the biggest performance bottlenecks in the system. The problem is trying to second guess the OS, which is already doing its own optimization. Local disk systems and network volumes all implement their own caching and read-ahead optimization. Some work extremely well with QRecall while others drag it into the mud. Implementing my own caching and read-ahead optimization may speed up the worst cases, but would probably slow down the best ones.

Do you know in advance what percentage of the file needs to be rewritten, so one could estimate if reading into memory, modifying, writing back as a whole may be faster than scattered individual file accesses?

That's a good question, and is one technique that I plan to revisit again in the future. Speeding up the quanta and names index are high on my list of optimizations.

The archive probably got corrupt either because a user in the family shut down its Mac while a capture was in progress or another user in the family (now, that's me...) fiddled with the network settings of the NAS the archive lives on while a capture was in progress.

99% of the time, shutting down a system before it can complete a capture should not cause any problems. The next action should auto-repair the archive and continue normally. On the other hand, I can't predict what effect "fiddling" with the network settings will have.

I'll see if I can wait long enough for the hash.index update to finish or if it will be faster to fetch the archive from the networked volume to local disk, indexing there, then moving it back to the NAS.

I suspect that just letting the reindex run its course will be pretty close to the optimal speed. If you feed adventurous and have enough local disk space, you could copy the archive from the NAS to a local drive, reindex it, then copy back just the repaired index files back into the original repository package. That works because the Reindex command does not alter the primary repository.data file, although you'll have to be careful that nothing tries to update the original archive while you're doing this. That might be faster -- can't say for sure because it involves a lot of additional copying.