QRecall

Charles,

You've stumbled into the crazy world of human dates and times.

An interval schedule (like every 3 days or every 90 minutes) is based on fixed time intervals from a particular point in time. You selected 28-Jul-2008 7:10 as your anchor time. QRecall then calculates 72 hour intervals from that time.

Unfortunately, 7:10 in July is an hour different from 7:10 in January because daylight savings time shifts the clock between summer and winter. QRecall is calculating an exact number of 72 hour intervals from 28-Jul-2008 7:10 and that occurs at 6:10 this month.

Solutions:

You could pick a starting time in the winter (say 1-Jan-2010). That would start the verify at 7:10 in the winter and 8:10 in the summer.

I'd suggest using a Daily schedule and choose two days a week to perform the verify. The Daily schedule always uses the localized time for "today", so the actual time shifts automatically according to your local time zone. For example, if you moved from France to Mexico, your capture will still run at 7:00 Mexico time (CST), whereas your current interval schedule would start running your verify at 1:10 at night (but still 7:10 in France in the Summer).

(I wish I could go back in time, find the guy who invented daylight savings time, and imprison him on a desert island where he can't do any harm.)

Bruce Giles wrote:If I boot from the external drive, running Tiger Server, will there be any problems with using that system to restore Leopard Server onto the internal drive?

I suspect that this won't work. While the file system differences between Tiger (10.4) and Leopard (10.5) are not at huge as those between Leopard and Snow Leopard (10.6), Leopard introduced access control lists (ACLs) that I believe are critical for a few key OS files. Tiger lacks the APIs for restoring ACLs, so QRecall will ignore ACLs captured by a Leopard system.

My suggestion would be to install a minimal Leopard system—it can be a 10.5 client, it doesn't have to be a server—on an external drive, copy over a copy of QRecall, bo use that to restore the 10.5 server.

If you don't have a spare external drive, install Leopard on the Xserver, install QRecall, perform a "live" restore, and then immediately reboot.

Ralph Strauch wrote:I'm continuing to have problems backing up to a network drive connected to an Apple Airport Extreme. I run scheduled partial backups on my wife's iMac at 1am and on my MacBook Pro at 3am, into the same archive. Both have Qrecall 1.1.4.5 installed. Up till 12/22 both seemed to be working fine. On 12/22 the iMac log shows a "Scheduler Version mismatch" and "Autoupdate system components," so that may have been when I updated the iMac to 1.1.4.5. From the log, it looks like I had updated the MBP on 12/19.

Ralph,

This is indeed the message that you get when QRecall updates itself. The log files you uploaded confirm that this is the date you updated QRecall on that system.

On 12/23, the 1am iMac backup failed due to "disk full -- Capture ran out of disk space." The disk had more than 20gb free and the backup would have been less than 100mb. The 3am MBP backup later that night ran fine. Later that day I merged a bunch of layers to free up additional space in the archive, then turned the iMac and AE off and left for a vacation. When I came back on 12/31 I started things up again.

You've run into another known problem with OS X and the Airport basestation. Basically, some Apple File Servers don't handle file pre-allocation requests correctly. The bug is in older versions of OS X, the Airport basestation, and some NAS storage devices.

Your "disk full" problems can probably be solved by setting the QRFilePreallocateDisable expert setting in the Terminal:

defaults write com.qrecall.client QRFilePreallocateDisable -boolean true

You'll probably want to do this on all of your systems, particularly those encountering the out-of-disk-space problem. This problem was recently mentioned in the Preallocation failed thread.

Most of what you describe after that makes sense. The file allocation bug caused QRecall to abort abnormally, leaving the archive in an invalid state. It may have also caused the volume allocation map to become corrupted—operating systems have notorious difficulty dealing with disk-full conditions. Repairing the volume and then repairing the archive probably cleaned everything up.

After the repair I have two distinct entries in the "owners and volumes" list for my MBP hard drive. One contains only recent layers which I haven't merged yet and the other contains all the layers that have been merged at some time with another layer. I also had one damaged layer at the end, with no date and only damaged content. I then did a manual full backup of the MBP. Both that and the scheduled partial backup look fine. I haven't yet reattached the drive to the AE for a backup of the iMac.

The two volumes in the Owners & Volumes drawer is odd. This usually happens when you resize/reformat a volume. Even though it has the same name, QRecall identifies it as a different disk drive. I'd be interested to know if QRecall continues to add layers to the "new" volume or the old one.

The damaged volume/layers in the repaired archive can be ignored or deleted. In your case they are simply artifacts of the repair process.

Richard,

Sorry for the delay, I didn't notice that you'd posted a follow up.

The mini: Given the current architecture of QRecall, the mini is going to have an awfully hard time capturing and maintaining an archive over 1TB. I don't know how much memory it has, but that's probably the limiting factor. When archives get large, capturing involves a lot of small drive/network reads which incur a lot of latency and virtual memory swapping. As I mentioned earlier, I'm in the process of reengineering the database lookup code so that's it more efficient with large archives, but that's not a reality yet.

The iMac: As for the other problems you're having, I suspect I/O problems possibly related to using a WHS as your archive volume. I really can't hazard a guess as to what those problem might be, but you shouldn't be getting the errors and problems that you're reporting.

In all cases, I'd really appreciate it if you could send me diagnostic reports (Help > Send Report) from each QRecall installation. That would let me look at your hardware configuration and the details of the reported errors.

I'm sorry that you're initial encounter with QRecall wasn't more pleasant. I can't personally test every possible combination of computer and network, so user feedback and diagnostic reports are immensely helpful in improving QRecall for everyone.

Incidentally, my iMac will not blank the screen while QRecall is running, which is annoying. It blanks as soon as the job completes.

See the QRRunningMakesSystemActive setting on the Advanced QRecall Settings page.

Richard,

First, a confession. I'm fully aware that QRecall has performance problems when the size of the archive begins to exceed 1TB. The basic problem is that QRecall must compare every block of data being added to the archive with the data that's already stored in the archive. This various indexes and tables used to do this are very efficient when the archive is modest (500GB or less), but begin to degrade when the archive exceeds 1TB. Addressing this performance issue is one of the main goals of QRecall 1.2.

So the fact that things are slow isn't surprising. By the way, you never mentioned how big your archive is.

However ...

Richard Morris wrote:I continually get "A storage or disk error occurred......the archive is probably damaged" error messages, particularly from the iMac. The Mini seems much less prone to problems and after several attempts I succeeded in backing it up.

Slow is one thing, but you shouldn't be getting corrupted archives. Please send a diagnostic report (Help > Send Report) from both systems so I can take a look at the cause of these errors in more detail. There may be something else going on.

In desperation I have broken the backup job into 4 actions and the second one just failed.

My suggestion would be not to try to subdivide the capture (although there are good reasons to do that too), but to limit the amount of time the capture works by setting a stop condition. QRecall captures are always incremental, so I recommend setting up a single capture that starts late and night and automatically stops in the morning. You can do the same for other long-running actions, like a compact.

The idea is that the capture might not finish, but it will save its work and finalize the archive. This is important because the auto-repair feature works by reverting changes back to the last completed action. By successfully adding data in smaller increments, any possible failure in the future doesn't have as much to recapture. The next capture will always pick up where the last one left off.

Running a rebuild/reindex after a failure takes 24 hours so not something that is practical to do after every second backup attempt.

When you get a "archive is probably damaged" message, are you letting another capture or verify action run first before trying to repair the archive? Depending on what kinds of problems you're encountering, QRecall is able to auto-recover from most kinds of capture failures. It does this automatically at the beginning of the next action.

The verify action/command is particularly useful in this respect. A verify will first auto-repair the archive before the verify begins. If a damaged archive can be auto-repaired, the verify will do it. Just watch the verify progress. Once it starts verifying the contents of the archive, it has successfully auto-repaired the archive (assuming it needed it) and you can stop the verify. If the archive can't be auto-repaired, it will immediately report a problem.

Is any one else successfully backing up to multi TB archives, which after all, are not that uncommon these days?

[Raises hand] I keep a 1.6TB archive here for testing. I use it mostly to stress test QRecall.

It seems very slow once the archive is large.

I freely admit that my 1.6TB archive is as slow as molasses on a cold day.

Do these speeds sound right for other users? The LAN can easily handle the top speed of a USB drive (30+MB/s).

My speeds are better than yours, but you're never going to get stellar performance from this arrangement. (That's not to say that it couldn't be better). A 30MB/s transfer rate is not going to be terribly fast for a large archive for a number of (technical) reasons. Add in network and file server latency and it's going to slow that down even more. So given these circumstances, that sounds about right.

Any suggestions would be welcome.

My suggestions would be (1) create time-limited captures, (2) run verify after a failure (or just let the next action run as scheduled) to see if the archive can be auto-repaired, (3) send in a diagnostic report, (4) schedule a verify to run once a week, and (5) try capturing to separate archives.

The last one is really one of desperation, but will probably avoid most of the problems you've encountered. You could, for example, set up four archives: one for each internal drive and one for each external drive. The real question is how much duplicate data you have between the four drives. If you have 100s of GBs of data duplicated on the two Macs, then I can see how you'd like to have a single archive. However, if most of data on each system is unique, then most of the duplicate data is going to be from one capture to the next, not between systems. In the latter case, multiple archives will be nearly as efficient and much faster

I suppose it is possible there is some interaction with the WHS going on but it is patched to date and has performed flawlessly for the last 18 months.

That's possible, but I need to see more evidence. I'm naturally skeptical of claims about "flawless" performance because most drive and network errors go unnoticed.

For anyone who's interested, QRecall currently imposes a 2TB limit on the size of archives. This limit is imposed by the size of the data structures needed to manage an archive of that size. I plan to increase this limit greatly in the future, but it will require a 64-bit version of QRecall.

The message "waiting for permission to open archive" usually means that some other process has the archive open. It could be another backup program, Spotlight, the QRecall application (an open archive window), or a QRecall action running on another computer.

One way of finding out which process has an archive file open is to use the lsof command, like this:

sudo lsof /Path/to/folder/containing/archive

There are two bugs to look out for. The first is Apple's filer server. It occationaly loses contact with a client and thinks the client still has a file open when it doesn't. The solution is to restart (stop/start) file sharing.

The previous version of QRecall had a bug where closing an archive window didn't completely close the archive file. Any actions that tried to start would wait. I you think this might be the case, just close the QRecall application and open it again.

In very (very) rare circumstances, the archive's lock file doesn't get cleared. If you've checked all other possible sources (using lsof) and the action is still stuck, post again or send a diagnostic report (Help > Send Report)

Christian,

Thanks for the reminder to document those settings. I've updated the advanced QRecall settings page. Look for the QRFilePreallocateDisable and QRFilePreallocateBugWorkaroundRule towards the bottom of the page.

I'd suggest starting by setting QRFilePreallocateBugWorkaroundRule to 1. This causes QRecall to assume that all volumes are affected by the pre-allocation bug and to use a workaround. This allows QRecall to continue to preallocate space (important to maintaining the integrity of the archive should the volume fill up), but avoids the pre-allocation bug in some versions of the OS and NAS drives.

If that fails to solve the problem, you can just turn off pre-allocation by setting QRFilePreallocateDisable to true.

Glenn Henshaw wrote:I run a weekly verify to ensure that the archive is stable. (I also flatten the archive weekly).

I approve.

This operation is now taking 9 hours for an archive that is about 126G (on a 150G disk partition mounted on an Airport Extreme). How long should this operation take?

The bulk of the verify action is consumed with reading every byte of the archive and checking its integrity. Most of the archive is read just once, but it is read from start to finish.

Doing the math:

126,000 (126GB in megabytes) / 9 (hours) / 60 (minutes) / 60 (seconds) = 3.89 MB/sec transfer rate

That sounds like an 802.11n wireless network running at close to its maximum capacity. If you're verifying the volume via a wireless network connection, then that's just about right.

- A verify over a 100Mb ethernet LAN would take about 3-4 hours.

If you have another computer on your network that's connected via ethernet, schedule it to preform the weekly verify. It doesn't matter what computer does the verify; that computer doesn't even need an identity key.

- An external drive connected via Firewire 800 should take about 30 minutes.

If you need to perform a fast verify, disconnect the drive from the Airport basestation, connect it to any computer with QRecall installed, verify the archive, then put it back.

Nicholas Sloan wrote:And on the subject of filters, is there any way to determine the path of a folder in the filters list?

And if not, can I suggest a contextual menu ?Show in Finder? or a tooltip giving the path? (Perhaps this is taken care of in 1.2?)

I'm improving the interface for filtered and captured items in 1.2. In the mean time, 1.1 already has a tooltip to the given path. I have found Leopard/Snow Leopard tooltips to be a little erratic. If you're not seeing a tooltip when you hover over the item, switch to another application (like the Finder) and then back to QRecall. That usually wakes them up.

Nicholas Sloan wrote:Would this be a viable feature request?

Honestly, that would be a little tricky. I have some other filtering features planned for a future version of QRecall which might obviate the need for such a feature. I'll add it to the request list and revisit it again after version 1.3.

Richard,

There's something wrong with your installation of QRecall. The error that you're getting is an internal error. It says that QRecall can't find one of its components inside the QRecall application bundle. That should never happen.

Follow these steps:

(1) Uninstall QRecall by opening the QRecall application, holding down the Shift and Option keys, and choosing the QRecall > Quit and Uninstall command.

(2) Download a fresh copy of QRecall (http://www.qrecall.com/download/).

(3) Replace your existing QRecall with the new copy.

(4) Launch QRecall, choose QRecall > Preferences, go to the Authorization tab, and pre-authorize QRecall again.

Let me know if that solves the problem.

Nicholas,

It's a two step process:

I know how to add excluded folders to a capture action

(1) The first step is to do just that. Open your capture actions and add the file or folders you want excluded to the Filter section of the capture action.

In the QRecall Help (Delete Item) it tells you how to delete items from an archive

(2) Then do that; open the archive, select the item(s) you excluded in step 1 and choose the Archive > Delete Item... command.

Step 2 removes all traces of that item from the archive, and step 1 prevents those items from being recaptured.

Thanks, Ralph, for posting this issue to the forum.

Just to clarify, the issue I'm intimately aware of is when one Mac OS X system is acting a a file server (file sharing enabled) using a local QRecall archive and another is access that same archive via the network. There are cache synchronization problems—where changes made by one system aren't communicated to the other—if both computers are running an earlier version of 10.5 or one is running 10.5 and the other running 10.6. If both are running 10.6 or the latest version of 10.5, everything seems to work correctly.

The workaround is to periodically unmount the networked volume. This forces the remote computer to flush its disk cache and re-read the remote volume data. If you encounter this problem, eject your network volume and restart the action. If the volume remounts and the actions runs OK, then there was never anything wrong with your archive. (I should note that I have exactly that situation here, and have to restart the nightly capture on my 10.5 laptop about once a week.)

Ralph's situation is a little different in that he's using an Airport Extreme base station as the file server. The Airport runs a version of Apple's File Sharing service, but Apple doesn't document what version. Given that there are compatibilities issues between file sharing on 10.5 and 10.6, and between 10.6 and the latest Airport, I guess it's not surprising that there might be issues when one client is running 10.5 and the other 10.6.

I'm going to set up a similar configuration here and see if I can duplicate this problem.

I haven't invested a lot of time or effort into addressing this problems because (a) I keep hoping that Apple will simply fix the problem and it will go away and (b) it doesn't cause any data loss. It's just incredibly annoying.

Richard,

Please use QRecall to send a diagnostic report (Help > Send Report...) and I'll look into it.

Nicholas Sloan wrote:I was assuming that it would be a bad thing to run QRecall captures and SuperDuper clonings concurrently, ...

Not at all, that sounds perfectly harmless. Any number of processes can read the same files/folders simultaniously. It doesn't affect anything. Problems can occur when something is writing to the files being copied/captured, but that's not the case here.

It would simplify things (and save a little energy) if I could throw all my backup tasks into the same time slot.

Having two programs copy the same source items simultaneously can cause a phenomenon called "thrashing" where the disk heads spend a lot of time jumping around the volume trying to satisfy the requests of both processes. The result is that trying to do two things at once actually ends up taking more time than if you did them one at a time. But that's speculation; you'd actually have to observe your system to determine if running both at the same time is faster or slower than running them sequentially.