QRecall Community Forum
  [Search] Search   [Recent Topics] Recent Topics   [Hottest Topics] Hottest Topics   [Top Downloads] Top Downloads   [Groups] Back to home page 
[Register] Register /  [Login] Login 

Unable to complete large backups RSS feed
Forum Index » Problems and Bugs
Author Message
Richard Morris


Joined: Dec 6, 2009
Messages: 6
Offline
I have two Macs that I am trying to backup to a single QRecall archive on a network server and after 6 days * 24 hours of continual backing up I don't seem to be getting anywhere.
I have a Mac Mini with 1.1TB spread over the internal drive and a single USB external drive and an iMac with 0.9 TB between the internal drive and an external USB drive. Both Macs are running Snow Leopard.
I am backing up to a folder on a Windows Home Server (file duplication off) on a gigabit LAN. I've seen no sign of power problems and I have recently copied 2 - 3 TB of data to and from the server without any problems.

I continually get "A storage or disk error occurred......the archive is probably damaged" error messages, particularly from the iMac. The Mini seems much less prone to problems and after several attempts I succeeded in backing it up. The difficultly is now getting the iMac to add to the archive. In desperation I have broken the backup job into 4 actions and the second one just failed. Running a rebuild/reindex after a failure takes 24 hours so not something that is practical to do after every second backup attempt.

Is any one else successfully backing up to multi TB archives, which after all, are not that uncommon these days?
It seems very slow once the archive is large. Processing the 430GB on the iMac and moving 500mb of changed data into the archive took 6:10 hours. Then the next backup of the external drive failed so I have a potentially corrupted archive again. Do these speeds sound right for other users? The LAN can easily handle the top speed of a USB drive (30+MB/s).

Any suggestions would be welcome. I am running out of ideas and I the alternative is back to another windows server being backed up to the Windows Home Server which is what I was trying to replace with QRecall. I suppose it is possible there is some interaction with the WHS going on but it is patched to date and has performed flawlessly for the last 18 months.

Thanks







James Bucanek


Joined: Feb 14, 2007
Messages: 1568
Offline
Richard,

First, a confession. I'm fully aware that QRecall has performance problems when the size of the archive begins to exceed 1TB. The basic problem is that QRecall must compare every block of data being added to the archive with the data that's already stored in the archive. This various indexes and tables used to do this are very efficient when the archive is modest (500GB or less), but begin to degrade when the archive exceeds 1TB. Addressing this performance issue is one of the main goals of QRecall 1.2.

So the fact that things are slow isn't surprising. By the way, you never mentioned how big your archive is.

However ...

Richard Morris wrote:I continually get "A storage or disk error occurred......the archive is probably damaged" error messages, particularly from the iMac. The Mini seems much less prone to problems and after several attempts I succeeded in backing it up.

Slow is one thing, but you shouldn't be getting corrupted archives. Please send a diagnostic report (Help > Send Report) from both systems so I can take a look at the cause of these errors in more detail. There may be something else going on.

In desperation I have broken the backup job into 4 actions and the second one just failed.

My suggestion would be not to try to subdivide the capture (although there are good reasons to do that too), but to limit the amount of time the capture works by setting a stop condition. QRecall captures are always incremental, so I recommend setting up a single capture that starts late and night and automatically stops in the morning. You can do the same for other long-running actions, like a compact.

The idea is that the capture might not finish, but it will save its work and finalize the archive. This is important because the auto-repair feature works by reverting changes back to the last completed action. By successfully adding data in smaller increments, any possible failure in the future doesn't have as much to recapture. The next capture will always pick up where the last one left off.

Running a rebuild/reindex after a failure takes 24 hours so not something that is practical to do after every second backup attempt.

When you get a "archive is probably damaged" message, are you letting another capture or verify action run first before trying to repair the archive? Depending on what kinds of problems you're encountering, QRecall is able to auto-recover from most kinds of capture failures. It does this automatically at the beginning of the next action.

The verify action/command is particularly useful in this respect. A verify will first auto-repair the archive before the verify begins. If a damaged archive can be auto-repaired, the verify will do it. Just watch the verify progress. Once it starts verifying the contents of the archive, it has successfully auto-repaired the archive (assuming it needed it) and you can stop the verify. If the archive can't be auto-repaired, it will immediately report a problem.

Is any one else successfully backing up to multi TB archives, which after all, are not that uncommon these days?

[Raises hand] I keep a 1.6TB archive here for testing. I use it mostly to stress test QRecall.

It seems very slow once the archive is large.

I freely admit that my 1.6TB archive is as slow as molasses on a cold day.

Do these speeds sound right for other users? The LAN can easily handle the top speed of a USB drive (30+MB/s).

My speeds are better than yours, but you're never going to get stellar performance from this arrangement. (That's not to say that it couldn't be better). A 30MB/s transfer rate is not going to be terribly fast for a large archive for a number of (technical) reasons. Add in network and file server latency and it's going to slow that down even more. So given these circumstances, that sounds about right.

Any suggestions would be welcome.

My suggestions would be (1) create time-limited captures, (2) run verify after a failure (or just let the next action run as scheduled) to see if the archive can be auto-repaired, (3) send in a diagnostic report, (4) schedule a verify to run once a week, and (5) try capturing to separate archives.

The last one is really one of desperation, but will probably avoid most of the problems you've encountered. You could, for example, set up four archives: one for each internal drive and one for each external drive. The real question is how much duplicate data you have between the four drives. If you have 100s of GBs of data duplicated on the two Macs, then I can see how you'd like to have a single archive. However, if most of data on each system is unique, then most of the duplicate data is going to be from one capture to the next, not between systems. In the latter case, multiple archives will be nearly as efficient and much faster

I suppose it is possible there is some interaction with the WHS going on but it is patched to date and has performed flawlessly for the last 18 months.
That's possible, but I need to see more evidence. I'm naturally skeptical of claims about "flawless" performance because most drive and network errors go unnoticed.

For anyone who's interested, QRecall currently imposes a 2TB limit on the size of archives. This limit is imposed by the size of the data structures needed to manage an archive of that size. I plan to increase this limit greatly in the future, but it will require a 64-bit version of QRecall.

- QRecall Development -
[Email]
Richard Morris


Joined: Dec 6, 2009
Messages: 6
Offline
Hi James,

Thanks for the quick reply. The current archive is 1.36TB and I probably have another 500GB to add to it depending on the level of duplication. Given the 2TB limit I suspect I am going to have to split the archive anyway until you go 64bit.

Breaking the job up yesterday did answer one of your questions. I broke the iMac backup into 5 sections. The first 2 succeeded (only 500mb needed to be moved of 450GB capture 1 and capture 2 was 60GB). Capture 3 failed with a "Storage or disk error". Capture 4 then started and auto repaired and continued for 4 hours before failing with another "Storage or disk error". Capture 5 then started and reported an invalid header length I need to recreate the index.

It is nice to know 1.6TB backups do work for you so I should be able to get it working.

I'll start working my way through your list of suggestions and post back to the forum how I go.
Richard Morris


Joined: Dec 6, 2009
Messages: 6
Offline
Here is an update on the saga of getting a archive off my two Macs. Sadly no success yet.

To start with a clean slate I deleted all the previous archives and started again. I decided to split the archives between machines to reduce its size.

The Mini is still going and has processed 485GB in 47:25 hours. It has over 1.2TB to backup so at this rate it is going to be far too slow to be practical. Any thoughts on the bottleneck? iStat says CPU use is low and the LAN is running at about 10% capacity.

I created a new archive for the iMac and attempted to backup the 250GB on the internal drive. Seemed to do it okay but got stuck at the end with the message "closing archive" and stayed that way for well over 3 hours. In the end I had to restart the Mac to clear it. I opened the archive and it was empty, no error messages reported. The archive takes up 226GB on the server and the server reported a file conflict with repository.data.

I had a second file server (still WHS, clean patched install) so decided to try this to eliminate problems on the old server. Started a 6 hour limited backup of the iMac internal drive and it completed okay after backing up 100GB. Closed okay after 27 minutes. Removed the 6 hour limit to finish the drive and it finished with the old favourite, a storage error, and suggested I reindex. The reindex failed after reporting damaged records.

Incidentally, my iMac will not blank the screen while QRecall is running, which is annoying. It blanks as soon as the job completes.

I'm just about had enough unless anyone has any other ideas. The combination of slow backup speed and inability to create a single working archive after over a week trying is wearing thin. Obviously other users are happy with QRecall and I hoped it would give me a Mac version of Windows Home Server. Something in my setup must be upsetting it.

James Bucanek


Joined: Feb 14, 2007
Messages: 1568
Offline
Richard,

Sorry for the delay, I didn't notice that you'd posted a follow up.

The mini: Given the current architecture of QRecall, the mini is going to have an awfully hard time capturing and maintaining an archive over 1TB. I don't know how much memory it has, but that's probably the limiting factor. When archives get large, capturing involves a lot of small drive/network reads which incur a lot of latency and virtual memory swapping. As I mentioned earlier, I'm in the process of reengineering the database lookup code so that's it more efficient with large archives, but that's not a reality yet.

The iMac: As for the other problems you're having, I suspect I/O problems possibly related to using a WHS as your archive volume. I really can't hazard a guess as to what those problem might be, but you shouldn't be getting the errors and problems that you're reporting.

In all cases, I'd really appreciate it if you could send me diagnostic reports (Help > Send Report) from each QRecall installation. That would let me look at your hardware configuration and the details of the reported errors.

I'm sorry that you're initial encounter with QRecall wasn't more pleasant. I can't personally test every possible combination of computer and network, so user feedback and diagnostic reports are immensely helpful in improving QRecall for everyone.

Incidentally, my iMac will not blank the screen while QRecall is running, which is annoying. It blanks as soon as the job completes.

See the QRRunningMakesSystemActive setting on the Advanced QRecall Settings page.

- QRecall Development -
[Email]
Manfred Ell


Joined: Feb 18, 2008
Messages: 5
Offline
Richard Morris wrote:
I'm just about had enough unless anyone has any other ideas. The combination of slow backup speed and inability to create a single working archive after over a week trying is wearing thin. Obviously other users are happy with QRecall and I hoped it would give me a Mac version of Windows Home Server. Something in my setup must be upsetting it.



I also had to give up on QR after fighting for a long time because of the same reasons. I frequently got problems with the archives, needing repairing and the speed was unbearable. I still think the idea is a great one but somehow not ready for prime time.
Dr. D


Joined: Apr 29, 2013
Messages: 9
Offline
Hello,
I am consistently getting a corrupt archive backing up to a fresh copy of a working archive, just around the 1 TB size.
As I was wondering if there is any limit in QRecall (or possibly in accessing the archive on a networked volume) I found this thread but unfortunately no immediate pointers.
Should I start a new archive and avoid running higher than 1 TB ?
This will eventually get impractical (source has 1 TB capacity) but for now it might work.
Thanks!
James Bucanek


Joined: Feb 14, 2007
Messages: 1568
Offline
Dr. D wrote:I am consistently getting a corrupt archive backing up to a fresh copy of a working archive, just around the 1 TB size.
As I was wondering if there is any limit in QRecall (or possibly in accessing the archive on a networked volume) ...

All filesystems have an upper file size limit. In most modern filesystems, the upper limit is so large as to be essentially unlimited, but older filesystems and file servers can be limit the size of your archives.

First, consider the filesystem (format) of the volume containing your archive. Ideally, it should be Mac OS Extended (also known as HFS+). This is the native filesystem for OS X and will handle archives of any size. If it isn't Mac OS Extended, consider reformatting it, if that's an option.

If the archive is stored on a remote volume then the combination of the server and server volume's filesystem will determine the maximum file size. You'll need to investigate both. For example, some Windows-compatible filesystems are limited to 4GB files, older AFP (Apple File Protocol) servers are limited to 2TB, and some Linux-based servers are limited to 1TB. Some even older filesystems, like FAT32, are limited to 2GB (not TB) files, but you would have hit that limit a long time ago.

From your other post, I suspect you're running QRecall 1.2.x. New code in QRecall 2.0 determines the maximum file size for a volume and uses that to limit the size of the archive. Unfortunately, that code isn't in QRecall 1.2.

If reformatting your volume isn't an option, or the limit is in your fileserver, consider splitting your archive. You might capture your System and Applications folders to one archive and your user folders to a second archive. Excluding items you don't need to capture (like log files) and turning on compression are other ways of reducing your archive's size.
[Email]
Dr. D


Joined: Apr 29, 2013
Messages: 9
Offline
Thanks!
both the volume that is backed up and the target volume are Journaled HFS (the QRecall archive being on a Sparse image which has the same format).
What happens is that the backup process aborts, the archive "unbundles" (looks like a folder) and the file system of the disk volume with the sparse image is corrupted.
The first time this happened I thought it was a bad disk, so I went and bought a new one, increased the room in the sparse image, copied the archive over (a slightly older one) and - ran into the same issue again.
If this is QRecall 1.2.3 running into limitations, then how can I know when I am hitting the wall before it happens? Is it the number of files in the archive or the size of the archive or something else I can check?
James Bucanek


Joined: Feb 14, 2007
Messages: 1568
Offline
You're not hitting any QRecall limits.

I suspect that you're hitting a (possible bug) in the sparse disk image that's corrupting the volume when you try to write a file that's larger than 1 TB.

I suggest writing the archive to regular volume (not a disk image). If that's not possible, try creating a regular (non-sparse) disk image and see if that makes any difference.
[Email]
 
Forum Index » Problems and Bugs
Go to:   
Mobile view
Powered by JForum 2.8.2 © 2022 JForum Team • Maintained by Andowson Chang and Ulf Dittmer