QRecall Community Forum
  [Search] Search   [Recent Topics] Recent Topics   [Hottest Topics] Hottest Topics   [Top Downloads] Top Downloads   [Groups] Back to home page 
[Register] Register /  [Login] Login 

Compression vs. performance RSS feed
Forum Index » General
Author Message
Christian Roth


Joined: Jul 12, 2008
Messages: 26
Offline
Hello,

I'm pondering about the most suitable compression settings for my environment. The documentation says that higher compression degrades performance. My question is: Is this calculation performance on the client machine or does increased compression also increase traffic to the storage device?

My environment is a Mac Pro 2x2.8 GHz Quad-core, which I'd say is a reasonably fast machine. However, my storage device is a low-performance NAS device (roughly 15 MB/s throughput). Now I am thinking if I would get most performance by using heavy compression (since I have a fast Mac) and therefore reducing the amount of data to be transferred (which is slow), or if that does not actually matter much.

If compression needs frequent (smaller size) disk access (e.g. nearby quanta or administrative data), the smaller actual data to be transferred is probably outweighed by the former.

Is there some rule of thumb to go by like

Fast Mac, slow storage -> high compression
Slow Mac, fast storage -> low compression

Though archive size is not unimportant as I am approaching the 1 TB limit of the NAS, this is not the main factor for me - it is solely the need for speed

Thanks, Christian
James Bucanek


Joined: Feb 14, 2007
Messages: 1568
Offline
Christian,

I can imagine that there might be a situation where the access to the archive volume was so slow (maybe 802.11b?) that compression would improve performance, but the difference between processing power and I/O speed would have be to extreme.

In practice, adding compression invariably reduces performance. I did a few tests this morning just to confirm (since I haven't benchmarked compression in awhile). I set up a capture from a brand-new eight-core Xeon Mac Pro networked to a MacMini (PPC) with an external Firewire drive. The transfer rate between the Mac Pro and the Mini tops out at about 10 MB/sec. Not terribly slow, but certainly not fast.

I captured a reasonable number (9,000) of fairly compressible files (mostly word processing documents and uncompressed TIFF images used for publication). The difference between using an uncompressed and a highly compressed archive were:

Uncompressed archive: 1.7GB
Compressed archive: 687MB (about a 60% reduction in size)

Initial capture: 04:35 (uncompressed), 06:52 (compressed) - 50% slower
Recapture (100% duplicate): 03:32 (uncompressed), 02:10 (compressed) - 80% faster

So in the case where all the data was being recaptured, a really fast computer was able to take advantage of the slightly smaller data to outperform the uncompressed archive. But in almost all other situations, the amount of work required to compress and uncompress the data far outweighs the slight advantage of a smaller data transfer. If you average these figures (assume that most captures are a combination of new and duplicate data), the performance is about the same. And if you work with data the resists compression (video, audio, most images), then there will be no advantage at all.

In conclusion, you're not likely to see any increase in performance by increasing your compression level, and you could see significant decreases. The bottom line is that compression saves space, not time.

- QRecall Development -
[Email]
James Bucanek


Joined: Feb 14, 2007
Messages: 1568
Offline
Christian,

Having said that about the performance of compression, you might consider changing your QRCaptureFreeSpaceSweep setting. This will cause the archive to grow more quickly when capturing, but does improve the performance of the capture. You'll want to schedule an occasional compact action to recover the free space now being ignored by the captures.

- QRecall Development -
[Email]
Christian Roth


Joined: Jul 12, 2008
Messages: 26
Offline
Thank you James for even doing benchmarks for answering my question.

I think that with my application, I fall into the "Recapture" case, since probably more than 98% of the data remains the same between (re-)captures. Initial capture times aren't an issue for me (that's just a one-time op for any new machine added to the backup), it's recaptures that happen often. If I read your figures correctly, compression may yield some small performance gains there. Changing data here is quite compressible (Java sources, Pages documents, and XML files of various kinds). When I combine this with the possible reduction in size of the archive, I am now willing to give that strategy a try.

If I can deduct some interesting figures from the logs before and after turning on (high) compression, I'll post a follow-up.

Thanks again,
Christian
James Bucanek


Joined: Feb 14, 2007
Messages: 1568
Offline
OK, one more suggestion.

This is a bit radical, but you might consider it. I do something similar with my system here (for completely different reasons, but that's another story).

Set up two archives on your NAS: A compressed archive for long term backups, and an uncompressed one for short term backups.

Capture to the short term archive every day, or even multiple times a day. Set up a rolling merge to keep only the last 14 days worth of layers.

Capture to the long term archive once a week.

If you're keeping archive data for a long period of time, the sheer bulk of your existing archive is probably the biggest drain on your performance. By keeping a short term archive that never grows beyond 100GB and a dozen layers (or so), you'll see significant improvements in incremental captures.

Just an idea...

- QRecall Development -
[Email]
James Bucanek


Joined: Feb 14, 2007
Messages: 1568
Offline
Christian Roth wrote:When I combine this with the possible reduction in size of the archive, I am now willing to give that strategy a try.
If you're going to go for it, I'll make a couple of points:

  • Turning on compression only compresses new data until you perform a compact.

  • An automatic compact won't recompress anything until the archive has at least 4% of empty space, but you can force that by starting a compact manually from the QRecall application.

  • Recompressing a terabyte of data over a slow network connection might take weeks. Fortunately, compacts are incremental and you stop them and restart them later. I'd suggest setting up a compact action the runs every night and set it to stop after running for 4 or 5 hours. Also set the QRCaptureFreeSpaceSweep option, at least temporarily. Eventually the compaction will work its way through your entire archive and it will be completely compressed.

  • In setting up the above, you might also consider leaving the capture level compression off and setting the compact compression high. Compression will only be performed during the compact action, improving the performance of new-data captures.

  • Turning off compression later will never uncompress any of the compressed data in the archive. So you can't undo this if it doesn't work out.


  • - QRecall Development -
    [Email]
    Christian Roth


    Joined: Jul 12, 2008
    Messages: 26
    Offline
    James Bucanek wrote:Having said that about the performance of compression, you might consider changing your QRCaptureFreeSpaceSweep setting. This will cause the archive to grow more quickly when capturing, but does improve the performance of the capture. You'll want to schedule an occasional compact action to recover the free space now being ignored by the captures.


    I'm not convinced that this will be beneficial with large (ca. 700 GB) archives on a slow medium. My reasoning is that yes, initial capture is faster of items, but the later compacting, where many data need to be moved within the file, will require quite some bandwidth (reading and writing to the slow device). I once did a compact action on that large archive and it took (I think) about 30 hours. This means that for more than a day, I cannot do any backups to that archive. Being self-employed, most of the time there is no such thing what others call a "weekend" where those actions usually run.

    Anyway, I'll give the compression route a try now and see how it fares in my situation.

    Thanks again,
    Christian
    James Bucanek


    Joined: Feb 14, 2007
    Messages: 1568
    Offline
    Christian Roth wrote:I once did a compact action on that large archive and it took (I think) about 30 hours.
    See my note about stopping and restarting compact actions. Compacting should never get in the way of being employed.

    - QRecall Development -
    [Email]
    Christian Roth


    Joined: Jul 12, 2008
    Messages: 26
    Offline
    James Bucanek wrote:
  • Recompressing a terabyte of data over a slow network connection might take weeks. Fortunately, compacts are incremental and you stop them and restart them later.



  • This, I did not know - I thought the compact action was atomic and once force-cancelled would either leave the archive in inconsistent state (requiring a re-index or repair) or have not achieved anything. I once tried compacting the 700 GB archive on the slow NAS, and it took about 35 hours before it failed near the end - or at least I thought so -, probably due to some network problem.

    Being able to incrementally compact lets appear the whole thing in a different light, and then I am with you to capture fast (=no compression), and compact (medium to heavily) in an incremental fashion.

    So now I'm destined to give it a try.

    Thank you very much for the detailed insight on the matter (and the warnings...). This certainly helps me making more informed decisions and is highly appreciated!

    Kind regards,
    Christian
    James Bucanek


    Joined: Feb 14, 2007
    Messages: 1568
    Offline
    Christian Roth wrote:I thought the compact action was atomic and once force-cancelled would either leave the archive in inconsistent state (requiring a re-index or repair) or have not achieved anything.
    Stopping a compact won't reduce its overall size, but in the interim space in the archive is being consolidated by packing data blocks together. Think of sliding beads on a string; you can't cut the string and make it shorter until you move all the beads to one end, but you can move some beads, stop, and come back later to move more beads, until you're done.

    - QRecall Development -
    [Email]
    Christian Roth


    Joined: Jul 12, 2008
    Messages: 26
    Offline
    James Bucanek wrote:Just and idea...


    ...and maybe not the worst one. I'll think about that!

    NOTE: This thread went a little beyond ping-pong, i.e. I saw some of your additional posts only after I anwered to an earlier one. This broke the logical continuity in some part - sorry!

    - Christian
    Christian Roth


    Joined: Jul 12, 2008
    Messages: 26
    Offline
    James Bucanek wrote:Think of sliding beads on a string; you can't cut the string and make it shorter until you move all the beads to one end, but you can move some beads, stop, and come back later to move more beads, until you're done.


    Great analogy! Please, remember it and maybe add it to the user's guide

    And well, yes, all that info is already there in the user's guide So that's another "RTFM!" for me...

    - Christian
     
    Forum Index » General
    Go to:   
    Mobile view
    Powered by JForum 2.8.2 © 2022 JForum Team • Maintained by Andowson Chang and Ulf Dittmer