Author |
Message |
1 decade ago
|
#16
|
James Bucanek
Joined: Feb 14, 2007
Messages: 1572
Offline
|
Gary K. Griffey wrote:Possibly, in a future release...QRecall could provide the ability to override the use of the file system events via preferences...
I will add that to the wish list.
|
- QRecall Development - |
|
|
1 decade ago
|
#17
|
Gary K. Griffey
Joined: Mar 21, 2009
Messages: 156
Offline
|
Greetings James, I hope you may be able to shed some light on a error situation that I ran into yesterday using the Beta 1.2 version. I have been capturing my primary laptop's entire hard drive every Saturday using QRecall installed on another laptop for the last 4 weeks. Yesterday, I added the 4th layer. The capture appeared to run normally. The primary laptop is started in target disk mode...then mounted to the second laptop. This way the entire drive of the primary laptop is quiesced at the time of the capture. This is the same laptop image that I had discussed with you in a previous message in this same thread. Yesterday, after adding the 4th layer...I ran a QRecall Verify...as I always do...and the verify failed...I tried to Repair the archive...but this also failed. In looking at the Repair log entries...it appears that the rather large (about 30 GB) virtual machine disk file caused the error. I know from other threads that you have written...that a virtual machine must be suspended or shutdown to make a valid backup of it. In this case, however, the laptop that contained the virtual machine was operating in target disk mode...and was mounted to my second laptop where QRecall is installed...thus...not only was the virtual machine shutdown...but OSX was quiesced on the source drive as well. I have attached the log file from the Capture, Verify and subsequent ReIndex/Repair operations. Now, I did run a Compact operation on the archive during the week with the 3 existing layers...and I also changed both the Compression and Shifted Quanta detection on the archive...but I never had issues doing this before to an existing archive. As always...your assistance would be greatly appreciated.
Filename |
QRecall.log.1 |
Download
|
Description |
No description given |
Filesize |
242 Kbytes
|
Downloaded: |
1150 time(s) |
|
|
|
1 decade ago
|
#18
|
James Bucanek
Joined: Feb 14, 2007
Messages: 1572
Offline
|
Gary K. Griffey wrote:Yesterday, after adding the 4th layer...I ran a QRecall Verify...as I always do...and the verify failed...
That's correct. The verify detected corrupted data in the archive and/or on your hard disk.
I tried to Repair the archive...but this also failed.
Actually, the repair was successful. QRecall did log warnings about the problems it found and what items were affected, but the repair finished successfully as was confirmed by the verify action that you performed afterwards. Note that many (many!) archive corruption errors are often the side effect of a corrupted volume structure. I would encourage everyone to use Disk Utility to repair the volume containing the archive before repairing the archive. If your volume has cross-linked file allocations (for example), repairing will just set the archive up for future failure.
In looking at the Repair log entries...it appears that the rather large (about 30 GB) virtual machine disk file caused the error.
Also correct. If you did not select the option to recover damaged files, then the damaged version of that file has been deleted from your archive.
I know from other threads that you have written...that a virtual machine must be suspended or shutdown to make a valid backup of it.
That's very true. There is no backup system that can correctly copy a file that is being actively modified.
In this case, however, the laptop that contained the virtual machine was operating in target disk mode...and was mounted to my second laptop where QRecall is installed...thus...not only was the virtual machine shutdown...but OSX was quiesced on the source drive as well.
The problem wasn't that the file was being modified, but that QRecall detected that data previously stored in the archive failed its validity check(s). This can happen for a score of different reasons (data corrupted during transfer to the drive, random data loss on the drive, intermittent RAM errors, ...). But it has nothing to do with the source file or what condition it was in. I also applaud the rigor of your backup methodology, but I personally think it's a little overkill. While it's true that you can't make a "perfect" copy of your boot volume while OS X is running, QRecall works really hard to successfully perform live captures and recalls from/to your startup volume. It's certainly problematic, and you definitely want to quit as many applications as possible (certainly any VM and disk images that you might be writing to), but it's not absolutely necessary to shutdown the entire OS to make a decent backup. QRecall has the ability to capture while you're logged out, and you can even schedule captures to run while you're logged out (or not run while you're logged in). Just food for thought. I only mention this because I firmly believe that the back up strategy that works best is the one that gets used, and the one that gets used is usually the one that runs automatically, independent of the user. I'd be much happier with an imperfect backup that occurs every day than a perfect one that I get twice a week—if I remembered to do it. You also might consider a two-tiered backup strategy. Make regular (even hourly) captures of your regular documents, excluding things like your movie library and virtual machine images, and then continue with your complete backup strategy on a weekly or bi-weekly basis.
Now, I did run a Compact operation on the archive during the week with the 3 existing layers...and I also changed both the Compression and Shifted Quanta detection on the archive...but I never had issues doing this before to an existing archive.
That shouldn't have had any bearing on the problem you described.
|
- QRecall Development - |
|
|
1 decade ago
|
#19
|
Gary K. Griffey
Joined: Mar 21, 2009
Messages: 156
Offline
|
James, Thanks for your quick reply and informative comments. If this happens again...I will try a disk repair prior to the QRecall Repair operation... One other somewhat odd occurrence...I observed a system log message being repeated literally thousands of times during the capture operation. The messages were similar to the following: 8/22/10 1:34:25 PM mdworker[196] _RepositoryNamesImporter /Users/Gary/Downloads/melsLaptop.quanta/filename.index 8/22/10 1:34:25 PM mdworker[196] _RepositoryNamesImporter eof=6534416, contentPosition=48 8/22/10 1:34:25 PM mdworker[196] _RepositoryNamesImporter envelope at position 48, length=61464 8/22/10 1:34:25 PM mdworker[196] _RepositoryNamesImporter added 2822 names, document size now 47343 8/22/10 1:34:25 PM mdworker[196] _RepositoryNamesImporter envelope at position 61512, length=61464 8/22/10 1:34:25 PM mdworker[196] _RepositoryNamesImporter added 2504 names, document size now 96272 8/22/10 1:34:25 PM mdworker[196] _RepositoryNamesImporter envelope at position 122976, length=61448 8/22/10 1:34:25 PM mdworker[196] _RepositoryNamesImporter added 2489 names, document size now 145261 8/22/10 1:34:25 PM mdworker[196] _RepositoryNamesImporter envelope at position 184424, length=61464 8/22/10 1:34:25 PM mdworker[196] _RepositoryNamesImporter added 2564 names, document size now 193893 8/22/10 1:34:25 PM mdworker[196] _RepositoryNamesImporter envelope at position 245888, length=61456 8/22/10 1:34:25 PM mdworker[196] _RepositoryNamesImporter added 2257 names, document size now 244054 8/22/10 1:34:25 PM mdworker[196] _RepositoryNamesImporter envelope at position 307344, length=61464 8/22/10 1:34:25 PM mdworker[196] _RepositoryNamesImporter added 2471 names, document size now 293148 I know that mdworker is a process used by Spotlight...I am just not sure why this type of message would be generated almost continuously during the Capture...in any event...it probably has no bearing on my corrupted archive...I just thought it might be worth a mention. Thanks...
|
|
|
1 decade ago
|
#20
|
James Bucanek
Joined: Feb 14, 2007
Messages: 1572
Offline
|
Gary K. Griffey wrote:I observed a system log message being repeated literally thousands of times during the capture operation.
Welcome to beta testing. The beta version of QRecall spits out tons more console and log messages than the release version, mostly so I can diagnose problems reported by beta testers. In this case, it's the QRecall Spotlight plug-in, which is normally quite laconic. But you make an interesting observation. (And see, one that you wouldn't have made if I left those messages off!) Normally, Spotlight shouldn't reindex the archive until the capture (or whatever) is finished. I'm surprised that it would repeatedly be reindexing the archive while a single capture was in progress. But it's hard to tell from the fragment that you've sent. All of that activity is a single reindex. You'd need to look for multiple occurrences of the message "mdworker[xxxx] _RepositoryNamesImporter /Users/Gary/Downloads/melsLaptop.quanta/filename.index" during the course of a single capture. If that was happening, then that's something I need to look into.
|
- QRecall Development - |
|
|
1 decade ago
|
#21
|
Gary K. Griffey
Joined: Mar 21, 2009
Messages: 156
Offline
|
Ok...I will investigate the log message content further and see if this multiple indexing is actually taking place. By the way...I ran Disk Utility on the target laptop's hard drive...the one that experienced the corrupted archive...and there were no issues with the drive at all. I also checked disk permissions...no issues. So...I guess at this point..there is really no known explanation for what occurred with this particular incremental Capture that evidently corrupted the entire historic archive. I guess that is the issue that I see moving forward using QRecall for backups with my clients...it would appear that each and every incremental Capture basically puts the entire historic archive at risk of total loss...this is unlike many other imaging products that I have used in the past where incremental backups are always written to a new physical file...and thereby do not jeopardize the integrity of the historic backups currently in existence. I realize, of course, that I could simply make a copy of an existing archive each and every time before a subsequent Capture is performed...and then use the copy for the incremental Capture attempt...by doing so...one would not risk the entire historic archive should the new capture experience issues. This seems like it almost defeats the general "theme" of QRecall though...because duplicate archives must be created and, at least temporarily, be maintained. Don't get me wrong...I think your product has some great features...and I do appreciate your thoughts and guidance. Thanks, Gary
|
|
|
1 decade ago
|
#22
|
James Bucanek
Joined: Feb 14, 2007
Messages: 1572
Offline
|
Gary K. Griffey wrote:By the way...I ran Disk Utility on the target laptop's hard drive...the one that experienced the corrupted archive...and there were no issues with the drive at all. I also checked disk permissions...no issues. So...I guess at this point..there is really no known explanation for what occurred with this particular incremental Capture that evidently corrupted the entire historic archive.
It's often hard to isolate the cause of a single, or even a series, of random data failures without more information. The basic problem is that "stuff happens." Data storage and transfer is not nearly as perfect and repeatable as most people assume. Data in magnetic media—despite all of the clever tricks employed by modern drives to avoid it—gets lost from time to time. Data flying through USB, Firewire, SATA and over WiFi doesn't always arrive the way it was sent. And consumer-grade dynamic RAM is susceptible to the occasional bit-flip and corruption by cosmic rays. The "problem" with QRecall is that it is almost alone as a backup solution in that it attaches 64-bit checksum to every record of data it creates, and verifies the integrity of that data every time it reads it. When QRecall reports damaged data, it inevitably results in the initial impression that QRecall is failing or is somehow failing to protect your data, when in fact it's most likely the media/controller/interface/RAM/CPU/OS that's damaging the data. What is (or should be) frightening is that so many other so-called "reliable" backup solutions make no attempt whatsoever to protect against, detect, or report any data loss. Solutions like Time Machine simply copy files and hope for the best. They couldn't tell you if your files were successfully and accurately copied if you wanted them to.
I guess that is the issue that I see moving forward using QRecall for backups with my clients...it would appear that each and every incremental Capture basically puts the entire historic archive at risk of total loss...
Not true at all. QRecall is specifically designed to protect against partial loss of data in an archive. Damaging part of an archive in no way impacts the integrity, or recoverability, of the rest of the archive.
this is unlike many other imaging products that I have used in the past where incremental backups are always written to a new physical file...
QRecall doesn't write files, it writes data records. Each data record is small, self contained, and independently verifiable. When the data in an archive is damaged, the repair process reads and verifies every record. It then reassembles the valid records into a usable archive again. It doesn't matter if a million records were written to a single file or a million individual files. (Writing to a single file is more efficient and ultimately safer, which is why QRecall does it that way.) Either each individual record is valid or it isn't. QRecall does not relay on the file system's directory structure to organize its information.
and thereby do not jeopardize the integrity of the historic backups currently in existence.
The integrity of the archive's history is never in jeopardy. QRecall's layer records are specifically designed to resist corruption through partial data loss. It employs a method of "positive deltas" that re-record critical directory structure information in every layer. So the loss of data in an older layer won't impact the structure or integrity of subsequent layers.
I realize, of course, that I could simply make a copy of an existing archive each and every time before a subsequent Capture is performed...
No need, QRecall's already doing that behind the scenes. Most actions begin by duplicating key index files within the archive (if you peek inside an archive package during a capture or merge you'll often see temporary "_scribble" files appear). New data is appended to the primary data file. If anything goes wrong (power loss, OS crash, ...) the partially modified files are summarily discarded and the primary data file is truncated at the point before the action began. The result is an instant rewind to a valid archive state. You'll see "auto-repair" in the log when this happens.
Don't get me wrong...I think your product has some great features...and I do appreciate your thoughts and guidance.
I hope some of this technical information will help explain the extraordinary efforts that QRecall takes to preserve your data, and detect when that data has been lost or damaged.
|
- QRecall Development - |
|
|
1 decade ago
|
#23
|
Gary K. Griffey
Joined: Mar 21, 2009
Messages: 156
Offline
|
James, Thanks for the info...possibly, you could assist me in better understanding the available Repair options...it seemed to me that after executing the Repair operation...previously verified layers in the archive were indeed compromised by this subsequent re-Capture operation...which is very much contrary to what you have stated...possibly (very likely) a user error on my part when running the Repair. As you recall...I ran the Repair operation against the archive after the verify failed...I ran it using only the first default option..."Use auto-repair information"...you further stated...from examining the log files that I attached...that the Repair was successful. However, after the Repair operation completed...I opened the archive...and saw 3 of the 4 layers now showing "Damaged Files"...which turned-out to be the large virtual machine package...(the most important reason for running the backup in the first place, by the way). Thus it appeared to me that the VM package in layer 2 and 1...taken 3 and 4 weeks ago respectively...had been compromised...and thus my assumption that previously verified backup layers had now been corrupted. Should I have run the Repair with different/additional options selected? Would this have preserved my 1st and 2nd layers? Thanks... GKG
|
|
|
1 decade ago
|
#24
|
James Bucanek
Joined: Feb 14, 2007
Messages: 1572
Offline
|
( Yikes, this really should have been it's own thread.)
Gary K. Griffey wrote:Thanks for the info...possibly, you could assist me in better understanding the available Repair options...it seemed to me that after executing the Repair operation...previously verified layers in the archive were indeed compromised by this subsequent re-Capture operation...which is very much contrary to what you have stated...possibly (very likely) a user error on my part when running the Repair.
None of the above; the previous verified data simply stopped verifying. A capture action only adds to an archive, it doesn't rewrite portions of the archive that have already been captured. If a previously captured and verified data record later fails its verification, then it's because some external cause (magnetic media failure, a glitch in the drive controller, OS bug, whatever) either lost, changed, or overwrote that data. The last thing to happen to an archive isn't necessary the cause. The various repair options (which I'll explain in a moment) aren't to blame, nor can they pull valid data out of thin air.
As you recall...I ran the Repair operation against the archive after the verify failed...
That's absolutely the correct thing to do.
I ran it using only the first default option..."Use auto-repair information"...you further stated...from examining the log files that I attached...that the Repair was successful. However, after the Repair operation completed...I opened the archive...and saw 3 of the 4 layers now showing "Damaged Files"...which turned-out to be the large virtual machine package...(the most important reason for running the backup in the first place, by the way).
This is also correct. The archive is now "repaired" in that it is valid and internally consistent. However, data was damaged. Lost data can't be magically reinvented. QRecall examined the records and concluded that the data block beloning to two versions of your VM file were lost. So the two file records that contained the damaged data blocks were expunged from the archive, and the folders that contained those files was marked as "-Damaged-" indicating where the data loss occurred.
Thus it appeared to me that the VM package in layer 2 and 1...taken 3 and 4 weeks ago respectively...had been compromised...and thus my assumption that previously verified backup layers had now been corrupted.
That's correct. The data you captured a few weeks ago became damaged at some point. The verify alerted you to the issue, and the repair recovered the files and folders that weren't damaged. Looking at the log, only a single data record was corrupted in your archive. Most likely, this impacted a single data block belonging to different versions of the file in the two earliest layers. That data didn't belong to later versions of the file, so the subsequent layers were unscathed.
Should I have run the Repair with different/additional options selected?
Not really, unless you're desperate. The repair options are:
Copy recovered content to new archive: Use this only if you do not want to touch the damaged archive in any way. It's useful for testing repair settings (since it doesn't change the original archive) or repairing an archive on read-only media.
Recover lost files: If directory records in the archive are destroyed, it may leave "orphaned" file records. That is, file records that aren't contained in any folder. This option scrapes the archive and assembles all orphaned files together in a group of special "recovered" folders. Note that this may resurrect files previously deleted by a merge action.
Recover incomplete files: If a data block belonging to a file is lost, the file is deleted. With this option turned on, the file is kept, marked as "damaged" and all of the remaining (valid) data block are assembled into a file. The file is still incomplete—the lost data is still lost—but the remaining data is recovered. The file probably isn't usable, but might contain usable data. This last option is probably the one you're most interested in, but it still can't recover the entire file. It can only recover the portion of the file data that wasn't lost.
Would this have preserved my 1st and 2nd layers?
No. Lost data is still lost data. You can't get it back once it's gone. Using the last option, you can get everything else in the file that wasn't damaged back, but that's usually of limited value.
|
- QRecall Development - |
|
|
1 decade ago
|
#25
|
Gary K. Griffey
Joined: Mar 21, 2009
Messages: 156
Offline
|
James...thanks so much for your time...I know this thread got ridiculously long...I just think the technology/methodology that you have built into your product is really superior...and my only goal is to continue using QRecall as a valuable tool in my backup "arsenal"... So...just to summarize...(I promise )...previously verified layers in the archive were indeed corrupted...but not via the last Capture operation...through some other, as of yet, unrecognized "event"...that is my final "takeaway" from your comments...correct? If so..this makes me feel completely confident in your product once again going forward...expect several $40 licenses fees from my clients in the near future...you deserve it...just for listening to me... Thanks again... GKG
|
|
|
1 decade ago
|
#26
|
James Bucanek
Joined: Feb 14, 2007
Messages: 1572
Offline
|
Gary K. Griffey wrote:So...just to summarize...(I promise )...previously verified layers in the archive were indeed corrupted...but not via the last Capture operation...through some other, as of yet, unrecognized "event"...that is my final "takeaway" from your comments...correct?
Correct. And should have let you write the reply; that was so much more succinct.
If so..this makes me feel completely confident in your product once again going forward...expect several $40 licenses fees from my clients in the near future...
Always good news.
|
- QRecall Development - |
|
|
1 decade ago
|
#27
|
Gary K. Griffey
Joined: Mar 21, 2009
Messages: 156
Offline
|
James, I am getting a QRecall application crash in beta versions 1.2v6 and now in the just released 1.2v8. It occurs when the Re-Index operation is performed. I have attached the crash log. Here is what I see. If you select the File ==> ReIndex option....the Open File dialogue is displayed...you then select an archive...click on Open...and the form disappears completely...and nothing appears....if you then walk through the same steps again...this time, the archive browser does appear and the Re-index operation continues...but you then receive the crash when you close the archive browser after the re-index completes. The re-index does seem to work...just the UI appears to crash afterwards. Thanks... GKG
Filename |
QRecall_2010-08-28-025852_Garys-Mac-Pro.crash |
Download
|
Description |
No description given |
Filesize |
33 Kbytes
|
Downloaded: |
960 time(s) |
|
|
|
1 decade ago
|
#28
|
James Bucanek
Joined: Feb 14, 2007
Messages: 1572
Offline
|
Gary K. Griffey wrote:I am getting a QRecall application crash in beta versions 1.2v6 and now in the just released 1.2v8. It occurs when the Re-Index operation is performed.
Thanks, Gary. This is a known bug. It occurs with the Repair command too. And sometimes the first time you choose either of these commands nothing happens, but the second time it works. There's also a related bug that can cause the QRecall application to crash when closing an archive window. These are caused by a change in OS X that happened around version 10.6.2 which changed the order of events that occur when opening and closing a window. While arguably an improvement, it trips QRecall up. I've been ignoring this issue because the QReacall user interface is being largely rewritten as I write this. The new code should replace the buggy code and (hopefully) won't have any of the same problems.
|
- QRecall Development - |
|
|
1 decade ago
|
#29
|
Gary K. Griffey
Joined: Mar 21, 2009
Messages: 156
Offline
|
Understood... Thanks... GKG
|
|
|
1 decade ago
|
#30
|
Chris Caouette
Joined: Aug 30, 2008
Messages: 39
Offline
|
Not to sidetrack the discussion too much but i'm starting anew with my archives now that Lion is out...and I'm not worried about older versions of files as my current setup is all I need. As a 'wish list' item I'd like to have a simple sync icon akin to what Time Machine does in the top right rather than a activity window. Thanks Chris
|
Lots of Macs here! |
|
|
|