Message |
|
Ralph Strauch wrote:Are there advantages to doing occasional compactions or does qrecall work just as well with a fragmented archive, so long as there's plenty of free space on the disk?
In general, yes. It won't be as efficient as it could be, but if you have plenty of free disk space then it probably doesn't matter.
Compaction takes a long time, and there doesn't seem much point in doing it unless there are significant advantages.
The most significant benefit is that is keeps your archive compact. This makes verify and repair operations proportionally faster, and there's a slight improvement in performance overall when capturing and merging layers. Note that a compact action won't completely compact the archive unless there's at least 4% of unused space in the archive to recover. You can increase this threshold by changing the QRCompactFreeSpaceRatioMinimum advanced setting. So you can prevent the archive from doing a time-consuming compaction unless there's at least 10%, 15%, or even 25% of the archive space that's unused.
I don't think I've ever seen an entry for "free" in the archive inspector. All it seems to show is "undetermined."
Following a merge, QRecall don't actually know how much space in the archive is "free" until it performs a process known as garbage collection. This is one of the tasks of the compact action. After a compact, the "free" space value will have a known quantity (until the next merge). In earlier versions of QRecall, garbage collection was performed at the beginning of every capture. But this turned out to be awkward and too time consuming, so now it's only done by the compact action unless you set the QRCaptureFreeSpaceSweep advanced setting to true.
Yet the status window always shows an amount of "unused space" that seems to get updated regularly.
The status window remembers the last known value until it's updated again.
Are "free" and "unused space" something different, or is the inspector just not picking up the value?
No, it's more like there's a mismatch in terminology. The inspector uses "free" and the status window uses "unused", but it's the same value.
|
|
|
RJ Muna wrote:If I change the name of a folder, will QRecall recognize that change or will it try to back up what it thinks is a new folder?
QRecall organizes items by path, so it will see it as a new folder and capture it.
The folder in question holds 300GB, so if a simple change of the folder name prompts a new 300GB layer in QRecall, it will fill up my backup volume, and the Action will break.
No new data will be added to your archive. QRecall performs block-level de-duplication. It can't be fooled by renaming, moving, copying, concatenating, appending, or splitting files. If a file contains a block of data that is identical to a block of data that's already been captured (in any other file, on any volume, by any owner), that block of data is not added to the archive again.
|
|
|
Gary, Thanks for the report. The reindex failed for a completely mundane reason: the archive data is damaged. So the good news is that it didn't fail because of any weird "no such file" error. The reindex detected an invalid record during its examination of the primary data file. Since this is on a networked volume, it's not unlikely that this was a transient data error. The simple test is to reindex again and see if the error occurs at the same position in the file. It could also mean that something has overwritten or corrupted that archive data, in which case a repair is the only way to work around it.
|
|
|
Gary K. Griffey wrote:but the re-index action fails also.
That's even more bizarre. I probably don't have a solution, but I'd still like a diagnostic report that includes the failed reindex.
|
|
|
Gary, I reviewed the diagnostic report, and it's the same symptom as before. When QRecall is ready to close the archive at the end of the capture, it makes a copy of a small index file and opens the copy; except, the copy can't be found or isn't there (the error -43 you see in the log). This could be an incompatibility with a rarely used file-duplication function which has caused QRecall to run into problems with other networked file system (including Apple's own AFS). You could try deleting all of the .index files in the archive's package and then reindexing it. If the problem doesn't go away on its own—like it did last time—I'll take another look at a temporary solution. I think this problem will go away once QRecall has finished transitioning away from the now-depreciated filesystem APIs in OS X, a task that's planned to be completed before the release of Mountain Lion.
|
|
|
Gary, Thanks for the diagnostic report. I confirmed what I suspect was happening, and I think I have a fix for it. Please download and install QRecall 1.2.0a78. After a few days, please send another diagnostic report (regardless of whether the problem reappears or not). Note that a release candidate will also appear in the next few days, at which time 1.2.0a78 will want to update itself. Please postpone the update until you've let a78 run for a while and have sent the report. Thanks again, James
|
|
|
Gary K. Griffey wrote:I am just surprised that some of my other apps have not "bumped their head" on this issue...
You and me both.
I can assume then that only the Capture action attempts to read this Preference value?
Nope. Most of the QRecall processes access numerous user preference values; they read them in different ways, and at different points in their execution. Why this request for this specific value is causing the application to crash is very peculiar.
|
|
|
Gary K. Griffey wrote:I just can't believe that Apple changed that much "under the hood" in 10.8...
They only have to change one thing. I'm still not sure what the problem is. The capture crashes when it tries to read a user preference value, which is one of the simplest things an application does. I'll file a bug report when I have more info, but for now it remains a mystery.
|
|
|
Gary K. Griffey wrote:Just an update...since loading the last special version that you offered...the "Lost Connection" issue has not occurred.
I'm having the same "problem" here. My Xserve running 10.6.8 that was getting this occationally is now completely silent. So either the problem has just decided to take a holiday, or the logging code added has subtly altered the timing enough that it changes the outcome of the race condition. If it doesn't occur by tomorrow, go ahead and send a diagnostic report anyway. I can still review the message timing and see if there's a pattern. I'll probably release a new QRecall beta in a couple of days, without the special debugging code, and we'll see how that does.
|
|
|
Gary K. Griffey wrote:The error did occur again, however, this time, the verify task did not remain in "running" status.
Thanks, Gary. This definitely gets a little closer to the problem. I suspect, however, that the code to debug the problem is interfering with reproducing it. Here's a new version to try. Again, drop this in and send a diagnostic report after the problem happens again. QRecall 1.2.0a76 Thank you for your patience.
|
|
|
Odd indeed. Try this: Open a terminal window and use the ls -lA@ command to list the items in that directory and their extended attributes.
ls -la@ <path to folder> If the .dmg file lists a com.apple.FinderInfo attribute, try to delete it:
xattr -d com.apple.FinderInfo <path to file> Now try the capture again and see what happens.
|
|
|
Gary, Thanks for all of the useful info. It appears that the QRecall action process (in this case the verify) and the scheduler are struggling with a race condition. When an action is finished, it sends a "stopped" message to the scheduler. When the scheduler receives this message it knows the action is finished and can schedule the next one to start. What's happening on your mini is this: the action finishes, sends a stop message, and terminates. But the scheduler doesn't get that message right away. In fact, it doesn't appear to get it for a couple of minutes, long after the process has terminated. What happens after that is a confused mess of messages and communication errors, some of which get processed and some don't, leaving the scheduler thinking the action is still running—which is clearly is not. In an attempt to untangle what's going on, I've build a special alpha version of QRecall that logs a lot more information about the state of actions, the "stopped" message handling, and communication link errors. Please install it, wait for this to happen again, and send me another diagnostic report. QRecall 1.2.0a73 For future reference, there's also another way around this problem. An action in the activity window that is being held by the scheduler for other actions to finish can be started anyway by right-clicking (or click and hold) on the action's stop/menu button and choosing the Ignore Hold and Run command. This causes the action to ignore the scheduler's suggestion to wait and starts execution immediately.
|
|
|
Gary K. Griffey wrote:First, the scheduled Action that is active when the "Lost Connection with Helper" occurs does indeed complete succesfully. However, the Actions window continues to show it as "Running"...and subsequent scheduled Actions that use the same archive just say "Waiting" in the Monitor window...and they never run....
Good to know. That tells me that, not only did the monitor process lose its connection with the running action, but that something has happened to the scheduler too.
My fix is to reboot the mac.
That will certainly do it.
I do believe that I tried to just kill the Scheduler process once...and it did seem to change the Actions window from the "Running" status...but I was not sure if that would cause other issues...so...I am simply rebooting.
As a general rule, all QRecall processes will respond gracefully to a Quit request (not a Force Quit, just a regular Quit). When quit, the schedule and monitor processes should automatically restart.
One other odd symptom is that you only see the "Lost Connection" verbiage in the log..whereas normally, I would see it in the Monitor window. I'm not sure if that helps you or not.
It might. The "lost connection" appears in the activity window when the monitor process loses its connection. Other processes (like the scheduler) don't have a UI, so they just log the problem.
I will run the sample.sh script the next time this occurs...and send a diagnostic report.
At this point that's what I'm most interested in seeing—the state of the scheduler after this happens.
|
|
|
James Bucanek wrote:Broken communications pipes is a known problem in Lion, but I've rarely seen it in earlier versions of OS X.
Hah, wouldn't you know that not a hour after I posted this my OS X Server (running 10.6. got a "Lost communications" error. But it was just that—a loss of communications between the two processes. The log says the capture completed successfully, and the actions window shows the action is no longer running. So not quite the same situation.
|
|
|
Gary K. Griffey wrote:The real problem here is that when this connection to the helper is lost...the scheduled Action that was executing remains in "Running" status in the Action window...and therefore, subsequent scheduled Actions never run because they are "waiting" for the archive.
That really doesn't make much sense to me. If the monitor process loses communications with the helper process it gets logged as a "Lost communications with Helper" error. This is often because the helper process crashed, but it can also mean that the communications pipe between the two processes is broken and the helper process is just fine. Broken communications pipes is a known problem in Lion, but I've rarely seen it in earlier versions of OS X. The "running" status in the actions window comes from the scheduler. If the scheduler thinks the process is still running, then either the scheduler is stuck or the action really is still running (which infers that the communications pipe between the scheduler and the action is still valid). So either the action has stopped and the scheduler is confused, or the action is still running and the monitor is confused. It seems unlikely that both of those would be true, which is why I'm confused.
Thus, the machine has to be manually checked every day to see if the condition has occurred and further actions are waiting.
When this happens, what do you do? Also, I'd very much like to get a sample of your QRecall processes and a diagnostic report. The next time this happens, please do the following: (this assumes that you've upgraded to 1.2.0b69 or later) Open the Terminal application Enter the command
/Applications/QRecall.app/Contents/Resources/sample.sh
Press Return. Enter your administrator's password. When the sample.sh script is finished, open the QRecall application Choose Help > Send Report...
|
|
|