QRecall

Mike M

I'm having some trouble understanding the behavior of QRecall. I suspect that the notion of "layers" is obfuscating things for me. Let me propose another concept, which I'll call "checkpoints."

I am running into certain questions, like trying to think about whether certain old versions, or whether an old deleted file, is still in the archive. I'd like some deleted files in certain directories to stick around a while, and others to go away quickly. I'm trying to understand how to answer these questions.

Let me propose the concept of checkpoint. If a checkpoint at time T exists in the archive, it means that the "file state" of the captured items at time T can be reconstructed.

Here is how that is different from a layer. Your documentations describes layers as containing (1) files, (2) deltas from previous layers, and/or (3) some indication that a file has been deleted.

Your layers also have numbers rather than time tags--at least that's how you always describe it in the documentation.

When layers are merged, you renumber them.

To think, then, about whether a file can be reconstructed at time T, first I have to wonder if layers have time tags. I have not gotten deeply into QRecall so I'm not sure -- however, I do know that your examples use layer numbers instead of time tags, and so it's an extra step of thinking to go from the example to the idea of a time tag. Also the numbers keep changing. It seems simpler to consider that a time tag *never* changes.

So let's say a layer is time tagged as T. Then I have to consider which of the three items mentioned above (1, 2, and 3) exist in this layer. I also have to consider what exist in earlier layers. Then I have to wrap my head around the algorithm used to reconstruct a file.

So what about the idea of checkpoints? Well, let's say there's a checkpoint at time T. I can then know, simply, that enough information is in the archive to recreate the state of all captured items at time T.

I don't have to concern myself with how that data is represented. It might be deltas, it might need to refer to prior layers, it might look at other markers in the layer... but I don't need to know any of that stuff.

Now let's consider the concept of "merging layers." This gives me a headache. I think it's actually a quite involved algorithm, and your examples have lots of parts in their diagrams.

What if, instead, we say that QRecall "deletes checkpoints"? No longer is there the concept of "merging." Instead, I can think about a "checkpoint deletion" as removing information from the archive ... and it's easy to think about that. The information lost, simply, is the state of the file system at the deleted checkpoints.

I don't have to care whether this is done via merging layers or any other algorithm. I also don't get a headache trying to think about the behavior of the system.

Now let's say that I ask myself, "hmm, does a certain deleted file exist in the archive?" The answer is, "it does if it existed at one or more checkpoints that are left in the archive." Now let's say I want to ensure that QRecall behaves in my desired manner, that is, it keeps enough useful old data, but also doesn't get too large. Well I just think about what checkpoints exist and what is the algorithm that determines the oldest possible checkpoint.

One statement in the documentation about the rolling merge is, "all layers older than <something> are merged into a single layer." This is another statement that gives me a headache. I have a question at this point, if you could answer it. Can you put that in terms of checkpoints? Does this mean that any checkpoint older than the oldest time frame in the rolling merge is deleted?

One last comment about rolling merges. If I reframe the behavior in terms of checkpoints, here is how I envision it. Let's say I have a daily capture. This creates checkpoints at times T_1, T_2, T_3, ... etc, where these times are 24 hours apart. Imagine that we have a timeline, and that when QRecall creates a checkpoint, it's like dropping a bread crumb onto the timeline. Let's get whimsical and imagine a path in the forest. Hansel and Gretel drop a breadcrumb every 24 hours.

Now what is a rolling merge? It's basically a bird that comes along and gobbles up some of the crumbs. Let's say I keep 28 daily layers. That means the bird ignores the most recent 28 checkpoints. But the bird looks at checkpoints older than that, and starts gobbling them up. If I have weekly layers, then the bird gobbles up 6 out of every 7 checkpoints, leaving some bread crumbs a week apart.

You could think of it like this. Six days of the week, the bird gobbles up the bread crumb that is **29** (28 + 1) days old. But one day of the week, he leaves it alone.

It's a little different from the perspective of the internals of QRecall. My way of expressing it is "a checkpoint disappears," which I find to be clear. But you, as the programmer of QRecall, have to think about what that means in terms of removing and merging state in the archive. My primary point here is that QRecall's algorithm should be decoupled from the user's thought process.

Now let me stop here and ask, is this actually an accurate way of thinking about it?

Mike

James Bucanek

Mike,

I can appreciate that some of these concepts give you headaches. Filesystems are hard.

I'll address some of you specific questions, but for the most part you can think of layers as "checkpoints" or "snapshots" or whatever concept you find easiest to deal with.

In the case where you have a single capture action, so that you capture the exact same set of files each time, and nothing else, most of these concepts can be equated. Specifically:

Here is how that is different from a layer. Your documentations describes layers as containing (1) files, (2) deltas from previous layers, and/or (3) some indication that a file has been deleted.

Start by ignoring the implementation details. How QRecall represents files and folders in a layer is really immaterial. Conceptually, each layer contains a complete copy of every item captured. This is your "checkpoint."

(In reality, a layer doesn't contain anything but references to unique database records, which contain the data and metadata of those items; the reason it's done that way is because it allows QRecall to store all of this data in the minimal amount of space possible.)

Your layers also have numbers rather than time tags--at least that's how you always describe it in the documentation.

Layer numbers are simply convenient labels that make referring to them in the interface, on the command line, or in actions, simple and easy to understand. Every layer has a date, the date it was captured. This date appear in the layer pane of the archive browser.

It's also useful to know that each item has a capture date (which you can see in the inspector panel). This is the exact moment in time that specific item was captured.

So what about the idea of checkpoints? Well, let's say there's a checkpoint at time T. I can then know, simply, that enough information is in the archive to recreate the state of all captured items at time T.

That's a layer. If an item exists in a layer, then that item can be recalled by rewinding the archive back to that layer.

This is easily visualized using the browser timelines. Select a captured item in the browser, and QRecall will draw its timeline back through the layers where that item was captured, recaptured, or simply existed in. Conceptually, if a timeline intersects a layer, that item "exists" in that layer.

I don't have to concern myself with how that data is represented. It might be deltas, it might need to refer to prior layers, it might look at other markers in the layer... but I don't need to know any of that stuff.

Now you're getting the idea.

Now let's consider the concept of "merging layers." This gives me a headache. I think it's actually a quite involved algorithm, and your examples have lots of parts in their diagrams.

What if, instead, we say that QRecall "deletes checkpoints"? No longer is there the concept of "merging." Instead, I can think about a "checkpoint deletion" as removing information from the archive ... and it's easy to think about that. The information lost, simply, is the state of the file system at the deleted checkpoints.

I don't have to care whether this is done via merging layers or any other algorithm. I also don't get a headache trying to think about the behavior of the system.

Again, if you limit the example to a single, uniform, capture action that captures the same set of files every time, then these concepts are equivalent. And if that makes you're life easier, then use that.

Most of the rest of your description is accurate. Basically, if an item exists in a layer then you can recall that item at some future date. Rolling merges eliminate intermediate layers/checkpoints so that only the last captured version in any particular timespan is retained. Whether you imagine layers being merged or checkpoints being deleted, the results are the same.

Now let me stop here and ask, is this actually an accurate way of thinking about it?

It is, as long as your layers remain simple. But layers can get complicated.

Consider capturing your whole startup volume at 3:00, then repeatedly capturing just your home folder every hour during the day. At the end of the day you merge all of those layers together. What do you have?

You have an interesting mixture of items captured at 3:00 (your applications) and newer items captured as late as 23:00 (in your Documents folder). That's because merging is just that; you can't think of it as "deleting" all of the earlier layers, because there's data in the very first layer that's not superseded by subsequent layers.

Now take another example of two volumes, or even two separate computer systems. Volume "A" is captured to layer 1. Later, volume "B" gets captured to layer 2. You then merge those two layers. What information was deleted?

The answer is nothing. The new layer contains a complete copy of both volumes "A" and "B" because none of the items in those layer sets intersect.

Sorry if that makes you're head hurt.

Mike M

I see where the checkpoint model, or rather my idea that merging layers can be reframed as "deleting checkpoints," gets more complicated. Very interesting, and I think it was a useful thought exercise for me to write that out. Now I am in a better position to take in the information you gave at the very end. I will look at that tonight or tomorrow.

Also, sometimes when I'm dealing with a complicated concept, I need to make a diagram myself. If I find the way of arranging the diagram that makes it most appealing and clear to me, then I have learned something.

I am a math tutor and that's a trick I use sometimes: ask the student to find their own way of writing out and expressing the concept they just learned. Or ask them to pretend they are teaching it to me.

Mike

James Bucanek

Mike M wrote:Or ask them to pretend they are teaching it to me.

There's nothing that focuses one's learning like having to teach it.

Let me know if you need to teach me anything else.