A highly-available move opertaion for replicated trees [pdf] (opens in new tab)

(martin.kleppmann.com)

41 pointsa7b3fa4y ago13 comments

13 comments

Super cool. Martin Kleppmann, the lead author on this paper, is the author of Designing Data-Intensive Applications, which I haven't read but have heard pretty awesome things about.

What I have done: watched his other talks about CRDTs (conflict-free replicated data types). I'd recommend watching this video [1] for an overview of what CRDTs are, why he cares about them, and open problems (one of which he solves in this paper!).

A (potentially inaccurate) TLDR: local-first collaboration software presents is a compelling alternative to the current way collaboration is implemented in a client-server model. Specifically: if we build tooling that makes it easy to build collaborative experiences into your app without a server, then we'll potentially be able to achieve the goals of 'decentralization' (or, fuck the middleman or whatever you wanna call it) in some real sense. Note that I'm not talking about crypto coins here!

Very cool, very exciting!

[1] https://www.youtube.com/watch?v=Qytg0Ibet2E

narush4y ago

Ok, having read through the core of the algorithm, cool! Seems like the main key here is that move operations have timestamps associated with them, and this allows you to effectively linearize the operations (and ignore ones that result in invalid state).

The tradeoff: you need to store all the operations you've ever seen, as you need to be able to detect conflicts between old ones.

Question for the authors (if they are around): any ideas for how we might get to efficient pruning of old operations?

My first thought was that you could have some notion of epoch, and once you move beyond that epoch, you accept no new operations from it. But then you realize that this requires nodes to agree on moving between epochs, which seems like it might require some fancy consensus protocol / a known set of nodes / mad complexity.

I don't have any ideas. Do you all have some idea of how we might be able to prune this old log? Or even sketches of ideas?

a7b3faOP4y ago

The authors actually do briefly mention this concern in the section titled 3.7 Algorithm extensions (under Log truncation):

> However, in practice it is easy to truncate the log, because apply_op only examines the log prefix of operations whose timestamp is greater than that of the operation being applied. Thus, once it is known that all future operations will have a timestamp greater than t, then operations with timestamp t or less can be discarded from the log.

The main issue seems to be that we need to know about all the replicas that we want to be able to synchronize with - but I guess there isn't really a way around this.

1 more reply

cdiamand4y ago

Reading through the book right now. It's really illuminating. Highly recommend to anyone trying to plug any of the gaps in their knowledge of system design.

sydthrowaway4y ago

This book seems the bible of distributed systems, but I'm still looking for a MOOC that covers Distributed Systems.

martinkl4y ago

Not quite a MOOC, but if you like video, I published my lecture series here (same as what I teach to Cambridge undergraduates): https://www.youtube.com/playlist?list=PLeKd45zvjcDFUEv_ohr_H...

petrosagg4y ago

I must be missing something. The paper describes the algorithm of the CRDT and mentions that "timestamps ’t need to be globally unique and totally ordered".

Then it mentions multiple times that Lamport clocks/timestamp can be used as the timestamp in their system but as far as I know these only give a partial order of events. How is this reconciled in their system?

a7b3faOP4y ago

My understanding[1] is that you would not use only a Lamport timestamp but rather a tuple (Lamport, actor ID), where the actor ID is a globally unique per-node ID (say, a UUID), which would then be used as a tiebreaker in cases where comparing by the Lamport timestamp alone would give an ambiguous ordering.

This should not be problematic, since the Lamport timestamps are only equal in cases where there is guaranteed to be no causal relationship between the events (i.e. the user did not see the effects of one event before submitting the next event), so it's fine to pick the ordering arbitrarily.

[1] Based on reading the Rust implementation (https://docs.rs/crdt_tree/latest/src/crdt_tree/clock.rs.html...), since I had the same question :)

dmulligan4y ago

That's right!

aseipp4y ago

Every partially ordered set S can easily be "upgraded" to a totally ordered set by simply "extending" every element of of S with an element of another set A, where A has a total ordering. The ordering on elements of A is used to break any ties when ordering the elements of S.

So for a Lamport timestamp, you go from the 1-tuple (timestamp) to the 2-tuple (timestamp, id) where 'id' is some arbitrary ordered key, unique for each process in the system. For instance every process in the system could just generate a large GUID, or a really big integer, or anything else. Then for any event e1 and e2, if the timestamps are equal, just tiebreak based on the ordering of IDs.

(This ability to "upgrade" a Lamport timestamp to have a total ordering is actually covered in Lamport's original paper at some point IIRC, but I don't have it open.)

MauranKilom4y ago

(Someone please fix the typo in the submitted title... @dang?)

j / k navigate · click thread line to collapse

13 comments

narush4y ago

Super cool. Martin Kleppmann, the lead author on this paper, is the author of Designing Data-Intensive Applications, which I haven't read but have heard pretty awesome things about.

Very cool, very exciting!

[1] https://www.youtube.com/watch?v=Qytg0Ibet2E

narush4y ago

The tradeoff: you need to store all the operations you've ever seen, as you need to be able to detect conflicts between old ones.

Question for the authors (if they are around): any ideas for how we might get to efficient pruning of old operations?

I don't have any ideas. Do you all have some idea of how we might be able to prune this old log? Or even sketches of ideas?

a7b3faOP4y ago

The authors actually do briefly mention this concern in the section titled 3.7 Algorithm extensions (under Log truncation):

The main issue seems to be that we need to know about all the replicas that we want to be able to synchronize with - but I guess there isn't really a way around this.

1 more reply

cdiamand4y ago

Reading through the book right now. It's really illuminating. Highly recommend to anyone trying to plug any of the gaps in their knowledge of system design.

sydthrowaway4y ago

This book seems the bible of distributed systems, but I'm still looking for a MOOC that covers Distributed Systems.

martinkl4y ago

Not quite a MOOC, but if you like video, I published my lecture series here (same as what I teach to Cambridge undergraduates): https://www.youtube.com/playlist?list=PLeKd45zvjcDFUEv_ohr_H...

petrosagg4y ago

I must be missing something. The paper describes the algorithm of the CRDT and mentions that "timestamps ’t need to be globally unique and totally ordered".

a7b3faOP4y ago

[1] Based on reading the Rust implementation (https://docs.rs/crdt_tree/latest/src/crdt_tree/clock.rs.html...), since I had the same question :)

dmulligan4y ago

That's right!

aseipp4y ago

(This ability to "upgrade" a Lamport timestamp to have a total ordering is actually covered in Lamport's original paper at some point IIRC, but I don't have it open.)

MauranKilom4y ago

(Someone please fix the typo in the submitted title... @dang?)

j / k navigate · click thread line to collapse