undefined | Better HN

0 pointsjosephg2y ago0 comments

Thanks for linking my post. I really need to write a followup at some point - we’ve gotten another 2-10x speed up from when I ran those benchmarks, depending on how you measure it.

I still stand by what I wrote in that blog post. Using lists rather than trees is still a good approach. And it’s super simple to implement too. You can have a working crdt in a few dozen lines of code.

I’m happy to answer questions if anyone is curious about this stuff. Ive been working to implement and optimise CRDTs for years at this point. Increasingly I’m seeing it as a solved problem.

0 comments

practal2y ago

I've read up on CRDTs over the last two months or so (and I've come across your very helpful posts as well, of course), because I am building a collaborative editor for Practal [0].

In particular, I've invented a new simple text format for this which I call Recursive teXt (RX) [1]. The idea is to just develop a CRDT for RX. RX is naturally structured as a tree, and it seems to make sense to model a document as an A of blocks, a block as an A of lines and blocks, and a line as an A of characters. Here "A of" stands for some sort of CRDT array based on inserting via predecessor (and successor?). Each A-object (document, block, line) is referenced by its own id and stored in a purely functional tree (similar to how Redux would do it [3], and I think Automerge does it similarly).

Would be great to get your opinion on this design choice, maybe you see some obvious (or not so obvious) problems with it. One problem seems to be one that Kleppmann points out in [2, end of section 4], when you press enter in the middle of a line, so that a line is split into two lines, you have to deal with that in a special way. Similarly with splitting/joining blocks.

[0] https://practal.com

[1] https://practal.com/recursivetext/

[2] https://martin.kleppmann.com/papers/list-move-papoc20.pdf

[3] https://redux.js.org/usage/structuring-reducers/normalizing-...

josephgOP2y ago

Sounds like a very neat approach!

> One problem seems to be one that Kleppmann points out in [2, end of section 4], when you press enter in the middle of a line, so that a line is split into two lines, you have to deal with that in a special way. Similarly with splitting/joining blocks.

I was about to mention this problem. We ran into this with Google wave. The initial document model (based on an xml tree) used <line> tags for lines. We hit exactly this problem - if you press enter in the middle of a line while someone is concurrently editing that line, how does it handle those changes? The initial code had special split and join operations but nobody could figure out how to make split and join work correctly in an OT system.

Wave was over a decade ago now. I don’t know if anyone has solved this problem - all the working systems that I know of bailed on this approach. It’s much easier if you just make newline characters be an item that can be inserted or deleted like any other character. And then make lines be a higher order concept.

If you get this working (working = passing fuzz test suite), I’d love to hear about it. But the well trodden path of those who come before is to use newline characters instead.

practal2y ago

Ok, thank you, appreciate the "here be dragons"!

mattarm2y ago

Definitely interested in how you achieved another 2-10x over the btree approach. I want surprised that btree was as effective as it was, but I’d be curious to know how you squeezed a bit more out of it.

josephgOP2y ago

The btree works great, and has barely changed. I made it faster with two tricks:

1. I made my own rope library (jumprope) using skip lists. Jumprope is about 2x faster than ropey on its own. And I have a wrapper around the skip list (called “JumpropeBuf” in code) which buffers a single incoming write before touching the skip list. This improves raw replay performance over ropey by 10-20x iirc.

2. Text (“sequence”) CRDTs replicate a list / tree of fancy “crdt items” (items with origin left / origin right / etc). This special data structure needs to be available both to parse incoming edits and generate local edits.

Turns out that’s not the only way you can build systems like this. Diamond types now just stores the list of original edits. [(Edit X: insert “X” position 12, parent versions Y, Z), …]. Then we recompute just enough of the crdt structure on the fly when merging changes.

This has a bunch of benefits - it makes it possible to prune old changes, it lowers memory usage (you can just stream writes to disk). The network and disk formats aren’t dependant on some weird crdt structure that might change next week. (Yjs? RGA? Fugue?). File size is also smaller.

And the best bit: linear traces don’t need the btree step at all. Linear traces go as fast as the rope. Which - as I said above, is really really fast. Even when there are some concurrent edits and the btree is created, any time the document state converges on all peers we can discard all the crdt items we generated so far and start again. Btrees are O(log n). This change essentially keeps resetting n, which gives a constant size performance improvement.

The downside is that the code to merge changes is more complex now. And it’s slower for super complex traces (think dozens of concurrent branches in git).

I’m writing a paper at the moment about the algorithm. Should be up in a month or two.

j / k navigate · click thread line to collapse

0 comments

practal2y ago

I've read up on CRDTs over the last two months or so (and I've come across your very helpful posts as well, of course), because I am building a collaborative editor for Practal [0].

[0] https://practal.com

[1] https://practal.com/recursivetext/

[2] https://martin.kleppmann.com/papers/list-move-papoc20.pdf

[3] https://redux.js.org/usage/structuring-reducers/normalizing-...

josephgOP2y ago

Sounds like a very neat approach!

If you get this working (working = passing fuzz test suite), I’d love to hear about it. But the well trodden path of those who come before is to use newline characters instead.

practal2y ago

Ok, thank you, appreciate the "here be dragons"!

mattarm2y ago

josephgOP2y ago

The btree works great, and has barely changed. I made it faster with two tricks:

The downside is that the code to merge changes is more complex now. And it’s slower for super complex traces (think dozens of concurrent branches in git).

I’m writing a paper at the moment about the algorithm. Should be up in a month or two.

j / k navigate · click thread line to collapse