FoundationDB 6.0 released, featuring multi-region support (opens in new tab)

(foundationdb.org)

185 pointsdavelester7y ago70 comments

70 comments

Congrats to the team on the release. Using FoundationDB has been one of the most rock solid NoSQL experiences I've ever had, and I've used a lot. After having a few months to hammer my cluster with fairly low level atomic operations, I can confidently say this thing holds up to pretty much anything you want to throw at it. Coming from the land of HBase and DynamoDB, It's ability to automatically (and intelligently) repartition data based on write throughput has been an administrative breakthrough for me.

Looking forward to additional use cases I can throw at this beast of a system.

Kudos to you guys!

jinqueeny7y ago

Any chance of writing up a case study in detail to share your experience and best practice, especially the comparison with other DBs?

th1nkdifferent7y ago

Could you add some details about your scale?

monstrado7y ago

Hey, I've written some comments regarding the use case in the past you can see here:

https://news.ycombinator.com/item?id=18305446

ryanworl7y ago

To anyone who is on the fence about putting FoundationDB into production (or at least evaluating it for their use cases), what is the number one thing you think is missing or you're worried about?

i.e.

- a SQL interface,

- pre-packaged data structure libraries,

- monitoring,

- limitations of FoundationDB itself,

- etc.

I'm working on a talk for the upcoming FoundationDB Summit and I'd love to address some real-world questions or issues people have.

dividuum7y ago

One of my constant pain point with all distributed data stores is that it's really hard to find out how they behave if something breaks. Be it the network, local storage and so on. How do I find out what's wrong? Are there guides on how to fix a problem? What happens if I lost more nodes than required to automatically recover? How does backup and restore work? Any estimates on how long a restore will take? Are there failure modes that I should monitor for that might be non-obvious? This is mostly the operations side, but personally I would never use something I don't understand enough to have a good feeling of how the system works beneath the shiny surface.

And of course there's the application side: With SQL and EXPLAIN, I can usually see bottlenecks. I have a latent fear that performance with distributed systems suddenly tanks if some structure is suddenly split across nodes for example.

zzzcpan7y ago

Yeah, operations is something nobody talks about.

realreality7y ago

- transaction size and duration limitations. I can almost understand the limitation on large write transactions, but the same size limitation applies to read transactions. If you’re doing a large range read, you may not know whether your range will reach the 10MB limit, and thus raise an exception.

- the storage backends seem less impressive than the marketing leads you to believe. The default memory backend is obviously too limited to use in production, and the “ssd” backend turns out just to be built on top of the Btree code from SQLite. Besides that, the documentation warns against using the ssd backend on macOS. Isn’t that a bit strange, considering who owns foundationdb??

- while testing, I found that it was impossible to shrink a cluster. If you add a second storage node just to test that the distributed stuff works correctly, you can’t reduce it back to a single node without destroying the entire database and starting over. If it’s possible to run everything on one node, it should be possible to shrink a cluster back to a single node.

- the storage backends have a crazy amount of write amplification (something like 3x, according to the docs). The foundationdb folks should focus on improving the underlying storage, for instance by building on lmdb or RocksDB or something. For my toy app, I abstracted my data access to use either lmdb (for local testing) or foundationdb (for production), but I ultimately ended up just using lmdb because I didn’t want to deal with fdb’s limitations and operational unknowns.

- another weird fdb limitation: the best single threaded latency you’ll get is supposedly around 1ms for small reads. The docs suggest you can achieve much better performance by scaling the cluster and number of clients. That may be true, but some applications may want high single-threaded performance. (Something like lmdb can achieve tens of thousands of reads per second)

jared25017y ago

On shrinking a cluster: you'll want to use the fdbcli to "exclude" nodes. Should be pretty straight forward (search the docs for the word "exclude").

On write amplification: a factor of 3x is not actually that unusual. The default RocksDB size amplification is 2x, and I've seen performant LSM trees with about 3x write amplification.

On the single threaded bottleneck: this is an inherent issue you have when you put your database over a network connection. LMDB can do 10k/100k+ reads/sec on a single thread since it's just doing syscalls. As soon as you start to need to distribute your database across more than 1 machine you start to need to parallelize you work for high throughput.