ScyllaDB: Drop-in replacement for Cassandra that claims to be 10x faster (opens in new tab)

(scylladb.com)

114 pointshaint10y ago93 comments

93 comments

Very nice.

Broadly speaking, this is the correct style of architecture for a database engine on modern hardware. It is vastly more efficient in terms of throughput than the more traditional architectures common in open source. It also lends itself to elegant, compact implementations. I've been using similar architectures for several years now.

While I have not benchmarked their particular implementation, my first-hand experience is that these types of implementations are always at least 10x faster on the same hardware than a nominally equivalent open source database engine, so the performance claim is completely believable. One of my longstanding criticisms of open source data infrastructure has always been the very poor operational efficiency at a basic architectural level; many closed source companies have made a good business arbitraging the gross efficiency differences.

acconsta10y ago

Agreed, but which architectural features are you referring to?

jandrewrogers10y ago

Over the last decade, the distributed system nature of modern server hardware internals has become painfully evident in how software architectures scale on a single machine. The traditional approaches -- multithreading, locking, lock-free structures, etc -- are all forms of coordination and agreement in a distributed system, with the attendant scalability problems if not used very carefully.

At some point several years ago, a few people noticed that if you attack the problem of scalable distribution within a single server the same way you would in large distributed systems (e.g. shared nothing architectures) that you could realize huge performance increases on a single machine. The caveat is that the software architectures look unorthodox.

The general model looks like this:

- one process per core, each locked to a single core

- use locked local RAM only (effectively limiting NUMA)

- direct dedicated network queue (bypass kernel)

- direct storage I/O (bypass kernel)

If you do it right, you minimize the amount of silicon that is shared between processes which has surprisingly large performance benefits. Linux has facilities that make this relatively straightforward too.

As a consequence, adjacent cores on the same CPU have only marginally more interaction with each other than cores on different machines entirely. Treating a single server as a distributed cluster of 1-core machines, and writing the software in such a way that the operating system behavior reflects that model to the extent possible, is a great architecture for extreme performance but you rarely see it outside of closed source software.

As a corollary, garbage-collected languages do not work for this at all.

8 more replies

z9210y ago

Basically ditched Java in favor of C++, and used a C++ framework called Seastar.

"The Scylla design, right, is based on a modern shared-nothing approach. Scylla runs multiple engines, one per core, each with its own memory, CPU and multi-queue NIC."

http://www.scylladb.com/technology/architecture/

superpaul10y ago

I agree as well, on the networking aspect since based on their diagram they are utilizing Intel NICs which Intel provides this DPDK to bypass the kernel space and access the hardware from the application itself.

Now my question is how portable Scylla be in terms of NIC vendors?

vitalyd10y ago

I believe dpdk supports non-Intel NICs as well. Would also be interesting to compare scylla with kernel networking vs Cassandra.

acconsta10y ago

It's exciting to finally see this. Cassandra's strengths were in its distributed architecture (no master, tunable consistency, etc.). The database engine itself has always been a bit of a mess (https://issues.apache.org/jira/browse/CASSANDRA-8099).

nattaylor10y ago

>The Scylla design, right, is based on a modern shared-nothing approach. Scylla runs multiple engines, one per core, each with its own memory, CPU and multi-queue NIC. We can easily reach 1 million CQL operations on a single commodity server. In addition, Scylla targets consistent low latency, under 1 ms, for inserts, deletes, and reads.

Interesting. From: http://www.scylladb.com/technology/architecture/

whalesalad10y ago

So on virtualized hardware, namely AWS, I'm sure the benchmarks won't be so magnificent. Needing a dedicated nic per core is a big deal unless you're at a pretty large scale.

jandrewrogers10y ago

A modern Ethernet chipset has a large number of independent hardware queues. These can be assigned to VMs for direct access to the NIC, bypassing the hypervisor. AWS, since you used that example, offers instances with this type of direct bypass.

Just to pull an example from memory, the ubiquitous Intel 82599 10GbE NIC silicon has up to 128 TX and RX queues in hardware. IIRC, these are bundled in pairs for direct access in virtualized environments, so in principle you could have 64 virtual cores each with their own dedicated physical hardware queue. This is almost certainly what they were talking about. That is the whole point of this feature in Ethernet silicon; it gives cores (virtual or physical) dedicate network hardware off a single NIC.

1 more reply

vitalyd10y ago

Not dedicated NIC per core, but multi-queue NIC having its queues serviced by dedicated cores.

acconsta10y ago

How big do you have to be to lease hardware?

1 more reply

jhugg10y ago

"A Cassandra compatible NoSQL column store, at 1MM transactions/sec per server."

Personal pet-peeve of mine. Using "TPS" or "Transactions/sec" to measure something that is in no way transactional. Maybe ops/sec, reads/sec, updates/sec, or something...

JoeAltmaier10y ago

Add my pet peeve: not listing latency stats. Big Tables does millions of ops/sec but it can take 5(!) seconds to complete one. That's the stat that matters to customers.

lucindo10y ago

http://www.scylladb.com/technology/cassandra-vs-scylla-laten...

2 more replies

mappu10y ago

Numbers look great, but so do /dev/null's. What guarantees does it make?

Has it been through Jepsen yet?

dorlaor10y ago

It's planned. However, I don't believe we'll pass it today. We're targeting GA for Jan and we'll give it our best shot.

acconsta10y ago

Yeah, I was wondering about that. It looks like you guys have done some brilliant work with the storage engine, but reimplementing all the distributed logic is another (possibly bigger) project.

acconsta10y ago

Honestly, Cassandra's Jepsen didn't set a high bar:

https://aphyr.com/posts/294-call-me-maybe-cassandra/

cbsmith10y ago

Except that problem has been largely addressed now.

3 more replies

mbfg10y ago

Wait, did I read that right? the test was with (1) one server? What's the point of that? Smells like a cooked up test.

glommer10y ago

The point of that is to show how efficient a node can be, because that is what is replaced.

All the external facing things for scylla is the same as Cassandra. That includes all the ring stuff and all network protocols.

So you should expect similar cluster behavior.

mbfg10y ago

>> So you should expect similar cluster behavior.

i would expect nothing.

If theire numbers were astounding with a 10, 100, 1000 node cluster, they would have published numbers with such set-ups. I call shenanigans on a report that is purposely out of line with the expected use case.

kcw3921710y ago

Cassandra is an open source distributed database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure.......

There is nothing commodity about a server with 128GB RAM.

When you introduce other nodes, you get chatter and network traffic....

2 more replies

domlebo7010y ago

Literally zero mention that in the event of a network partition they will just drop messages on the floor. This is fine as a cache, perhaps replacing Redis... but as a Cassandra replacement this is pretty scary

vitalyd10y ago

Where did you get this from? I hope that's not a conclusion from the benchmark doing single server load testing.

nnx10y ago

Source?

prohor10y ago

The license is Affero GPL, which means you need to open-source your code even if you use it for a service. "Traditional" GPL was effective only while redistributing. That means you would need to go for commercial license whenever you build a service on it. Which in fact is a fair approach for a business model when there is a company behind an open source project. Especially that this time there is no lock-in. You could always come back to Cassandra.

mappu10y ago

The virality doesn't cross the database interface layer.

Modifications to the database software must be shared, yes, but your client application is outside the reach of the AGPL and can remain proprietary.

philipov10y ago

Wow, that autoadvancing website is a deal breaker.

dmarti10y ago

ScyllaDB web person here. If I made it so that you could block one script and get a home page without the horizontally scrolling thingy (but have all the other JS stuff work including syntax highlighting and graphs), would you come back? ( dmarti@scylladb.com )

eip10y ago

NoScript FTW

kcw3921710y ago

which JVM did they use? What was the flags passed to the JVM?

dschiptsov10y ago

Finally, back to sanity of great old-school products, like Informix, by dropping Java (the whole scam) for C++14 and by paying attention to details of an underlying OS (again).

Same trend, by the way, is in Android development.

_Codemonkeyism10y ago

10x speedup (same algorithms, same architecture) replacing Java with C++ is not possible (~2x at max).

One of the latest benchmarks I've seen is "Comparison of Programming Languages in Economics" [1] for code without any IO just number crunching, has a 1.91 to 2.69 speedup of using C++ compared to Java. So any code involving IO is going to be slower.

Replacing bad Java code with excellent machine aligned C++ a 10x speedup is possible.

[1] https://github.com/jesusfv/Comparison-Programming-Languages-...

ryeguy10y ago

You are placing way too much weight in microbenchmarks. You simply can't use them to make a sweeping statement like you just did. Writing code that is identical to one another from language to language is not idiomatic and is not representative of how you would write each in a large scale project such as cassandra.

Java has a ton of overhead that C++ doesn't. Each object has metadata which results in more "cold data" in the cache. Each object is a heap allocation (unless you're lucky enough to hit the escape analysis optimization), which again leads to less cache locality because things are distributed around memory. Then there's the garbage collector. Then bounds checking.

1 more reply

glommer10y ago

It doesn't come from the choice of language. It comes from the choice of architecture. C++ is a tiny piece of the puzzle. It would have been hell to implement such an architecture in Java bit this is as far as the language matters.

cbsmith10y ago

It's particularly flawed given:

a) IO is such a large portion of the problem b) Hypertable isn't just way, way faster.

3 more replies

_Codemonkeyism10y ago

[Edit: The LMAX guys showed how much more performance is possible with aligning code with CPU/hardware (in this case for Java)

http://mechanical-sympathy.blogspot.de/ ]

1 more reply

finalight10y ago

will this be the next docker in the nosql database?

superpaul10y ago

I don't even see any connection how could this be the next docker in the nosql world... in other words i didn't get at all what you mean...

ketralnis10y ago

What does this mean?

mappu10y ago

It's nonsensical buzzwords.

I guess the poster's underlying question is "will this database become hyped as the Next Big Thing"

j / k navigate · click thread line to collapse

93 comments

jandrewrogers10y ago

Very nice.

acconsta10y ago

Agreed, but which architectural features are you referring to?

jandrewrogers10y ago

The general model looks like this:

- one process per core, each locked to a single core

- use locked local RAM only (effectively limiting NUMA)

- direct dedicated network queue (bypass kernel)

- direct storage I/O (bypass kernel)

As a corollary, garbage-collected languages do not work for this at all.

8 more replies

z9210y ago

Basically ditched Java in favor of C++, and used a C++ framework called Seastar.

"The Scylla design, right, is based on a modern shared-nothing approach. Scylla runs multiple engines, one per core, each with its own memory, CPU and multi-queue NIC."

http://www.scylladb.com/technology/architecture/

superpaul10y ago

Now my question is how portable Scylla be in terms of NIC vendors?

vitalyd10y ago

I believe dpdk supports non-Intel NICs as well. Would also be interesting to compare scylla with kernel networking vs Cassandra.

acconsta10y ago

nattaylor10y ago

Interesting. From: http://www.scylladb.com/technology/architecture/

whalesalad10y ago

So on virtualized hardware, namely AWS, I'm sure the benchmarks won't be so magnificent. Needing a dedicated nic per core is a big deal unless you're at a pretty large scale.

jandrewrogers10y ago

1 more reply

vitalyd10y ago

Not dedicated NIC per core, but multi-queue NIC having its queues serviced by dedicated cores.

acconsta10y ago

How big do you have to be to lease hardware?

1 more reply

jhugg10y ago

"A Cassandra compatible NoSQL column store, at 1MM transactions/sec per server."

Personal pet-peeve of mine. Using "TPS" or "Transactions/sec" to measure something that is in no way transactional. Maybe ops/sec, reads/sec, updates/sec, or something...

JoeAltmaier10y ago

Add my pet peeve: not listing latency stats. Big Tables does millions of ops/sec but it can take 5(!) seconds to complete one. That's the stat that matters to customers.

lucindo10y ago

http://www.scylladb.com/technology/cassandra-vs-scylla-laten...

2 more replies

mappu10y ago

Numbers look great, but so do /dev/null's. What guarantees does it make?

Has it been through Jepsen yet?

dorlaor10y ago

It's planned. However, I don't believe we'll pass it today. We're targeting GA for Jan and we'll give it our best shot.

acconsta10y ago

Yeah, I was wondering about that. It looks like you guys have done some brilliant work with the storage engine, but reimplementing all the distributed logic is another (possibly bigger) project.

acconsta10y ago

Honestly, Cassandra's Jepsen didn't set a high bar:

https://aphyr.com/posts/294-call-me-maybe-cassandra/

cbsmith10y ago

Except that problem has been largely addressed now.

3 more replies

mbfg10y ago

Wait, did I read that right? the test was with (1) one server? What's the point of that? Smells like a cooked up test.

glommer10y ago

The point of that is to show how efficient a node can be, because that is what is replaced.

All the external facing things for scylla is the same as Cassandra. That includes all the ring stuff and all network protocols.

So you should expect similar cluster behavior.

mbfg10y ago

>> So you should expect similar cluster behavior.

i would expect nothing.

kcw3921710y ago

There is nothing commodity about a server with 128GB RAM.

When you introduce other nodes, you get chatter and network traffic....

2 more replies

domlebo7010y ago

vitalyd10y ago

Where did you get this from? I hope that's not a conclusion from the benchmark doing single server load testing.

nnx10y ago

Source?

prohor10y ago

mappu10y ago

The virality doesn't cross the database interface layer.

Modifications to the database software must be shared, yes, but your client application is outside the reach of the AGPL and can remain proprietary.

philipov10y ago

Wow, that autoadvancing website is a deal breaker.

dmarti10y ago

eip10y ago

NoScript FTW

kcw3921710y ago

which JVM did they use? What was the flags passed to the JVM?

dschiptsov10y ago

Finally, back to sanity of great old-school products, like Informix, by dropping Java (the whole scam) for C++14 and by paying attention to details of an underlying OS (again).

Same trend, by the way, is in Android development.

_Codemonkeyism10y ago

10x speedup (same algorithms, same architecture) replacing Java with C++ is not possible (~2x at max).

Replacing bad Java code with excellent machine aligned C++ a 10x speedup is possible.

[1] https://github.com/jesusfv/Comparison-Programming-Languages-...

ryeguy10y ago

1 more reply

glommer10y ago

cbsmith10y ago

It's particularly flawed given:

a) IO is such a large portion of the problem b) Hypertable isn't just way, way faster.

3 more replies

_Codemonkeyism10y ago

[Edit: The LMAX guys showed how much more performance is possible with aligning code with CPU/hardware (in this case for Java)

http://mechanical-sympathy.blogspot.de/ ]

1 more reply

finalight10y ago

will this be the next docker in the nosql database?

superpaul10y ago

I don't even see any connection how could this be the next docker in the nosql world... in other words i didn't get at all what you mean...

ketralnis10y ago

What does this mean?

mappu10y ago

It's nonsensical buzzwords.

I guess the poster's underlying question is "will this database become hyped as the Next Big Thing"

j / k navigate · click thread line to collapse