Please stop calling databases CP or AP (opens in new tab)

(martin.kleppmann.com)

200 pointsmartinkl11y ago34 comments

34 comments

The article convinced me that talking about databases as CP or AP is not very useful.

The article did not convince me that CAP-centered thinking is itself harmful or counterproductive. It's true that C-Consistency and A-Availability are relatively blunt definitions in a landscape of diverse and subtle possible semantics. But their usefulness is that they represent our intuitive notions of what a database should do.

If it were possible and had good performance, every single database system would provide C, A, and P, because it would make distributed databases much easier to reason about and program against. Intuitively, it really seems like a database ought to read the data that was most recently written. And it sure would be nice if a distributed database could somehow magically remain available and consistent even in the presence of a partition.

C, A, and P are useful because they are a yardstick against which we can measure a distributed database's actual capabilities against what a distributed database could do in an ideal world. The real world will fall short of that, and the CAP theorem gives a vocabulary for which part(s) of the ideal world real databases cannot satisfy.

For example, even in this article CAP terminology is used to describe interesting edge cases in ZooKeeper which, despite having generally strong consistency guarantees, does not always guarantee full capital-C Consistency.

seliopou11y ago

> The article did not convince me that CAP-centered thinking is itself harmful or counterproductive. It's true that C-Consistency and A-Availability are relatively blunt definitions in a landscape of diverse and subtle possible semantics. But their usefulness is that they represent our intuitive notions of what a database should do.

I duno, I think the two major points of the article were: 1. CAP-centered thinking can be harmful or counterproductive; and 2. by virtue of the fact that most traditional database systems don't fall into CP/AP categories, maybe those categories don't really have much to do with what we expect of databases. One could go a step further and make a case that 1 and 2 imply that CAP-centered thinking for many use-cases is harmful or counterproductive.

But I'm not gonna do that here. I do not want that hot potato.

tracker111y ago

It really depends on the amount of writes your database needs to handle, and the overall quantity of data. There are very strong CAP systems (Zookeeper comes to mind), but I wouldn't want that for very large quantities of data, as it wouldn't be able to keep up.

For example, if you are doing logging for a few million simultaneous users (say 5-8 per resource request across service layers), a single system wouldn't be able to keep up, and definitely a single system that has to coordinate each write as an atomic/consistent state.

The fact is, depending on your needs, you will need to sacrifice something to reach a scale needed by some systems and to minimize down time. It's a trade-off. And any distributed database will have real-world weaknesses.

haberman11y ago

> There are very strong CAP systems

There are not strong CAP systems -- that is the whole point of the CAP theorem. It's impossible. Am I misunderstanding you?

1 more reply

jbangert11y ago

very broad theorems like CAP (and e.g. The halting problem) are still very useful for proving things by reduction. Instead of talking about nodes failing, you could talk about a network partition isolating just that one node (sure, the node might process some things on its own, without input, or not --because it crashed-- but for CAP that is totally out of scope, and part of the underlying algorithm). Instead of talking about multi object transactions, it talks about anything that implements a register - which includes multi object transaction systems.

CAP therefore shows what real-life trade offs are feasible in the first place ( of course, there are still systems allowed by CAP that might be impossible for other reasons).

mjb11y ago

Exactly.

CAP is one of the borders of the space. It's popular because it's relatively easy to look at a claim and see if it's on the "possible" or "impossible" side of the border. Tools like that (as in your halting problem example) are extremely useful, because they can be used to cut through a lot of bullshit masquerading as subtlety.

On the other hand, as OP's post does a good job of explaining, CAP is not a good tool for exploring the area inside the boundary. We often use it to do that, and it leads to all sorts of awkwardness and miscommunication. I think OP captured that well, and it's an area I'd really like to see some alternatives getting popularized (harvest and yield, PACELC, etc). Simplified rules are useful in this space, because while it would be very useful for all practitioners to understand the whole taxonomy, there's a lot of material to understand there.

martinklOP11y ago

My beef with CAP is not that it's so broad, but that it's so narrow. It actually only proves a fairly small class of systems as impossible, and it's kinda obvious. Literature going back to the 1970s suggests that the trade-off was well known back then -- just not under such a catchy name.

Impossibility results are great, but misinterpreting them and using them to argue stuff to which they simply don't apply is not so great.

jellofiend8411y ago

What I dislike about articles like this is the author doesn't really talk about what we should be calling databases.

While calling one CP or AP may not correctly capture all the nuance of what those terms really mean it does give the user a quick idea of the capabilities of the database without having to go into a mutli-paragraph technical breakdown.

It's like describing a house as "orange", there are a million shades of "orange" and it might actually be closer to red or yellow but it gives a quick idea of what the house looks like. I feel the description CP and AP does the same, it gives a quick overview of the capabilities and if that interests you, one should be able to dig in deeper to the more technical details.

martinklOP11y ago

There are some pointers to better terminology at the end of the post — e.g. I like Terry's session guarantees (read-your-writes, monotonic reads, etc) — but they too only cover a small subset of the design space.

In the end, this stuff is still evolving, and we simply don't yet know the best way of describing things. I wouldn't want to propose some alternative classification scheme that would be no better than the one it replaces. But I do want to encourage people to be precise and thoughtful with the terminology they use.

jellofiend8411y ago

I agree with most everything you've said, but until we have good concise descriptions I think it is unfair to really say: "Never use these".

Quick less-precise overviews serve a real purpose and are very helpful to users and decision makers.

I guess my opinion is projects should never ONLY say it is CP or AP, but I think it is perfectly acceptable in your project overview to say: This database is CP*. Then have a longer technical discussion somewhere else where they use more precise and thoughtful language.

I understand and agree with the sentiment, but we humans need fuzzy short overviews. Then again a blog post titled: "All databases need an in-depth technical discussion page" doesn't quite capture the imagination.

aidenn011y ago

Nearly no distributed databases are either CP or AP in their default configuration, and CP and AP have very specific meanings, so it's more like saying a house is Pantone 300[1], when it's really some shade of cyan.

1: http://www.pantone.com/pages/pantone/colorfinder.aspx?c_id=6...

jellofiend8411y ago

Definitions are flexible and communities collectively decide what words actually mean. Sure if you are talking to PhD Data Scientists CAP have incredibly specific meanings.

I think it is clear that the much broader user/marketing/coder community has a much less rigid definition of these terms. In my opinion, fighting against terms or ideas getting watered down is a lost cause. It inevitably happens and there is very little that can be done to stop it.

As I said in another reply really the battle cry should be something along the lines of "Every database needs a real thoughtful precise technical breakdown" not "Stop using terminology that gives a brief, if imprecise, overview of the product". Because the later is almost certainly not going to happen.

1 more reply

ddispaltro11y ago

It's natural to reduce things to simple terms. I think CAP theorem caught on because previously people didn't know how to characterize distributed systems in an easy way and this gave them a framework and vocabulary to do it.

It's the same reason why companies create some index in their product (like New Relic AppDex), to take something complex and simplify it so it can be compared.

antirez11y ago

The article is good, however probably there is some issue here: "so quorum operations are not CAP-available (at least temporarily, until the database sets up additional replicas on the minority side)."

Availability is an eventual property, so technically speaking if the minority partition within a non-infinite amount of time always recovers and is able to provide a reply for our query, it is Available from the POV of CAP. However this requires the DB to be always able to create new replicas in the minority side and eventually reply to the query, and I'm not sure this is the case in Dynamo-alike systems when there is a single-node partition.

simongui11y ago

What you describe is not how quorums work in the Dynamo paper or for example in Riak (a Dynamo implementation). It does not require the DB to create replicas on the minority side of a partition. The Dynamo paper (and Riak implements this) includes Sloppy Quorums to increase availability.

MCRed11y ago

I agree. I feel it's time for a better, modern, clear elucidation of these issues- something of which the CAP theorem would be a subset.

And on top of that, I'd love a simple test harness that will run a distributed database thru its paces and produce a report.

Call it Bender, or something and then we can talk about the Bender characteristics and the Bender Report and say "this database passed the Bender report" or "bender found this database fails in this way during partitions: A & B".

This would make it relatively easy to compare systems relatively objectively.

Too often well meaning people have chosen poor databased for production systems (including bitcoin financial transactions companies!) because they didn't understand the issues well enough.

Such an open source project would be open to criticism and improvement and hopefully reasonable objective, and thus take it out of the realm of vendor propaganda and into the realm of broad understanding.

muaddirac11y ago

Jepson seems to do a lot of what you want in Bender: https://github.com/aphyr/jepsen and https://aphyr.com/tags/jepsen

karmakaze11y ago

Exactly saying something isn't CA or CP isn't as good as quantifying what happens. Two letters doesn't cut it--show me the numbers.

ble11y ago

Impossibility results are really good for proving that one or more things are impossible.

The CAP theorem is an impossibility result.

josteink11y ago

Looking at the discussion here in this thread, among these fine "experts", it seems a majority believe there are "CAP capable" systems out there.

Well I'm certainly impressed.

jongraehl11y ago

The article presents a really dumb argument: that because CAP only proves it's impossible to have CP and AP for a single record, and we care about more than one record, we shouldn't use the CAP impossibility result.

> The CAP system model is a single, read-write register – that’s all. For example, the CAP theorem says nothing about transactions that touch multiple objects: they are simply out of scope of the theorem, unless you can somehow reduce them down to a single register. [WHICH YOU CAN. DUH. SINGLE-REGISTER IMPOSSIBILITY OF CONSISTENCY APPLIES TO ALL N>=1]

vogan4211y ago

When a partition happens, you cannot communicate, you can choose to be available, or consistent.

If you choose to be available, for all operations at all disconnected sites, you may need to resolve conflicts in some fashion when or if the partition resolves.

If you choose to be consistent, you may need to deny operations to some portion of sites or for some portion of operations.

I find the ATM example to be very useful, I also try to whenever possible understand the system in terms of ledgers and accounting...they have been solving these problems also for a long time.

The formal definitions are very useful, and I would argue have unfortunately been lost in the noise. People are often trying to understand and compare many different and sometimes quickly evolving technologies, and these working definitions and misinterpretations seem to be largely related to misunderstandings or in some cases fit to the terminology of a specific vendor (which can be very confusing!)

It is also very useful to test your use cases and what you can tolerate in terms of consistency, latency, and availability with specific implementations.

I think a lot of the problem is that marketing material and documentation have collided in a similar fashion to reporting and advertising.

The unfortunate message seems to be: buyer beware

I think a similar thing has happened probably many times before, relational databases for instance and isolation levels are rarely discussed in terms of the formal definitions and much more often the vendor specific definitions.

My maybe much weaker message/ take away would be: define the terms that may be ambiguous so that you can get to the real fun of the ideas !

Dylan1680711y ago

It irks me whenever something trying to clarify CAP insists that you can't choose CA. You can. In the real world you can't avoid partitions, but you don't have to tolerate them.

The rule for Availability is that non-failing nodes must respond. If you detect a partition, you can fail the node. Make a system that shuts down when it loses a connection and you can be CA.

fred25611y ago

Maybe I'm missing the point you're trying to make, but a node that's shut down isn't exactly available.

Dylan1680711y ago

It's not the one node that shuts down, the whole system flicks a switch and ceases to be. It's incompatible with partitions.

So while you end up with something that's for most purposes worse than CP, it's still a possible configuration.

Availability does not require uptime. It only requires that if you are up, you are fully functional.

cwp11y ago

This highlights a further asymmetry in CAP. Yes, P is unavoidable if you have a distributed system. But also, availability is a property of the system when there is a partition. If you're not going to tolerate partitions, what does it mean to be available?

josteink11y ago

> If you're not going to tolerate partitions, what does it mean to be available?

It means you're not running a partitioned and distributed system, but a regular non-partitioned system.

Yes I know distributed cloud object-graph databases on selective-persist elastic firesprout-nodes are very popular and trendy these days, but lots of people just run regular, old-fashioned unpartitioned databases, like pgsql/mysql and they work just fine.

I know. Like whoah, eh?

redwood11y ago

Good point. Is it theoretically possible to guarantee a node shuts down in time the moment partition is detected? What I mean is: isn't there always a small window?

Dylan1680711y ago

Depends on what you mean by window.

If you mean a window for that node to act before it realizes there's a partition, your Consistency protocol should prevent that. Anything that requires all nodes to respond should work.

If you mean a window for the node to receive the request, without acting on it, before it fails, that doesn't really matter. From the outside you can't tell the difference between a node that failed before it got your request, and one that failed immediately after. And note this quote:

"Brewer originally only required almost all requests to receive a response. As allowing probabilistic availability does not change the result when arbitrary failures occur, for simplicity we are requiring 100% availability"

Plus, any other interpretation would make node failure fundamentally incompatible with Availability. You will always lose a request that was lodged one nanosecond before node failure.

j / k navigate · click thread line to collapse

34 comments

haberman11y ago

The article convinced me that talking about databases as CP or AP is not very useful.

seliopou11y ago

But I'm not gonna do that here. I do not want that hot potato.

tracker111y ago

haberman11y ago

> There are very strong CAP systems

There are not strong CAP systems -- that is the whole point of the CAP theorem. It's impossible. Am I misunderstanding you?

1 more reply

jbangert11y ago

CAP therefore shows what real-life trade offs are feasible in the first place ( of course, there are still systems allowed by CAP that might be impossible for other reasons).

mjb11y ago

Exactly.

martinklOP11y ago

Impossibility results are great, but misinterpreting them and using them to argue stuff to which they simply don't apply is not so great.

jellofiend8411y ago

What I dislike about articles like this is the author doesn't really talk about what we should be calling databases.

martinklOP11y ago

jellofiend8411y ago

I agree with most everything you've said, but until we have good concise descriptions I think it is unfair to really say: "Never use these".

Quick less-precise overviews serve a real purpose and are very helpful to users and decision makers.

aidenn011y ago

1: http://www.pantone.com/pages/pantone/colorfinder.aspx?c_id=6...

jellofiend8411y ago

Definitions are flexible and communities collectively decide what words actually mean. Sure if you are talking to PhD Data Scientists CAP have incredibly specific meanings.

1 more reply

ddispaltro11y ago

It's the same reason why companies create some index in their product (like New Relic AppDex), to take something complex and simplify it so it can be compared.

antirez11y ago

simongui11y ago

MCRed11y ago

I agree. I feel it's time for a better, modern, clear elucidation of these issues- something of which the CAP theorem would be a subset.

And on top of that, I'd love a simple test harness that will run a distributed database thru its paces and produce a report.

This would make it relatively easy to compare systems relatively objectively.

Too often well meaning people have chosen poor databased for production systems (including bitcoin financial transactions companies!) because they didn't understand the issues well enough.

muaddirac11y ago

Jepson seems to do a lot of what you want in Bender: https://github.com/aphyr/jepsen and https://aphyr.com/tags/jepsen

karmakaze11y ago

Exactly saying something isn't CA or CP isn't as good as quantifying what happens. Two letters doesn't cut it--show me the numbers.

ble11y ago

Impossibility results are really good for proving that one or more things are impossible.

The CAP theorem is an impossibility result.

josteink11y ago

Looking at the discussion here in this thread, among these fine "experts", it seems a majority believe there are "CAP capable" systems out there.

Well I'm certainly impressed.

jongraehl11y ago

vogan4211y ago

When a partition happens, you cannot communicate, you can choose to be available, or consistent.

If you choose to be available, for all operations at all disconnected sites, you may need to resolve conflicts in some fashion when or if the partition resolves.

If you choose to be consistent, you may need to deny operations to some portion of sites or for some portion of operations.

I find the ATM example to be very useful, I also try to whenever possible understand the system in terms of ledgers and accounting...they have been solving these problems also for a long time.

It is also very useful to test your use cases and what you can tolerate in terms of consistency, latency, and availability with specific implementations.

I think a lot of the problem is that marketing material and documentation have collided in a similar fashion to reporting and advertising.

The unfortunate message seems to be: buyer beware

My maybe much weaker message/ take away would be: define the terms that may be ambiguous so that you can get to the real fun of the ideas !

Dylan1680711y ago

It irks me whenever something trying to clarify CAP insists that you can't choose CA. You can. In the real world you can't avoid partitions, but you don't have to tolerate them.

The rule for Availability is that non-failing nodes must respond. If you detect a partition, you can fail the node. Make a system that shuts down when it loses a connection and you can be CA.

fred25611y ago

Maybe I'm missing the point you're trying to make, but a node that's shut down isn't exactly available.

Dylan1680711y ago

It's not the one node that shuts down, the whole system flicks a switch and ceases to be. It's incompatible with partitions.

So while you end up with something that's for most purposes worse than CP, it's still a possible configuration.

Availability does not require uptime. It only requires that if you are up, you are fully functional.

cwp11y ago

josteink11y ago

> If you're not going to tolerate partitions, what does it mean to be available?

It means you're not running a partitioned and distributed system, but a regular non-partitioned system.

I know. Like whoah, eh?

redwood11y ago

Good point. Is it theoretically possible to guarantee a node shuts down in time the moment partition is detected? What I mean is: isn't there always a small window?

Dylan1680711y ago

Depends on what you mean by window.

If you mean a window for that node to act before it realizes there's a partition, your Consistency protocol should prevent that. Anything that requires all nodes to respond should work.

Plus, any other interpretation would make node failure fundamentally incompatible with Availability. You will always lose a request that was lodged one nanosecond before node failure.

j / k navigate · click thread line to collapse