undefined | Better HN

0 pointsopendais12y ago0 comments

Wouldn't a centralized Cassandra cluster be reliable enough to meet that need?

I've never had a cluster completely collapse on me unless things were already screwed up enough that Service Discovery was ultimately useless since nothing else would work.

It just seems to me that losing your datastore makes your services unusable...at which point 'discovering them' isn't really the issue. Instead, everyone wants to introduce another datastore you need to rely on that its loss == can't find anyone. Even if your services themselves are still functional.

0 comments

hardwaresofton12y ago

tldr; centralize, and you have single point of failure, that's bad for distributed systems. you can have multiple DBs, but how will your app know about them, if they're brought up dynamically? You'll have to stop instances, possibly modify some code/env/something, and keep those changes in your head.

Well so the problem is, as soon as you centralize, you introduce a single point of failure, which is a no-no if you're looking for as pure of a distributed system as you can get (distributed systems have their flaws, but single-point-of-failure systems have been worked past at this point, generally the drawbacks are expressed in terms of number of non-faulty/byzantine nodes).

While it is definitely true that if the cluster completely collapses, service discovery won't work anyway, but as that is very rare (hopefully), the thinking here is that what if your centralized cassandra cluster fails? You would need to replicate everything to something else, and once you start preparing for those kinds of failures, you're already building a distributed system.

NOTE: I am assuming here that you mean ONE machine running cassandra... if you mean multiple, then the stuff below doesnt' really apply, if cassandra handles dynamic node changing well... but still, why not abstract? Why not make EVERY service you're running app/db/cache/app2/utility know about dynamic changes to architecture?

What do you mean by "losing your data store?" -- from what I understand, a consul agent runs on every machine and EVERY consul agent has an LMDB instance. If you mean losing your data store as in losing the service that provides your actual application data -- that would be the point of automatically discovering services, you could just arbitrarily add nodes that do the "db" service, and your nodes that run "app" would automatically know more "db"s showed up.

Forgive me if this is unnecessary explanation, but:

To illustrate this -- let's say I have 3 servers, 2 are running instances of the app (5 instances each) and 1 big-RAM machine is running the DB. All 10 instances are relying on that DB to not go down. While there are many very very capable & reliable DBs out there (cassandra, postgres, etc), it's dangerous to assume they will not fail.

However, the problem is, how do you just add nodes? You're going to either need to change app code, change some env variables, or do some other kind of monkey patching to let some of the app processes (there are 10 of them) know which DB to use. Also, if you look at just the problem of adding instances of app processes for more load balancing, there are various static-y files that possibly need to change to accomodate (nginx/apache config, env variables,etc).

Again, someone correct me if I'm wrong, but this is where Consul comes in. If app server 1 knows about at LEAST 1 of the DB clusters, you can easily add more DB clusters, and ask Consul about them. So, if one DB has gone down, and you have consul-aware code in place, consul can tell your app instances where to get their database data.

Like Cassandra, There are some DBs that make this really easy to do (spin up more DBs that can act as masters, or just backup read onlys, or whatever) -- rethinkdb is one of them (http://rethinkdb.com/)... They have a really good web interface that makes adding and managing clusters as easy as starting up a rethinkdb service with some extra options telling it where the master is. However, cassandra seems like it doesn't really handle dynamic node creation (I'm going off this page: http://www.datastax.com/docs/0.8/install/cluster_init). If it does, the case for an abstracted, dynamic service discovery still stands (cassandra might be OK, what about if you want to know about service x?)

opendaisOP12y ago

I wasn't trying to start a complex argument in someone else's thread. Sorry. Maybe we should drop it and/or discuss elsewhere?

I think you misunderstood what I was talking about based on your explanation with a single physical machine running a single database instance and manually adding nodes requiring human intervention.

hardwaresofton12y ago

Ah OK, was feeling I was explaining far too much -- no worries, but if HN is no longer a place for complex arguments, I think it'd be a lot less valuable... Also, me being wrong certainly sounds like a great chance for Hashicorp people to correct me.

1 more reply

armon12y ago

To reiterate a bit on what hardwaresofton has already said, yes you could use Cassandra. However, Cassandra like ZooKeeper is just a building block. It doesn't actually provide service discovery, health checking, etc. You could use it to build all of those, but it doesn't provide it.

We compare Consul to ZooKeeper here, but much of that applies to Cassandra as well: http://www.consul.io/intro/vs/zookeeper.html

Internally, Consul could also use something like Cassandra to store the data. However we use LMDB which is an embedded database to avoid an expensive context switch out of the process to serve requests with lower latency and higher throughput.

opendaisOP12y ago

This is more of a wishlist item in my professional life. I'm not trying to dispute the choice you made which is to add an additional type of critical cluster for persistence purposes, since that is how all of such services I've seen so far are designed.

I just wish I could avoid having to maintain Cluster Type A for service discovery and Cluster Type B for data storage.

simonw12y ago

I hadn't seen LMDB before. How does it compare to LevelDB? On first glance it looks like the two have similar characteristics.

hardwaresofton12y ago

Check out LMDB's page:

http://symas.com/mdb/

I dunno if leveldb supports acid now (or recently came to), but that's a big difference

j / k navigate · click thread line to collapse

0 comments

hardwaresofton12y ago

Forgive me if this is unnecessary explanation, but:

opendaisOP12y ago

I wasn't trying to start a complex argument in someone else's thread. Sorry. Maybe we should drop it and/or discuss elsewhere?

I think you misunderstood what I was talking about based on your explanation with a single physical machine running a single database instance and manually adding nodes requiring human intervention.

hardwaresofton12y ago

1 more reply

armon12y ago

We compare Consul to ZooKeeper here, but much of that applies to Cassandra as well: http://www.consul.io/intro/vs/zookeeper.html

opendaisOP12y ago

I just wish I could avoid having to maintain Cluster Type A for service discovery and Cluster Type B for data storage.

simonw12y ago

I hadn't seen LMDB before. How does it compare to LevelDB? On first glance it looks like the two have similar characteristics.

hardwaresofton12y ago

Check out LMDB's page:

http://symas.com/mdb/

I dunno if leveldb supports acid now (or recently came to), but that's a big difference

j / k navigate · click thread line to collapse