I've never had a cluster completely collapse on me unless things were already screwed up enough that Service Discovery was ultimately useless since nothing else would work.
It just seems to me that losing your datastore makes your services unusable...at which point 'discovering them' isn't really the issue. Instead, everyone wants to introduce another datastore you need to rely on that its loss == can't find anyone. Even if your services themselves are still functional.
Well so the problem is, as soon as you centralize, you introduce a single point of failure, which is a no-no if you're looking for as pure of a distributed system as you can get (distributed systems have their flaws, but single-point-of-failure systems have been worked past at this point, generally the drawbacks are expressed in terms of number of non-faulty/byzantine nodes).
While it is definitely true that if the cluster completely collapses, service discovery won't work anyway, but as that is very rare (hopefully), the thinking here is that what if your centralized cassandra cluster fails? You would need to replicate everything to something else, and once you start preparing for those kinds of failures, you're already building a distributed system.
NOTE: I am assuming here that you mean ONE machine running cassandra... if you mean multiple, then the stuff below doesnt' really apply, if cassandra handles dynamic node changing well... but still, why not abstract? Why not make EVERY service you're running app/db/cache/app2/utility know about dynamic changes to architecture?
What do you mean by "losing your data store?" -- from what I understand, a consul agent runs on every machine and EVERY consul agent has an LMDB instance. If you mean losing your data store as in losing the service that provides your actual application data -- that would be the point of automatically discovering services, you could just arbitrarily add nodes that do the "db" service, and your nodes that run "app" would automatically know more "db"s showed up.
Forgive me if this is unnecessary explanation, but:
To illustrate this -- let's say I have 3 servers, 2 are running instances of the app (5 instances each) and 1 big-RAM machine is running the DB. All 10 instances are relying on that DB to not go down. While there are many very very capable & reliable DBs out there (cassandra, postgres, etc), it's dangerous to assume they will not fail.
However, the problem is, how do you just add nodes? You're going to either need to change app code, change some env variables, or do some other kind of monkey patching to let some of the app processes (there are 10 of them) know which DB to use. Also, if you look at just the problem of adding instances of app processes for more load balancing, there are various static-y files that possibly need to change to accomodate (nginx/apache config, env variables,etc).
Again, someone correct me if I'm wrong, but this is where Consul comes in. If app server 1 knows about at LEAST 1 of the DB clusters, you can easily add more DB clusters, and ask Consul about them. So, if one DB has gone down, and you have consul-aware code in place, consul can tell your app instances where to get their database data.
Like Cassandra, There are some DBs that make this really easy to do (spin up more DBs that can act as masters, or just backup read onlys, or whatever) -- rethinkdb is one of them (http://rethinkdb.com/)... They have a really good web interface that makes adding and managing clusters as easy as starting up a rethinkdb service with some extra options telling it where the master is. However, cassandra seems like it doesn't really handle dynamic node creation (I'm going off this page: http://www.datastax.com/docs/0.8/install/cluster_init). If it does, the case for an abstracted, dynamic service discovery still stands (cassandra might be OK, what about if you want to know about service x?)
I think you misunderstood what I was talking about based on your explanation with a single physical machine running a single database instance and manually adding nodes requiring human intervention.
We compare Consul to ZooKeeper here, but much of that applies to Cassandra as well: http://www.consul.io/intro/vs/zookeeper.html
Internally, Consul could also use something like Cassandra to store the data. However we use LMDB which is an embedded database to avoid an expensive context switch out of the process to serve requests with lower latency and higher throughput.
I just wish I could avoid having to maintain Cluster Type A for service discovery and Cluster Type B for data storage.
I dunno if leveldb supports acid now (or recently came to), but that's a big difference