Further, coreos tries to make you write applications in a 12-factor-y way, so when the time does come for you to use a million machines, you won't need to make huge adjustments to your deployments (just plug the container and init script into fleet and let it roll).
Our cloud-config implementation is designed so that you can use the same config to set up a new machine to match an existing one, and have it automatically join the cluster. Start with 3 machines and scale up as you need.
Learning systemd is one bump you'll have to get over, but all of the major distros will be using it, so they're is no better time to try it out.
If you think docker is a good fit for your application, this is kind of the next step.
Run from root partition A and only update on partition B, essentially via chroot. Downtime required for updates is the cost of a reboot. If, however, upon an error during boot, reboot into other partition which will be your last known good config.
Seems simple and straight-forward but is difficult for unmodified Debian-based distributions, so they've addressed this as a key feature of CoreOS.
So I like that etcd is a fundamental component of CoreOS, with these features:
1. Written from scratch in Go
2. Implements the Raft protocol for node coordination
3. Has a useful command line app
4. Runs on every node in a cluster
5. Enables auto-discovery
6. Allows CoreOS nodes to share stateCan you explain what excites you about that?
I guess what I meant by "from scratch" is that they aren't burdened by legacy code, and aren't limited to using the Paxos algorithm.
If you look at Deis, for example, it basically outsources a lot of node management to Chef Server, which in my view creates a great deal of technical debt on day one.
In fact, I had to resort to running test queries against a running etcd server just to work out the proper semantics of some of the arguments.
Is that so? When looking at it I distinctly remember it advising running a set of 3-9 nodes of etcd (not necessarily separate from other things).
"Our upcoming release, etcd 0.4, adds a new feature, standby mode. It is an important first step in allowing etcd clusters to grow beyond just the peers that participate in consensus."
As you (perhaps automatically) expand and collapse the cluster, you'll need to make sure to communicate to all nodes what the new cluster size is. If some nodes don't know the correct quorum count, split-brain!
Also, coordination services are typically critical, so its important to isolate from to the bugs in the adhoc code you're writing for your web tier, a crazy query in your database, etc.
It's much easier and safer in practice to just have 3 or 5 nodes running the coordination in isolation.
Edit: more reasons -- It's easier to deploy a coordination service to 5 nodes than 500. It's easier to debug 5 nodes than 500.
So somethings like a distributed lock is impossible to implement with Serf.
Don't write about something without defining what it is for those not familiar with it (or you, or your service(s)).
Although its not limited to config data necessarily.
I suggest reading http://sysadvent.blogspot.com/2013/12/day-20-distributed-con...