Etcd – The Road to 1.0 (opens in new tab)

(coreos.com)

136 pointsbmizerany12y ago26 comments

26 comments

argc12y ago

Is there any benefit to using coreOS when you don't need a million machines? How much work is it to start with, for example, if you have no idea what your scaling needs will be in the future?

HeyImAlex12y ago

No work at all; basic coreos is just docker managed by systemd. Can you make your application docker-izable? Were you planning on using process management? If you answered yes to both of these questions... :)

Further, coreos tries to make you write applications in a 12-factor-y way, so when the time does come for you to use a million machines, you won't need to make huge adjustments to your deployments (just plug the container and init script into fleet and let it roll).

robszumski12y ago

The main benefit is it's an easy to update, minimal OS. And docker's already set up for you and updated regularly.

Our cloud-config implementation is designed so that you can use the same config to set up a new machine to match an existing one, and have it automatically join the cluster. Start with 3 machines and scale up as you need.

Learning systemd is one bump you'll have to get over, but all of the major distros will be using it, so they're is no better time to try it out.

cardmagic12y ago

It is not a lot of work to get started, here is a short and simple tutorial that goes from nothing to a full running app with a database on CoreOS: http://www.centurylinklabs.com/building-your-first-app-on-co...

anko12y ago

I'm interested in it because I plan to use docker to ease application management, and with CoreOS I barely have to maintain the operating system because it's so lightweight. So it greatly simplifies things :)

If you think docker is a good fit for your application, this is kind of the next step.

peterwwillis12y ago

The only significant benefit over traditional operating systems/network services is it's designed to work on flaky hardware and networks. The only use cases i've found for decentralized distributed networks of application services (other than obscure stuff like parallel processing of large datasets) is when you have no guarantees of availability. As far as "scaling", you don't need coreOS to build a scalable network (and i've seen no performance benchmarks of coreOS running on thousands of machines at a time, so I have no idea how well it scales)

dpezely12y ago

"Recoverable system upgrades" - https://coreos.com/blog/recoverable-system-upgrades/

Run from root partition A and only update on partition B, essentially via chroot. Downtime required for updates is the cost of a reboot. If, however, upon an error during boot, reboot into other partition which will be your last known good config.

Seems simple and straight-forward but is difficult for unmodified Debian-based distributions, so they've addressed this as a key feature of CoreOS.

bashcoder12y ago

I know that there are lots of heavyweight folks who swear by Zookeeper as both a reliable and powerful tool, and for good reason. Unfortunately the docs can be fairly inscrutable, even for experts, and it typically requires the maintenance of a separate cluster of Zookeeper nodes.

So I like that etcd is a fundamental component of CoreOS, with these features:

  1. Written from scratch in Go
  2. Implements the Raft protocol for node coordination
  3. Has a useful command line app
  4. Runs on every node in a cluster
  5. Enables auto-discovery
  6. Allows CoreOS nodes to share state

darklajid12y ago

No offense, I just don't get it: Why is 1) a feature for you? Everything else on the list kinda makes sense (I understand that this describes something I'd call a feature), but 'written from scratch' or 'in Go'?

Can you explain what excites you about that?

bashcoder12y ago

I listed what I like. "Excites" is your word.

I guess what I meant by "from scratch" is that they aren't burdened by legacy code, and aren't limited to using the Paxos algorithm.

If you look at Deis, for example, it basically outsources a lot of node management to Chef Server, which in my view creates a great deal of technical debt on day one.

2 more replies

ecnahc51512y ago

Probably the fact that you get a single binary (unlike interpreted languages), and that it isn't a mix of Go and C meaning I worries about c libraries.

1 more reply

ekimekim12y ago

Personally, my preference for zookeeper comes from the API. To me, the ZK API and docs were far more understandable than the etcd ones. The etcd api docs appear to be a collection of examples, not reference docs. It does a poor job of explaining the possible operations and what various options will do, particularly in what combinations of options are allowed.

In fact, I had to resort to running test queries against a running etcd server just to work out the proper semantics of some of the arguments.

stormbrew12y ago

> 4. Runs on every node in a cluster

Is that so? When looking at it I distinctly remember it advising running a set of 3-9 nodes of etcd (not necessarily separate from other things).

robszumski12y ago

The functionality to handle this is mentioned in the blog post: "standby" peer mode.

"Our upcoming release, etcd 0.4, adds a new feature, standby mode. It is an important first step in allowing etcd clusters to grow beyond just the peers that participate in consensus."

1 more reply

fizx12y ago

I don't know that that's necessarily a good idea.

As you (perhaps automatically) expand and collapse the cluster, you'll need to make sure to communicate to all nodes what the new cluster size is. If some nodes don't know the correct quorum count, split-brain!

Also, coordination services are typically critical, so its important to isolate from to the bugs in the adhoc code you're writing for your web tier, a crazy query in your database, etc.

It's much easier and safer in practice to just have 3 or 5 nodes running the coordination in isolation.

Edit: more reasons -- It's easier to deploy a coordination service to 5 nodes than 500. It's easier to debug 5 nodes than 500.

1 more reply

Bjoern12y ago

Whats the communities opinion on Serf? How does it measure up against Etcd?

nemothekid12y ago

While similar, Serf and Etcd solve a different problem. Etcd is strong consistent (all nodes will see the same data, however a partition may cause the system to not accept writes) while Serf is eventually consistent (all nodes are not guaranteed to see the same data, however the system will always accept writes)

So somethings like a distributed lock is impossible to implement with Serf.

Bjoern12y ago

Very interesting, wasn't aware that ETCD is strongly consistent. This makes alot of sense in terms of Brewster's theorem. Thanks for clarifying!

dredmorbius12y ago

What is etcd?

Don't write about something without defining what it is for those not familiar with it (or you, or your service(s)).

ecnahc51512y ago

Config data accessible via HTTP instead of files.

Although its not limited to config data necessarily.

I suggest reading http://sysadvent.blogspot.com/2013/12/day-20-distributed-con...

j / k navigate · click thread line to collapse

26 comments

argc12y ago

Is there any benefit to using coreOS when you don't need a million machines? How much work is it to start with, for example, if you have no idea what your scaling needs will be in the future?

HeyImAlex12y ago

robszumski12y ago

The main benefit is it's an easy to update, minimal OS. And docker's already set up for you and updated regularly.

Learning systemd is one bump you'll have to get over, but all of the major distros will be using it, so they're is no better time to try it out.

cardmagic12y ago

anko12y ago

If you think docker is a good fit for your application, this is kind of the next step.

peterwwillis12y ago

dpezely12y ago

"Recoverable system upgrades" - https://coreos.com/blog/recoverable-system-upgrades/

Seems simple and straight-forward but is difficult for unmodified Debian-based distributions, so they've addressed this as a key feature of CoreOS.

bashcoder12y ago

So I like that etcd is a fundamental component of CoreOS, with these features:

  1. Written from scratch in Go
  2. Implements the Raft protocol for node coordination
  3. Has a useful command line app
  4. Runs on every node in a cluster
  5. Enables auto-discovery
  6. Allows CoreOS nodes to share state

darklajid12y ago

Can you explain what excites you about that?

bashcoder12y ago

I listed what I like. "Excites" is your word.

I guess what I meant by "from scratch" is that they aren't burdened by legacy code, and aren't limited to using the Paxos algorithm.

If you look at Deis, for example, it basically outsources a lot of node management to Chef Server, which in my view creates a great deal of technical debt on day one.

2 more replies

ecnahc51512y ago

Probably the fact that you get a single binary (unlike interpreted languages), and that it isn't a mix of Go and C meaning I worries about c libraries.

1 more reply

ekimekim12y ago

In fact, I had to resort to running test queries against a running etcd server just to work out the proper semantics of some of the arguments.

stormbrew12y ago

> 4. Runs on every node in a cluster

Is that so? When looking at it I distinctly remember it advising running a set of 3-9 nodes of etcd (not necessarily separate from other things).

robszumski12y ago

The functionality to handle this is mentioned in the blog post: "standby" peer mode.

"Our upcoming release, etcd 0.4, adds a new feature, standby mode. It is an important first step in allowing etcd clusters to grow beyond just the peers that participate in consensus."

1 more reply

fizx12y ago

I don't know that that's necessarily a good idea.

Also, coordination services are typically critical, so its important to isolate from to the bugs in the adhoc code you're writing for your web tier, a crazy query in your database, etc.

It's much easier and safer in practice to just have 3 or 5 nodes running the coordination in isolation.

Edit: more reasons -- It's easier to deploy a coordination service to 5 nodes than 500. It's easier to debug 5 nodes than 500.

1 more reply

Bjoern12y ago

Whats the communities opinion on Serf? How does it measure up against Etcd?

nemothekid12y ago

So somethings like a distributed lock is impossible to implement with Serf.

Bjoern12y ago

Very interesting, wasn't aware that ETCD is strongly consistent. This makes alot of sense in terms of Brewster's theorem. Thanks for clarifying!

dredmorbius12y ago

What is etcd?

Don't write about something without defining what it is for those not familiar with it (or you, or your service(s)).

ecnahc51512y ago

Config data accessible via HTTP instead of files.

Although its not limited to config data necessarily.

I suggest reading http://sysadvent.blogspot.com/2013/12/day-20-distributed-con...

j / k navigate · click thread line to collapse