How to do distributed locking (opens in new tab)

(martin.kleppmann.com)

277 pointsmartinkl10y ago71 comments

71 comments

Thanks to Martin for analyzing Redlock, I looked forward to an analysis. I don't agree with the two main arguments of the analysis. TLDR: I think the unique random token of Redlock is enough for Check & Set when the lock holder work materializes into a database write (also note that it requires a linearizable database, to check token_A_ID > token_B_ID in the Martin proposed solution), and I think the system model is very real world (AFAIK the algorithm is not dependent on bounded network delays, I think there is an issue in the analysis, details in my blog post so other people can evaluate), and that the fact different processes can count relative time, without any absolute clock, with a bound error (like, 10%) is absolutely credible. However, I'm analyzing the analysis in depth right now and writing a blog post with the details to be posted tomorrow. Also note that usually when you need a distributed lock, it's because you have no sane way to handle race conditions, otherwise you can do with some weaker and/or optimistic locking form to start with.

ploxiln10y ago

The quality of your C code running on a single node is by all accounts really good. But you sometimes speak about distributed systems in a way that is not correct to people who have really spent a lot of time making these things work at scale.

I've done distributed systems, and in fact I usually don't implement systems that are 100% correct under maximally-adverse conditions, for performance and convenience reasons. But it's important to know and acknowledge where and how the system is less than 100% bullet-proof, so that you can understand and recover from failures. When you have 1000 servers, some virtual in the cloud, some in a colocation facility, and maybe 10 or 20 subtle variants in hardware, you will get failures that should not happen but they do. For example, I've had an AWS host where the system time jumped backwards and forwards by 5 minutes every 30 seconds. And I've seen a tcp load-balancer appliance managed by a colo facility do amazingly wrong things.

The point is, you need to understand where the consistency scheme in a distributed system is "fudged", "mostly reliable", "as long as conditions aren't insane", and where they are really guaranteed (which is almost nowhere). Then you can appropriately arrange for "virtually impossible" errors/failures to be detected and contained. This ad-hoc logic you bring to the problem of distributed systems doesn't really help.

Not to "dogpile", but elucidate for others who may not know some of the history here: https://aphyr.com/posts/287-asynchronous-replication-with-fa... and the also interesting reply: http://antirez.com/news/56

antirez10y ago

Hello ploxiln, to see systems with clocks jumping is indeed possible, I'll make it more clear in my blog post, but basically it is possible with some effort to prevent this in general, but more specifically the Martin suggestion of using the monotonic time API is a good one and was already planned. It's very hard to argue that the monotonic time jumps back and forth AFAIK, so I think it's a credible system model. The monotonic API surely avoids a lot of pitfalls that you can have with gettimeofday() and the sysadmin poking the computer time in one way or the other (ntpd or manually).

About the ad-hoc logic, in this case I think the problem is actually that you want to accept only what already exists. For example there are important distributed systems papers where the assumption of absolute time bound error among processes is made, using GPS units. This is a much stronger system model (much more synchronous) compared to what I assume, yet this is regarded as acceptable. So you have to ask yourself if, regardless of the fact this idea is from me, if you think that a computer, using the monotonic clock API, can count relative time with a max percentage of error. Yes? Then it's a viable system model.

About you linking to Aphyr, Sentinel was never designed in order to make Redis strongly consistent nor to to make it able to retain writes, I always analyzed the system for what it is: a failover solution that, mixed with the asynchronous replication semantics of Redis, has certain failure modes, that are considered to be fine for the intended use case of Redis. That is, it has just a few best-effort checks in order to try to minimize the data loss, and it has ways to ensure that only one configuration wins, eventually (so that there are no permanent split brain conditions after partitions heal) and nothing more.

To link at this I'm not sure what sense it makes. If you believe Redlock is broken, just state it in a rigorous form, and I'll try to reply.

If you ask me to be rigorous while I'm trying to make it clear with facts what is wrong about Martin analysis, you should do the same and just use arguments, not hand waving about me being lame at DS. Btw in my blog post I'll detail everything and show why I think Redlock is not affected by network unbound delays.

ploxiln10y ago

It's fair to criticize my somewhat harsh and yet hard-analysis-free critique. But let me explain why it matters to me, and what the relation to Redis Sentinel is:

"Redlock" is a fancy name, involves multiple nodes and some significant logic, less experienced developers will assume it's bulletproof.

"Redis Sentinel" is a fancy name, involves multiple nodes and some significant logic, less experienced developers will assume it's bulletproof.

I guess I'd prefer that people shoot themselves in the foot with their own automatic failover schemes, instead of using a publicized ready-to-use system which has marketing materials (name, website, spec, multiple implementations) which lead them to believe it is bulletproof, and these systems end up being used in for a bunch of general-purpose cases without specific consideration for their failure modes. In this case, the "marketing" is just the fact that you're behind it, and redis really is solid for what it is.

(Usually I avoid automatic promotion/fail-over altogether. It's often not a good idea. Instead I find some application-specific way of operating in a degraded state until a human operator can confirm a configuration change. So service is degraded for some minutes, but potential calamity of automatic systems doing wrong things with lots of customer data is averted.)

It's really not your responsibility to make sure the open source software you offer to the world for free is used appropriately. But people like me will be annoyed when it contributes to the over-complicated-and-not-working messes (not directly your fault) we have to clean up :)

One specific comment: http://redis.io/topics/distlock notes the random value used to lock/unlock to prevent unlocking a lock from another process. But at that point, the action just before the unlock could very well also have been done after the lock was acquired by another process. So while this is a good way to keep the lock state from being corrupted, it is really a side-note rather than a prominent feature, since if this happens, the data the lock was protecting could well have been corrupted.

1 more reply

u32010y ago

> I think the unique random token of Redlock is enough for Check & Set when the lock holder work materializes into a database write

But that's optimistic concurrency control! In that case the locking service is a pure optimization and doesn't affect correctness. So why use this complex multi-node locking thing then, instead of a single redis node?

dmuth10y ago

It's a long read, but the gist of this article is "use the right tool for the job". Redis is really good for some things, but in its current implementation, distributed locking is not one of them.

Along with the author of this blog post, I would recommend the usage of Zookeeper if you have a need for obtaining a lock in a distributed environment. You can read an analysis of how Zookeeper performs under a (intentionally) partitioned network here:

https://aphyr.com/posts/291-jepsen-zookeeper

dvirsky10y ago

The author dedicates a good chunk of code to describing a potential problem that might happen in long GC pauses or similar situations. That problem is not unique to redis. Is there any way to avoid it in ZK?

teraflop10y ago

Yep, and the author even (briefly) describes it. But you can't do it without moving some of the logic into the storage service, to check fencing tokens. Zookeeper exposes various kinds of IDs that can be used to do fencing; apparently (I haven't read the details) Redlock doesn't.

Interestingly, even though HBase uses Zookeeper for coordination, it does not use it for fencing. Instead, fencing is handled by atomic "log-rolling" operations on the HDFS namenode, which can be configured to use its own quorum-based journal. (The equivalent of a "token" is the monotonically-increasing namenode operation sequence number.) So the principle is the same: the system responsible for doing mutual exclusion, and the system that actually stores the data, must be coupled.

1 more reply

cryptica10y ago

If you want to have a truly parallel system, you cannot share resources between processes. If you need a lock, then that implies sharing of resources... Which implies limited scalability according to Amdahl's law.

To achieve unlimited scalability, you need to make sure that no single data store is shared between all processes and that the amount of interprocess communication (on a per-process basis) doesn't increase as you add more processes.

The more sharing you have, the less you can scale your system.

infinite8s10y ago

This. Sometimes it is better to duplicate computation than to wait for shared state (this is in the context of analytics and not a single-source of truth).

TallGuyShort10y ago

Although no one starts out wanting to have a truly parallel system. They start out wanting to solve a problem and when that requires scalability or redundancy a parallel system is a means to that end. Not every problem can or should be solved with no shared state, and when that's the case, you should know how to do it right.

jondubois10y ago

I find the code of highly parallel systems much more elegant and easy to read.

Personally, I find it easier to write highly parallel code than parallel-up-to-a-certain-point code that involves locks likes mutexes and/or semaphores. Highly parallel code just takes a little bit of extra planning and thinking up-front but it saves lots of time and stress down the line.

eklavya10y ago

What if you have a shared resource? Is there any way around distributed locking?

erentz10y ago

"If you still don’t believe me about process pauses, then consider instead that the file-writing request may get delayed in the network before reaching the storage service. Packet networks such as Ethernet and IP may delay packets arbitrarily, and they do [7]: in a famous incident at GitHub, packets were delayed in the network for approximately 90 seconds [8]. This means that an application process may send a write request, and it may reach the storage server a minute later when the lease has already expired."

This statement confused me. It seems to say that the packets were delayed in the network for 90 seconds before being delivered. From reading the original sources it actually sounds like packets were discarded by the switches, so the original requests discarded, and the nodes were partitioned for 90 seconds. When the partition was removed both nodes thought they were the leader and simultaneously requested the other to shutdown. Can anyone confirm? Keeping packets delayed in a network for 90 seconds would seem quite difficult (though not impossible assuming certain bugs).

Edit: On re-reading I think this is just talking about the network stack in general - not the network. A temporary partition may delay delivery of your request until max TCP retries is exceeded on your host, if its recovered before then your request may arrive later than you intended.

martinklOP10y ago

I only have the blog post to go by, and don't have first-hand information. It seems possible to me that a few packets could indeed be delayed by 90 seconds, perhaps stuck in a switch buffer somewhere, although this would be a small number of packets since those buffers are not very big.

However, yes, I was thinking about the network stack as a whole. Any kind of retries, e.g. TCP retransmission on timeout, effectively turns packet loss into packet delay (within certain bounds). Thus, even if you have a network interruption ("partition") and all packets are dropped, it could happen that after the interruption is fixed, a node receives packets that were sent before or during the interruption. For this reason, I find it helpful to think of it as delay, not just packet loss.

amelius10y ago

Why not just use the database to handle the "locking" for you? For example, to ensure that an email with ID=123 gets sent only once, just check if "email 123 sent" is in the database, otherwise commit it to the database, wait for the transaction to be committed, and send the email.

Edit: Why is this downvoted? It is a serious question.

beachy10y ago

Perhaps because your example fails to describe what happens if there is a crash, after updating the database to say "email sent", but before the email has actually been sent.

amelius10y ago

Well, I didn't want to go into that much detail here, because that also presents a problem in the case of locks (if the locking process crashes, the lock is never released).

1 more reply

infinite8s10y ago

You need locking around the entire transaction, which includes sending the email.

tacos10y ago

You shouldn't be downvoted; people should help you get up to speed.

In the scenario you describe, the database commit works, you send the email but... you get an error back from the email send function. Now what?

This is a basic distributed computing problem. Choosing how to solve it can have a drastic effect on your code and your infrastructure. If the secondary function is "charge a credit card" obviously you can't do that twice. But if it's merely "send an email" then maybe it's okay if people get a duplicate.

Google lease/lock/deadlock/race condition and read up on why databases tend to make for bad implementations.

sdab10y ago

"When used as a failure detector, timeouts are just a guess that something is wrong. (If they could, distributed algorithms would do without clocks entirely, but then consensus becomes impossible [10]."

Having just re-read Lynch's paper, can you explain what you mean here? I didn't see anything explicitly relying on time. It could be there is some implicit usage I didnt see. Additionally, the paper's impossibility result is about "perfectly correct consensus" which applies with and without clocks and then has a positive result for "partially correct consensus" (i.e. not deciding a value is a correct result). Im not sure which you mean when you say "consensus becomes impossible" as it is either already impossible (the perfectly correct protocol) with one faulty process or (to my understanding) not dependent on time (the partially correct protocol).

p.s. great article!

martinklOP10y ago

The FLP result (Fischer, Lynch, Paterson) shows that consensus cannot reliably be solved in an asynchronous system (if you do not make any timing assumptions) if one or more processes can fail. The impossibility result shows that any consensus algorithm in such a system will have executions in which it waits forever and never makes a decision, which breaks the liveness property of consensus.

However, if you do allow some timing assumptions — just enough to measure a timeout, nothing more — then consensus becomes possible. In this case, the timeout is known as an "unreliable failure detector". See Chandra and Toueg's paper for details.

In a good algorithm, the safety properties do not depend on timing, only the liveness properties do. Rather than "consensus being impossible" perhaps it would be clearer to say "consensus may never terminate" in an asynchronous system. But "consensus impossible" is the standard way of phrasing this issue in the distributed systems literature.

robinhouston10y ago

I'm having trouble relating this comment to the referenced paper [1], which seems to be explicit that its impossiblity result does not apply when synchronised clocks are available.

For example, on page 375: “Crucial to our proof is that processing is completely asynchronous; that is, we make no assumptions about the relative speeds of processes or about the delay time in delivering a message. We also assume that processes do not have access to synchronized clocks, so algorithms based on time-outs, for example, cannot be used.”

1. http://www.cs.princeton.edu/courses/archive/fall07/cos518/pa...

sdab10y ago

You mean the OP's comment or mine? I agree with you if you mean the OP's comment. Though your quote refers to synchronized clocks. I dont think OP is referring to synchronized clocks, though your point that Lynch et al does not rely on timeouts is what Im getting at.

anon98710y ago

"When used as a failure detector, timeouts are just a guess that something is wrong."

I send a packet over the network. I configured the timeout to be 10 seconds. 10 seconds passes. Is the packet truly lost? Or, maybe the packet was received, processed, and and a response is on the way back, but it takes 20 seconds instead of 10! Or maybe it takes an hour instead of 20 seconds. Or a decade.

We don't know if the packet is truly lost or is just taking >$timeout_timer and making assumptions is dangerous.

sdab10y ago

My question was about his claim in regards to the result in the paper. In practice, you are right but a consensus algorithm is still "partially correct" if it never decides on a value. It is only incorrect if different nodes decide on different values. For example, paxos does is not guaranteed to decide on a value. So timeouts are useful guessing tools, but I dont see how there is no consensus possible without them.

spo81rty10y ago

For those interested in distributed locking with Redis and C#, check out this blog post we did which also links to a github project. We use this very heavily. Hope it helps someone.

http://stackify.com/distributed-method-mutexing/

bogdando10y ago

> In a reasonably well-behaved datacenter environment, the timing assumptions will be satisfied most of the time – this is known as a partially synchronous system [12]. But is that good enough?

Please consider this answer [0] (which I personally understood as - YES, real systems are always partial sync): "Asynchronous systems have no fixed upper bounds. In practice, systems tend to exhibit partial synchrony, which is described as one of two models by Dwork and Lynch in Consensus in the Presence of Partial Synchrony. [1]"

[0] http://bravenewgeek.com/category/distributed-systems-2/

[1] http://groups.csail.mit.edu/tds/papers/Lynch/jacm88.pdf

praneshp10y ago

Interesting document a colleague of mine wrote: https://specs.openstack.org/openstack/openstack-specs/specs/...

tyre10y ago

Off-topic but that URL is remarkably repetitive. Is that some kind of SEO play?

praneshp10y ago

No, its an effect of autobuilding from a git repo, I think.

The specs are submitted for review at https://github.com/openstack/openstack-specs

hueving10y ago

Repo called openstack-specs sitting in the openstack namespace. Inside that repo is a folder called specs.

https://github.com/openstack/openstack-specs/tree/master/spe...

mring3362110y ago

Yo, Dawg, I heard you like some openstack in your openstack...

simonebrunozzi10y ago

Who's the author? I can't find it in the document.

praneshp10y ago

https://news.ycombinator.com/user?id=harlowja

1 more reply

brightball10y ago

Not directly related, but in a sublink within that piece I found an interesting read about 2 phase commits in Postgres.

https://aphyr.com/posts/282-call-me-maybe-postgres

mattiemass10y ago

Interesting stuff. In all my distributed systems work so far, I've assumed that a distributed lock is a thing to avoid. I really should take another look at them, just as a tool to have at my disposal.

Randgalt10y ago

If you assume that your distributed lock gives you transactional guarantees that you are the only lock holder then you are making a mistake. If, however, you can tolerate small overlaps in lock holders you are fine and this helps with numerous distributed algorithms. Further, using other facilities such as fences can make it even more secure. Another feature of ZooKeeper is write-with-version. You could obtain a lock (using Apache Curator - note I wrote this), then do a write-with-version to achieve 100% certainty that you are the only writer.

Randgalt10y ago

BTW - I wrote a Curator Tech Note about this a while ago: https://cwiki.apache.org/confluence/display/CURATOR/TN10

jodah10y ago

Like anything, it depends on what you're using it for. I wouldn't put a distributed lock into some massively high volume request path, or where absolute availability is required, but it's perfectly fine for some scenarios.

As Martin points out though, many (most?) distributed lock implementations are or can be broken in various ways. He hints at one of the fundamental problems - consensus. Many distributed lock implementations fail simply because they cannot achieve reliable consensus, or don't even try to.

That said, distributed locks can be safe and handle reasonably high throughput. The Atomix distributed lock is one example:

http://atomix.io/atomix/

http://atomix.io/atomix/docs/coordination/#distributedlock

Since consensus requires quorum and quorum requires availability, there is a risk that your ability to obtain a lock or learn about lock related events could be effected by availability, but at least in the case of Atomix, the system is fairly resilient with auto-failovers to passive or inactive nodes as needed (as compared to, say, a ZooKeeper based lock).

mavelikara10y ago

Thanks for posting about Atomix; I had not heard about it before. This seems to be based on Copycat, discussed previously on HN here: https://news.ycombinator.com/item?id=8180360

1 more reply

iheartmemcache10y ago

Oracle has a distributed transaction system that'll work cross-country, but I can't find the details. It's basically a complete rip-off of the OpenVMS implementation from the late 70s (or whenever all those VAX/VMSstations came out - post PDP), no surprises there, and it costs an insane amount just for that functionality (separate from RAC), but it works really well. I'm trying to find the docs since it's been more than half a decade since I used it but [1] is the closest I can get (and that's certainly not it, though RAC does it's job quite well also and that page is worth a read if only for exposing yourself to time-tested transactional models.) It might be HA-NFS over ACFS but that doesn't seem quite right to me. Anyways, read section 5 of this[2] Oracle doc, to have a master-class on maintaining database integrity across the datacenter or across the country. I've used a litany of DBs and I think I wouldn't pick anything made in the last 10 years for my primary store other than Datomic (which I've beaten into the ground as hard as I could and it took my abuse) and or maybeeee VoltDB (if only because Stonebraker is DJB-level smart).

[1] http://docs.oracle.com/cd/B28359_01/server.111/b28318/consis...

bpicolo10y ago

I mean, it's definitely a thing you want to avoid whenever possible, because it's a strict point of slowdown in a distributed application.

mattiemass10y ago

Perf for sure, but I was typically more concerned about availability.

EGreg10y ago

Why would you ever want to do distributed locking when you can do MVCC via eg vector clocks, possibly with cryptographic signatures?

If you want to acquire some shared resource, then elect an owner for that resource.

elbee10y ago

If you want to do mutual exclusion using distributed locking then you end up in a painful place (as the article points out). In general you can't distinguish between a process that is running really slowly and a process that is dead. So when confronted with a non-responsive lock owner you end up with two unpleasant options: 1) Assume the lock owner is alive and don't do anything. This approach guarantees mutual exclusion but isn't especially useful because it gets stuck when things die. 2) Assume the lock owner is dead and take the lock. This doesn't guarantee mutual exclusion but "probably" works if you make the timeout "long enough." This can work correctly if you have a realtime system which will monitor the lock owner and kill the owner if the lease on the shared resource expires (this would probably require a special hardware/OS/programming language stack).

waxjar10y ago

Instead of the fencing token, could the scenario in the blog post be prevented using a good hashing function?

When you perform a write, instead of the token, send a hash of the object previously read. The storage can then compare this against a hash of the resource's current state. If it doesn't match the lock expired and the write is not accepted.

This would reduce the state to keep track off to the resources themselves.

hardwaresofton10y ago

Has anyone used both Zookeeper and etcd in production for management of distributed state?

Generally when I think of this problem I reach for etcd, not Zookeeper first, in the hopes of it being lighter (with a relatively vague definition of "light"), and easier to use.

neeleshs10y ago

Not etcd, but zookeeper yes. It's mostly been set up and forget kind of infrastructure for us, except for some snafus created by our own config errors.

bkeroack10y ago

etcd would be a fine choice, but Consul provides a lock feature[1] out of the box.

1. https://www.consul.io/docs/guides/leader-election.html

teraflop10y ago

As the article explains at length, using Zookeeper or etcd as a "locking service" in this way is not safe, even if the service is perfect and failure-free. There is a fundamental race condition between finding out that you've obtained a lock, and doing some other operation that assumes mutual exclusion.

1 more reply

takeda10y ago

Doesn't it have the same issue that redis does?

1 more reply

kang10y ago

> If they could, distributed algorithms would do without clocks entirely, but then consensus becomes impossible

Consensus can be achieved without clocks using blockchains: Instead of timelocking the resource, client can broadcast his altered version after making changes to the network. The next client than then start working on top of this changed resource, but since multiple versions might be floating around (due to timing issues in the first place) the longest tree wins. So if a client submits work done on a shorter tree then it is rejected by the network.

This has other issues like it takes longer time & reorganization risks but it does away with clocks altogether by providing a different method of timestamping.

This is similar to proof-of-work timestamp server that bitcoin uses but we can do away with proof-of-work because the resource and membership in the network is centralised.

querulous10y ago

a blockchain does not make consensus without clocks possible it is just a grossly inefficient vector clock implementation (that incidentally does not provide consensus in the distributed systems sense)

kang10y ago

I just described how you can do it without clocks. Is there a mistake somewhere, sorry?

1 more reply

batbomb10y ago

The blockchain is a consensus protocol.

1 more reply

j / k navigate · click thread line to collapse

71 comments

antirez10y ago

ploxiln10y ago

antirez10y ago

To link at this I'm not sure what sense it makes. If you believe Redlock is broken, just state it in a rigorous form, and I'll try to reply.

ploxiln10y ago

It's fair to criticize my somewhat harsh and yet hard-analysis-free critique. But let me explain why it matters to me, and what the relation to Redis Sentinel is:

"Redlock" is a fancy name, involves multiple nodes and some significant logic, less experienced developers will assume it's bulletproof.

"Redis Sentinel" is a fancy name, involves multiple nodes and some significant logic, less experienced developers will assume it's bulletproof.

1 more reply

u32010y ago

> I think the unique random token of Redlock is enough for Check & Set when the lock holder work materializes into a database write

dmuth10y ago

It's a long read, but the gist of this article is "use the right tool for the job". Redis is really good for some things, but in its current implementation, distributed locking is not one of them.

https://aphyr.com/posts/291-jepsen-zookeeper

dvirsky10y ago

teraflop10y ago

1 more reply

cryptica10y ago

The more sharing you have, the less you can scale your system.

infinite8s10y ago

This. Sometimes it is better to duplicate computation than to wait for shared state (this is in the context of analytics and not a single-source of truth).

TallGuyShort10y ago

jondubois10y ago

I find the code of highly parallel systems much more elegant and easy to read.

eklavya10y ago

What if you have a shared resource? Is there any way around distributed locking?

erentz10y ago

martinklOP10y ago

amelius10y ago

Edit: Why is this downvoted? It is a serious question.

beachy10y ago

Perhaps because your example fails to describe what happens if there is a crash, after updating the database to say "email sent", but before the email has actually been sent.

amelius10y ago

Well, I didn't want to go into that much detail here, because that also presents a problem in the case of locks (if the locking process crashes, the lock is never released).

1 more reply

infinite8s10y ago

You need locking around the entire transaction, which includes sending the email.

tacos10y ago

You shouldn't be downvoted; people should help you get up to speed.

In the scenario you describe, the database commit works, you send the email but... you get an error back from the email send function. Now what?

Google lease/lock/deadlock/race condition and read up on why databases tend to make for bad implementations.

sdab10y ago

p.s. great article!

martinklOP10y ago

robinhouston10y ago

I'm having trouble relating this comment to the referenced paper [1], which seems to be explicit that its impossiblity result does not apply when synchronised clocks are available.

1. http://www.cs.princeton.edu/courses/archive/fall07/cos518/pa...

sdab10y ago

anon98710y ago

"When used as a failure detector, timeouts are just a guess that something is wrong."

We don't know if the packet is truly lost or is just taking >$timeout_timer and making assumptions is dangerous.

sdab10y ago

spo81rty10y ago

For those interested in distributed locking with Redis and C#, check out this blog post we did which also links to a github project. We use this very heavily. Hope it helps someone.

http://stackify.com/distributed-method-mutexing/

bogdando10y ago

> In a reasonably well-behaved datacenter environment, the timing assumptions will be satisfied most of the time – this is known as a partially synchronous system [12]. But is that good enough?

[0] http://bravenewgeek.com/category/distributed-systems-2/

[1] http://groups.csail.mit.edu/tds/papers/Lynch/jacm88.pdf

praneshp10y ago

Interesting document a colleague of mine wrote: https://specs.openstack.org/openstack/openstack-specs/specs/...

tyre10y ago

Off-topic but that URL is remarkably repetitive. Is that some kind of SEO play?

praneshp10y ago

No, its an effect of autobuilding from a git repo, I think.

The specs are submitted for review at https://github.com/openstack/openstack-specs

hueving10y ago

Repo called openstack-specs sitting in the openstack namespace. Inside that repo is a folder called specs.

https://github.com/openstack/openstack-specs/tree/master/spe...

mring3362110y ago

Yo, Dawg, I heard you like some openstack in your openstack...

simonebrunozzi10y ago

Who's the author? I can't find it in the document.

praneshp10y ago

https://news.ycombinator.com/user?id=harlowja

1 more reply

brightball10y ago

Not directly related, but in a sublink within that piece I found an interesting read about 2 phase commits in Postgres.

https://aphyr.com/posts/282-call-me-maybe-postgres

mattiemass10y ago

Randgalt10y ago

BTW - I wrote a Curator Tech Note about this a while ago: https://cwiki.apache.org/confluence/display/CURATOR/TN10

jodah10y ago

That said, distributed locks can be safe and handle reasonably high throughput. The Atomix distributed lock is one example:

http://atomix.io/atomix/

http://atomix.io/atomix/docs/coordination/#distributedlock

mavelikara10y ago

Thanks for posting about Atomix; I had not heard about it before. This seems to be based on Copycat, discussed previously on HN here: https://news.ycombinator.com/item?id=8180360

1 more reply

iheartmemcache10y ago

[1] http://docs.oracle.com/cd/B28359_01/server.111/b28318/consis...

bpicolo10y ago

I mean, it's definitely a thing you want to avoid whenever possible, because it's a strict point of slowdown in a distributed application.

mattiemass10y ago

Perf for sure, but I was typically more concerned about availability.

EGreg10y ago

Why would you ever want to do distributed locking when you can do MVCC via eg vector clocks, possibly with cryptographic signatures?

If you want to acquire some shared resource, then elect an owner for that resource.

elbee10y ago

waxjar10y ago

Instead of the fencing token, could the scenario in the blog post be prevented using a good hashing function?

This would reduce the state to keep track off to the resources themselves.

hardwaresofton10y ago

Has anyone used both Zookeeper and etcd in production for management of distributed state?

Generally when I think of this problem I reach for etcd, not Zookeeper first, in the hopes of it being lighter (with a relatively vague definition of "light"), and easier to use.

neeleshs10y ago

Not etcd, but zookeeper yes. It's mostly been set up and forget kind of infrastructure for us, except for some snafus created by our own config errors.

bkeroack10y ago

etcd would be a fine choice, but Consul provides a lock feature[1] out of the box.

1. https://www.consul.io/docs/guides/leader-election.html

teraflop10y ago

1 more reply

takeda10y ago

Doesn't it have the same issue that redis does?

1 more reply

kang10y ago

> If they could, distributed algorithms would do without clocks entirely, but then consensus becomes impossible

This has other issues like it takes longer time & reorganization risks but it does away with clocks altogether by providing a different method of timestamping.

This is similar to proof-of-work timestamp server that bitcoin uses but we can do away with proof-of-work because the resource and membership in the network is centralised.

querulous10y ago

kang10y ago

I just described how you can do it without clocks. Is there a mistake somewhere, sorry?

1 more reply

batbomb10y ago

The blockchain is a consensus protocol.

1 more reply

j / k navigate · click thread line to collapse