A proposal for more reliable locks using Redis (opens in new tab)

(antirez.com)

54 pointsjanerik12y ago26 comments

26 comments

mjb12y ago

The classic "Leases: an efficient fault-tolerant mechanism for distributed file cache consistency" (http://portal.acm.org/citation.cfm?id=74870) dating back to 1989 is a good read about these kinds of systems. It makes some interesting observations about the approach, and introduces the need for bounded drift.

I think antirez is saying "skew" here when "drift" would be more appropriate. The safety property appears to refer to the different in rates between clocks, rather than the difference in absolute values. That's a much more reasonable assumption, and is likely to be true even with very bad clock hardware over short periods of time.

Obviously the bounded drift assumption,

antirez12y ago

> The safety property appears to refer to the different in rates between clocks, rather than the difference in absolute values.

Exactly that, thanks for the correction, indeed the word I used is wrong, even if probably from the context it was understandable I was referring to drift I'm going to replace the term.

Glyptodon12y ago

Is this new? I feel like using Redis for locks is something that's been going around for a while. I've used Redis to make locks, and also used it to make counting semaphores. It's a fairly interesting use because it's frequently the the simplest, least overhead means to solve a problem that's reliable enough without actually being reliable.

The most obvious issue is that if Redis goes down you could end up with problems if the processes using the locks continue, particularly depending on when and what state Redis restarts.

Another is that you have to take an approach of re-checking out your lock so as not to let it expire if you can't guarantee strict time constraints. Once you do this, you run a risk of something not finishing but extending its lock indefinitely.

A final issue is that you can end up with a situation where there's no guarantee that a waiting task (or whatever you call something that wants a lock or in on a semaphore) will ever run.

I don't really buy those who talk about this being an insane violation 0 state/share nothing. When I've needed these kinds of primitives it rarely has to do with the application state itself - for example I've used the counting semaphores to control how many worker processes can be active. Likewise, I've used the plain locks (and lock-like structures) to do things like insure atomic/ordered writes for user sessions (I suppose session is stateful, but it's also not really shared application state).

In any case, there are some issues, but at the cost of a minimal amount of nursing the ease of implementation and integration often makes Redis a go-to choice for these kinds of things, particularly in resource (hardware) constrained environments. On the other hand if you're operating a scale where you've got multiple datacenters and such, it's a different ballgame.

antirez12y ago

It is not new at all, that's exactly why I wrote the blog post, since it is an actual use case. But it is a good idea to formalize a canonical model to do it with Redis in a safe way, and to show what are the failure modes when we resort to simpler designs (like failover to slaves).

Glyptodon12y ago

As a related aside, I'd be mildly interested in Redis having an ability to produce and produce and set UUIDs. That sort of thing might also have some utility for locking structures/synchronization primitives, though it's not the original reason I'd thought having something like that might be nice.

ThePhysicist12y ago

I wrote a very similar, Redis-based lock in Python a while ago, here it is:

https://gist.github.com/adewes/6103220

It uses Redis pipelines and watchers to make sure that no race conditions between two processes requiring the same lock occur, and uses "expire" keys to avoid deadlocks.

klibertp12y ago

I prefer Lua-scripting based approach: https://github.com/bbangert/retools/blob/master/retools/lock..., but I had to implement something similar to your code to support older Redis versions.

thezilch12y ago

You should just use redis-py's Lock (https://github.com/andymccurdy/redis-py/blob/master/redis/cl...)

jcampbell112y ago

> Step 2) It tries to acquire the lock in all the N instances sequentially, using the same key name and random value in all the instances.

> so ideally the client should try to send the SET commands to the N instances at the same time using multiplexing.

I am confused. Are the locks requested sequentially, or at the same time? It seem like if they are requested sequentially, the the random backoff time would need to be a large multiple of the combined latency.

antirez12y ago

> I am confused. Are the locks requested sequentially, or at the same time? It seem like if they are requested sequentially, the the random backoff time would need to be a large multiple of the combined latency.

As stated in the post, it is ideal if using multiplexing we send the SET to all the instances at the same time, but this does not change a lot the difference between the chosen lock validity and the latency to set the lock, since with 5 instances it is still requires something in the "a few" millisecond range in the average to set the lock, with not very optimized clients, so 1 second is already three order of magnitude more.

jcampbell112y ago

Thanks, I understand. The sequential locking isn't a requirement per-se, and the assumption is that the N-masters are in the same region. It seems like with clever multiplexing, you could relax the requirement the N-masters are in the same region.

1 more reply

nkozyra12y ago

It seems like a lock should be able to autorelease in a distributed environment if the acquirer is no longer available. Would this not be considered "safe?"

What about broadcasting acquire/release messages?

antirez12y ago

The proposed algorithm provides a safety guarantee which is time bound: once the lock is acquired it has a specified validity time, after this time, it is possible for another client to reacquire it.

In practical terms this forces you to have the protected code path to be "real time", which is, guaranteed to terminate (or to abort) without the specified time.

eurleif12y ago

>In practical terms this forces you to have the protected code path to be "real time", which is, guaranteed to terminate (or to abort) without the specified time.

You could keep re-acquiring the lock when it gets close to expiring, and stop your process if the lock becomes un-acquirable.

1 more reply

nkozyra12y ago

It seems like a timeout is less reliable/safe than some broadcast/ping mechanism that can check availability perpetually and if a node has disappeared the validity of the lock changes.

Trying to remember which distributed system model it is that sort of does this. Ring? Mesh?

2 more replies

oivavoi12y ago

So the locks in a Distributed version work something like the NRW concept in Riak ?

antirez12y ago

The algorithm described more closely resembles a CP system (in its need to successfully lock the majority of instances), since it requires majority to be available and in order to lock a resource. Also it is not a tunable algorithm, at least the majority of nodes must be locked in order for the safety to be guaranteed.

eddd12y ago

I finish reading after first paragraph... When do people will learn that using locks, shared memory does not work? It is just wrong. Things should be immutable, you should share nothing. And by nothing i mean nothing at all. It is just WRONG.

mjb12y ago

I don't think you can simplify things that far. Shared-nothing is certainly a good architectural goal, and there are many types of distributed systems where it is both highly desirable and achievable. On the other hand, there are many distributed systems (and parallel systems) where coordination is strictly necessary.

Think about implementing something like the TPC-C load on a distributed database. Many parts of that load can be done sharded, with a shared-nothing model. Other parts need some level of coordination, and some need true serializability (which requires significant coordination). The pieces that can be built on shared-nothing primitives should be built on shared-nothing primitives. However, it's not useful to pretend that the pieces that require coordination simply don't exist.

Similarly, immutability is a good goal. An append-only log of immutable data items is often a very good mental model, and frequently a good implementation model, for distributed databases. It isn't universally applicable, though, and may put constraints on the possible supportable operations that add complexity to other parts of the system.

ericingram12y ago

Go create such a system, make it easy to use and widely available, and then you might have room to say such things with some credibility.

j / k navigate · click thread line to collapse

26 comments

mjb12y ago

Obviously the bounded drift assumption,

antirez12y ago

> The safety property appears to refer to the different in rates between clocks, rather than the difference in absolute values.

Exactly that, thanks for the correction, indeed the word I used is wrong, even if probably from the context it was understandable I was referring to drift I'm going to replace the term.

Glyptodon12y ago

The most obvious issue is that if Redis goes down you could end up with problems if the processes using the locks continue, particularly depending on when and what state Redis restarts.

A final issue is that you can end up with a situation where there's no guarantee that a waiting task (or whatever you call something that wants a lock or in on a semaphore) will ever run.

antirez12y ago

Glyptodon12y ago

ThePhysicist12y ago

I wrote a very similar, Redis-based lock in Python a while ago, here it is:

https://gist.github.com/adewes/6103220

It uses Redis pipelines and watchers to make sure that no race conditions between two processes requiring the same lock occur, and uses "expire" keys to avoid deadlocks.

klibertp12y ago

I prefer Lua-scripting based approach: https://github.com/bbangert/retools/blob/master/retools/lock..., but I had to implement something similar to your code to support older Redis versions.

thezilch12y ago

You should just use redis-py's Lock (https://github.com/andymccurdy/redis-py/blob/master/redis/cl...)

jcampbell112y ago

> Step 2) It tries to acquire the lock in all the N instances sequentially, using the same key name and random value in all the instances.

> so ideally the client should try to send the SET commands to the N instances at the same time using multiplexing.

antirez12y ago

jcampbell112y ago

1 more reply

nkozyra12y ago

It seems like a lock should be able to autorelease in a distributed environment if the acquirer is no longer available. Would this not be considered "safe?"

What about broadcasting acquire/release messages?

antirez12y ago

The proposed algorithm provides a safety guarantee which is time bound: once the lock is acquired it has a specified validity time, after this time, it is possible for another client to reacquire it.

In practical terms this forces you to have the protected code path to be "real time", which is, guaranteed to terminate (or to abort) without the specified time.

eurleif12y ago

>In practical terms this forces you to have the protected code path to be "real time", which is, guaranteed to terminate (or to abort) without the specified time.

You could keep re-acquiring the lock when it gets close to expiring, and stop your process if the lock becomes un-acquirable.

1 more reply

nkozyra12y ago

It seems like a timeout is less reliable/safe than some broadcast/ping mechanism that can check availability perpetually and if a node has disappeared the validity of the lock changes.

Trying to remember which distributed system model it is that sort of does this. Ring? Mesh?

2 more replies

oivavoi12y ago

So the locks in a Distributed version work something like the NRW concept in Riak ?

antirez12y ago

eddd12y ago

mjb12y ago

ericingram12y ago

Go create such a system, make it easy to use and widely available, and then you might have room to say such things with some credibility.

j / k navigate · click thread line to collapse