I think antirez is saying "skew" here when "drift" would be more appropriate. The safety property appears to refer to the different in rates between clocks, rather than the difference in absolute values. That's a much more reasonable assumption, and is likely to be true even with very bad clock hardware over short periods of time.
Obviously the bounded drift assumption,
Exactly that, thanks for the correction, indeed the word I used is wrong, even if probably from the context it was understandable I was referring to drift I'm going to replace the term.
The most obvious issue is that if Redis goes down you could end up with problems if the processes using the locks continue, particularly depending on when and what state Redis restarts.
Another is that you have to take an approach of re-checking out your lock so as not to let it expire if you can't guarantee strict time constraints. Once you do this, you run a risk of something not finishing but extending its lock indefinitely.
A final issue is that you can end up with a situation where there's no guarantee that a waiting task (or whatever you call something that wants a lock or in on a semaphore) will ever run.
I don't really buy those who talk about this being an insane violation 0 state/share nothing. When I've needed these kinds of primitives it rarely has to do with the application state itself - for example I've used the counting semaphores to control how many worker processes can be active. Likewise, I've used the plain locks (and lock-like structures) to do things like insure atomic/ordered writes for user sessions (I suppose session is stateful, but it's also not really shared application state).
In any case, there are some issues, but at the cost of a minimal amount of nursing the ease of implementation and integration often makes Redis a go-to choice for these kinds of things, particularly in resource (hardware) constrained environments. On the other hand if you're operating a scale where you've got multiple datacenters and such, it's a different ballgame.
https://gist.github.com/adewes/6103220
It uses Redis pipelines and watchers to make sure that no race conditions between two processes requiring the same lock occur, and uses "expire" keys to avoid deadlocks.
> so ideally the client should try to send the SET commands to the N instances at the same time using multiplexing.
I am confused. Are the locks requested sequentially, or at the same time? It seem like if they are requested sequentially, the the random backoff time would need to be a large multiple of the combined latency.
As stated in the post, it is ideal if using multiplexing we send the SET to all the instances at the same time, but this does not change a lot the difference between the chosen lock validity and the latency to set the lock, since with 5 instances it is still requires something in the "a few" millisecond range in the average to set the lock, with not very optimized clients, so 1 second is already three order of magnitude more.
What about broadcasting acquire/release messages?
In practical terms this forces you to have the protected code path to be "real time", which is, guaranteed to terminate (or to abort) without the specified time.
You could keep re-acquiring the lock when it gets close to expiring, and stop your process if the lock becomes un-acquirable.
Trying to remember which distributed system model it is that sort of does this. Ring? Mesh?
Think about implementing something like the TPC-C load on a distributed database. Many parts of that load can be done sharded, with a shared-nothing model. Other parts need some level of coordination, and some need true serializability (which requires significant coordination). The pieces that can be built on shared-nothing primitives should be built on shared-nothing primitives. However, it's not useful to pretend that the pieces that require coordination simply don't exist.
Similarly, immutability is a good goal. An append-only log of immutable data items is often a very good mental model, and frequently a good implementation model, for distributed databases. It isn't universally applicable, though, and may put constraints on the possible supportable operations that add complexity to other parts of the system.