Work Distribution with Jump Consistent Hashing (opens in new tab)

(zacksiri.dev)

23 pointsartellectual2y ago8 comments

8 comments

I’m surprised that rendezvous hashing[1] isn’t more popular given that it’s considerably easier to understand and implement than consistent hashing while having all the same pleasant properties.

[1] https://en.m.wikipedia.org/wiki/Rendezvous_hashing

mjb2y ago

It is cool, but that wikipedia article sure has an axe to grind. I'm not sure I'd agree with the "simpler" or the assertion that CH needs to store tokens. And the variant of rendezvous hashing that doesn't require O(N) computation for placement isn't definitely isn't much simpler! It's a cool technique, but that's not where I would go for a comparison.

From wiki: > Since the hash function is randomizing, each of the n sites is equally likely to receive the object O. Loads are uniform across the sites.

This is a really interesting topic of its own! The first sentence is correct, the second is asymptotically correct, but untrue in a lot of practical cases. It's very easy to overestimate the degree of uniformity you get from random allocation, especially where the number of "balls" isn't very large compared to the number of "bins". More here: https://brooker.co.za/blog/2018/01/01/balls-into-bins.html

sacnoradhq2y ago

Unfortunately, the purity of simplicity here is deceptive and doesn't scale well to the real world where some jobs are resource hogs (job variance) and hang (nondeterminism and error handling). This is reinventing an HA job scheduler but without retries or load leveling. If you want even working sets without hotspots, then you need a job scheduler aware of workers' load state. This cannot happen by blindly rolling dice and giving it a cute name.

artellectualOP2y ago

Thx for your feedback. Most of the workload still happens inside a job queue however for this case we deem that it’s not necessary. Fortunately it’s working well for us, the task async part is also adaptable if the problem becomes more complex than what it needs to be then we can handle those cases accordingly. However we still believe throwing everything in a job queue isn’t always optimal either.

For error handling when the task fails it sends a message to the genserver we can also use that opportunity to handle retry, or do a strategy change where after first failure we throw it into a job queue.

This way we optimize for user experience and at the same time have a robust strategy for handling failure. I guess that can be another blog post on its own.

standapart2y ago

Nice post! Couple things that might be useful:

1. While JCH will usually be the most performant hashing method, naively, removing a node will affect all nodes of higher order. This makes the logic of node deletions somewhat more complex than (say) Discord's hash ring. This is why JCH is more common for long-term, distributed, redundant storage -- where the topology changes far less frequently.

2. For sharding, what makes distribution hard is not so much the hashing but consensus on the cluster state -- this is the hidden problem. Bryan Hunter's talk on Waterpark (https://youtu.be/9qUfX3XFi_4) is a excellent example of what you can do when you can set things up so that the topology is fixed. In fact, this approach makes things so straight forward that it is shared by Riak, where the number of vnodes is fixed.

However, if you have a rapidly changing topology (like several Kubernetes clusters that are frequently scaling up and down), you can often need some sort of consensus mechanism to make sure every node has a consistent view of the cluster. In my experience, this usually ends up being the most complex part of distribution problem to solve.

artellectualOP2y ago

Will definitely checkout discord’s hash ring!

Thank you for your feedback.

Dowwie2y ago

Worth noting Discord's consistent hash ring implementation in pure Elixir. It's very easy to use and maintained by Discord (yay): https://github.com/discord/ex_hash_ring/

zacksiri2y ago

That looks cool! Seems like I have a new toy to explore!

j / k navigate · click thread line to collapse

8 comments

User232y ago

[1] https://en.m.wikipedia.org/wiki/Rendezvous_hashing

mjb2y ago

From wiki: > Since the hash function is randomizing, each of the n sites is equally likely to receive the object O. Loads are uniform across the sites.

sacnoradhq2y ago

artellectualOP2y ago

This way we optimize for user experience and at the same time have a robust strategy for handling failure. I guess that can be another blog post on its own.

standapart2y ago

Nice post! Couple things that might be useful:

artellectualOP2y ago

Will definitely checkout discord’s hash ring!

Thank you for your feedback.

Dowwie2y ago

Worth noting Discord's consistent hash ring implementation in pure Elixir. It's very easy to use and maintained by Discord (yay): https://github.com/discord/ex_hash_ring/

zacksiri2y ago

That looks cool! Seems like I have a new toy to explore!

j / k navigate · click thread line to collapse