Mathemagical!
Problem with consistent hashing:
However, consistent hashing comes with its own problem: uneven distribution of requests.
Because of its mathematical properties, consistent hashing only balances loads about as
well as choosing a random server for each request, when the distribution of requests is
equal. But if some content is much more popular than others (as usual for the internet),
it can be worse than that.
Problem with Power of 2 Load Balancing: Why wasn’t there a way to say “use consistent hashing, but please don’t overload any
servers”? As early as August 2015, I had tried to come up with an algorithm based on
the power of two random choices that would do just that, but a bit of simulation said
that it didn’t work. Too many requests were sent to non-ideal servers to be worthwhile.
Instead, he used something called Consistent Hashing with Bounded Loads.[1] https://medium.com/vimeo-engineering-blog/improving-load-bal...
For what we're doing, we actually need to consider more than just load. Since our LBs are distributed globally, we also want to make sure we're sending requests to backends that are geographically near them.
We can do this by tracking latency between the load balancer and origin servers, then using it to restrict the candidate pool we're going to choose two from at random.
Consistent hashing is used to always attach a request to the same host. It's the opposite of load balancing.
Load balancing algorithms (least connection, business, etc...) are used to distribute requests across servers as well as possible to maximize performances.
You want to minimize load on all servers, but you also want to pack things up efficiently (so minimize operational costs), but of course you want the benefits of caching, so you want requests from a sessions to land on the same node/server/box.
Basically a multi-dimensional optimization problem. Completely solvable with constraints. Let the business people decide what's more important, latency or throughput or low cost of operations.
This solves caching too since you are only ever receiving and caching user data on a single server. No cache communication required. You can enforce it on the server side for security as well.
Doesn't require a load balance server - just an extra line of code.
Keep it simple.
Load balancing based on consistent hashing is the better way to implement this.
Short of something like cassandra's ring topology, how would you use consistent hashing add new servers and assign them requests?
user_id%numb_server may work early on when user activity and uptake are consistent,
but what happens when user activity becomes more complex: increase in users, some users abandoning the platform, others using it more; and that complexity lacks homogeneous distribution through this only concerned property: 'user id';
what if over time you gain more users but the majority of people who drop the platform have a user_id%numb_servers==2|11|13|17
in this case you would have some servers working hard while others sitting dormant
what is the real distribution of the relation between activity and user_id over time? asymptotic(o)? similar to the prime numbers(i)? a gaussian distribution(ii)? a benford distribution(iii)?
whichever future dada will show to be the best fit, most distributions show a strong trend toward eventual favoring of values
which i think implies, to ensure an even distribution of work across servers, the problem requires something with greater dimensionality than modulo on an immutable value that is defined serially
(o) https://en.wikipedia.org/wiki/Asymptotic_analysis
(i) https://en.wikipedia.org/wiki/Prime_number_theorem
(ii) https://en.wikipedia.org/wiki/Probability_density_function
Anyway, I do have a point beyond being pedantic: this offers two advantages that a fixed sharding scheme doesn't. #1: it doesn't need to identify a piece of data on the request to shard off of. #2: it actively (though imperfectly) attempts to achieve similar utilization on every server.
In fact, we use consistent hashing when we accept requests, and two random choices when we deliver them to the apps. This works much better for _most_ of the apps we see. We're typically worried about cache data for a particular app. The app instances themselves, though, tend to be mostly stateless and disposable.
The other main problem is that it's not a consistent hash: if you grow the server pool, you typically need to reshard a lot of content.
(It's still useful in a pinch, but it helps to be aware of the tradeoffs.)
1) Θ( log n = log / log n )
2) Θ(log log n)
It's hard to understand why this technique works so well without digging deep in the math. Roughly speaking, if you throw n balls in n bins at random, the maximum of number balls in any bins will grow surprisingly quickly (because of the birthday paradox). However, if we allow ourselves to choose between two random bins instead of one, and put the ball in the one with the fewest balls in it, the maximum number of balls in any bins grow much more slowly (i.e., O(ln ln n)). Hence, having that one extra random choice allows us to get surprisingly close to the optimal approach of comparing all bins (which would give us O(1)), without doing all that work.
2) Throw n balls into n bins, two bin for each ball chosen randomly, always picking the bin with fewer balls in it
In both cases you will have n balls distributed over n bins in the end. But the number of balls in the largest bin will be different for the two processes above. In the first case the largest bin has more balls: O(log n / log log n) == O(log n). And the second case has just O(log log n) balls. So just adding an extra choice of bins made the expected largest bin exponentially smaller.
More rough intuition: if x of your bins are occupied, in the first case your next ball has x/n probability of queueing instead of finding an empty bin but in the second it's only (x/n)^2 chance to need to queue.
log(n) / log(log(n)) = logx(n) (where x = log(n), wasn't sure how to describe logarithm base in a better way). So you get O(logx(n)). In general the logarithm base doesn't matter for Big-O when it's a constant, but I'm not sure you can apply the same thing to a base of log(n).
I like 2Choice because it is not dependent on hash function design & is temporal, but I have a positive aversion to the 2^n hash distributions when it comes to data, specifically for distributed systems which need to flex up/down [1].