When Simple Wins: Power of 2 Load Balancing (opens in new tab)

(fly.io)

140 pointsmattdennewitz8y ago45 comments

45 comments

alxv8y ago

The method is called "Power of Two Random Choices" (http://www.eecs.harvard.edu/~michaelm/postscripts/handbook20...). And the two-choices paradigm is widely applicable beyond load balancing. In particular, it applies to hash table design (e.g. cuckoo hashing) and cache eviction schemes (https://danluu.com/2choices-eviction/).

mrkurt8y ago

You're right, I updated the title. Got a little too clever with the whole "power" thing.

naiveattack8y ago

"while each additional choice beyond two decreases the maximum load by only a constant factor"

Mathemagical!

adrianN8y ago

It also works for solving SAT. Try two literals and recurse on the one that can be made to satisfy more clauses.

zubspace8y ago

I'm not an expert in this field, but an engineer of vimeo went into detail, why this approach did not work for them. [1]

Problem with consistent hashing:

  However, consistent hashing comes with its own problem: uneven distribution of requests.
  Because of its mathematical properties, consistent hashing only balances loads about as
  well as choosing a random server for each request, when the distribution of requests is
  equal. But if some content is much more popular than others (as usual for the internet),
  it can be worse than that.

Problem with Power of 2 Load Balancing:

  Why wasn’t there a way to say “use consistent hashing, but please don’t overload any
  servers”? As early as August 2015, I had tried to come up with an algorithm based on
  the power of two random choices that would do just that, but a bit of  simulation said
  that it didn’t work. Too many requests were sent to non-ideal servers to be worthwhile.

Instead, he used something called Consistent Hashing with Bounded Loads.

[1] https://medium.com/vimeo-engineering-blog/improving-load-bal...

mrkurt8y ago

Bounded load consistent hashing is interesting. It makes total sense that 2 random wouldn't work for vimeo, it actually doesn't work for us between visitors and our edge because we care a lot about cache data — we only use it between our load balancers and our customer's origin app instances.

For what we're doing, we actually need to consider more than just load. Since our LBs are distributed globally, we also want to make sure we're sending requests to backends that are geographically near them.

We can do this by tracking latency between the load balancer and origin servers, then using it to restrict the candidate pool we're going to choose two from at random.

user59944618y ago

They are different algorithm for different purpose.

Consistent hashing is used to always attach a request to the same host. It's the opposite of load balancing.

Load balancing algorithms (least connection, business, etc...) are used to distribute requests across servers as well as possible to maximize performances.

pas8y ago

Usually both properties are desirable .. up to a point.

You want to minimize load on all servers, but you also want to pack things up efficiently (so minimize operational costs), but of course you want the benefits of caching, so you want requests from a sessions to land on the same node/server/box.

Basically a multi-dimensional optimization problem. Completely solvable with constraints. Let the business people decide what's more important, latency or throughput or low cost of operations.

maffydub8y ago

It looks as though the approach proposed in the article is random, rather than attempting to use consistent hashing (as Vimeo investigated) - that may be why the results the Vimeo engineers found are worse than those the artcile suggests?

throwaway133378y ago

The simplest load balancing I've done is modulo the user ID by the number of servers then point at that server.

This solves caching too since you are only ever receiving and caching user data on a single server. No cache communication required. You can enforce it on the server side for security as well.

Doesn't require a load balance server - just an extra line of code.

Keep it simple.

alxv8y ago

What happens when the number of servers changes? The cache hit rate would likely drop to zero until it warms up again, which is a good way to accidentally overload your systems.

Load balancing based on consistent hashing is the better way to implement this.

adrianN8y ago

When the number of server changes you slowly ramp up from mod n to mod (n+1). Flip a biased coin for each user to decide whether to use n or n+1, slowly crank up the bias to the n+1 side.

mnutt8y ago

Consistent hashing is a bit cleaner way to do it, but pretty much the same result as modulo-ing the user id against number of servers. At least as I understand it, you consistently hash something (a user id, a request URL, etc) into N buckets, where N is the number of servers, so changing N re-shuffles all of the buckets anyway.

Short of something like cassandra's ring topology, how would you use consistent hashing add new servers and assign them requests?

3 more replies

justifier8y ago

though a clever way to be able to ignore the real problem eventually time will force you to revisit from base principles

user_id%numb_server may work early on when user activity and uptake are consistent,

but what happens when user activity becomes more complex: increase in users, some users abandoning the platform, others using it more; and that complexity lacks homogeneous distribution through this only concerned property: 'user id';

what if over time you gain more users but the majority of people who drop the platform have a user_id%numb_servers==2|11|13|17

in this case you would have some servers working hard while others sitting dormant

what is the real distribution of the relation between activity and user_id over time? asymptotic(o)? similar to the prime numbers(i)? a gaussian distribution(ii)? a benford distribution(iii)?

whichever future dada will show to be the best fit, most distributions show a strong trend toward eventual favoring of values

which i think implies, to ensure an even distribution of work across servers, the problem requires something with greater dimensionality than modulo on an immutable value that is defined serially

(o) https://en.wikipedia.org/wiki/Asymptotic_analysis

(i) https://en.wikipedia.org/wiki/Prime_number_theorem

(ii) https://en.wikipedia.org/wiki/Probability_density_function

(iii) https://en.wikipedia.org/wiki/Benford's_law

hvidgaard8y ago

I can think of a few ways to rebalance things on the fly, but I would probably just hash some immutable values for the user, like Id and name, together with a nonce. If a server get's overloaded, slowly move people from it by changing their nonce.

1 more reply

adrianmonk8y ago

Arguably, that's sharding, not load balancing. If you want to get picky with terminology at least.

Anyway, I do have a point beyond being pedantic: this offers two advantages that a fixed sharding scheme doesn't. #1: it doesn't need to identify a piece of data on the request to shard off of. #2: it actively (though imperfectly) attempts to achieve similar utilization on every server.

cbhl8y ago

Facebook chat used to use this scheme, but eventually had to move away from it because it made it difficult to add and remove servers to the fleet in response to load.

ww5208y ago

User ID is not exactly random material. Might be better to do mod(md5(user_id), server_count) to scatter the bits using MD5.

mrkurt8y ago

That's very simple consistent hashing. Consistent hashing is great when you want to trade even load for localized data.

In fact, we use consistent hashing when we accept requests, and two random choices when we deliver them to the apps. This works much better for _most_ of the apps we see. We're typically worried about cache data for a particular app. The app instances themselves, though, tend to be mostly stateless and disposable.

candu8y ago

One problem with this is the "long tail" issue: in many applications, you have a highly active small minority of users, and any server assigned to enough of those users will be overloaded while other servers are underutilized. Since these are also (typically) your most excited / engaged users, this effectively penalizes user behaviors you'd rather encourage.

The other main problem is that it's not a consistent hash: if you grow the server pool, you typically need to reshard a lot of content.

(It's still useful in a pinch, but it helps to be aware of the tradeoffs.)

kinkrtyavimoodh8y ago

But where is the modulo being calculated?

toomuchtodo8y ago

[removed, brain failure]

1 more reply

kornish8y ago

This is how many horizontally scalable OLTP databases operate too (e.g. DynamoDB, Citus): picking a partition key, then deterministically routing work associated with that partition key to the proper, well, partition.

euph0ria8y ago

Regarding the math section, could someone please describe it like you were talking to a 5 year old?

1) Θ( log n = log / log n )

2) Θ(log log n)

alxv8y ago

There is a proof shown in this handout: https://people.eecs.berkeley.edu/~sinclair/cs271/n15.pdf

It's hard to understand why this technique works so well without digging deep in the math. Roughly speaking, if you throw n balls in n bins at random, the maximum of number balls in any bins will grow surprisingly quickly (because of the birthday paradox). However, if we allow ourselves to choose between two random bins instead of one, and put the ball in the one with the fewest balls in it, the maximum number of balls in any bins grow much more slowly (i.e., O(ln ln n)). Hence, having that one extra random choice allows us to get surprisingly close to the optimal approach of comparing all bins (which would give us O(1)), without doing all that work.

MichaelGG8y ago

Thanks for the explanation! Much clearer and I get the concept. In the case of load balancing, we'd need a ton of servers (1000s?) for this to pay off vs just comparing all, right? Cache updating aside, most of the overhead would be in reading the load numbers in. Comparing a thousand numbers has to be quick in comparison, no?

1 more reply

nickpsecurity8y ago

That's a really intuitive explanation. Appreciate that.

euph0ria8y ago

Thank you sir!

rawnlq8y ago

1) Throw n balls into n bins, the bin for each ball chosen randomly

2) Throw n balls into n bins, two bin for each ball chosen randomly, always picking the bin with fewer balls in it

In both cases you will have n balls distributed over n bins in the end. But the number of balls in the largest bin will be different for the two processes above. In the first case the largest bin has more balls: O(log n / log log n) == O(log n). And the second case has just O(log log n) balls. So just adding an extra choice of bins made the expected largest bin exponentially smaller.

More rough intuition: if x of your bins are occupied, in the first case your next ball has x/n probability of queueing instead of finding an empty bin but in the second it's only (x/n)^2 chance to need to queue.

adrianratnapala8y ago

So that means the expectation value of the maximum scales as O(log n / log log n)?

matahwoosh8y ago

Generally, yes, but I think `O(log n / log log n) == O(log n)` is wrong.

log(n) / log(log(n)) = logx(n) (where x = log(n), wasn't sure how to describe logarithm base in a better way). So you get O(logx(n)). In general the logarithm base doesn't matter for Big-O when it's a constant, but I'm not sure you can apply the same thing to a base of log(n).

matahwoosh8y ago

small correction, it's Θ( log n / log log n ). I noticed though, when I copied the formula from the original paper, this what I got, too ;)

gopalv8y ago

"Power of 2 Random Choices" ... has nothing to do with the "Power of 2" directly.

I like 2Choice because it is not dependent on hash function design & is temporal, but I have a positive aversion to the 2^n hash distributions when it comes to data, specifically for distributed systems which need to flex up/down [1].

[1] - http://notmysock.org/blog/hacks/1440

scame8y ago

I've seen a paper doing the same thing directly at the network layer using IPv6 extension headers: http://www.thomasclausen.net/wp-content/uploads/2017/06/2017...

adrianratnapala8y ago

Can someone expand on the maths that the OP elided? What is the thing that comes out to O(log n / log log n)?

j / k navigate · click thread line to collapse

45 comments

alxv8y ago

mrkurt8y ago

You're right, I updated the title. Got a little too clever with the whole "power" thing.

naiveattack8y ago

"while each additional choice beyond two decreases the maximum load by only a constant factor"

Mathemagical!

adrianN8y ago

It also works for solving SAT. Try two literals and recurse on the one that can be made to satisfy more clauses.

zubspace8y ago

I'm not an expert in this field, but an engineer of vimeo went into detail, why this approach did not work for them. [1]

Problem with consistent hashing:

  However, consistent hashing comes with its own problem: uneven distribution of requests.
  Because of its mathematical properties, consistent hashing only balances loads about as
  well as choosing a random server for each request, when the distribution of requests is
  equal. But if some content is much more popular than others (as usual for the internet),
  it can be worse than that.

Problem with Power of 2 Load Balancing:

  Why wasn’t there a way to say “use consistent hashing, but please don’t overload any
  servers”? As early as August 2015, I had tried to come up with an algorithm based on
  the power of two random choices that would do just that, but a bit of  simulation said
  that it didn’t work. Too many requests were sent to non-ideal servers to be worthwhile.

Instead, he used something called Consistent Hashing with Bounded Loads.

[1] https://medium.com/vimeo-engineering-blog/improving-load-bal...

mrkurt8y ago

We can do this by tracking latency between the load balancer and origin servers, then using it to restrict the candidate pool we're going to choose two from at random.

user59944618y ago

They are different algorithm for different purpose.

Consistent hashing is used to always attach a request to the same host. It's the opposite of load balancing.

Load balancing algorithms (least connection, business, etc...) are used to distribute requests across servers as well as possible to maximize performances.

pas8y ago

Usually both properties are desirable .. up to a point.

Basically a multi-dimensional optimization problem. Completely solvable with constraints. Let the business people decide what's more important, latency or throughput or low cost of operations.

maffydub8y ago

throwaway133378y ago

The simplest load balancing I've done is modulo the user ID by the number of servers then point at that server.

This solves caching too since you are only ever receiving and caching user data on a single server. No cache communication required. You can enforce it on the server side for security as well.

Doesn't require a load balance server - just an extra line of code.

Keep it simple.

alxv8y ago

What happens when the number of servers changes? The cache hit rate would likely drop to zero until it warms up again, which is a good way to accidentally overload your systems.

Load balancing based on consistent hashing is the better way to implement this.

adrianN8y ago

When the number of server changes you slowly ramp up from mod n to mod (n+1). Flip a biased coin for each user to decide whether to use n or n+1, slowly crank up the bias to the n+1 side.

mnutt8y ago

Short of something like cassandra's ring topology, how would you use consistent hashing add new servers and assign them requests?

3 more replies

justifier8y ago

though a clever way to be able to ignore the real problem eventually time will force you to revisit from base principles

user_id%numb_server may work early on when user activity and uptake are consistent,

what if over time you gain more users but the majority of people who drop the platform have a user_id%numb_servers==2|11|13|17

in this case you would have some servers working hard while others sitting dormant

what is the real distribution of the relation between activity and user_id over time? asymptotic(o)? similar to the prime numbers(i)? a gaussian distribution(ii)? a benford distribution(iii)?

whichever future dada will show to be the best fit, most distributions show a strong trend toward eventual favoring of values

which i think implies, to ensure an even distribution of work across servers, the problem requires something with greater dimensionality than modulo on an immutable value that is defined serially

(o) https://en.wikipedia.org/wiki/Asymptotic_analysis

(i) https://en.wikipedia.org/wiki/Prime_number_theorem

(ii) https://en.wikipedia.org/wiki/Probability_density_function

(iii) https://en.wikipedia.org/wiki/Benford's_law

hvidgaard8y ago

1 more reply

adrianmonk8y ago

Arguably, that's sharding, not load balancing. If you want to get picky with terminology at least.

cbhl8y ago

Facebook chat used to use this scheme, but eventually had to move away from it because it made it difficult to add and remove servers to the fleet in response to load.

ww5208y ago

User ID is not exactly random material. Might be better to do mod(md5(user_id), server_count) to scatter the bits using MD5.

mrkurt8y ago

That's very simple consistent hashing. Consistent hashing is great when you want to trade even load for localized data.

candu8y ago

The other main problem is that it's not a consistent hash: if you grow the server pool, you typically need to reshard a lot of content.

(It's still useful in a pinch, but it helps to be aware of the tradeoffs.)

kinkrtyavimoodh8y ago

But where is the modulo being calculated?

toomuchtodo8y ago

[removed, brain failure]

1 more reply

kornish8y ago

euph0ria8y ago

Regarding the math section, could someone please describe it like you were talking to a 5 year old?

1) Θ( log n = log / log n )

2) Θ(log log n)

alxv8y ago

There is a proof shown in this handout: https://people.eecs.berkeley.edu/~sinclair/cs271/n15.pdf

MichaelGG8y ago

1 more reply

nickpsecurity8y ago

That's a really intuitive explanation. Appreciate that.

euph0ria8y ago

Thank you sir!

rawnlq8y ago

1) Throw n balls into n bins, the bin for each ball chosen randomly

2) Throw n balls into n bins, two bin for each ball chosen randomly, always picking the bin with fewer balls in it

adrianratnapala8y ago

So that means the expectation value of the maximum scales as O(log n / log log n)?

matahwoosh8y ago

Generally, yes, but I think `O(log n / log log n) == O(log n)` is wrong.

matahwoosh8y ago

small correction, it's Θ( log n / log log n ). I noticed though, when I copied the formula from the original paper, this what I got, too ;)

gopalv8y ago

"Power of 2 Random Choices" ... has nothing to do with the "Power of 2" directly.

[1] - http://notmysock.org/blog/hacks/1440

scame8y ago

I've seen a paper doing the same thing directly at the network layer using IPv6 extension headers: http://www.thomasclausen.net/wp-content/uploads/2017/06/2017...

adrianratnapala8y ago

Can someone expand on the maths that the OP elided? What is the thing that comes out to O(log n / log log n)?

j / k navigate · click thread line to collapse