undefined | Better HN

0 pointsvruiz10y ago0 comments

Have they implemented IP failover already? I haven't heard anything. Having your own LB without being able to fail over the IP is not HA. If the LB does down so does your business.

0 comments

brianwawok10y ago

What?

You use DNS failover and multiple load balancers.

FOO.COM A record -> 1.2.3.4, 1.2.3.5, 1.2.3.6

Then at 1.2.3.4, 1.2.3.5, and 1.2.3.6 you put a load balancer that splits loads between all of your clients.

Any LB goes down, and DNS client retries will deal with it. If any backend server goes down, your LB will deal with it.

Using this pretty successfully at digital ocean right now. What is the downside? I guess client DNS retries takes a few seconds, but for a rare case of a load balancer dying, seems not a deal breaker.

gog10y ago

That is not how things work. Once your system resolves the DNS record it will keep using that record for a while (depending on the TTL of the record and other factors).

Your browser will also cache the result of the DNS lookup, and if that server goes down it will not try to do another DNS lookup for another host and your service will be unavailable.

It will also be unavailable for any new customer that gets the "faulty" IP address.

Specifying multiple DNS records will just cause your DNS server to use one of those, usually in a round robin fashion.

brianwawok10y ago

TTL does not matter because I am not adding or removing systems from my DNS record. Even during an outage, a request to my domain name will return both the broken and the working load balancers.

I am simply giving a list of servers that can answer a request.. clients know to keep trying till one works. (Which they all do. Try it!)

2 more replies

ceejayoz10y ago

"Why is DNS failover not recommended?" http://serverfault.com/questions/60553/why-is-dns-failover-n...

Among other things, IE7 will pin IPs for 30 minutes, non-browser clients may have serious issues, etc.

brianwawok10y ago

Yah for sure, it is not perfect but it is pretty good.

In my use case, I don't support IE7 (won't work at all on my SAAS app), and I only support browser clients.

I have simulated LB failures by killing nginx, and watched traffic flow over to the other LB without a big delay (in 30 seconds everyone was over).

Fancier IP failover is nice for sure, and would let some more enterprisey people in.. but for a lot of apps out there, DNS failover works great. Surprised from above how many people don't realize it exists or works so well (for so little effort).

1 more reply

nailer10y ago

> Among other things, IE7 will pin IPs for 30 minutes

What, regardless of TTL? That's gross.

waffle_ss10y ago

> I guess client DNS retries takes a few seconds

Have you tested this in all the browsers? According to this ServerFault post[1], it could take minutes for an IP address to be considered "down" in Chromium before it cycles to the next one; Firefox apparently waits 20 seconds[2]. Those posts are dated 2011 but I can't imagine the behavior would've changed a whole lot since then. A user is not going to wait multiple minutes or even 20 seconds for a web page to render - it's effectively down.

IP failover with heartbeat or keepalived seems like a much better solution to me when feasible.

[1]: http://serverfault.com/a/328321/85897

[2]: https://bugzilla.mozilla.org/show_bug.cgi?id=641937

brianwawok10y ago

I have tested it in production, and seen traffic move! I am honestly not sure I ever had a customer access my site in Chromium, so that is not a deal breaker either way (assuming it wasn't also a Chrome bug).

Hacker news takes > 20 seconds to load all the time. You mash reload and go on with your life.

I think people get too hung up on "I must have the most optimal HA setup in the history of the world" they end up having no HA, or spending thousands upon thousands of dollars to make some elaborate AWS rube goldberg device that lets you checkoff a bunch of HA boxes. I know a lot of people who did that, and their fancy AWS HA contraption totally fails in the real world because the entire US-EAST region went down and operation depending on at least one availability zone of it working to stay up. Look how much effort Netflix puts into HA, and how many hours a year they are totally broken.

For each application, you have many competing desires. You can have a HA website or web application without using IP failover. IP failover is cool, but not without it's own problems. Every solution has pros and cons. DNS round robin is not a bad solution for many classes of apps that want dead simple failover.

vruizOP10y ago

> Any LB goes down, and DNS client retries will deal with it.

How? How does the DNS client know that the IP no longer works? do browsers today have this mechanism?

I'm not a network guy so perhaps I'm wrong but it's my understanding the problem with DNS load balancing is that you can not invalidate the TTL on the client.

brianwawok10y ago

It is up to the client. But all of the clients (browsers) out there do more or less the same thing.. they try the first DNS record.. if no response in ~30 seconds, try the second, and so on - going down the list.

TTL does not matter here because I am not yanking or adding to my DNS record. I am simply saying "Here are 3 servers.. try them in order until you find one that works".

In practice, a helpful feature is

a) Most clients try them in order from top to bottom b) Most DNS servers (including Digital Oceans) randomize the return order.

So if you do 2 dns requests, the first will return 1.2.3.4, 1.2.3.5, 1.2.3.6, and the second will return 1.2.3.5, 1.2.3.6, 1.2.3.4

This has the double benefit of splitting traffic more or less evenly between my load balancers, and dealing with things with one or more is dead.

1 more reply

tobz10y ago

Your view is an accurate view. It takes the end user -- be it some sort of client, browser, or manual user retry -- to hit the other, alive IP(s). There's also the TTL of a bad record being dropped to consider.

You can simulate IP failover with something like Elastic Network Interfaces / Elastic IPs in AWS... it's just not going to be on the same level of speed as doing it in, say, your own rack in a datacenter. It's also subject to weirdness where you could have some sort of split brain, nodes trying to take over interfaces in a loop. The health checked "multiple load balancers behind a single DNS record" approach has flaws but also simplifies a lot of things.

j / k navigate · click thread line to collapse

0 comments

brianwawok10y ago

What?

You use DNS failover and multiple load balancers.

FOO.COM A record -> 1.2.3.4, 1.2.3.5, 1.2.3.6

Then at 1.2.3.4, 1.2.3.5, and 1.2.3.6 you put a load balancer that splits loads between all of your clients.

Any LB goes down, and DNS client retries will deal with it. If any backend server goes down, your LB will deal with it.

gog10y ago

That is not how things work. Once your system resolves the DNS record it will keep using that record for a while (depending on the TTL of the record and other factors).

Your browser will also cache the result of the DNS lookup, and if that server goes down it will not try to do another DNS lookup for another host and your service will be unavailable.

It will also be unavailable for any new customer that gets the "faulty" IP address.

Specifying multiple DNS records will just cause your DNS server to use one of those, usually in a round robin fashion.

brianwawok10y ago

TTL does not matter because I am not adding or removing systems from my DNS record. Even during an outage, a request to my domain name will return both the broken and the working load balancers.

I am simply giving a list of servers that can answer a request.. clients know to keep trying till one works. (Which they all do. Try it!)

2 more replies

ceejayoz10y ago

"Why is DNS failover not recommended?" http://serverfault.com/questions/60553/why-is-dns-failover-n...

Among other things, IE7 will pin IPs for 30 minutes, non-browser clients may have serious issues, etc.

brianwawok10y ago

Yah for sure, it is not perfect but it is pretty good.

In my use case, I don't support IE7 (won't work at all on my SAAS app), and I only support browser clients.

I have simulated LB failures by killing nginx, and watched traffic flow over to the other LB without a big delay (in 30 seconds everyone was over).

1 more reply

nailer10y ago

> Among other things, IE7 will pin IPs for 30 minutes

What, regardless of TTL? That's gross.

waffle_ss10y ago

> I guess client DNS retries takes a few seconds

IP failover with heartbeat or keepalived seems like a much better solution to me when feasible.

[1]: http://serverfault.com/a/328321/85897

[2]: https://bugzilla.mozilla.org/show_bug.cgi?id=641937

brianwawok10y ago

Hacker news takes > 20 seconds to load all the time. You mash reload and go on with your life.

vruizOP10y ago

> Any LB goes down, and DNS client retries will deal with it.

How? How does the DNS client know that the IP no longer works? do browsers today have this mechanism?

I'm not a network guy so perhaps I'm wrong but it's my understanding the problem with DNS load balancing is that you can not invalidate the TTL on the client.

brianwawok10y ago

TTL does not matter here because I am not yanking or adding to my DNS record. I am simply saying "Here are 3 servers.. try them in order until you find one that works".

In practice, a helpful feature is

a) Most clients try them in order from top to bottom b) Most DNS servers (including Digital Oceans) randomize the return order.

So if you do 2 dns requests, the first will return 1.2.3.4, 1.2.3.5, 1.2.3.6, and the second will return 1.2.3.5, 1.2.3.6, 1.2.3.4

This has the double benefit of splitting traffic more or less evenly between my load balancers, and dealing with things with one or more is dead.

1 more reply

tobz10y ago

j / k navigate · click thread line to collapse