Latency Sneaks Up on You (opens in new tab)

(brooker.co.za)

135 pointsluord4y ago25 comments

25 comments

Great article, and a line of reasoning that ought to be more widely known. There is a similar tradeoff between latency and utilization in hash tables, for essentially the same reason.

The phenomenon described by the author can lead to interesting social dynamics over time. The initial designer of a system understands the latency/utilization tradeoff and dimensions the system to be underutilized so as to meet latency goals. Then the system is launched and successful, so people start questioning the low utilization, and apply pressure to increase utilization in order to reduce costs. Invariably latency goes up, customers complain. Customers escalate, and projects are started to reduce latency. People screw around at the margin changing number of threads etc, but the fundamental tradeoff cannot be avoided. Nobody is happy in the end. (Been through this cycle a few times already.)

dharmab4y ago

In my org, we define latency targets early (based on our user's needs where possible) and then our goal is to maximize utilization within those constraints.

wpietri4y ago

Mostly agreed, and I think the point about efficiency working against latency is both important and widely ignored. And not just in software, but software process.

There's a great book called Principles of Product Development Flow. It carefully looks at the systems behind how things get built. Key to any good feedback loop is low latency. So if we want our software to get better for users over time, low latencies from idea to release are vital. But most software processes are tuned for keeping developers 100% busy (or more!), which drastically increases system latency. That latency means we get a gain in efficiency (as measured by how busy developers are) but a loss in how effective the system is (as determined by creation of user and business value).

azundo4y ago

This principle applies as much to the work we schedule for ourselves (or our teams) as it does to our servers.

As teams get pushed to efficiently utilize scarce and expensive developer resources to their max they can also end up with huge latency issues for unanticipated requests. Not always easy to justify why planned work is way under a team's capacity though even if it leads to better overall outcomes at the end of the day.

mjb4y ago

Yes, for sure.

As another abstract example that's completely disconnected from the real world: if we're running the world's capacity to make n95 masks at high utilization, it may take a while to be able to handle a sudden spike in demand.

jrochkind14y ago

Mostly a reminder/clarification of things I knew, but a good and welcome one well-stated, because I probably sometimes forget. (I don't do performance work a lot).

But this:

> If you must use latency to measure efficiency, use mean (avg) latency. Yes, average latency

Not sure if I ever thought about it before, but after following the link[1] where OP talks more about it, they've convinced me. Definitely want mean latency at least in addition to median, not median alone.

[1]: https://brooker.co.za/blog/2017/12/28/mean.html

marcosdumay4y ago

> at least in addition to median

There was an interesting article here not long ago that made the point that median is basically useless. If you load 5 resources on a page load, the odds of all of them being faster than the median (so it represents the user experience) is about 3%. You need a very high rank to get any useful information, probably with a number of 9s.

jrochkind14y ago

Median for a particular action/page might be more useful.

marcosdumay4y ago

No doubt about that (even then, you will probably want the 90 or 99 percentile, depending on how many interactions you expect a person to have).

Th real median is just very hard to measure, and an easier 99.99 (with more 9's as needed) rank is almost as good.

1 more reply

zekrioca4y ago

> If we're expecting 10 requests per second at peak this holiday season, we're good.

Problem is, sometimes system engineers do not know what to expect, but they still need to have a plan for this case.

1 more reply

dwohnitmok4y ago

I think the article is missing one big reason why we care about 99.99% or 99.9% latency metrics and that is that we can have high latency spikes even with low utilization.

The majority of computer systems do not deal with high utilization. As has been pointed out many times, computers are really fast these days, and many businesses may be able to get away through their entire lifetime on a single machine if the underlying software makes efficient use of the hardware resources. And yet even with low utilization, we still have occasional high latency that still occurs often enough to frustrate a user. Why is that? Because a lot of software these days is based on a design that intersperses low-latency operations with occasional high-latency ones. This shows up everywhere: garbage collection, disk and memory fragmentation, growable arrays, eventual consistency, soft deletions followed by actual hard deletions, etc.

What this article is advocating for is essentially an amortized analysis of throughput and latency, in which case you do have a nice and steady relationship between utilization and latency. But in a system which may never come close to full utilization of its underlying hardware resources (which is a large fraction of software running on modern hardware), this amortized analysis is not very valuable because even with very low utilization we can still have very different latency distributions due to the aforementioned software design and what tweaks you make to that.

This is why many software systems don't care about the median latency or the average latency, but care about the 99 or 99.9 percentile latency: there is a utilization-independent component to the statistical distribution of your latency over time and for those many software systems which have low utilization of hardware resources that is the main determinant of your overall latency profile, not utilization.

MatteoFrigo4y ago

Even worse, the effects that you mention (garbage collection, etc.) are morally equivalent to an increase in utilization, which pushes you towards the latency singularity that the article is talking about.

As an oversimplified example, suppose that your system is 10% utilized and that $BAD_THING (gc, or whatever) happens that effectively slows down the system by a factor of 10 at least temporarily. Your latency does not go up by 10x---it grows unbounded because now your effective utilization is 100%.

shitlord4y ago

If you're interested, there's a whole branch of mathematics that models these sorts of phenomena: https://en.wikipedia.org/wiki/Queueing_theory

ksec4y ago

OK. I am stupid. I dont understand the article.

>> If you must use latency to measure efficiency, use mean (avg) latency. Yes, average latency

What is wrong with measuring latency at 99.99 percentile with a clear guideline that optimising efficiency ( in this article higher utilisation ) should not have trade off on latency?

Because latency is part of user experience. And UX comes first before anything else.

Or does it imply that there are lot of people who dont know the trade off between latency and utilisation? Because I dont know anyone who has utilisation to 1 or even 0.5 in production.

mjb4y ago

The other two answers you got are good. I will say that monitoring p99 (or 99.9 or whatever) is a good thing, especially if you're building human-interactive stuff. Here's my colleague Andrew Certain talking about how Amazon came to that conclusion: https://youtu.be/sKRdemSirDM?t=180

But p99 is just one summary statistic. Most importantly, it's a robust statistic that rejects outliers. That's a very good thing in some cases. It's also a very bad thing if you care about throughput, because throughput is proportional to 1/latency, and if you reject the outliers then you'll overestimate throughput substantially.

p99 is one tool. A great and useful one, but not for every purpose.

> Because I dont know anyone who has utilisation to 1 or even 0.5 in production.

Many real systems like to run much hotter than that. High utilization reduces costs, and reduces carbon footprint. Just running at low utilization is a reasonable solution for a lot of people in a lot of cases, but as margins get tighter and businesses get bigger, pushing on utilization can be really worthwhile.

bostik4y ago

In my previous job me and latency-sensitive engineering teams in general mostly went with just four core latency measurements.[ß]

    - p50, to see the baseline
    - p95, to see the most common latency peaks
    - p99, to see what the "normal" waiting times under load were
    - max, because that's what the most unfortunate customers experienced

In a normal distributed system the spread between p99 and max can be enormous, but the mental mindset of ensuring smooth customer experience, with awareness that a real person had to wait that long, is exceptionally useful. You need just one slightly slower service for the worst-case latency to skyrocket. In particular, GraphQL is exceptionally bad at this without real discipline - the minimum request latency is dictated by the SLOWEST downstream service.

To be fair, it was a real time gambling operation. And we were operating within the first Nielsen threshold.

ß: bucketing these by request route was quite useful.

EDIT: formatting

srg04y ago

Percentiles are order statistics, they are robust and not sensitive to outliers. This is why sometimes they are very useful. And this is why they do not capture how big the remaining 0.01% of the data are.

Let's take a median, which is also an order statistics. And a sequence of latency measurements: 0.005 s, 0.010 s, 3600 s. Median latency is 0.010 s, and this number does not tell how bad latency can actually be. Mean latency is 1200.05 s, which is more indicative how bad the worst case is.

In other words, percentiles show how often a problem happens (does not happen). Mean values show the impact of the problem.

dastbe4y ago

every statistic is a summary and a lie. p50/p99 metrics are good in the sense that they tell you a number someone actually experienced and they put an upper bound on that experience. they are bad because they won’t tell you is how the experience below that bound looks.

mean and all it’s variants won’t show you a number that someone in your system necessarily experienced, but it will incorporate the entire distribution and show when it has changed.

in the context of efficiency, mean is beneficial because it can be used to measure concurrency in a system via littles law, and will signal changes in your concurrency that a percentile metric won’t necessarily do.

azepoi4y ago

"How not to measure latency" video by Gil Tene

https://youtu.be/lJ8ydIuPFeU

dvh4y ago

Grace Hopper explaining 1 nanosecond: https://youtu.be/9eyFDBPk4Yw

the_sleaze94y ago

Seems like a trivially simple article, and I remain unconvinced of the conclusion. I think this is a confident beginner giving holistic, overly prescriptive advice. That is to say: feel free to skip and ignore.

In my experience, if you want monitoring (or measuring for performance) to provide any value what so ever, you must measure multiple different aspects of the system all at once. Percentiles, averages, load, responses, i/o, memory, etc etc.

The only time you would need a single metric would possibly be for alerting, and a good alert (IMHO) is one that triggers for impending doom, which the article states percentiles are good for. But I think alerts are outside of the scope of this article.

TLDR; Review of the article: `Duh`

MatteoFrigo4y ago

You characterization of Marc Brooker as a "confident beginner" is incorrect. The guy is a senior principal engineer at AWS, was the leader of EBS when I interacted with him, and has built more systems than I care to mention. The phenomenon he is describing is totally real. Of course the article is a simplification that attempts to isolate the essence of a terrifyingly complex problem.

LambdaComplex4y ago

And it even says as much in the sidebar on that page:

> I'm currently an engineer at Amazon Web Services (AWS) in Seattle, where I lead engineering on AWS Lambda and our other serverless products. Before that, I worked on EC2 and EBS.

christogreeff4y ago

https://brooker.co.za/blog/publications.html

j / k navigate · click thread line to collapse

25 comments

MatteoFrigo4y ago

Great article, and a line of reasoning that ought to be more widely known. There is a similar tradeoff between latency and utilization in hash tables, for essentially the same reason.

dharmab4y ago

In my org, we define latency targets early (based on our user's needs where possible) and then our goal is to maximize utilization within those constraints.

wpietri4y ago

Mostly agreed, and I think the point about efficiency working against latency is both important and widely ignored. And not just in software, but software process.

azundo4y ago

This principle applies as much to the work we schedule for ourselves (or our teams) as it does to our servers.

mjb4y ago

Yes, for sure.

jrochkind14y ago

Mostly a reminder/clarification of things I knew, but a good and welcome one well-stated, because I probably sometimes forget. (I don't do performance work a lot).

But this:

> If you must use latency to measure efficiency, use mean (avg) latency. Yes, average latency

[1]: https://brooker.co.za/blog/2017/12/28/mean.html

marcosdumay4y ago

> at least in addition to median

jrochkind14y ago

Median for a particular action/page might be more useful.

marcosdumay4y ago

No doubt about that (even then, you will probably want the 90 or 99 percentile, depending on how many interactions you expect a person to have).

Th real median is just very hard to measure, and an easier 99.99 (with more 9's as needed) rank is almost as good.

1 more reply

zekrioca4y ago

> If we're expecting 10 requests per second at peak this holiday season, we're good.

Problem is, sometimes system engineers do not know what to expect, but they still need to have a plan for this case.

1 more reply

dwohnitmok4y ago

I think the article is missing one big reason why we care about 99.99% or 99.9% latency metrics and that is that we can have high latency spikes even with low utilization.

MatteoFrigo4y ago

shitlord4y ago

If you're interested, there's a whole branch of mathematics that models these sorts of phenomena: https://en.wikipedia.org/wiki/Queueing_theory

ksec4y ago

OK. I am stupid. I dont understand the article.

>> If you must use latency to measure efficiency, use mean (avg) latency. Yes, average latency

What is wrong with measuring latency at 99.99 percentile with a clear guideline that optimising efficiency ( in this article higher utilisation ) should not have trade off on latency?

Because latency is part of user experience. And UX comes first before anything else.

Or does it imply that there are lot of people who dont know the trade off between latency and utilisation? Because I dont know anyone who has utilisation to 1 or even 0.5 in production.

mjb4y ago

p99 is one tool. A great and useful one, but not for every purpose.

> Because I dont know anyone who has utilisation to 1 or even 0.5 in production.

bostik4y ago

In my previous job me and latency-sensitive engineering teams in general mostly went with just four core latency measurements.[ß]

    - p50, to see the baseline
    - p95, to see the most common latency peaks
    - p99, to see what the "normal" waiting times under load were
    - max, because that's what the most unfortunate customers experienced

To be fair, it was a real time gambling operation. And we were operating within the first Nielsen threshold.

ß: bucketing these by request route was quite useful.

EDIT: formatting

srg04y ago

In other words, percentiles show how often a problem happens (does not happen). Mean values show the impact of the problem.

dastbe4y ago

mean and all it’s variants won’t show you a number that someone in your system necessarily experienced, but it will incorporate the entire distribution and show when it has changed.

azepoi4y ago

"How not to measure latency" video by Gil Tene

https://youtu.be/lJ8ydIuPFeU

dvh4y ago

Grace Hopper explaining 1 nanosecond: https://youtu.be/9eyFDBPk4Yw

the_sleaze94y ago

TLDR; Review of the article: `Duh`

MatteoFrigo4y ago

LambdaComplex4y ago

And it even says as much in the sidebar on that page:

> I'm currently an engineer at Amazon Web Services (AWS) in Seattle, where I lead engineering on AWS Lambda and our other serverless products. Before that, I worked on EC2 and EBS.

christogreeff4y ago

https://brooker.co.za/blog/publications.html

j / k navigate · click thread line to collapse