The Cost of Indirection in Rust (opens in new tab)

(blog.sebastiansastre.co)

101 pointssebastianconcpt16d ago56 comments

56 comments

> Maintainability and understandability only show up when you’re deliberate about them. Extracting meaning into well-named functions is how you practice that. Code aesthetics are a feature and they affect team and agentic coding performance, just not the kind you measure in the runtime.

> And be warned: some will resist this and surrender to the convenience of their current mental context, betting they’ll “remember” how they did it. Time will make that bet age badly. It’s 2026 — other AI agents are already in execution loops, disciplined to code better than that.”

Hard disagree: separating code from its context is exactly how you end up in the situation of needing to “remember”. Yes, helper functions and such can be useful for readability, but it's easy to overdo it and end up with incomprehensible ravioli code that does nothing terribly complicated in a terribly complicated manner.

tabwidth13d ago

The worst version of this I've seen is when every layer is like four lines long. You step into a function expecting some logic and it's just calling another function with slightly different args. Do that six times and you forgot what the original call was even trying to do. Naming helps in theory but in practice half those intermediate functions end up with names like processInner or handleCore because there's nothing meaningful to call them.

api13d ago

Any pattern executed robotically like this becomes a self parody.

structural12d ago

One heuristic I use to avoid this exact problem is "minimize the number of places that the next poor soul has to look in order to understand how this code works", where place is loosely defined as about the number of lines of code that fit on a screen or two.

This has given really good results in terms of helping decide whether to extract these helper functions or not - they have to both be memorable enough in name and arguments that the code calling them can understand what's going on without always having to dive in, and also provide a meaningful compression of the logic above so that it can be comprehended without having to jump across many hundreds of lines.

syockit13d ago

It'd be great if IDEs can store a stack of functions currently being explored similar to what you get when debugging. Not breadcrumbs, but plain stack. Bonus points if you can store multiple stacks, and give them names according to the context.

iterance13d ago

Ah. I once worked in a team with a hard cyclomatic complexity cap of 4 per function. Logic exceeding the cap needed to be broken into helper functions. Many, many functions were created to hold exactly one if statement each. Well, the code was relatively high quality for other reasons, but I can't say this policy contributed much.

chowells13d ago

I think I agree with what you're getting at, though I usually phrase it differently: indirection is not abstraction. A good abstraction makes it easier to understand what the code is doing by letting you focus on the important details and ignore the noise. It does this by giving you tools that match your problem space, whatever it may be. This will necessarily involve some amount of indirection when you switch semantic levels, but that's very different from constantly being told "look over there" when you're trying to figure out what the code is saying.

scottlamb13d ago

Agree, and I would add that a bad abstraction, the wrong abstraction for the problem, and/or an abstraction misused is far worse than no abstraction. That was bugging me in another thread earlier today: <https://news.ycombinator.com/item?id=47350533>

OtomotO13d ago

Someone has worked too much on corporate Java Codebases.

I feel your pain. Everything is so convoluted that 7 layers down you ask yourself why you didn't learn anything useful...

cwillu13d ago

Last time was a go shop, and let me tell you: that style mixes with go's error handling like spoiled milk and blended shit.

Oh gee, thank you for this wrapped error result, let me try to solve a logic puzzle to see (a) where the hell it actually came from, and (b) how the hell we got there.

pjmlp13d ago

Corporate has the magic touch to do that to any programming language.

bch13d ago

New pasta paradigm unlocked. Sounds like a case of premature optimization, leaning too hard on DRY.

Am reminded also of a discussion of software engineering between John Ousterhout (of whom I'm a big fan) and Robert Martin[0][1].

[0] https://github.com/johnousterhout/aposd-vs-clean-code

[1] https://youtu.be/3Vlk6hCWBw0

cwillu13d ago

I believe ravioli predates lasagna as a code pasta.

schubart13d ago

I’m familiar with spaghetti code and with lasagna code (too many layers) but I’m curious: what’s ravioli code?

p1necone13d ago

Each part of the codebase is a separate self contained module with its own wrapping (boilerplate), except there's like 30 of them and you still have to understand everything as a whole to understand the behaviour of the system anyway.

tartoran13d ago

Think of what ravioli are and apply that to the same code analogy as spagetti or lassagna. The code is split in tiny units and that creates too much indirection, a different indirection than spagetti or ravioli. The architecture feels fragmented even though there's nothing wrong with each piece.

paulddraper13d ago

It's "spaghetti" code, but with encapsulation. [1]

Lots and lots of little components, but not in a way that actually makes anything easier to actually find.

[1] https://wiki.c2.com/?RavioliCode

fsckboy13d ago

a ravioli is a b̶l̶a̶c̶k̶ beige box abstraction to which you pasta arguments interface usually after forking

1 more reply

bombela13d ago

I think this long post is saying that if you are afraid that moving code behind a function call will slow it down, you can look at the machine code and run a benchmark to convince yourself that it is fine?

layer813d ago

I think it’s making a case that normally you shouldn’t even bother benchmarking it, unless you know that it’s in a critical hot path.

dblohm712d ago

Agreed. Nitpicking about indirection is definitely a "premature micro-optimization is the root of all evil" moment.

When I worked on Firefox, we eventually had to remove a bunch of indirection (the interested can actually search bugzilla.mozilla.org for deCOMtamination for some instances of this), but that project wasn't a thing until there was clear evidence that there were problems with virtual function calls on hot paths.

eptcyka13d ago

I must add that code is on the hot path only under two conditions:

- the application is profiled well enough to prove that some piece of code is on the hot path

- the developers are not doing a great job

antonvs13d ago

This long post is demonstrating that Knuth’s advice, “premature optimization is the root of all evil,” is still one of the first heuristics you should apply.

The article describes a couple of straw men and even claims that they’re right in principle:

> Then someone on the team raises an eyebrow. “Isn’t that an extra function call? Indirection has a cost.” Another member quickly nods.

> They’re not wrong in principle.

But they are wrong in principle. There’s no excuse for this sort of misinformation. Anyone perpetuating it, including the blog author, clearly has no computer science education and shouldn’t be listened to, and should probably be sent to a reeducation camp somewhere to learn the basics of their profession.

Perhaps they don’t understand what a compiler does, I don’t know, but whatever it is, they need to be broken down and rebuilt from the ground up.

ekidd13d ago

We have been able to automatically inline functions for a few decades now. You can even override inlining decisions manually, though that's usually a bad idea unless you're carefully profiling.

Also, it's pointer indirection in data structures that kills you, because uncached memory is brutally slow. Function calls to functions in the cache are normally a much smaller concern except for tiny functions in very hot loops.

scottlamb13d ago

I'm not sure Rust's `async fn` desugaring (which involves a data structure for the state machine) is inlineable. (To be precise: maybe the desugared function can be inlined, but LLVM isn't allowed to change the data structure, so there may be extra setup costs, duplicate `Waker`s, etc.) It's probably true that there is a performance cost. But I agree with the article's point that it's generally insignificant.

For non-async fns, the article already made this point:

> In release mode, with optimizations enabled, the compiler will often inline small extracted functions automatically. The two versions — inline and extracted — can produce identical assembly.

ekidd13d ago

I am fairly doubtful that it makes sense to be using async function calls (or waits) inside of a hot loop in Rust. Pretty much anything you'd do with async in Rust is too expensive to be done in a genuinely hot loop where function call overhead would actually matter.

hutao13d ago

One of the unwritten takeaways of this post is that async/await is a leaky abstraction. It's supposed to allow you to write non-blocking I/O as if it were blocking I/O, and make asynchronous code resemble synchronous code. However, the cost model is different because async/await compiles down to a state machine instead of a simple call and return. The programmer needs to understand this implementation detail instead of pretending that async functions work the same way as sync functions. According to Joel Sposky, all non-trivial abstractions are leaky, and async/await is no different. [0]

The article mixes together two distinct points in a rather muddled way. The first is a standard "premature optimization is the root of all evil" message, reminding us to profile the code before optimizing. The second is a reminder that async functions compile down to a state machine, so the optimization reasoning for sync functions don't apply.

[0] https://www.joelonsoftware.com/2002/11/11/the-law-of-leaky-a...

fpoling13d ago

One non-trivial problem with async in Rust is that it leads to code that allocates on one CPU and free memory on another. That kills a lot of optimizations that system allocators try to do with CPU local caching and harms performance badly especially on fat servers with a lot of CPUs. When one hits this problem, there is no easy solution.

Ideally using an allocator per request would solve this issue, but Rust has no real support for it.

A workaround that works is to stop using async and just use a native thread per request. But then most crates and frameworks these days use async. So indeed async abstraction us very leaky regarding the cost.

dehrmann13d ago

> The programmer needs to understand this implementation detail instead of pretending that async functions work the same way as sync functions.

Async is telling the OS "I'll do it myself" to threading and context switches.

throwaway17_1713d ago

I agree that from the perspective of the implementation of async code, it is in many ways an application doing its own threading and context switching. However, your Parent comment is written from the perspective of the dev writing and reasoning about the code. In that case, from the devs perspective, async is there to make concurrent code ‘look like’ (since it certainly is not actually) sequential code.

I think this type of confusion (or more likely people talking past one another in most cases) is a fairly common problem in discussing programming languages and specific implementations of concepts in a language. In this case the perceived purpose of an abstraction based on a particular “view point”, leads to awkward discussions about those abstractions, their usefulness, and their semantics. I don’t know if there is way to fix these sorts of things (even when someone is just reading a comment thread), but maybe pointing it out can serve to highlight when it happens.

stevenhuang13d ago

Yeah the author makes a really poor example with the async case here.

Async in rust is done via cooperative scheduling. If you call await you enter a potential suspension point. You're willingly telling the scheduler you're done running and giving another task a chance to run. Compound that with something like tokio's work stealing and now you'll possibly have your task migrated to run on a different thread.

If this is in hot path making another call to await is probably the worst thing you can do lol.

The author demonstrates later with a dead simple inlining example that the asm is equivalent. Wonder why he didn't try that with await ;)

Sytten13d ago

Also to note that the inline directive is optional and the compiler can decide to ignore it (even if you put always if I remember)

cat-whisperer14d ago

I wouldn't have agreed with you a year ago. async traits that were built with boxes had real implications on the memory. But, by design the async abstraction that rust provides is pretty good!

stevenhuang13d ago

The author is right about inlining but has picked the wrong example to show this since the compiler cannot inline across await.

If this function is in the hot path the last thing you'll want to do is to needlessly call await. You'll enter a suspension point and your task can get migrated to another thread. It is in no way comparable to the dead simple inlining example given later.

This is why you should always benchmark before making guesses, and to double check you're even benchmarking the right thing. In this case they used the findings from a nonasync benchmark and applied it to async. This will lead you to a very wrong conclusion, and performance issues.

wowczarek13d ago

Regardless of the language, optimisation of this kind has always been a trap for me when moving back and forth between old or otherwise small, embedded systems and modern hardware and toolchains. When we were learning C, compilers weren't as smart as they are today, and every little bit helped - old habits die hard. The lesson is simple - just see what the compiler does with your code first. But also, weigh the real performance pinch points vs. readability and convenience, and as much as it's tempting, don't optimise prematurely - of course I always do; its fun.

Scubabear6813d ago

A function call is not necessarily an indirection. Basic premise of the blog is wrong on its face.

paulddraper13d ago

1. "Indirection" can be logical, or runtime.

2. Please read the blog. That's literally what is said.

hrmtst9383713d ago

People new to Rust sometimes assume every abstraction is free but that's just not the case, especially with lifetimes and dynamic dispatch. Even a small function call can hide allocations or vtable lookups that add up quickly if you're not watching closely.

simonask13d ago

Why do you mention lifetimes here? They are exclusively a compile-time pointer annotation, they have no runtime behavior, thus no overhead.

Dynamic dispatch in general is much, much faster than many people’s intuition seems to indicate. Your function doesn’t have to be going much at all for the difference to become irrelevant. Where it matters is for inlining.

Dynamic dispatch in Rust is expected to be very slightly faster than in C++ (due to one fewer indirections, because Rust uses fat pointers instead of an object prefix).

alilleybrinker13d ago

Did you read the article? The author makes exactly that point.

1 more reply

thezipcreator13d ago

seems pointless to extract `handle_suspend` here. There are very few reasons to extract code that isn't duplicated in more than one place; it's probably harder to read to extract the handling of the event than to handle it inline.

skrtskrt13d ago

I strongly prefer this sort of code:

```

    fn does_a_many_step_process():

         first_step_result_which_is_not_tied_to_details_internal_to_the_step_implementation = well_named_first_step_which_encapsulates_concerns();



        second_step_result_in_same_manner = well_named_second_step_which_encapsulates_concerns();

  ...etc

} ```

The logic of process flow is essentially one kind of information. All the implementation details are another. Step functions should not hide further important steps - they should only hide hairy implementation details that other steps don't need to know about.

kstrauser13d ago

One huge one is so that you can test it in isolation.

scuff3d13d ago

There's extraction for reuse and then theres extraction for readability/maintainability. The second largely comes down to personal taste. I personally tend to lose the signal in the noise, so it's easy for me to follow the logic if some of the larger bits are pushed into appropriately named functions. Goes to the whole self commenting code thing. I know there's a chunk of code behind that function call, I know it does some work based on its name and args, but I don't have to worry about it in the moment. There's a limit of course, moving a couple lines of code out without good cause is infuriating.

Other people prefer to have big blocks of code together in one place, and that's fine too. It just personally makes it harder for me to track stuff.

thezipcreator13d ago

fair enough, I suppose

foo4u13d ago

I love how this post, almost to a fault, just jumps right in. No BS set up. Not even context set up. Just what you expected after reading the title. That's an art.

As for the context of the article, maintainability is almost always worth the cost of the function lookup. The proof here that the cost is almost non-existent means to me the maintainability is always worth the perceived (few cycles) impact unless this is real-time code.

thesnide12d ago

in a nutshell: clear defined helper functions are much better for comprehension than bigger functions. and they usually cost nothing ar runtime sine the compiler inlines them anyway.

But the real cost is that having a myriad of them is usually very difficult to get the right cut. not too small not too big and having a clear intend of what it exactly does.

so nothing new. API design is hard. naming thing even more so.

armchairhacker13d ago

A nitpick I have with this specific example: would `handle_suspend` be called by any other code? If not, does it really improve readability and maintainability to extract it?

rudolph913d ago

The idea is that performance isn’t a reason not to do it. Other considerations may cause you to choose inline, but performance shouldn’t be one of them.

elzbardico13d ago

re-use as a criteria for functional decomposability is a very misguided notion

slopinthebag13d ago

Cool article but I got turned off by the obvious AI-isms which, because of my limited experience with Rust, has me wondering how true any of the article actually is.

paulddraper13d ago

Doesn't seem like AI.

This is an incomplete sentence:

> All cases where you are in a CPU intensive blocking task that, if you’re not careful, could starve all the others.

ramon15613d ago

I don't see anything wrong code-wise, but it's definitely an odd way of making an accumulator. Maybe I'm pedantic

j / k navigate · click thread line to collapse

56 comments

cwillu13d ago

tabwidth13d ago

api13d ago

Any pattern executed robotically like this becomes a self parody.

structural12d ago

syockit13d ago

iterance13d ago

chowells13d ago

scottlamb13d ago

OtomotO13d ago

Someone has worked too much on corporate Java Codebases.

I feel your pain. Everything is so convoluted that 7 layers down you ask yourself why you didn't learn anything useful...

cwillu13d ago

Last time was a go shop, and let me tell you: that style mixes with go's error handling like spoiled milk and blended shit.

Oh gee, thank you for this wrapped error result, let me try to solve a logic puzzle to see (a) where the hell it actually came from, and (b) how the hell we got there.

pjmlp13d ago

Corporate has the magic touch to do that to any programming language.

bch13d ago

New pasta paradigm unlocked. Sounds like a case of premature optimization, leaning too hard on DRY.

Am reminded also of a discussion of software engineering between John Ousterhout (of whom I'm a big fan) and Robert Martin[0][1].

[0] https://github.com/johnousterhout/aposd-vs-clean-code

[1] https://youtu.be/3Vlk6hCWBw0

cwillu13d ago

I believe ravioli predates lasagna as a code pasta.

schubart13d ago

I’m familiar with spaghetti code and with lasagna code (too many layers) but I’m curious: what’s ravioli code?

p1necone13d ago

tartoran13d ago

paulddraper13d ago

It's "spaghetti" code, but with encapsulation. [1]

Lots and lots of little components, but not in a way that actually makes anything easier to actually find.

[1] https://wiki.c2.com/?RavioliCode

fsckboy13d ago

a ravioli is a b̶l̶a̶c̶k̶ beige box abstraction to which you pasta arguments interface usually after forking

1 more reply

bombela13d ago

layer813d ago

I think it’s making a case that normally you shouldn’t even bother benchmarking it, unless you know that it’s in a critical hot path.

dblohm712d ago

Agreed. Nitpicking about indirection is definitely a "premature micro-optimization is the root of all evil" moment.

eptcyka13d ago

I must add that code is on the hot path only under two conditions:

- the application is profiled well enough to prove that some piece of code is on the hot path

- the developers are not doing a great job

antonvs13d ago

This long post is demonstrating that Knuth’s advice, “premature optimization is the root of all evil,” is still one of the first heuristics you should apply.

The article describes a couple of straw men and even claims that they’re right in principle:

> Then someone on the team raises an eyebrow. “Isn’t that an extra function call? Indirection has a cost.” Another member quickly nods.

> They’re not wrong in principle.

Perhaps they don’t understand what a compiler does, I don’t know, but whatever it is, they need to be broken down and rebuilt from the ground up.

ekidd13d ago

We have been able to automatically inline functions for a few decades now. You can even override inlining decisions manually, though that's usually a bad idea unless you're carefully profiling.

scottlamb13d ago

For non-async fns, the article already made this point:

> In release mode, with optimizations enabled, the compiler will often inline small extracted functions automatically. The two versions — inline and extracted — can produce identical assembly.

ekidd13d ago

hutao13d ago

[0] https://www.joelonsoftware.com/2002/11/11/the-law-of-leaky-a...

fpoling13d ago

Ideally using an allocator per request would solve this issue, but Rust has no real support for it.

dehrmann13d ago

> The programmer needs to understand this implementation detail instead of pretending that async functions work the same way as sync functions.

Async is telling the OS "I'll do it myself" to threading and context switches.

throwaway17_1713d ago

stevenhuang13d ago

Yeah the author makes a really poor example with the async case here.

If this is in hot path making another call to await is probably the worst thing you can do lol.

The author demonstrates later with a dead simple inlining example that the asm is equivalent. Wonder why he didn't try that with await ;)

Sytten13d ago

Also to note that the inline directive is optional and the compiler can decide to ignore it (even if you put always if I remember)

cat-whisperer14d ago

I wouldn't have agreed with you a year ago. async traits that were built with boxes had real implications on the memory. But, by design the async abstraction that rust provides is pretty good!

stevenhuang13d ago

The author is right about inlining but has picked the wrong example to show this since the compiler cannot inline across await.

wowczarek13d ago

Scubabear6813d ago

A function call is not necessarily an indirection. Basic premise of the blog is wrong on its face.

paulddraper13d ago

1. "Indirection" can be logical, or runtime.

2. Please read the blog. That's literally what is said.

hrmtst9383713d ago

simonask13d ago

Why do you mention lifetimes here? They are exclusively a compile-time pointer annotation, they have no runtime behavior, thus no overhead.

Dynamic dispatch in Rust is expected to be very slightly faster than in C++ (due to one fewer indirections, because Rust uses fat pointers instead of an object prefix).

alilleybrinker13d ago

Did you read the article? The author makes exactly that point.

1 more reply

thezipcreator13d ago

skrtskrt13d ago

I strongly prefer this sort of code:

```

    fn does_a_many_step_process():

         first_step_result_which_is_not_tied_to_details_internal_to_the_step_implementation = well_named_first_step_which_encapsulates_concerns();



        second_step_result_in_same_manner = well_named_second_step_which_encapsulates_concerns();

  ...etc

} ```

kstrauser13d ago

One huge one is so that you can test it in isolation.

scuff3d13d ago

Other people prefer to have big blocks of code together in one place, and that's fine too. It just personally makes it harder for me to track stuff.

thezipcreator13d ago

fair enough, I suppose

foo4u13d ago

I love how this post, almost to a fault, just jumps right in. No BS set up. Not even context set up. Just what you expected after reading the title. That's an art.

thesnide12d ago

in a nutshell: clear defined helper functions are much better for comprehension than bigger functions. and they usually cost nothing ar runtime sine the compiler inlines them anyway.

But the real cost is that having a myriad of them is usually very difficult to get the right cut. not too small not too big and having a clear intend of what it exactly does.

so nothing new. API design is hard. naming thing even more so.

armchairhacker13d ago

A nitpick I have with this specific example: would `handle_suspend` be called by any other code? If not, does it really improve readability and maintainability to extract it?

rudolph913d ago

The idea is that performance isn’t a reason not to do it. Other considerations may cause you to choose inline, but performance shouldn’t be one of them.

elzbardico13d ago

re-use as a criteria for functional decomposability is a very misguided notion

slopinthebag13d ago

Cool article but I got turned off by the obvious AI-isms which, because of my limited experience with Rust, has me wondering how true any of the article actually is.

paulddraper13d ago

Doesn't seem like AI.

This is an incomplete sentence:

> All cases where you are in a CPU intensive blocking task that, if you’re not careful, could starve all the others.

ramon15613d ago

I don't see anything wrong code-wise, but it's definitely an odd way of making an accumulator. Maybe I'm pedantic

j / k navigate · click thread line to collapse