How much does Rust's bounds checking cost? (opens in new tab)

(blog.readyset.io)

204 pointsglittershark3y ago186 comments

186 comments

Jach3y ago

Always amuses me that it's current year and people think about turning off checks, even when they're pretty much free in modern* (since 1993 Pentium, which got like 80% accuracy with its primitive branch prediction?) CPUs...

"Around Easter 1961, a course on ALGOL 60 was offered … After the ALGOL course in Brighton, Roger Cook was driving me and my colleagues back to London when he suddenly asked, "Instead of designing a new language, why don't we just implement ALGOL60?" We all instantly agreed--in retrospect, a very lucky decision for me. But we knew we did not have the skill or experience at that time to implement the whole language, so I was commissioned to design a modest subset. In that design I adopted certain basic principles which I believe to be as valid today as they were then.

"(1) The first principle was security: The principle that every syntactically incorrect program should be rejected by the compiler and that every syntactically correct program should give a result or an error message that was predictable and comprehensible in terms of the source language program itself. Thus no core dumps should ever be necessary. It was logically impossible for any source language program to cause the computer to run wild, either at compile time or at run time. A consequence of this principle is that every occurrence of every subscript of every subscripted variable was on every occasion checked at run time against both the upper and the lower declared bounds of the array. Many years later we asked our customers whether they wished us to provide an option to switch off these checks in the interests of efficiency on production runs. Unanimously, they urged us not to -- they already knew how frequently subscript errors occur on production runs where failure to detect them could be disastrous. I note with fear and horror that even in 1980, language designers and users have not learned this lesson. In any respectable branch of engineering, failure to observe such elementary precautions would have long been against the law."

-Tony Hoare, 1980 Turing Award Lecture (https://www.cs.fsu.edu/~engelen/courses/COP4610/hoare.pdf)

titzer3y ago

> I note with fear and horror that even in 1980, language designers and users have not learned this lesson. In any respectable branch of engineering, failure to observe such elementary precautions would have long been against the law.

Here we are, 42 years later, and bounds checks are still not the default in some languages. Because performance, or something. And our computers are literally 1000x as fast as they were in 1980. So instead of paying 2% in bounds checks and getting a merge 980x faster, we get 2-3x more CVEs, costing the economy billions upon billions of dollars a year.

nine_k3y ago

Removing bounds checks is a stark example of a premature optimization.

You can remove bounds checks when you can prove that the index won't ever get out of bounds; this is possible in many cases, such as iteration with known bounds.

femto3y ago

Isn't this a job for the compiler? The default would be to have boundary checking, but if the compiler can prove that the index is always in range it can drop the boundary check. From the user's perspective, the boundary check is always there. Most vector operations would have provable boundaries.

Edit:

Based on the benchmark code linked to in dahfizz 's comment, it seems that Rust does the above for vectors, but there are situations where the bounds can't be proved, so boundary checking can't be removed. How common is this case in practice?

1 more reply

MaxBarraclough3y ago

This is the approach used by SPARK Ada. [0]

The norms there, from what I gather, are that you compile with runtime checks enabled unless you've used the SPARK prover tools to verify the absence of runtime errors, in which case you can safely disable runtime checks in your builds.

[0] https://docs.adacore.com/spark2014-docs/html/ug/en/usage_sce...

dataangel3y ago

They’re nowhere near free. Branch prediction table has finite entries, instruction cache has finite size, autovectorizing is broken by bounds checks, inlining (the most important optimization) doesn’t trigger if functions are too big because of the added bounds checking code, etc. This is just not great benchmarking — no effort to control for noise.

dahfizz3y ago

> autovectorizing is broken by bounds checks

This is the big one. You pay a 50% penalty for actual CPU bound, iteration heavy code with bounds checking enabled.

https://github.com/matklad/bounds-check-cost

zozbot2343y ago

The proper way of addressing that is to manually hoist bound checks out of "hot" loops. Not just remove them altogether.

camkego3y ago

This should be the article.

Running this with 1.65 on an Intel 12400 gets a nearly 4x speedup when bounds checking is not needed. Just wow.

Bounds checking avoidance is important when it becomes a significant chunk of your hot-path.

moloch-hai3y ago

For real programs, you should demand that the compiler hoist such checks out of the loop, which may then be vectorized the usual way.

If the compiler can't do that by itself, a library should do it.

The real issue is whether the information about the true size of the memory region involved is available at the point where it is needed. This may come down to how good the language is at capturing desired semantics in a library. Rust still has a long way to go to catch up with C++ on this axis, and C++ is not waiting around.

Rust claims responsibility for enforcing safety in the compiler, with libraries using "unsafe" to delegate some of that to themselves. Users then trust the compiler and libraries to get it right. In C++, the compiler provides base semantics while libraries take up the whole responsibility for safety. Users can trust libraries similarly as in Rust, to similar effect.

Modern C++ code typically does no visible operations with pointers at all, and most often does not index directly in arrays, preferring range notation, as in Rust, achieving correctness by construction. A correct program is implicitly a safe program.

varajelle3y ago

> This may come down to how good the language is at capturing desired semantics in a library. Rust still has a long way to go to catch up with C++ on this axis, and C++ is not waiting around.

What catch up does Rust need to do?

Rust has slice that know the size of its data built in the language, while C++ doesn't. And Rust has stricter const and mutability rules that facilitates optimizations.

As for the implementation, Rust use LLVM which is also the backend used by one of the popular C++ compiler.

1 more reply

nextaccountic3y ago

> For real programs, you should demand that the compiler hoist such checks out of the loop, which may then be vectorized the usual way.

LLVM sometimes does this, but when it doesn't, you may insert asserts to guide the optimizer, as explained here https://news.ycombinator.com/item?id=33808853

I think this technique works in C and C++ too (if you use clang or gcc)

1 more reply

pjmlp3y ago

Unfortunely I only see Modern C++ on C++ conference talks and on my hobby projects.

Most of the stuff I see at work, is quite far from this ideal reality, starting with Android's codebase, or the various ways C++ gets used in Microsoft frameworks.

1 more reply

kelnos3y ago

At the risk of moving the goalposts: so what? The vast majority of applications running out there would not be impacted meaningfully in the least by taking that performance hit.

Bounds checking should be the default, and then only when someone has proved through benchmarking and profiling that it's actually a problem for their application, should they even consider turning it off.

imtringued3y ago

Bounds checks are the easiest type of code to branch predict. You just assume they never trigger, suddenly you have a 99.99% hit rate on them. When they trigger you don't care about the branch misprediction at all because the program is already busted and security is more important.

TylerE3y ago

Yeah, I didn’t find it compelling either.

If your conclusion is “no signal, just noise” boost the input until the signal becomes apparent. If that means writing such a massive loop that the program takes an hour to run, fine.

fear913y ago

I have removed signed-integer based values bounds checking in a compiler once and before I noticed, I got a nice 3.8% performance gain in a large diverse benchmarking suite. While not expensive, bounds checks are certainly not free.

kelnos3y ago

And that's the thing, really. Sure, they're not free, but they're pretty cheap. In a project like OpenSSL, if C had bounds checking, it should be enabled, all the time. Sure, disable it for some random bit of custom high-perf software where you really need to eke out the last few percent of performance. But for 99% of everything else, leave the bounds checking turned on.

Kamq3y ago

> Sure, disable it for some random bit of custom high-perf software where you really need to eke out the last few percent of performance.

And that's the rub, isn't it? Getting that last bit of perf in the inner hot loop has traditionally often required people to write the entire thing in languages that are unsafe (ffi overhead often being enough that you can't just wrap up the hot loop and call the bit that needs to be performant from elsewhere).

I wonder how much good interop could get us.

netr0ute3y ago

I don't know, because if you're writing programs for things like IoT or embedded, then you're dealing with nothing but wimpy little low power processors where removing a bounds check gives you huge increases. Then, it makes no sense to have checks by default if you're going to get rid of them anyway.

2 more replies

userbinator3y ago

If you want to experience what using a "safe" language looks like, complete with bounds checking amongst other things, there's the JavaScript ecosystem.

...which has evolved its own, much worse, horrors instead.

Before that, there was Java.

Do not want.

jackmott423y ago

Occasionally small changes like this will result in bigger than expected performance improvements. An example of this happened once with C#, when two very tiny changes, each of which were borderline measurable, combined they made a big difference.

IIRC it was in the List.Add method, a very commonly used function in the C# core libs. First one programmer refactored it to very slightly reduce how many instructions were output when compiled. Then a second programmer working on the jit compiler optimizations which also affected this Add method making it a little smaller as well.

Alone, each change was hard to even measure, but seemed like they should be a net win at least in theory. Combined, the two changes made the Add method small enough to be an in-lining candidate! Which meant in real programs sometimes very measurable performance improvements result.

As others in this post have noted, a removed bounds check might also unblock vectorization optimizations in a few cases. One might be able to construct a test case where removing the check speeds thing up by a factor of 16!

brundolf3y ago

One thing that I assume reduces this problem even further is the prevalent use of iterators in Rust. I almost never index an array or vector directly, which means it's impossible for me to use an out of bounds index, and I'd be really surprised if rustc and/or LLVM don't somehow take advantage of that fact (maybe just through unchecked indexing in the standard library's iterator functions)

oconnor6633y ago

I think LLVM is smart enough to optimize regular `for i in 0..stuff.len()` loops into the same assembly as `for i in &stuff` loops in almost all cases. I imagine this sort of "we can tell that i is always less than len" optimization is a big contributor to the low cost of bounds checks. In some large portion of cases where they're not needed, the optimizer can already see that.

ridiculous_fish3y ago

IME loops with induction variables (integer indexes) often produces better codegen than with iterators. Compare the these two Rust functions for inverting bits: https://rust.godbolt.org/z/cE4vPdbdY

This got improved in Rust 1.65 just this month, but the point stands.

edit: ARM64 compilation is even sillier. https://rust.godbolt.org/z/PEsbeGxWP

akshaykarthik3y ago

Interestingly enough, with `-C opt-level=3` both functions yield the same assembly.

I wonder if there's some pass that's missing or not done at the lower opt level.

1 more reply

brundolf3y ago

Yeah, I wondered about those too. Less straightforward than pure iterator usage but still plausible

tialaramex3y ago

Yes, for example the relatively modern (didn't exist two years ago) implementation of IntoIterator for arrays themselves, gives you an iterator which doesn't use bounds checks since it is going to give exactly each of the things in the array once and it knows exactly how many of them there are.

est313y ago

It will still implement that with a loop over the array where there is a (bounds) check for the integer when it's being incremented (the i < len in int i; for(i = 0; i < len; i++)). That's no different from a loop over a slice, both of which eliminate the bounds check in the body of the loop (the one in list[i]). Arrays do have an advantage however, compilers can see their size so if you have an array (or array reference) and index it with a constant, then that bounds check will be eliminated. Also, array references are cheaper than slices because slices always contain the length.

nindalf3y ago

> Arrays do have an advantage however, compilers can see their size

Wouldn’t vectors have the same advantage in Rust? If we’re iterating over a vector, it’s proveable at compile time that the length is not being modified during the iteration.

2 more replies

winrid3y ago

IIRC the Rust in Action book claimed iterators usually omit bounds checks, but if you manually index into the array then it's more likely the bounds check won't be optimized out.

Animats3y ago

"It seems like at least for this kind of large-scale, complex application, the cost of pervasive runtime bounds checking is negligible."

Right. The myth that bounds checking is expensive may have come from some terrible compilers in the early days. Berkeley Pascal was a notable example. Each bounds check was a subroutine call.

The common cases for bounds checks are:

- It's in an inner loop iterating over arrays. That's the case where the highest percentage of the time goes into the bounds check. It's also the case likely to be optimized out. This is the case people worry about.

- It's in code that doesn't do a lot of subscript operations. So it doesn't matter.

- It's in non-optimizable code that does a lot of subscript operations. That's unusual, but it does come up. An modern case might be Unreal Engine's Nanite meshes, which have lots of small offsets within the data stream. On the other hand, if you don't check that stuff, it's a great attack vector.

moloch-hai3y ago

Each check burns a branch prediction slot, even if it always goes the same way. That may eject a branch predictor whose prediction matters.

imtringued3y ago

Then it sounds like our branch predictors are shit if they can't deal with simple things like this.

saagarjha3y ago

This is exactly what they are designed to do and they do their job well, but they can't do it for free.

1 more reply

rfoo3y ago

A consistent 5 ms difference in micro-benchmarks is definitely not "measurement noise". Noise averages out way before accumulating to 5ms. There must be a reason and it mostly likely relates to the change. So you can confidently say that removing bounds checking (at least with how you did it) is a regression.

... that being said, I'd argue that the most beneficial memory-safety feature of Rust is about temporal things (i.e. prevents UAF etc) instead of spatial ones.

whatshisface3y ago

Well, there is both random and systemic error in any experiment, and if 5ms is small relative to anything you'd expect (or there is some other reason to discount it) then it might be related to a problem in the benchmarking setup that's too small to be worth resolving. Any test is good to within some level of accuracy and they don't always average out to infinitely good if you rerun them enough times.

joosters3y ago

The 5ms isn't the key number. It's 5ms extra over a 28ms baseline, that's about 18% difference. If your noise threshold is 18%, then I think you have to accept that the benchmark probably isn't any good for this stated task.

viraptor3y ago

https://github.com/bheisler/criterion.rs is good for tests like that. It will give you much more than a single number and handle things like outliers. This makes identifying noisy tests simpler.

1 more reply

spullara3y ago

A benchmarking harness without error bars?

dataangel3y ago

It could be noise in a benchmark that does IO, or has determinism problems.

rfoo3y ago

Here is the benchmark output from the post:

  news_app/ranges_and_joins/cached
     time:   [28.583 ms 29.001 ms 29.526 ms]
     thrpt:  [277.45 Kelem/s 282.48 Kelem/s 286.61 Kelem/s]

  news_app/ranges_and_joins/cached
     time:   [33.271 ms 33.836 ms 34.418 ms]
     thrpt:  [238.01 Kelem/s 242.11 Kelem/s 246.22 Kelem/s]

Given that 33.836/(1000/(242.11*1000)) ~= 8192, my understanding is the time reported here is how long it takes to do 8192 queries. Also it reports three metrics (should be min, median and max). All these means the benchmark harness did run the test for a lot of times and the 5 ms different is not random at all.

dathinab3y ago

Anything between nothing and one most likely correct branch predicted to _not_ jump "branch iff int/pointer > int/pointer".

This kind of bounds check are normally not ever violated (in well formed code) so branch prediction predicts them correctly nearly always.

It also is (normally) just jumping in the bad case, which means with a correct branch predictions thy can be really cheap.

And then cpu "magic" tends to be optimized for that kind of checks at they appear in a lot of languages (e.g. Java).

Then in many cases the compiler can eliminate the checks partially.

For example any many kinds of for-each element iterations the compiler can infer that the result of the conditionally loop continuation check implies the bounds check. Combine that with loop unrolling which can reduce the number of continuation checks and you might end up with even less.

Also bounds checks tend to be an emergency guard, so you tend to sometimes do checks yourself before indexing and the compiler can often use that to eliminate the bounds check.

And even if you ignore all optimizations it's (assuming in bounds) "just" at most one int/pointer cmp (cheap) followed by a conditional branch which doesn't branch (cheap).

ridiculous_fish3y ago

Branches add control flow which can inhibit other optimizations, such as vectorization. Compare the codegen of these two functions to double the first 64 elements in a u8 slice: https://rust.godbolt.org/z/hccWGv889

The unchecked version is fully unrolled and vectorized using multiple registers. The checked version must use a loop.

Part of what's going on here is that panics are "recoverable." If the out-of-bounds write occurs at index 61, this will panic, but the writes to lower indexes must have gone through. This means the panic cannot be hoisted out of the loop.

mastax3y ago

One technique is to add asserts before a block of code to hoist the checks out. The compiler is usually smart enough to know which conditions have already been checked. Here's a simple example: https://rust.godbolt.org/z/GPMcYd371

This can make a big difference if you can hoist bounds checks out of an inner loop. You get the performance without adding any unsafe {}.

sltkr3y ago

Funnily there is an off-by-one error in your example; if you fix it, the generated assembly is even more efficient.

mastax3y ago

Hah! There's one reason not to switch everything to `unsafe get_unchecked()`.

est313y ago

Yeah this is because the error message printed contains the location of the error as well as the attempted index. Thus, there are differences between the bounds failures and the optimizer can't hoist the check out (plus probably some concerns due to side effects of opaque functions).

mastax3y ago

I wonder if there could be a flag to tell the compiler that you don't care about getting the exact distinct panic message for each bounds check, please optimize it. I suppose the assert is a flag, in a way, but I mean something more global and automatic. Maybe the compiler could emit a single shared basic block per function that just says "out of bounds access in function foo".

We've learned to accept that when you turn on optimizations, you lose some lines and variables from your debug info. This is a pretty similar trade-off.

Arnavion3y ago

You can use a custom `#[panic_handler]` item that ignores its `PanicInfo` arg and just aborts. The optimizer should notice that it doesn't need to bother with unique messages. However currently this requires either being a no_std program or compiling libstd without its handler, since otherwise its handler will conflict with yours.

Although, if one is building their own libstd anyway, then I believe compiling with `--build-std-features panic_immediate_abort` should also have the same effect.

kibwen3y ago

Hm? The existence of panic messages doesn't preclude the optimizer from hoisting a bounds check out of the loop.

est313y ago

Actually that made me think... you might be right. I just saw this opinion in earlier threads and repeated it but upon second inspection I either remembered it wrongly, or it is a wrong theory.

I think the issue is more the side effects than the panic message. I have tried making a side effect free loop like for i in 0..44 { v[i]; } but it compiled down to a "if array length is larger than limit X, then call panic_bounds_check". On the other hand, if you replace the v[i] with soon-stable black_box(v[i]), you see that the loop remains. It doesn't know what black_box is doing so it has to run the code. The optimizer is in this case very happy if you have a check before the loop.

https://rust.godbolt.org/z/5x46edeTb

Someone3y ago

But Rust doesn’t have a spec (https://doc.rust-lang.org/reference/ gets closest, but explicitly states “Rust compilers, including rustc, will perform optimizations. The reference does not specify what optimizations are allowed or disallowed” and “this book is not normative”), so it doesn’t promise what kind of error you’ll get or when.

I would think a Rust compiler could hoist the check outside of the loop at least sometimes (it might not want to do it if the loop made changes that are visible from outside the loop, such as changing a volatile variable or doing I/O)

tialaramex3y ago

Rust doesn't have "volatile variables" that's a weird C thing which then ends up in C++ and related languages because nobody wants to touch this mess.

The purpose of "volatile" is to mark MMIO so that the memory reads and writes don't get optimised out because they actually perform I/O. Everywhere you see volatile abused to do something else (yes including Unix signal handlers) that's because C has a hammer and so now everything looks like a nail to C programmers. In a few cases this abuse is guaranteed to work (and results in a lot of heavy lifting elsewhere to achieve that) in most cases it's just luck.

Rust has generic intrinsics which perform memory read/write that won't get optimised out, which is the actual thing C needed when volatile was invented.

dahfizz3y ago

This is a pretty unsatisfying benchmark. Can we pin the thread to a core and re-run a few times to de-noise? And how about using an actual CPU bound program? Even a significant speedup in the code will be lost in an application like this where you spend so much time in I/O.

constantcrying3y ago

I imagine the reason bounds check are cheap is because of the branch predictor. If you always predict the in bounds path, the check is almost free.

You also do not really care about flushing the pipe on an out of bounds index, since very likely normal operations can not go on and you move over to handling/reporting the error, which likely has no need for significant throughput.

Also I would just like to note that safe arrays aren't a unique rust feature. Even writing your own in C++ is not hard.

int_19h3y ago

It's not hard, but when the idiomatically used containers aren't bounds-checked, most code out in the wild won't be, either. Worse yet if you are writing a library and have to interop with other code which will also use those idiomatic types.

These days, C++ really should be compiled with bounds-checked indexing and iterators by default. Unfortunately, this is still not a scenario that is well-supported by tooling.

pjmlp3y ago

On VC++ it is quite easy to do so,

https://learn.microsoft.com/en-us/cpp/standard-library/check...

The hard part is changing the mentality from whoever sits at the keyboard.

int_19h3y ago

FWIW it looks like they're planning to kill it off for release builds: https://github.com/microsoft/STL/issues/277

In my experience with them, the performance hit is far more substantial that a few percent, except in situations where the compiler can elide the checks altogether. For example, simply iterating over std::vector with _ITERATOR_DEBUG_LEVEL=1 is twice as slow if you work with iterators explicitly instead of writing it as a range-for.

And I'm not sure if it can be substantially better, given that C++ as designed simply needs to do more checks to ensure validity - to catch cases like comparing iterators belonging to different containers, or iterators getting invalidated when containers get resized or when the corresponding element is deleted outright. This all can't be done with simple ranges or slices, which is why VC checked iterators maintain a reference to the parent container.

1 more reply

jackmott423y ago

Yep, unless your code is wrong, the bounds check will always be predictable. Which makes it free in a sense. But sometimes it will block other optimizations, and it takes up space in the caches.

pantalaimon3y ago

That would be bad on Embedded where MCUs usually don’t do any branch prediction.

dathinab3y ago

> If you always predict the in bounds path, the check is almost free.

Note that you often only branch in the "bad" case, which means even on systems without branch prediction it tends to be not very expensive, and compilers can also eliminate a lot of bounds checks.

pjmlp3y ago

Yeah, but it only works in practice if you are only one coding the application, or there are strict rules in place to not do C style arrays.

moloch-hai3y ago

The instructions generated make a big difference. Modern processor specifications commonly quote how many instructions of a type can be "retired" in a cycle. They can retire lots of conditional branches at once, or branches and other ops, when the branches are not taken.

So it matters whether the code generator produces dead branches that can be retired cheaply. Probably, optimizers take this into account for built-in operations, but they know less about the happy path in libraries.

This is a motivation for the "likely" annotations compilers support. The likely path can then be made the one where the branch is not taken. Code on the unhappy path can be stuck off in some other cache line, or even another MMU page, never fetched in normal operation.

The cost seen here is likely from something else, though. Keeping array size in a register costs register pressure, or comparing to a stack word uses up cache bandwidth. Doing the comparison burns an ALU unit, and propagating the result to a branch instruction via the status register constrains instruction order.

Even those might not be at fault, because they might not add any extra cycles. Modern processors spend most of their time waiting for words from memory: just a few cycles for L1 cache, many more for L2 or L3, an eternity for actual RAM. They can get a fair bit done when everything fits in registers and L1 cache, and loops fit in the micro-op cache. Blow any of those, and performance goes to hell. So depending how close your code is to such an edge, extra operations might have zero effect, or might tank you.

Results of measurements don't generalize. Change something that looks like it ought to make no difference, and your performance goes up or down by 25%. In that sense, the 10% seen here is noise just because it is hard to know what might earn or cost you 10%.

dathinab3y ago

In rust there is `#[cold]` for functions as well as (nightly only) `likely(cond)`/`unlikely(cond)` and some tricks you can have something similar in stable rust.

Also branch paths which lead guaranteed to an panic tend to be treated as "unlikely" but not sure how far this is guaranteed.

bjourne3y ago

The reason performance decreased when he removed bounds checking is because asserting bounds is very useful to a compiler. Essentially, the compiler emits code like this:

    1. if (x >= 0) && (x < arr_len(arr))
    2.    get element from array index x
    3. else 
    4.     throw exception
    5. do more stuff

The compiler deduces that at line 5 0 <= x < arr_len(arr). From that it can deduce that abs(x) is a no op, that 2*x won't overflow (cause arrays can only have 2^32 elements), etc. Without bounds checking the compiler emits:

    1. get element from array index x
    2. do more stuff

So the compiler doesn't know anything about x, which is bad. The solution which apparently is not implemented in Rust (or LLVM, idk) is to emit code like the following:

    1. assert that 0 <= x < arr_len(arr)
    2. get element from array index x
    3. do more stuff

est313y ago

Interesting observation. So one should instead do the comparison with something like:

    1. if (x >= 0) && (x < arr_len(arr))
    2.    get element from array index x
    3. else 
    4.     core::hint::unreachable_unchecked
    5. do more stuff

Where unreachable_unchecked transmits precisely such information to the optimizer: https://doc.rust-lang.org/stable/std/hint/fn.unreachable_unc...

CrendKing3y ago

If what you said is true, then this is not Rust specific, and we should observe performance improvement in all languages, should they introduce bounds-checking during code emission. Is there compiler flag that we can turn on bounds-checking in GCC for C++ programs? Is there research to compare performance before and after that flag?

kibwen3y ago

> The solution which apparently is not implemented in Rust (or LLVM, idk) is to emit code like the following:

The solution to what? You appear to be suggesting that compilers should insert bounds checks even when the programmer doesn't ask for them, which, sure, I'm down for that, but the whole point of this discussion is that the bounds checking that currently gets done has a negligible cost in practice.

bjourne3y ago

Maybe I wasn't clear. The compiler should compile code without bounds checking as if bounds checking occur. The likely reason for the slowdown reported in the blog post is that Rust doesn't do this, which may be a compiler bug.

vore3y ago

I'm not sure I follow: where is abs(x)?

layer83y ago

It’s an example of what could occur within “do more stuff”. The mentioned 2*x is another example.

killingtime743y ago

Can someone smarter than me enlighten me when you would consider disabling bounds checking for performance? In ways the compiler is not already doing so? The article starts with a bug that would have been prevented by bounds checking. It's like talking about how much faster a car would go if it didn't have to carry the extra weight of brakes.

returningfory23y ago

I think the point of the article is the other way around: when starting from a language like C that doesn't have bound checking, moving to Rust will involve adding bounds checks and then an argument will be made that this will regress performance. So to test that hypothesis you start with the safe Rust code, and then remove the bounds check to emulate what the C code might be like. If, as in the article, you find that performance is not really affected, then it makes a C-to-Rust migration argument more compelling.

ablob3y ago

Making the migration in order to find out if it was worth it appears to be quite an expensive test of an hypothesis.

emn133y ago

What's the alternative, really?

emn133y ago

Sometimes the extra speed can be relevant. Knowing what the upside _can_ be can help inform the choice whether it's worth it.

Secondly, even assuming you want runtime bounds checking everywhere, then this is still a useful analysis because if you learn that bounds-checking has no relevant overhead - great! No need to look at that if you need to optimize. But if you learn that it _does_ have an overhead, then you have the knowledge to guide your next choices - is it enough to be worth spending any attention on? If you want the safety, perhaps there's specific code paths you can restructure to make it easier for the compiler to elide the checks, or the branch predictor to make em smaller? Perhaps you can do fewer indexing operations altogether? Or perhaps there's some very specific small hot-path you feel you can make an exception for; use bounds-checking 99% of the time, but not in that spot? All of these avenues are only worth even exploring if there's anything to gain here in the first place.

And then there's the simple fact that having a good intuition for where machines spend their time makes it easier to write performant code right off the bat, and it makes it easier to guess where to look first when you're trying to eek out better perf.

Even if you like or even need a technique like bounds checking, knowing the typical overheads can be useful.

d265f2783y ago

I've seen bounds checks being compiled to a single integer comparison followed by a jump (on x86 at least). This should have a negligible performance impact for most programs running on a modern, parallel CPU. However, for highly optimized programs that constantly saturate all processor instruction ports, bounds checks might of course become a bottleneck.

I think the most preferable solution (although not always possible) would be to use iterators as much as possible. This would allow rustc to "know" the entire range of possible indexes used at runtime, which makes runtime bounds checking redundant.

Some old benchmarks here: https://parallel-rust-cpp.github.io/v0.html#rustc

jackmott423y ago

In Rust you can usually use iterators to avoid bounds checks. This is idiomatic and the fast way to do things, so usually when using Rust you don't worry about this at all.

But, occasionally, you have some loop that can't be done with an iterator, AND its part of a tight loop where removing a single conditional jump matters to you. When that matters it is a real easy thing to use an unsafe block to index into the array without the check. The good news is then at least in your 1 million line program, the unsafe parts are only a few lines that you are responsible for being sure are correct.

pornel3y ago

In tight numeric code that benefits from autovectorization.

Bound checks prevent some optimizations, since they're a branch with a significant side effect that the compiler must preserve.

heleninboodler3y ago

I don't really see this as a person who wants to turn off the bounds checking for any real reason, but as someone who just wants to have an idea of what the cost of that bounds checking is.

pitaj3y ago

Sometimes the programmer can prove that bounds checks are unnecessary in a certain situation, but the compiler can't prove that itself, and the programmer can't communicate that proof to the compiler. Bounds checks can result in lost performance in some cases (very tight loops), so unsafe facilities exist as a workaround (like `get_unchecked`).

masklinn3y ago

> It's like talking about how much faster a car would go if it didn't have to carry the extra weight of brakes.

And there’s folks who do exactly that.

constantcrying3y ago

>Can someone smarter than me enlighten me when you would consider disabling bounds checking for performance?

Because it is faster. Worst case you are triggering a branch miss, which is quite expensive.

>It's like talking about how much faster a car would go if it didn't have to carry the extra weight of brakes.

So? Every wasted CPU cycle costs money and energy. Especially for very high performance applications these costs can be very high. Not every car needs brakes, if it doesn't need to stop by itself and crashing hurts nobody they are just waste.

dathinab3y ago

In some very hot code (most times loops with math stuff were it prevents some optimizations and/or the compiler fails to eliminate it) it can lead to relevant performance improvements.

Because of this you might find some rust code which opts out of bounds check for such code using unsafe code.

But this code tends to be fairly limited and often encapsulated into libraries.

So I agree that for most code doing so is just plain stupid. In turn I believe doing it on a program or compilation unit level (instead of a case by case basis) is (nearly) always a bad idea.

tragomaskhalos3y ago

I would expect an iterator into a slice to not incur any bounds checking, as the compiler can deduce the end pointer one time as start + size. So idiomatic looping should be maximally efficient you'd hope.

lmkg3y ago

The compiler shouldn't have to deduce anything, an Iterator shouldn't have a bounds check to begin with. It ought to be using unsafe operations under the hood, because it can guarantee they will only be called with valid arguments.

tylerhou3y ago

Safe iterators have to have bounds checks for dynamically sized arrays; otherwise, how you prevent iterators from walking past the end?

apendleton3y ago

In Rust at least, once you instantiate the iterator, the array it's iterating over can't be mutated until the iterator is dropped, and that can be statically guaranteed at compile time. So you don't need to bounds-check at every access; you can decide at the outset how many iterations there are going to be, and doing that number of iterations will be known not to walk past the end.

1 more reply

meindnoch3y ago

Calling ‘next’ on an iterator involves a bounds check.

tester7563y ago

I've been shocked when I've heard C programmers being actually concerned about performance penalty of checks

like, why bother? CPUs in next 2 years will win that performance anyway

and your software will be safer

Gigachad3y ago

Most online debates are filled with illogical opinions on theoretical issues. You get people on this site complaining that they have to spend money on a catalytic converter because it's not required for the car to run and only prevents other people from getting cancer.

flohofwoe3y ago

For proper bounds checking in C you first need to communicate the "bounds" to be "checked" to all the places where it matters, just a pointer isn't enough. Unfortunately many old-school C APIs (including the stdlib) often don't pass pointer-size pairs around, but just pointers (and IMHO the biggest problem in the C world is not so much the language, but outdated APIs like the C stdlib or POSIX which have mostly been designed in the K&R era and which basically "encourage" unsafe usage).

Other then that, I doubt that any reasonably pragmatic and experienced C programmer will ever argue against runtime bounds checking from a performance point of view. Even in hot loops one can usually move the bounds checking to a place outside the loop.

pjmlp3y ago

My experience in some gamedev and embedded development circles is that there are several that will religiously argue against it, without ever having run a profiler to validate their perception of the world.

flohofwoe3y ago

FWIW, I'm also coming out of the game dev world, and we shipped all our games with (custom) asserts enabled (which includes bounds checking in containers). I did regular performance tests and the enabled asserts were never a big enough performance hit to justify removing them (around 1..3% across an entire frame), the ability to get 'clean' crash reports from out in the wild, triggered by asserts instead of random segfaults was more than worth it (we also had special 'hot path asserts' which were not included in the release builds though, but those were rarely used). The only downside is that the executable gets around 25% bigger (and easier to reverse engineer) because of all the embedded assert-condition strings - but I guess this could have been solved differently (since we were using our own assert macros anyway).

1 more reply

bluecalm3y ago

A lot of people don't care about security but they do care about performance. Some examples: game engines, various solvers, simulations. This stuff runs 24h/day to produce results. A few percent here and there is a huge cost and time penalty, especially that in this kind of code bounds checks and similar have bigger consequences than in your typical CRUD app. Another example is programming for small devices which are not upgraded for years. It just kills user experience if you accumulate tens of milliseconds of performance (and again on those devices the penalty is usually bigger because CPUs are not that smart about prediction).

If you care about security of processing outside input the are many other options (or you can use sanitizers or safe practices in C). For significant part of the C programming world it's just not important though.

dahfizz3y ago

This is mostly true. The one exception is when bounds checking prevents the compiler from vectorizing your code. Then you may pay up to a 50% penalty for the innocent looking `if`.

ReactiveJelly3y ago

Sounds like we need auto-vectorized bounds checks

dataangel3y ago

Single threaded perf gains have slowed down, this isn’t true anymore.

humanrebar3y ago

All of the CPUs? C runs a lot of places.

tester7563y ago

If you need perf, then consider

better algorithms, better data structures, multi-threading, branchless programming (except safety), data-oriented design

and then elimination of checks, not first.

1 more reply

tayistay3y ago

What if a compiler were to only allow an array access when it can prove that it's in bounds? Wherever it can't you'd have to wrap the array access in an if, or otherwise refactor your code to help the compiler. Then you'd have no panicking at least and more predictable performance.

constantcrying3y ago

>What if a compiler were to only allow an array access when it can prove that it's in bounds?

Even very good static analysis tools have a hard time doing this. In a language like C++ this would effectively mean that very few index operations can be done naively and compile times are significantly increased. Performance is likely reduced as well over the trivial alternative of using a safe array.

tylerhou3y ago

With bounds checking by default, even if a compiler can't statically prove that an index is in bounds, if the index in practice is always in bounds, the compiler inlines the check/branch into the calling function, and you're not starved of branch prediction resources, the check will be "free" because the branch will always be predicted as taken.

nigeltao3y ago

That's how WUFFS (Wrangling Untrusted File Formats Safely) works:

https://github.com/google/wuffs#what-does-compile-time-check...

LoganDark3y ago

WUFFS is awesome. It won't even let you add two ints without proving they cannot overflow

est313y ago

Rust has a tool for that, it's iterators. It is only limited however.

dathinab3y ago

also `slice.get()` will return on option

and using a range check manually before an index will normally optimize the internal bounds check away

glittersharkOP3y ago

there's a cheeky link to idris's vector type in the second paragraph: https://www.idris-lang.org/docs/idris2/current/base_docs/doc... which accomplishes just that

piwi3y ago

The article mentions measurement noise several times without addressing the uncertainty. It would help to add statistical tests, otherwise the spread could let us conclude the opposite of what is really happened, just because we are out of luck.

mjcohen3y ago

Back in the 80's, I was programming in Fortran on a VAX 780. I had converted a complex eigenvalue-eigenvector routine from Algol to Fortran, and, after verifying that it worked, decided to see how much bounds checking added to the runtime. I figured since so much array referencing was done that this would be a worst case scenario. In that particular situation, it added about 30%. I decided that this was well worth it and kept array bounds checking on in all my code.

moloch-hai3y ago

That is pretty silly. Your eigen library had in it everything it needed to ensure its own safety, so anything more just added slowness.

A check performed in a library, or a condition ensured in a library, is wholly as good as the same work done in the compiler. Compilers are not magic, they are just programs.

pitaj3y ago

Very interesting. One thing that I'm curious about is adding the bounds-check assertion to `get_unchecked` and seeing if that has a significant effect.

masklinn3y ago

Happens from time to time, I’ve seen folks going around libraries looking for “perf unsafe” and benching if removing the unsafe actually lowered performances.

One issue on that front is a question of reliability / consistency: on a small benchmark chances are the compiler will always trigger to its full potential because there’s relatively little code, codegen could be dodgier in a context where code is more complicated or larger.

Then again the impact of the bounds check would also likely be lower on the non-trivial code (on the other hand there are also threshold effects, like branch predictor slots, icache sizes, …).

sammy22553y ago

“ I’d say that either it’s entirely explainable by measurement noise”

How about you do something about that? Pin it to a single core? Run it a few thousand times?

titzer3y ago

For Virgil, there is a switch to turn off bounds checking, for the only reason to measure their cost. (It's not expected that anyone ever do this for production code). Bounds checks do not appear to slow down any program that matters (TM) by more than 2%. That's partly because so many loops automatically have bounds checks removed by analysis. But still. It's negligible.

dataangel3y ago

A 2% efficiency difference is tens if not hundreds of millions of dollars for Google, Meta, etc. Globally it’s enormous.

titzer3y ago

And yet globally both of these companies spend 20% of CPU cycles on TLB misses. Should we turn off virtual memory protections and go back to raw physical memory (or something?)

lowbloodsugar3y ago

I am a Rust fan, but 10% degradation in performance (29ms to 33ms) is not "a pretty small change" nor "within noise threshold". If the accuracy of the tests are +/- 10% then that needs to be proven and then fixed. I didn't see any evidence in the article that there is, in fact, a 10% error, and it looks like there is a genuine 10% drop in performance.

hra5th3y ago

To be clear, removing the bounds checks led to the observed performance degradation. Your statement beginning with "I am a Rust fan, but..." suggests that you might have interpreted it as the other way around

lowbloodsugar3y ago

I certainly did. Thank you for pointing that out. That suggests then that the problem is the tests are bogus.

glittersharkOP3y ago

a 10% drop in performance with bounds checks removed, mind you - so if anything the bounds checks are improving performance.

pantalaimon3y ago

The more likely explanation is that the test is bunk.

Or maybe the unsafe access acts like volatile in C and disables any optimization/reordering because the compiler thinks it’s accessing a register.

Jweb_Guru3y ago

Unsafe accesses do not act that way, they compile to exactly the same code as array accesses in C.

The tests aren't bunk. There are a variety of reasons why the assertions generated by array index checks can be useful for LLVM, and there is also a fair amount of noise in any end to end test like this. The main point is that it clearly isn't a primary bottleneck (which should be pretty obvious in a test that takes 30 ms and performs under 2000 bounds checks).

vlovich1233y ago

I had written a piece of code that was trying to process things at disk line speed and bounds checking was the first bottleneck I discovered in profiling. I think this is very dependent on the application. It’s unlikely to pop in many applications, but if you’re really trying to push the machine, it seems possible to be the first bottleneck.

cyber_kinetist3y ago

This is one of the main reason you should use indices instead of pointers to store references to other objects (regardless of if you’re using Rust or C++): memory safety. Indices can be bound-checked, pointers can’t.

api3y ago

It's a lot cheaper than having an RCE and being completely pwned.

dataangel3y ago

If you make no changes what is the difference across benchmark runs? I’m very skeptical.

worewood3y ago

99. ..% of "real" applications are bottlenecked by I/O, be it disk or network or synchronization (as in, waiting for something else to happen)

If you're not in a HPC or heavily resource constrained context you can safely ignore the performance implications of choosing whatever programming language you like.

azylman3y ago

This isn't really true, some languages handle I/O much better than other languages. We migrated a Python application to Go that was about as simple as you can get and mostly blocked by I/O (call DB, transform storage format to Thrift, respond to caller with Thrift) and saw SUBSTANTIAL improvements in performance. Approximately 40% improvement in p99 latency and, more notably, 15x improvement in throughput.

gigatexal3y ago

TL;DR - in the test bounds checking vs no checks showed no noticeable difference. Very good article though. Worth reading.

carl_dr3y ago

Not too long, did read :

The benchmark went from 28.5ms to 32.9ms.

That as a percentage is 15% and is huge, it’s not noise.

The test is flawed in some way, the article is disappointing in that the author didn’t investigate further.

dale_glass3y ago

MySQL is a huge amount of code doing a variety of things in each query -- networking, parsing, IO, locking, etc. Each of those can easily have significant and hard to predict latencies.

Benchmarking that needs special care, and planning for whatever it is you want to measure. A million trivial queries and a dozen very heavy queries are going to do significantly different things, and have different tradeoffs and performance characteristics.

carl_dr3y ago

The benchmark was specifically testing the hot path of a cached query in their MySQL caching proxy. MySQL wasn’t involved at all.

I agree completely that benchmarks need care, hence my point that the article is disappointing.

The author missed the chance to investigate why removing bounds checks seemed to regress performance by 15%, and instead wrote it off as “close enough.”

It would have been really interesting to find out why, even if it did end up being measurement noise.

1 more reply

gigatexal3y ago

Right but it went up when turning off bounds checking. Which is crazy no?

bugfix-663y ago

Similarly, you can turn off bounds-checking in Go like this:

  go build -gcflags=-B

and see if it helps. Generally the assembly looks better, but it doesn't really run faster on a modern chip.

Do your own test, and keep the results in mind next time somebody on Hacker News dismisses Go because of the "overwhelming cost of bounds checking".

masklinn3y ago

> next time somebody on Hacker News dismisses Go because of the "overwhelming cost of bounds checking".

That’s certainly one criticism I don’t remember ever seeing.

viraptor3y ago

There's a few examples like this https://news.ycombinator.com/item?id=32256038 if you search comments for "go bounds checking"

pjmlp3y ago

It appears that the flag keeps being undocumented, though.

https://pkg.go.dev/cmd/compile

j / k navigate · click thread line to collapse

186 comments

Jach3y ago

-Tony Hoare, 1980 Turing Award Lecture (https://www.cs.fsu.edu/~engelen/courses/COP4610/hoare.pdf)

titzer3y ago

nine_k3y ago

Removing bounds checks is a stark example of a premature optimization.

You can remove bounds checks when you can prove that the index won't ever get out of bounds; this is possible in many cases, such as iteration with known bounds.

femto3y ago

Edit:

1 more reply

MaxBarraclough3y ago

This is the approach used by SPARK Ada. [0]

[0] https://docs.adacore.com/spark2014-docs/html/ug/en/usage_sce...

dataangel3y ago

dahfizz3y ago

> autovectorizing is broken by bounds checks

This is the big one. You pay a 50% penalty for actual CPU bound, iteration heavy code with bounds checking enabled.

https://github.com/matklad/bounds-check-cost

zozbot2343y ago

The proper way of addressing that is to manually hoist bound checks out of "hot" loops. Not just remove them altogether.

camkego3y ago

This should be the article.

Running this with 1.65 on an Intel 12400 gets a nearly 4x speedup when bounds checking is not needed. Just wow.

Bounds checking avoidance is important when it becomes a significant chunk of your hot-path.

moloch-hai3y ago

For real programs, you should demand that the compiler hoist such checks out of the loop, which may then be vectorized the usual way.

If the compiler can't do that by itself, a library should do it.

varajelle3y ago

> This may come down to how good the language is at capturing desired semantics in a library. Rust still has a long way to go to catch up with C++ on this axis, and C++ is not waiting around.

What catch up does Rust need to do?

Rust has slice that know the size of its data built in the language, while C++ doesn't. And Rust has stricter const and mutability rules that facilitates optimizations.

As for the implementation, Rust use LLVM which is also the backend used by one of the popular C++ compiler.

1 more reply

nextaccountic3y ago

> For real programs, you should demand that the compiler hoist such checks out of the loop, which may then be vectorized the usual way.

LLVM sometimes does this, but when it doesn't, you may insert asserts to guide the optimizer, as explained here https://news.ycombinator.com/item?id=33808853

I think this technique works in C and C++ too (if you use clang or gcc)

1 more reply

pjmlp3y ago

Unfortunely I only see Modern C++ on C++ conference talks and on my hobby projects.

Most of the stuff I see at work, is quite far from this ideal reality, starting with Android's codebase, or the various ways C++ gets used in Microsoft frameworks.

1 more reply

kelnos3y ago

At the risk of moving the goalposts: so what? The vast majority of applications running out there would not be impacted meaningfully in the least by taking that performance hit.

imtringued3y ago

TylerE3y ago

Yeah, I didn’t find it compelling either.

If your conclusion is “no signal, just noise” boost the input until the signal becomes apparent. If that means writing such a massive loop that the program takes an hour to run, fine.

fear913y ago

kelnos3y ago

Kamq3y ago

> Sure, disable it for some random bit of custom high-perf software where you really need to eke out the last few percent of performance.

I wonder how much good interop could get us.

netr0ute3y ago

2 more replies

userbinator3y ago

If you want to experience what using a "safe" language looks like, complete with bounds checking amongst other things, there's the JavaScript ecosystem.

...which has evolved its own, much worse, horrors instead.

Before that, there was Java.

Do not want.

jackmott423y ago

brundolf3y ago

oconnor6633y ago

ridiculous_fish3y ago

IME loops with induction variables (integer indexes) often produces better codegen than with iterators. Compare the these two Rust functions for inverting bits: https://rust.godbolt.org/z/cE4vPdbdY

This got improved in Rust 1.65 just this month, but the point stands.

edit: ARM64 compilation is even sillier. https://rust.godbolt.org/z/PEsbeGxWP

akshaykarthik3y ago

Interestingly enough, with `-C opt-level=3` both functions yield the same assembly.

I wonder if there's some pass that's missing or not done at the lower opt level.

1 more reply

brundolf3y ago

Yeah, I wondered about those too. Less straightforward than pure iterator usage but still plausible

tialaramex3y ago

est313y ago

nindalf3y ago

> Arrays do have an advantage however, compilers can see their size

Wouldn’t vectors have the same advantage in Rust? If we’re iterating over a vector, it’s proveable at compile time that the length is not being modified during the iteration.

2 more replies

winrid3y ago

IIRC the Rust in Action book claimed iterators usually omit bounds checks, but if you manually index into the array then it's more likely the bounds check won't be optimized out.

Animats3y ago

"It seems like at least for this kind of large-scale, complex application, the cost of pervasive runtime bounds checking is negligible."

Right. The myth that bounds checking is expensive may have come from some terrible compilers in the early days. Berkeley Pascal was a notable example. Each bounds check was a subroutine call.

The common cases for bounds checks are:

- It's in code that doesn't do a lot of subscript operations. So it doesn't matter.

moloch-hai3y ago

Each check burns a branch prediction slot, even if it always goes the same way. That may eject a branch predictor whose prediction matters.

imtringued3y ago

Then it sounds like our branch predictors are shit if they can't deal with simple things like this.

saagarjha3y ago

This is exactly what they are designed to do and they do their job well, but they can't do it for free.

1 more reply

rfoo3y ago

... that being said, I'd argue that the most beneficial memory-safety feature of Rust is about temporal things (i.e. prevents UAF etc) instead of spatial ones.

whatshisface3y ago

joosters3y ago

viraptor3y ago

https://github.com/bheisler/criterion.rs is good for tests like that. It will give you much more than a single number and handle things like outliers. This makes identifying noisy tests simpler.

1 more reply

spullara3y ago

A benchmarking harness without error bars?

dataangel3y ago

It could be noise in a benchmark that does IO, or has determinism problems.

rfoo3y ago

Here is the benchmark output from the post:

  news_app/ranges_and_joins/cached
     time:   [28.583 ms 29.001 ms 29.526 ms]
     thrpt:  [277.45 Kelem/s 282.48 Kelem/s 286.61 Kelem/s]

  news_app/ranges_and_joins/cached
     time:   [33.271 ms 33.836 ms 34.418 ms]
     thrpt:  [238.01 Kelem/s 242.11 Kelem/s 246.22 Kelem/s]

dathinab3y ago

Anything between nothing and one most likely correct branch predicted to _not_ jump "branch iff int/pointer > int/pointer".

This kind of bounds check are normally not ever violated (in well formed code) so branch prediction predicts them correctly nearly always.

It also is (normally) just jumping in the bad case, which means with a correct branch predictions thy can be really cheap.

And then cpu "magic" tends to be optimized for that kind of checks at they appear in a lot of languages (e.g. Java).

Then in many cases the compiler can eliminate the checks partially.

Also bounds checks tend to be an emergency guard, so you tend to sometimes do checks yourself before indexing and the compiler can often use that to eliminate the bounds check.

And even if you ignore all optimizations it's (assuming in bounds) "just" at most one int/pointer cmp (cheap) followed by a conditional branch which doesn't branch (cheap).

ridiculous_fish3y ago

The unchecked version is fully unrolled and vectorized using multiple registers. The checked version must use a loop.

mastax3y ago

This can make a big difference if you can hoist bounds checks out of an inner loop. You get the performance without adding any unsafe {}.

sltkr3y ago

Funnily there is an off-by-one error in your example; if you fix it, the generated assembly is even more efficient.

mastax3y ago

Hah! There's one reason not to switch everything to `unsafe get_unchecked()`.

est313y ago

mastax3y ago

We've learned to accept that when you turn on optimizations, you lose some lines and variables from your debug info. This is a pretty similar trade-off.

Arnavion3y ago

Although, if one is building their own libstd anyway, then I believe compiling with `--build-std-features panic_immediate_abort` should also have the same effect.

kibwen3y ago

Hm? The existence of panic messages doesn't preclude the optimizer from hoisting a bounds check out of the loop.

est313y ago

Actually that made me think... you might be right. I just saw this opinion in earlier threads and repeated it but upon second inspection I either remembered it wrongly, or it is a wrong theory.

https://rust.godbolt.org/z/5x46edeTb

Someone3y ago

tialaramex3y ago

Rust doesn't have "volatile variables" that's a weird C thing which then ends up in C++ and related languages because nobody wants to touch this mess.

Rust has generic intrinsics which perform memory read/write that won't get optimised out, which is the actual thing C needed when volatile was invented.

dahfizz3y ago

constantcrying3y ago

I imagine the reason bounds check are cheap is because of the branch predictor. If you always predict the in bounds path, the check is almost free.

Also I would just like to note that safe arrays aren't a unique rust feature. Even writing your own in C++ is not hard.

int_19h3y ago

These days, C++ really should be compiled with bounds-checked indexing and iterators by default. Unfortunately, this is still not a scenario that is well-supported by tooling.

pjmlp3y ago

On VC++ it is quite easy to do so,

https://learn.microsoft.com/en-us/cpp/standard-library/check...

The hard part is changing the mentality from whoever sits at the keyboard.

int_19h3y ago

FWIW it looks like they're planning to kill it off for release builds: https://github.com/microsoft/STL/issues/277

1 more reply

jackmott423y ago

Yep, unless your code is wrong, the bounds check will always be predictable. Which makes it free in a sense. But sometimes it will block other optimizations, and it takes up space in the caches.

pantalaimon3y ago

That would be bad on Embedded where MCUs usually don’t do any branch prediction.

dathinab3y ago

> If you always predict the in bounds path, the check is almost free.

Note that you often only branch in the "bad" case, which means even on systems without branch prediction it tends to be not very expensive, and compilers can also eliminate a lot of bounds checks.

pjmlp3y ago

Yeah, but it only works in practice if you are only one coding the application, or there are strict rules in place to not do C style arrays.

moloch-hai3y ago

dathinab3y ago

In rust there is `#[cold]` for functions as well as (nightly only) `likely(cond)`/`unlikely(cond)` and some tricks you can have something similar in stable rust.

Also branch paths which lead guaranteed to an panic tend to be treated as "unlikely" but not sure how far this is guaranteed.

bjourne3y ago

The reason performance decreased when he removed bounds checking is because asserting bounds is very useful to a compiler. Essentially, the compiler emits code like this:

    1. if (x >= 0) && (x < arr_len(arr))
    2.    get element from array index x
    3. else 
    4.     throw exception
    5. do more stuff

    1. get element from array index x
    2. do more stuff

So the compiler doesn't know anything about x, which is bad. The solution which apparently is not implemented in Rust (or LLVM, idk) is to emit code like the following:

    1. assert that 0 <= x < arr_len(arr)
    2. get element from array index x
    3. do more stuff

est313y ago

Interesting observation. So one should instead do the comparison with something like:

    1. if (x >= 0) && (x < arr_len(arr))
    2.    get element from array index x
    3. else 
    4.     core::hint::unreachable_unchecked
    5. do more stuff

Where unreachable_unchecked transmits precisely such information to the optimizer: https://doc.rust-lang.org/stable/std/hint/fn.unreachable_unc...

CrendKing3y ago

kibwen3y ago

> The solution which apparently is not implemented in Rust (or LLVM, idk) is to emit code like the following:

bjourne3y ago

vore3y ago

I'm not sure I follow: where is abs(x)?

layer83y ago

It’s an example of what could occur within “do more stuff”. The mentioned 2*x is another example.

killingtime743y ago

returningfory23y ago

ablob3y ago

Making the migration in order to find out if it was worth it appears to be quite an expensive test of an hypothesis.

emn133y ago

What's the alternative, really?

emn133y ago

Sometimes the extra speed can be relevant. Knowing what the upside _can_ be can help inform the choice whether it's worth it.

Even if you like or even need a technique like bounds checking, knowing the typical overheads can be useful.

d265f2783y ago

Some old benchmarks here: https://parallel-rust-cpp.github.io/v0.html#rustc

jackmott423y ago

In Rust you can usually use iterators to avoid bounds checks. This is idiomatic and the fast way to do things, so usually when using Rust you don't worry about this at all.

pornel3y ago

In tight numeric code that benefits from autovectorization.

Bound checks prevent some optimizations, since they're a branch with a significant side effect that the compiler must preserve.

heleninboodler3y ago

I don't really see this as a person who wants to turn off the bounds checking for any real reason, but as someone who just wants to have an idea of what the cost of that bounds checking is.

pitaj3y ago

masklinn3y ago

> It's like talking about how much faster a car would go if it didn't have to carry the extra weight of brakes.

And there’s folks who do exactly that.

constantcrying3y ago

>Can someone smarter than me enlighten me when you would consider disabling bounds checking for performance?

Because it is faster. Worst case you are triggering a branch miss, which is quite expensive.

>It's like talking about how much faster a car would go if it didn't have to carry the extra weight of brakes.

dathinab3y ago

In some very hot code (most times loops with math stuff were it prevents some optimizations and/or the compiler fails to eliminate it) it can lead to relevant performance improvements.

Because of this you might find some rust code which opts out of bounds check for such code using unsafe code.

But this code tends to be fairly limited and often encapsulated into libraries.

So I agree that for most code doing so is just plain stupid. In turn I believe doing it on a program or compilation unit level (instead of a case by case basis) is (nearly) always a bad idea.

tragomaskhalos3y ago

lmkg3y ago

tylerhou3y ago

Safe iterators have to have bounds checks for dynamically sized arrays; otherwise, how you prevent iterators from walking past the end?

apendleton3y ago

1 more reply

meindnoch3y ago

Calling ‘next’ on an iterator involves a bounds check.

tester7563y ago

I've been shocked when I've heard C programmers being actually concerned about performance penalty of checks

like, why bother? CPUs in next 2 years will win that performance anyway

and your software will be safer

Gigachad3y ago

flohofwoe3y ago

pjmlp3y ago

flohofwoe3y ago

1 more reply

bluecalm3y ago

dahfizz3y ago

This is mostly true. The one exception is when bounds checking prevents the compiler from vectorizing your code. Then you may pay up to a 50% penalty for the innocent looking `if`.

ReactiveJelly3y ago

Sounds like we need auto-vectorized bounds checks

dataangel3y ago

Single threaded perf gains have slowed down, this isn’t true anymore.

humanrebar3y ago

All of the CPUs? C runs a lot of places.

tester7563y ago

If you need perf, then consider

better algorithms, better data structures, multi-threading, branchless programming (except safety), data-oriented design

and then elimination of checks, not first.

1 more reply

tayistay3y ago

constantcrying3y ago

>What if a compiler were to only allow an array access when it can prove that it's in bounds?

tylerhou3y ago

nigeltao3y ago

That's how WUFFS (Wrangling Untrusted File Formats Safely) works:

https://github.com/google/wuffs#what-does-compile-time-check...

LoganDark3y ago

WUFFS is awesome. It won't even let you add two ints without proving they cannot overflow

est313y ago

Rust has a tool for that, it's iterators. It is only limited however.

dathinab3y ago

also `slice.get()` will return on option

and using a range check manually before an index will normally optimize the internal bounds check away

glittersharkOP3y ago

there's a cheeky link to idris's vector type in the second paragraph: https://www.idris-lang.org/docs/idris2/current/base_docs/doc... which accomplishes just that

piwi3y ago

mjcohen3y ago

moloch-hai3y ago

That is pretty silly. Your eigen library had in it everything it needed to ensure its own safety, so anything more just added slowness.

A check performed in a library, or a condition ensured in a library, is wholly as good as the same work done in the compiler. Compilers are not magic, they are just programs.

pitaj3y ago

Very interesting. One thing that I'm curious about is adding the bounds-check assertion to `get_unchecked` and seeing if that has a significant effect.

masklinn3y ago

Happens from time to time, I’ve seen folks going around libraries looking for “perf unsafe” and benching if removing the unsafe actually lowered performances.

Then again the impact of the bounds check would also likely be lower on the non-trivial code (on the other hand there are also threshold effects, like branch predictor slots, icache sizes, …).

sammy22553y ago

“ I’d say that either it’s entirely explainable by measurement noise”

How about you do something about that? Pin it to a single core? Run it a few thousand times?

titzer3y ago

dataangel3y ago

A 2% efficiency difference is tens if not hundreds of millions of dollars for Google, Meta, etc. Globally it’s enormous.

titzer3y ago

And yet globally both of these companies spend 20% of CPU cycles on TLB misses. Should we turn off virtual memory protections and go back to raw physical memory (or something?)

lowbloodsugar3y ago

hra5th3y ago

lowbloodsugar3y ago

I certainly did. Thank you for pointing that out. That suggests then that the problem is the tests are bogus.

glittersharkOP3y ago

a 10% drop in performance with bounds checks removed, mind you - so if anything the bounds checks are improving performance.

pantalaimon3y ago

The more likely explanation is that the test is bunk.

Or maybe the unsafe access acts like volatile in C and disables any optimization/reordering because the compiler thinks it’s accessing a register.

Jweb_Guru3y ago

Unsafe accesses do not act that way, they compile to exactly the same code as array accesses in C.

vlovich1233y ago

cyber_kinetist3y ago

api3y ago

It's a lot cheaper than having an RCE and being completely pwned.

dataangel3y ago

If you make no changes what is the difference across benchmark runs? I’m very skeptical.

worewood3y ago

99. ..% of "real" applications are bottlenecked by I/O, be it disk or network or synchronization (as in, waiting for something else to happen)

If you're not in a HPC or heavily resource constrained context you can safely ignore the performance implications of choosing whatever programming language you like.

azylman3y ago

gigatexal3y ago

TL;DR - in the test bounds checking vs no checks showed no noticeable difference. Very good article though. Worth reading.

carl_dr3y ago

Not too long, did read :

The benchmark went from 28.5ms to 32.9ms.

That as a percentage is 15% and is huge, it’s not noise.

The test is flawed in some way, the article is disappointing in that the author didn’t investigate further.

dale_glass3y ago

MySQL is a huge amount of code doing a variety of things in each query -- networking, parsing, IO, locking, etc. Each of those can easily have significant and hard to predict latencies.

carl_dr3y ago

The benchmark was specifically testing the hot path of a cached query in their MySQL caching proxy. MySQL wasn’t involved at all.

I agree completely that benchmarks need care, hence my point that the article is disappointing.

The author missed the chance to investigate why removing bounds checks seemed to regress performance by 15%, and instead wrote it off as “close enough.”

It would have been really interesting to find out why, even if it did end up being measurement noise.

1 more reply

gigatexal3y ago

Right but it went up when turning off bounds checking. Which is crazy no?

bugfix-663y ago

Similarly, you can turn off bounds-checking in Go like this:

  go build -gcflags=-B

and see if it helps. Generally the assembly looks better, but it doesn't really run faster on a modern chip.

Do your own test, and keep the results in mind next time somebody on Hacker News dismisses Go because of the "overwhelming cost of bounds checking".

masklinn3y ago

> next time somebody on Hacker News dismisses Go because of the "overwhelming cost of bounds checking".

That’s certainly one criticism I don’t remember ever seeing.

viraptor3y ago

There's a few examples like this https://news.ycombinator.com/item?id=32256038 if you search comments for "go bounds checking"

pjmlp3y ago

It appears that the flag keeps being undocumented, though.

https://pkg.go.dev/cmd/compile

j / k navigate · click thread line to collapse