Here's an entire blog post with proof:
https://kristerw.blogspot.com/2016/02/how-undefined-signed-o...
> Signed integer expression simplification [...]
This is easily addressed by allowing (but not requiring) implementations to keep wider intermediate results on signed overflow, to a unspecified width at least as wide as the operands.
> the index calculations are typically done using 32-bit int
No, they're done using size_t; that's what size_t is for.
> undefined overflow ensures that a[i] and a[i+1] are adjacent.
No, the fact that i (of type size_t) < length of a <= size of a in bytes <= SIZE_MAX ensures that a[i] and a[i+1] are adjacent.
(Also, note that this does not mean the optimizer is allowed to access memory at &a[i+1], since that might be across a page boundary (or even the wraparound from address ...FFF to 0), which makes adjacency less helpful than one might hope.)
> Value range calculations [...] Loop analysis and optimization [...]
Allowed but not required to retain excess bits on signed overflow.
> This is easily addressed by allowing (but not requiring) implementations to keep wider intermediate results on signed overflow, to a unspecified width at least as wide as the operands.
> > Value range calculations [...] Loop analysis and optimization [...]
> Allowed but not required to retain excess bits on signed overflow.
What does that even mean? That the code would have different behavior with a different compiler, different optimizations or when you slightly change it (depending on whether the compiler chooses to keep wider intermediate results or not)?
If understand you correctly, then depending on the values of x and y, `(x+1)<(y+3)` would have a different result than `(x+a)<(y+b)` when a=1 and b=3, because in the first case you would be simplifying the expression but in the second case you couldn't.
That would be quite surprising, to say the least.
> No, they're done using size_t; that's what size_t is for.
for (int i = 0; i < 256; i++)
a[i] = b[i] + c[i];
So you've never seen code like this? Not all arrays are huge and for small indices people usually go for `int`.Also, when doing arithmetic with indices, you might need to represent a negative index, so `size_t` wouldn't work. You'd need to use `ssize_t`, which is a signed integer which would benefit from these optimizations.
But even then you might use an `int` if you know the arithmetic result will fit.
> No, the fact that i (of type size_t) < length of a <= size of a in bytes <= SIZE_MAX ensures that a[i] and a[i+1] are adjacent.
Not if `i` is a signed integer, say, `int` or `int8_t`. Which is the point of the optimization.
If x > TYPEOFX_MAX or y > TYPEOFY_MAX-2, then this can already happen with the more (maximally) vague "signed overflow is completely undefined" policy; wider intermediates just mean that code is not allowed to do other things, like crash or make demons fly out of your nose.
If x+[a or 1] and y+[3 or b] don't overflow, then computing them in a wider signed integer type has no effect, since the same values are produced as in the original type.
More generally, retained overflow bits / wider intermediates (instead of undefined behaviour) mean that when you would have gotten undefined behaviour due to integer overflow, you instead get a partially-undefined value with a smaller blast radius (and hence less opprotunity for a technically-standards-conformant compiler to insert security vulnerabilities or other insidious bugs). In cases where you would not have gotten undefined behaviour, there is no signed integer overflow, so the values you get are not partially-undefined, and work the same way as in the signed-overflow-is-undefined-behaviour model. ... you know what; table:
signed overflow is \ no overflow unsigned overflow signed overflow
undefined behaviour: correct result truncated mod 2^n your sinus cavity is a demon roost
wide intermediates: correct result truncated mod 2^n one of a predictable few probably-incorrect resultsAnd the blog post lists dozens of optimizations which can indeed be performed due to that decision.
The benefits of these optimizations will vary depending on the actual code and on the architecture/CPU, of course.
It will be greater for hot and tight loops that benefit from these optimizations and less important for cold parts of the code.
It will also be greater for more limited CPUs and less important for more sophisticated ones.