Undefined Behavior in C and C++ (2024) (opens in new tab)

(russellw.github.io)

101 pointsimadr9mo ago233 comments

233 comments

One has to add that from the 218 UB in the ISO C23, 87 are in the core language. From those we already removed 26 and are in progress of removing many others. You can find my latest update here (since then there was also some progress): https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3529.pdf

tialaramex9mo ago

A lot of that work is basically fixing documentation bugs, labelled "ghosts" in your text. Places where the ISO document is so bad as a description of C that you would think there's Undefined Behaviour but it's actually just poorly written.

Fixing the document is worthwhile, and certainly a reminder that WG21's equivalent effort needs to make the list before it can even begin that process on its even longer document, but practical C programmers don't read the document and since this UB was a "ghost" they weren't tripped by it. Removing items from the list this way does not translate to the meaningful safety improvement you might imagine.

There's not a whole lot of movement there towards actually fixing the problem. Maybe it will come later?

taneq9mo ago

> practical C programmers don't read the document and since this UB was a "ghost" they weren't tripped by it

I would strongly suspect that C compiler implementers very much do read the document, though. Which, as far as I can see, means "ghosts" could easily become actual UB (and worse, sneaky UB that you wouldn't expect.)

tialaramex9mo ago

The previous language might cause a C compiler developer to get very confused because it seems as though they can choose something else but what it is isn't specified, but almost invariably eventually they'll realise oh, it's just badly worded and didn't mean "should" there.

It's like one of those tricky self-referential parlor box statements. "The statement on this box is not true"? Thanks I guess. But that's a game, the puzzles are supposed to be like that, whereas the mission of the ISO document was not to confuse people, so it's good that it is being improved.

1 more reply

Sharlin9mo ago

If I understand correctly, the "ghosts" are vacuously UB. As in, the standard specifies that if X, then UB, but X can in fact never be true according to the standard.

uecker9mo ago

Fixing the actual problems is work-in-progress (as my document also indicates), but naturally it is harder.

But the original article also complains about the number of trivial UB.

ncruces9mo ago

And yet, I see P1434R0 seemingly trying to introduce new undefined behavior, around integer-to-pointer conversions, where previously you had reasonably sensible implementation defined behavior (the conversions “are intended to be consistent with the addressing structure of the execution environment").

https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p14...

gpderetta9mo ago

Pointer provenance already existed before, but the standards were contradictory and incomplete. This is an effort to more rigorously nail down the semantics.

i.e., the UB already existed, but it was not explicit had to be inferred from the whole text and the boundaries were fuzzy. Remember that anything not explicitly defined by the standard, is implicitly undefined.

Also remember, just because you can legally construct a pointer it doesn't mean it is safe to dereference.

ncruces9mo ago

The current standard still says integer-to-pointer conversions are implementation defined (not undefined) and furthermore "intended to be consistent with the addressing structure of the execution environment" (that's a direct quote).

I have an execution environment, Wasm, where doing this is pretty well defined, in fact. So if I want to read the memory at address 12345, which is within bounds of the linear memory (and there's a builtin to make sure), why should it be undefined behavior?

And regarding pointer provenance, why should going through a pointer-to-integer and integer-to-pointer conversions try to preserve provenance at all, and be undefined behavior in situations where that provenance is ambiguous?

The reason I'm using integer (rather than pointer) arithmetic is precisely so I don't have to be bound by pointer arithmetic rules. What good purpose does it serve for this to be undefined (rather than implementation defined) beyond preventing certain programs to be meaningfully written at all?

I'm genuinely curious.

4 more replies

JonChesterfield9mo ago

Pointer provenance was certainly not here in the 80s. That's a more modern creation seeking to extract better performance from some applications at a cost of making others broken/unimplementable.

It's not something that exists in the hardware. It's also not a good idea, though trying to steer people away from it proved beyond my politics.

4 more replies

kazinator9mo ago

Undefined behavior only means that ISO C doesn't give requirements, not that nobody gives requirements. Many useful extensions are instances where undefined behavior is documented by an implementation.

Including a header that is not in the program, and not in ISO C, is undefined behavior. So is calling a function that is not in ISO C and not in the program. (If the function is not anywhere, the program won't link. But if it is somewhere, then ISO C has nothing to say about its behavior.)

Correct, portable POSIX C programs have undefined behavior in ISO C; only if we interpret them via IEEE 1003 are they defined by that document.

If you invent a new platform with a C compiler, you can have it such that #include <windows.h> reformats all the attached storage devices. ISO C allows this because it doesn't specify what happens if #include <windows.h> successfully resolves to a file and includes its contents. Those contents could be anything, including some compile-time instruction to do harm.

Even if a compiler's documentationd doesn't grant that a certain instance of undefined behavior is a documented extension, the existence of a de facto extension can be inferred empirically through numerous experiments: compiling test code and reverse engineering the object code.

Moreover, the source code for a compiler may be available; the behavior of something can be inferred from studying the code. The code could change in the next version. But so could the documentation; documentation can take away a documented extension the same way as a compiler code change can take away a de facto extension.

Speaking of object code: if you follow a programming paradigm of verifying the object code, then undefined behavior becomes moot, to an extent. You don't trust the compiler anyway. If the machine code has the behavior which implements the requirements that your project expects of the source code, then the necessary thing has been somehow obtained.

throw-qqqqq9mo ago

> Undefined behavior only means that ISO C doesn't give requirements, not that nobody gives requirements. Many useful extensions are instances where undefined behavior is documented by an implementation.

True, most compilers have sane defaults in many cases for things that are technically undefined (like take sizeof(void) or do pointer arithmetic on something other than a char). But not all of these cases can be saved by sane defaults.

Undefined behavior means the compiler can replace the code with whatever. So if you e.g. compile optimizing for size, the compiler will rip out the offending code, as replacing it with nothing yields the greatest size optimization.

See also John Regehr's collection of UB-Canaries: https://github.com/regehr/ub-canaries

Snippets of software exhibiting undefined behavior, executing e.g. both the true and the false branch of an if-statement or none etc. UB should not be taken lightly IMO...

eru9mo ago

> [...] undefined behavior, executing e.g. both the true and the false branch of an if-statement or none etc.

Or replacing all you mp3s with a Rick Roll. Technically legal.

(Some old version of GHC had a hilarious bug where it would delete any source code with a compiler error in it. Something like this would technically legal for most compiler errors a C compiler could spot.)

pjmlp9mo ago

Unfortunely it also means that when the programmer fails to understand what undefined behaviour is exposed on their code, the compiler is free to take advantage of that to do the ultimate performance optimizations as means to beat compiler benchmarks.

The code change might come in something as innocent as a bug fix to the compiler.

account429mo ago

Ah yes, the good old "compiler writers only care about benchmarks and are out to hurt everyone else" nonsense.

I for one am glad that compilers can assume that things that can't happen according to the language do in fact not happen and don't bloat my programs with code to handle them.

adwn9mo ago

> I for one am glad that compilers can assume that things that can't happen according to the language do in fact not happen and don't bloat my programs with code to handle them.

Yes, unthinkable happenstances like addition on fixed-width integers overflowing! According to the language, signed integers can't overflow, so code like the following:

    int new_offset = current_offset + 16;
    if (new_offset < current_offset)
        return -1; // Addition overflowed, something's wrong

can be optimized to the much leaner

    int new_offset = current_offset + 16;

Well, I sure am glad the compiler helpfully reduced the bloat in my program!

1 more reply

titzer9mo ago

Moral hazard here. The rest of us, and all of society, now rests on a huge pile of code written by incorrigible misers who imagined themselves able to write perfect, bug-free code that would go infinitely fast because bad things never happen. But see, there's bugs in your code and other people pay the cost.

2 more replies

quietbritishjim9mo ago

> Including a header that is not in the program, and not in ISO C, is undefined behavior.

What is this supposed to mean? I can't think of any interpretation that makes sense.

I think ISO C defines the executable program to be something like the compiled translation units linked together. But header files do not have to have any particular correspondence to translation units. For example, a header might declare functions whose definitions are spread across multiple translation units, or define things that don't need any definitions in particular translation units (e.g. enum or struct definitions). It could even play macro tricks which means it declares or defines different things each time you include it.

Maybe you mean it's undefined behaviour to include a header file that declares functions that are not defined in any translation unit. I'm not sure even that is true, so long as you don't use those functions. It's definitely not true in C++, where it's only a problem (not sure if it's undefined exactly) if you ODR-rule use a function that has been declared but not defined anywhere. (Examples of ODR-rule use are calling or taking the address of the function, but not, for example, using sizeof on an expression that includes it.)

kazinator9mo ago

> I can't think of any interpretation that makes sense

Start with a concrete example. A header that is not in our program, or described in ISO C. How about:

  #include <winkle.h>

Defined behavior or not? How can an implementation respond to this #include while remaining conforming? What are the limits on that response?

> But header files do not have to have any particular correspondence to translation units.

A header inclusion is just a mechanism that brings preprocessor tokens into a translation unit. So, what does the standard tell us about the tokens coming from #include <winkle.h> into whatever translation unit we put it into?

Say we have a single file program and we made that the first line. Without that include, it's a standard-conforming Hello World.

im3w1l9mo ago

I think we are slowly getting closer to the crux of the matter. Are you saying that it's a problem to include files from a library since they are "not in our program"? What does that phrase actually mean? What is the bounds of "our program" anyway? Couldn't it be the set {main.c, winkle.h}

1 more reply

quietbritishjim9mo ago

Do you just meant an attempt to include a file path that couldn't be found? That's not a correct usage of the term "program" – that refers to the binary output of the compilation process, whereas you're taking about the source files that are the input to the compilation. That sounds a bit pedantic but I really didn't understand what you meant.

I just checked, and if you attempt to include a file that cannot be found (in the include path, though it doesn't use that exact term) then that's a constraint violation and the compiler is required to stop compilation and issue a diagnostic. Not undefined behaviour.

1 more reply

gpderetta9mo ago

You are basically trying to explain the difference between a conforming program and a strictly conforming one.

safercplusplus9mo ago

A couple of solutions in development (but already usable) that more effectively address UB:

i) "Fil-C is a fanatically compatible memory-safe implementation of C and C++. Lots of software compiles and runs with Fil-C with zero or minimal changes. All memory safety errors are caught as Fil-C panics." "Fil-C only works on Linux/X86_64."

ii) "scpptool is a command line tool to help enforce a memory and data race safe subset of C++. It's designed to work with the SaferCPlusPlus library. It analyzes the specified C++ file(s) and reports places in the code that it cannot verify to be safe. By design, the tool and the library should be able to fully ensure "lifetime", bounds and data race safety." "This tool also has some ability to convert C source files to the memory safe subset of C++ it enforces"

tialaramex9mo ago

Fil-C is interesting because as you'd expect it takes a significant performance penalty to deliver this property, if it's broadly adopted that would suggest that - at least in this regard - C programmers genuinely do prioritise their simpler language over mundane ideas like platform support or performance.

The resulting language doesn't make sense for commercial purposes but there's no reason it couldn't be popular with hobbyists.

eru9mo ago

Well, you could also treat Fil-C as a sanitiser, like memory-san or ub-san:

Run your test suite and some other workloads under Fil-C for a while, fix any problems report, and if it doesn't report any problems after a while, compile the whole thing with GCC afterwards for your release version.

safercplusplus9mo ago

Right. And of course there are still less-performance-sensitive C/C++ applications (curl, postfix, git, etc.) that could have memory-safe release versions.

But the point is also to dispel the conventional wisdom that C/C++ is necessarily intrinsically unsafe. It's a tradeoff between safety, performance and flexibility/compatibility. And you don't necessarily need to jump to a completely different language to get a different tradeoff.

Fil-C sacrifices some performance for safety and compatibility. The traditional compilers sacrifice some safety for performance and flexibility/compatibility. And scpptool aims to provide the option of sacrificing some flexibility for safety and performance. (Along with the other two tradeoffs available in the same program). The claim is that C++ turns out to be expressive enough to accommodate the various tradeoffs. (Though I'm not saying it's always gonna be pretty :)

1 more reply

laauraa9mo ago

>Uninitialized data

They at least fixed this in c++26. No longer UB, but "erroneous behavior". Still some random garbage value (so an uninitialized pointer will likely lead to disastrous results still), but the compiler isn't allowed to fuck up your code, it has to generate code as if it had some value.

tialaramex9mo ago

It won't be a "random garbage value" but is instead a value the compiler chose.

In effect if you don't opt out your value will always be initialized but not to a useful value you chose. You can think of this as similar to the (current, defanged and deprecated as well as unsafe) Rust std::mem::uninitialized()

There were earlier attempts to make this value zero, or rather, as many 0x00 bytes as needed, because on most platforms that's markedly cheaper to do, but unfortunately some C++ would actually have worse bugs if the "forgot to initialize" case was reliably zero instead.

eru9mo ago

What are these worse bugs?

tialaramex9mo ago

The classic thing is, we're granting user credentials - maybe we're a login proces, or a remote execution helper - and we're on Unix. In some corner case we forget to fill out the user ID. So it's "random noise". Maybe in the executable distributed to your users it was 0x4C6F6769 because the word "Login" was in that memory in some other code and we never initialized it so...

Bad guys find the corner case and they can now authenticate as user 0x4C6F6769 which doesn't exist and so that's useless. But - when we upgrade to C++ 26 with the hypothetical zero "fix" now they're root instead!

kazinator9mo ago

C also fixed it in its way.

Access to an uninitialized object defined in automatic storage, whose address is not taken, is UB.

Access to any uninitialized object whose bit pattern is a non-value, likewise.

Otherwise, it's good: the value implied by the bit pattern is obtained and computation goes on its merry way.

account429mo ago

That's unfortunate.

fattah259mo ago

Rust here rust there. We are just talking about C not rust. Why we have to using rust. If you talking memory safety why there is no one recommends Ada language instead of rust.

We have zig, Hare, Odin, V too.

ViewTrick10029mo ago

> Ada language instead of rust

Because it never achieved mainstream success?

And Zig for example is very much not memory safe. Which a cursory search for ”segfault” in the Bun repo quickly tells you.

https://github.com/oven-sh/bun/issues?q=is%3Aissue%20state%3...

lifthrasiir9mo ago

More accurately speaking, Zig helps spatial memory safety (e.g. out-of-bound access) but doesn't help temporal memory safety (e.g. use-after-free) which Rust excels at.

pjmlp9mo ago

Which is something that even PL/I predating C already had.

ViewTrick10029mo ago

As long as you are using the "releasesafe" build mode and not "releasefast" or "releasesmall".

johnisgood9mo ago

> Because it never achieved mainstream success?

And with this attitude it never will. With Rust's hype, it would.

pjmlp9mo ago

None of them solve use after free, for example.

Ada would rather be a nice choice, but most hackers love their curly brackets.

the__alchemist9mo ago

Even within the rust OSS community it's irritating. They will try to cancel people for writing libs using `unsafe`, and makes APIs difficult to use by wrapping things in multiple layers of traits, then claim using other patters are unsafe/unsound/UB. They make claims that things like DMA are "advanced topics", and "We haven't figured it out yet/found a good solution yet". Love rust/hate the Satefy Inquisition. Or say things like "Why use rust if you don't use all the safety-features and traits"... which belittles rust as a one-trick lang!

agalunar9mo ago

A small nit: the development of Unix began on the PDP-7 in assembly, not the PDP-11.

(The B language was implemented for the PDP-7 before the PDP-11, which are rather different machines. It’s sometimes suggested that the increment and decrement operators in C, which were inherited from B, are due to the instruction set architecture of the PDP-11, but this could not have been the case. Per Dennis Ritchie:¹

> Thompson went a step further by inventing the ++ and -- operators, which increment or decrement; their prefix or postfix position determines whether the alteration occurs before or after noting the value of the operand. They were not in the earliest versions of B, but appeared along the way. People often guess that they were created to use the auto-increment and auto-decrement address modes provided by the DEC PDP-11 on which C and Unix first became popular. This is historically impossible, since there was no PDP-11 when B was developed. The PDP-7, however, did have a few “auto-increment” memory cells, with the property that an indirect memory reference through them incremented the cell. This feature probably suggested such operators to Thompson; the generalization to make them both prefix and postfix was his own.

Another person puts it this way:²

> It's a myth to suggest C’s design is based on the PDP-11. People often quote, for example, the increment and decrement operators because they have an analogue in the PDP-11 instruction set. This is, however, a coincidence. Those operators were invented before the language [i.e. B] was ported to the PDP-11.

In any case, the PDP-11 usually gets all the love, but I want to make sure the other PDPs get some too!)

[1] https://www.bell-labs.com/usr/dmr/www/chist.html

[2] https://retrocomputing.stackexchange.com/questions/8869

VivaTechnics9mo ago

We switched to Rust. Generally, are there specific domains or applications where C/C++ remain preferable? Many exist—but are there tasks Rust fundamentally cannot handle or is a weak choice?

pjmlp9mo ago

Yes, all the industries where C and C++ are the industry standards like Khronos APIs, POSIX, CUDA, DirectX, Metal, console devkits, LLVM and GCC implementation,....

Not only you are faced with creating your own wrappers, if no one else has done it already.

The tooling, for IDEs and graphical debuggers, assumes either C or C++, so it won't be there for Rust.

Ideally the day will come where those ecosystems might also embrace Rust, but that is still decades away maybe.

uecker9mo ago

Advantages of C are short compilation time, portability, long-term stability, widely available expertise and training materials, less complexity.

IMHO you can today deal with UB just fine in C if you want to by following best practices, and the reasons given when those are not followed would also rule out use of most other safer languages.

simonask9mo ago

This is a pet peeve, so forgive me: C is not portable in practice. Almost every C program and library that does anything interesting has to be manually ported to every platform.

C is portable in the least interesting way, namely that compilers exist for all architectures. But that's where it stops.

snovymgodym9mo ago

> C is not portable in practice. Almost every C program and library that does anything interesting has to be manually ported to every platform.

I'm guessing you mean that every cross-platform C codebase ends up being plastered in cascading preprocessor code to deal with OS and architecture differences. Sure that's true, you still have to do some porting work regardless of the language you chose.

But honestly, is there any language more portable than C? I struggle to come up with one.

If someone told me "I need a performant language that targets all major architectures and operating systems, but also maybe I want to run it on DOS, S390X, an old Amiga I have in my closet, and any mystery-meat microcontroller I can find." then really wouldn't have a better answer for them than C89.

If C isn't portable then nothing is.

1 more reply

pjmlp9mo ago

Back in the 2000's I had lots of fun porting code across several UNIX systems, Aix, Solaris, HP-UX, Red-Hat Linux.

A decade earlier I also used Xenix and DG/UX.

That is a nice way to learn how "portable" C happens to be, even between UNIX systems, its birthplace.

uecker9mo ago

Compilers existing is essential and not trivial (and also usually then what other languages build on). The conformance model of C also allows you to write programs that are portable without change to different platforms. This is possible, my software runs on 20 different architectures without change. That one can then also adopt it to make use of specific features of different platforms is quite natural in my opinion.

1 more reply

lifthrasiir9mo ago

> short compilation time

> IMHO you can today deal with UB just fine in C if you want to by following best practices

In the other words, short compilation time has been traded off with wetware brainwashing... well, adjustment time, which makes the supposed advantage much less desirable. It is still an advantage, I reckon though.

uecker9mo ago

I do not understand what you are tying to say, but it seems to be some hostile rambling.

1 more reply

bluetomcat9mo ago

Rust encourages a rather different "high-level" programming style that doesn't suit the domains where C excels. Pattern matching, traits, annotations, generics and functional idioms make the language verbose and semantically-complex. When you follow their best practices, the code ends up more complex than it really needs to be.

C is a different kind of animal that encourages terseness and economy of expression. When you know what you are doing with C pointers, the compiler just doesn't get in the way.

eru9mo ago

Pattern matching should make the language less verbose, not more. (Similar for many of the other things you mentioned.)

> When you know what you are doing with C pointers, the compiler just doesn't get in the way.

Alas, it doesn't get in the way of you shooting your own foot off, too.

Rust allows unsafe and other shenanigans, if you want that.

bluetomcat9mo ago

> Pattern matching should make the language less verbose, not more.

In the most basic cases, yes. It can be used as a more polished switch statement.

It's the whole paradigm of "define an ad-hoc Enum here and there", encoding rigid semantic assumptions about a function's behaviour with ADTs, and pattern matching for control-flow. This feels like a very academic approach and modifying such code to alter its opinionated assumptions isn't funny.

1 more reply

za_creature9mo ago

> When you know what you are doing with C pointers, the compiler just doesn't get in the way.

Tell me you use -fno-strict-aliasing without telling me.

Fwiw, I agree with you and we're in good[citation needed] company: https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg...

pizza2349mo ago

Yes, based on a few attempts chronicled in articles from different sources, Rust is a weak choice for game development, because it's too time-consuming to refactor.

bakugo9mo ago

There's also the fact that a lot of patterns that are commonly used in game development are fundamentally at odds with the borrow checker.

Relevant: https://youtu.be/4t1K66dMhWk?si=dZL2DoVD94WMl4fI

simonask9mo ago

Basically all of those problems originate with the tradition of conflating pointers and object identity, which is a problem in Rust as soon as you have ambiguous ownership or incongruent access patterns.

It's also very often not the best way to identify objects, for many reasons, including performance (spatial locality is a big deal).

These problems go away almost completely by simply using `EntityID` and going through `&mut World` for modifications, rather than passing around `EntityPtr`. This pattern gives you a lot of interesting things for free.

1 more reply

Defletter9mo ago

Yup, this one (https://news.ycombinator.com/item?id=43824640) comes to mind. The first comment says "Another failed game project in Rust", hinting that this is very common.

ramon1569mo ago

We've only had 6-7 years of hame dev in rust. Bevy is coming along nicely and will hopefully remove these pain points

flohofwoe9mo ago

"Mit dem Angriff Steiner's wird das alles in Ordnung kommen" ;)

As shitty as C++ is from today's PoV, the entire gaming industry switched over within around 3 years towards the end of the 90s. 6..7 years is a long time, and a single engine (especially when it's more or less just a runtime without editor and robust asset pipeline) won't change the bigger picture that Rust is a pretty poor choice for gamedev.

1 more reply

pizza2349mo ago

The articles describe how the problem is inherent in the language.

If we exclude AAA games, probably the vast majority of the games nowadays don't need manual memory management for the game core (C# was a popular choice, it seems). I guess that if one really needs manual memory management, languages with moderate memory safety would be a more appropriate choice (support libraries/frameworks being equal, which certainly aren't).

I've used Bevy, and ECS is not an appopriate choice for every game (I wouldn't actually advise it unless there is a specific need). It requires very careful design over the whole lifecycle (ECS-based games very easily tend to get a mess), which is exactly the opposite of one wants for rapid prototyping.

account429mo ago

And there are millions of game engines written in C++. Many of them have also been coming along nicely for years.

Making a nontrivial game with them is a wholly different story.

mgaunard9mo ago

Rust forces you to code in the Rust way, while C or C++ let you do whatever you want.

nicoburns9mo ago

> C or C++ let you do whatever you want.

C and C++ force you to code in the C and C++ ways. It may that that's what you want, but they certainly dont let me code how I want to code!

mgaunard9mo ago

There is no C or C++ ways. It's widely known that every codebase is its own dialect.

1 more reply

mckravchyk9mo ago

If you wanted to develop a cross-platform native desktop / mobile app in one framework without bundling / using a web browser, only QT comes to mind, which is C++. I think there are some bindings though.

jandrewrogers9mo ago

An application domain where C++ is notably better is when the ownership and lifetimes of objects are not knowable at compile-time, only being resolvable at runtime. High-performance database kernels are a canonical example of code where this tends to be common.

Beyond that, recent C++ versions have much more expressive metaprogramming capability. The ability to do extensive codegen and code verification within C++ at compile-time reduces lines of code and increases safety in a significant way.

imadrOP9mo ago

I haven't used Rust extensively so I can't make any criticism besides that I find compilation times to be slower than C

ost-ing9mo ago

I find with C/++ I have to compile to find warnings and errors, while with Rust I get more information automatically due to the modern type and linking systems. As a result I compile Rust significantly less times which is a massive speed increase.

Rusts tooling is hands down better than C/++ which aids to a more streamlined and efficient development experience

bch9mo ago

> Rusts tooling is hands down better than C/++ which aids to a more streamlined and efficient development experience

Would you expand on this? What was your C tooling/workflow that was inferior to your new Rust experience?

1 more reply

kazinator9mo ago

The popular C compilers are seriously slow, too. Orders of magnitude compared to C compilers of yesteryear.

ykonstant9mo ago

I also hear that Async Rust is very bad. I have no idea; if anyone knows, how does async in Rust compare to async in C++?

01HNNWZ0MV43FF9mo ago

I am yet to use async in c++, but I did work on a multi threaded c++ project for a few years

Rust is nicer for async and MT than c++ in every way. I am pretty sure.

But it's still mid. If you use Rust async aggressively you will struggle with the borrow checker and the architecture results of channel hell.

If you follow the "one control thread that does everything and never blocks" you can get far, but the language does not give you much help in doing that style neatly.

I have never used Go. I love a lot of Go projects like Forgejo and SyncThing. Maybe Go solved async. Rust did not. C++ did not even add good tagged unions yet.

2 more replies

ViewTrick10029mo ago

> I also hear that Async Rust is very bad.

Not sure where this is coming from.

Async rust is amazing as long as you only mix in one more hard concept. Be it traits, generics or whatever. You can confidently write and refactor heavily multithreaded code without being deathly afraid of race conditions etc. and it is extremely empowering.

The problem comes when trying to write async generic traits in a multithreaded environment.

Then just throwing stuff at the wall and hoping something sticks will quickly lead you into despair.

teunispeters9mo ago

embedded hardware, any processor Rust doesn't support (there are many), and any place where code size is critical. Rust has a BIG base size for an application, uselessly so at this time. I'd also love to see if it offered anything that could be any use in those spaces - especially where no memory allocation takes place at all. C (and to a lesser extent C++) are both very good in those spaces.

steveklabnik9mo ago

You can absolutely make small rust programs, you just have to actually configure things the right way. Additionally, the Rust language doesn’t have allocation at all, it’s purely a library concern. If you don’t want heap allocations, then don’t include them. It works well.

The smallest binary rustc has produced is like ~145 bytes.

teunispeters9mo ago

That is far from my only concern. But it's good to see Rust is finally paying attention to binary sizes. And the overwhelming complexity of rust code is definitely not a gain when one is working in embedded spaces anyway. I am however really REALLY annoyed with the aggressive sales tactics of the rust community.

1 more reply

m-schuetz9mo ago

Prototyping in any domain. It's nice to do some quick&dirty way to rapidly evaluate ideas and solutions.

eru9mo ago

I don't think C nor C++ were ever great languages for prototyping? (And definitely not better than Rust.)

m-schuetz9mo ago

Please try not to be obnoxious and turn this into a language war.

1 more reply

eru9mo ago

> Generally, are there specific domains or applications where C/C++ remain preferable?

Well, anything were your people have more experience in the other language or the libraries are a lot better.

mrheosuper9mo ago

Rust can do inline ASM, so finding a task Rust "fundamentally cannot handle" is almost impossible.

eru9mo ago

That's almost as vacuous as saying that Rust can implement universal Turing machines are that Rust can do FFI?

kazinator9mo ago

In C, using uninitialized data is undefined behavior only if:

- it is an automatic variable whose address has not been taken; or

- the uninitialized object' bits are such that it takes on a non-value representation.

pizlonator9mo ago

I don’t buy the “it’s because of optimization argument”.

And I especially don’t buy that UB is there for register allocation.

First of all, that argument only explains UB of OOB memory accesses at best.

Second, you could define the meaning of OOB by just saying “pointers are integers” and then further state that nonescaping locals don’t get addresses. Many ways you could specify that, if you cared badly enough. My favorite way to do it involves saying that pointers to locals are lazy thunks that create addresses on demand.

OskarS9mo ago

No, it's absolutely because of optimization. For instance, C++20 defined signed integer representation as having two's complement, but signed integer overflow is still undefined behaviour. The reason is that if you compile with flags that make it defined, you lose a few percentage points of performance (primarily from preventing loop unrolling and auto-vectorization).

Same thing with e.g. strict aliasing or the various UB that exists in the standard library. For instance, it's UB to pass a null pointer to strlen. Of course, you can make that perfectly defined by adding an `if` to strlen that just returns 0. But then you're adding a branch to every strlen, and C is simply not willing to do that for performance reasons, so they say "this is UB" instead.

Pretty much instance of UB in standard C or C++ is because making it defined would either hamper the optimizer, or it would make standard library functions slower. They don't just make things UB for fun.

pizlonator9mo ago

This isn’t the reason why the UB is in the spec in the first place. The spec left stuff undefined to begin with because of lack of consensus over what it should do.

For example the reason why 2s complement took so long is because of some machine that ran C that still existed that was 1s complement.

> The reason is that if you compile with flags that make it defined, you lose a few percentage points of performance (primarily from preventing loop unrolling and auto-vectorization).

I certainly don’t lose any perf on any workload of mine if I set -fwrapv

If your claim is that implementers use optimization as the excuse for wanting UB, then I can agree with that.

I don’t agree that it’s a valid argument though. The performance wins from UB are unconvincing, except maybe on BS benchmarks that C compilers overtune for marketing reasons.

OskarS9mo ago

> For example the reason why 2s complement took so long is because of some machine that ran C that still existed that was 1s complement.

You're misunderstanding me: as of C++20, there is no other representation in C++ for signed integers other than two's complement (no signed ones' complement, no signed magnitude, nothing else), but signed overflow is still UB. It's not because of obscure machines or hardware, such hardware is not relevant for C++20 and later. The reason for it is performance. From the accepted paper [1]:

> The following polls were taken, and corresponding modifications made to the paper. The main change between [P0907r0] and the subsequent revision is to maintain undefined behavior when signed integer overflow occurs, instead of defining wrapping behavior. This direction was motivated by:

> * Performance concerns, whereby defining the behavior prevents optimizers from assuming that overflow never occurs

You may disagree, you may think they're wrong, but their motivation is performance, that's why this is UB. It's right there in black and white. This was C++, not C, but it's not at all unthinkable that the C standard will also mandate two's complement at some point, and if they do, they almost certainly keep signed overflow undefined for exactly the same reason.

It's not hard to write code that optimizes much better when you use signed loop variables. One of my favorite examples is this function [2] to turn a 3D mesh inside out by flipping the edges of each triangle in a triangle mesh. The godbolt link has two versions of the same function, one with a signed loop variable, one with an unsigned one. The signed one auto-vectorizes and optimizes much better because it can assume that the loop variable never overflows (this version is C++, it's trivial to rewrite it in C and get the same results).

This is why signed overflow is UB.

[1]: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p09...

[2]: https://godbolt.org/z/a1P5Y17fn

1 more reply

account429mo ago

I wish there was a way to opt into undefined behavior for unsigned overflow. Its rare that wraparound is actually what you want and in many cases overflow is still a bug. Sucks to have to either miss out on potential optimizations or miss out on the guarantee that the value can't be negative.

uecker9mo ago

I recently filed a bug for this: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116193

OskarS9mo ago

You do need some way to overflow properly, because sometimes that is what you want. A common example would be PRNGs, which frequently rely on overflow (the classic LCG, for instance). You could argue that should just be a library function or something (e.g. `add_with_overflow`), though that's more C++ than C.

You are absolutely, 100% correct though: I've never seen a case where accidental overflow doesn't start causing bugs anyway. Like, the Pac-Man kill screen is caused by a byte overflowing (it happens on level 256), and the game goes insane. Pac-Man was written in assembly where overflow is defined behavior, but that doesn't matter at all, the game is still broken. If signed overflow is essentially always a bug anyway, why not make it UB and optimize around it? Especially since it is super-valuable in being able to unroll loops.

People always bring up signed integer overflow as an argument for why UB is scary, and it always seemed like such a bad argument to me. Like, I can understand why people think UB has gone too far in C/C++, but signed overflow is such a bad example. It's one of the most sensible bits of UB in the entire standard, IMHO.

pizlonator9mo ago

-fwrapv

j16sdiz9mo ago

> First of all, that argument only explains UB of OOB memory accesses at best.

It explains many loop-unroll and integer overflow as well.

gpderetta9mo ago

> nonescaping locals don’t get addresses

inlining, interprocedural optimizations.

For example, something as an trivial accessor member function would be hard to optimize.

pjmlp9mo ago

Safer languages manage similar optimizations without having to rely on UB.

gpderetta9mo ago

Well, yes, safer languages prevent pointer forging statically, so provenance is trivially enforced.

And I believe that provenance is an issue in unsafe rust.

1 more reply

pizlonator9mo ago

Inlining doesn’t require UB

gpderetta9mo ago

I didn't claim that. What I mean is that if a pointer escapes into an inlined function and no further, it will still prevent further optimizations if we apply your rule that only non-escaping locals don't get addresses. The main benefit of inlining is that it is effectively a simple way to do interprocedurally optimizations. I.e.

  inline void add(int* to, int what) { *to += what; }
  void foo();
  void bar() {
      int x = 0;
      add(&x, 1);
      foo();
      return x;
  }

By your rules, optimizing bar to return the constant 1 would not be allowed.

1 more reply

tialaramex9mo ago

> Second, you could define the meaning of OOB by just saying “pointers are integers"

This means losing a lot of optimisations, so in fact when you say you "don't buy" this argument you only mean that you don't care about optimisation. Which is fine, but this does mean the "improved" C isn't very useful in a lot of applications, might as well choose Java.

pizlonator9mo ago

> This means losing a lot of optimisations

You won’t lose “a lot” of optimizations and you certainly won’t lose enough for it to make a noticeable difference in any workload that isn’t SPEC

IshKebab9mo ago

This asserts that UB was deliberately created for optimisation purposes; not to handle implementation differences. It doesn't provide any evidence though and that seems unlikely to me.

The spec even says:

> behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements

No motivation is given that I could find, so the actual difference between undefined and implementation defined behaviour seems to be based on whether the behaviour needs to be documented.

flohofwoe9mo ago

I'd say the original intent of UB was not the sort of "optimizer exploits" we see today, but to allow wiggle room for supporting vastly different CPUs without having to compromise runtime performance or increasing compiler complexity to balance performance versus correctness. Basically an escape hatch for compilers. The difference to IB also has always been quite fuzzy.

Also the C spec has always been a pragmatic afterthought, created and maintained to establish at least a minimal common feature set expected of C compilers.

The really interesting stuff still only exists outside the spec in vendor language extensions.

agent3279mo ago

I, once again, disagree with the premise that UB is a necessary precondition for optimisation, or that it exists to allow for optimisation. You do not need UB to unroll a loop, inline a function, lift an object or computation out of a loop, etc. Moreover, _most_ UB does not assist in optimisation at all.

The two instances where UB allows for optimisation are as follows:

1. The 'signed overflow' UB allows for faster array indexing. By ignoring potential overflow, the compiler can generate code that doesn't check for accidental overflow (which would require masking the array index, recomputing the address on each loop iteration). I believe the better solution here would be to introduce a specific type for iterating over arrays that will never overflow; size_t would do fine, and making signed overflow at least implementation defined, if not outright fully defined, after a suitable period during which compilers warn if you use a too-small type for array indexing.

2. The 'aliasing' UB does away with the need to read/write values to/from memory each time they're used, and is extremely important to performance optimisation.

But the rest? Most of it does precisely nothing for performance. At 'best', the compiler uses detected UB to silently eliminate code branches, but that's something to be feared, not celebrated. It isn't an optimisation if it removes vital program logic, because the compiler could 'demonstrate' that it could not possibly take the removed branch, on account of it containing UB.

The claim in the linked article ("what every C programmer should know") that use of uninitialized variables allows for additional optimisation is incorrect. What it does instead is this: if the compiler see you declare a variable, and then reading from it before writing to it, it has detected UB, and since the rule is that "the compiler is allowed to assume UB does not occur", use that as 'evidence' that that code branch will never occur and can be eliminated. It does not make things go faster; it makes them go _wrong_.

Undefined behaviour, ultimately, exists for many reasons: because the standards committee forgot a case, because the underlying platforms differ too wildly, because you cannot predict in advance what the result of a bug may be, to grandfather in broken old compilers, etc. It does not, in any way, shape, or form, exist _in order to_ enable optimisation. It _allows_ it in some cases, but that is, and never was, not the goal.

Moreover, the phrasing of "the compiler is allowed to assume that UB does not occur" was originally only meant to indicate that the compiler was allowed to emit code as if all was well, without introducing additional tests (for example, to see if overflow occurred or if a pointer was valid) - clearly that would be very expensive or downright infeasible. Unfortunately, over time this has enabled a toxic attitude to grow that turns minor bugs into major disasters, all in the name of 'performance'.

The two bullet points towards the end of the article are both true: the compiler SHOULD NOT behave like an adversary, and the compiler DOES NEED license to optimize. The mistake is thinking that UB is a necessary component of such license. If that were true, a language with more UB would automatically be faster than one with less. In reality, C++ and Rust are roughly identical in performance.

roman_soldier9mo ago

Just use Zig, it fixes all this

grougnax9mo ago

Worse languages ever.

compiler-guy9mo ago

Jack Sparrow: “… but you have heard of them.”

The dustbin of programming languages is jam packed with elegant, technically terrific, languages that never went anywhere.

OskarS9mo ago

C and C++ are languages that brought us UNIX, the Linux kernel, macOS and Windows, the interpreters of virtually every other language in the world, powering virtually all software in the world as well as the vast majority of embedded devices.

Chill the fuck out.

account429mo ago

Except for all the others.

j / k navigate · click thread line to collapse

233 comments

uecker9mo ago

tialaramex9mo ago

There's not a whole lot of movement there towards actually fixing the problem. Maybe it will come later?

taneq9mo ago

> practical C programmers don't read the document and since this UB was a "ghost" they weren't tripped by it

tialaramex9mo ago

1 more reply

Sharlin9mo ago

If I understand correctly, the "ghosts" are vacuously UB. As in, the standard specifies that if X, then UB, but X can in fact never be true according to the standard.

uecker9mo ago

Fixing the actual problems is work-in-progress (as my document also indicates), but naturally it is harder.

But the original article also complains about the number of trivial UB.

ncruces9mo ago

https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p14...

gpderetta9mo ago

Pointer provenance already existed before, but the standards were contradictory and incomplete. This is an effort to more rigorously nail down the semantics.

Also remember, just because you can legally construct a pointer it doesn't mean it is safe to dereference.

ncruces9mo ago

I'm genuinely curious.

4 more replies

JonChesterfield9mo ago

Pointer provenance was certainly not here in the 80s. That's a more modern creation seeking to extract better performance from some applications at a cost of making others broken/unimplementable.

It's not something that exists in the hardware. It's also not a good idea, though trying to steer people away from it proved beyond my politics.

4 more replies

kazinator9mo ago

Correct, portable POSIX C programs have undefined behavior in ISO C; only if we interpret them via IEEE 1003 are they defined by that document.

throw-qqqqq9mo ago

See also John Regehr's collection of UB-Canaries: https://github.com/regehr/ub-canaries

Snippets of software exhibiting undefined behavior, executing e.g. both the true and the false branch of an if-statement or none etc. UB should not be taken lightly IMO...

eru9mo ago

> [...] undefined behavior, executing e.g. both the true and the false branch of an if-statement or none etc.

Or replacing all you mp3s with a Rick Roll. Technically legal.

pjmlp9mo ago

The code change might come in something as innocent as a bug fix to the compiler.

account429mo ago

Ah yes, the good old "compiler writers only care about benchmarks and are out to hurt everyone else" nonsense.

I for one am glad that compilers can assume that things that can't happen according to the language do in fact not happen and don't bloat my programs with code to handle them.

adwn9mo ago

> I for one am glad that compilers can assume that things that can't happen according to the language do in fact not happen and don't bloat my programs with code to handle them.

Yes, unthinkable happenstances like addition on fixed-width integers overflowing! According to the language, signed integers can't overflow, so code like the following:

    int new_offset = current_offset + 16;
    if (new_offset < current_offset)
        return -1; // Addition overflowed, something's wrong

can be optimized to the much leaner

    int new_offset = current_offset + 16;

Well, I sure am glad the compiler helpfully reduced the bloat in my program!

1 more reply

titzer9mo ago

2 more replies

quietbritishjim9mo ago

> Including a header that is not in the program, and not in ISO C, is undefined behavior.

What is this supposed to mean? I can't think of any interpretation that makes sense.

kazinator9mo ago

> I can't think of any interpretation that makes sense

Start with a concrete example. A header that is not in our program, or described in ISO C. How about:

  #include <winkle.h>

Defined behavior or not? How can an implementation respond to this #include while remaining conforming? What are the limits on that response?

> But header files do not have to have any particular correspondence to translation units.

Say we have a single file program and we made that the first line. Without that include, it's a standard-conforming Hello World.

im3w1l9mo ago

1 more reply

quietbritishjim9mo ago

1 more reply

gpderetta9mo ago

You are basically trying to explain the difference between a conforming program and a strictly conforming one.

safercplusplus9mo ago

A couple of solutions in development (but already usable) that more effectively address UB:

tialaramex9mo ago

The resulting language doesn't make sense for commercial purposes but there's no reason it couldn't be popular with hobbyists.

eru9mo ago

Well, you could also treat Fil-C as a sanitiser, like memory-san or ub-san:

safercplusplus9mo ago

Right. And of course there are still less-performance-sensitive C/C++ applications (curl, postfix, git, etc.) that could have memory-safe release versions.

1 more reply

laauraa9mo ago

>Uninitialized data

tialaramex9mo ago

It won't be a "random garbage value" but is instead a value the compiler chose.

eru9mo ago

What are these worse bugs?

tialaramex9mo ago

kazinator9mo ago

C also fixed it in its way.

Access to an uninitialized object defined in automatic storage, whose address is not taken, is UB.

Access to any uninitialized object whose bit pattern is a non-value, likewise.

Otherwise, it's good: the value implied by the bit pattern is obtained and computation goes on its merry way.

account429mo ago

That's unfortunate.

fattah259mo ago

Rust here rust there. We are just talking about C not rust. Why we have to using rust. If you talking memory safety why there is no one recommends Ada language instead of rust.

We have zig, Hare, Odin, V too.

ViewTrick10029mo ago

> Ada language instead of rust

Because it never achieved mainstream success?

And Zig for example is very much not memory safe. Which a cursory search for ”segfault” in the Bun repo quickly tells you.

https://github.com/oven-sh/bun/issues?q=is%3Aissue%20state%3...

lifthrasiir9mo ago

More accurately speaking, Zig helps spatial memory safety (e.g. out-of-bound access) but doesn't help temporal memory safety (e.g. use-after-free) which Rust excels at.

pjmlp9mo ago

Which is something that even PL/I predating C already had.

ViewTrick10029mo ago

As long as you are using the "releasesafe" build mode and not "releasefast" or "releasesmall".

johnisgood9mo ago

> Because it never achieved mainstream success?

And with this attitude it never will. With Rust's hype, it would.

pjmlp9mo ago

None of them solve use after free, for example.

Ada would rather be a nice choice, but most hackers love their curly brackets.

the__alchemist9mo ago

agalunar9mo ago

A small nit: the development of Unix began on the PDP-7 in assembly, not the PDP-11.

Another person puts it this way:²

In any case, the PDP-11 usually gets all the love, but I want to make sure the other PDPs get some too!)

[1] https://www.bell-labs.com/usr/dmr/www/chist.html

[2] https://retrocomputing.stackexchange.com/questions/8869

VivaTechnics9mo ago

We switched to Rust. Generally, are there specific domains or applications where C/C++ remain preferable? Many exist—but are there tasks Rust fundamentally cannot handle or is a weak choice?

pjmlp9mo ago

Yes, all the industries where C and C++ are the industry standards like Khronos APIs, POSIX, CUDA, DirectX, Metal, console devkits, LLVM and GCC implementation,....

Not only you are faced with creating your own wrappers, if no one else has done it already.

The tooling, for IDEs and graphical debuggers, assumes either C or C++, so it won't be there for Rust.

Ideally the day will come where those ecosystems might also embrace Rust, but that is still decades away maybe.

uecker9mo ago

Advantages of C are short compilation time, portability, long-term stability, widely available expertise and training materials, less complexity.

IMHO you can today deal with UB just fine in C if you want to by following best practices, and the reasons given when those are not followed would also rule out use of most other safer languages.

simonask9mo ago

This is a pet peeve, so forgive me: C is not portable in practice. Almost every C program and library that does anything interesting has to be manually ported to every platform.

C is portable in the least interesting way, namely that compilers exist for all architectures. But that's where it stops.

snovymgodym9mo ago

> C is not portable in practice. Almost every C program and library that does anything interesting has to be manually ported to every platform.

But honestly, is there any language more portable than C? I struggle to come up with one.

If C isn't portable then nothing is.

1 more reply

pjmlp9mo ago

Back in the 2000's I had lots of fun porting code across several UNIX systems, Aix, Solaris, HP-UX, Red-Hat Linux.

A decade earlier I also used Xenix and DG/UX.

That is a nice way to learn how "portable" C happens to be, even between UNIX systems, its birthplace.

uecker9mo ago

1 more reply

lifthrasiir9mo ago

> short compilation time

> IMHO you can today deal with UB just fine in C if you want to by following best practices

uecker9mo ago

I do not understand what you are tying to say, but it seems to be some hostile rambling.

1 more reply

bluetomcat9mo ago

C is a different kind of animal that encourages terseness and economy of expression. When you know what you are doing with C pointers, the compiler just doesn't get in the way.

eru9mo ago

Pattern matching should make the language less verbose, not more. (Similar for many of the other things you mentioned.)

> When you know what you are doing with C pointers, the compiler just doesn't get in the way.

Alas, it doesn't get in the way of you shooting your own foot off, too.

Rust allows unsafe and other shenanigans, if you want that.

bluetomcat9mo ago

> Pattern matching should make the language less verbose, not more.

In the most basic cases, yes. It can be used as a more polished switch statement.

1 more reply

za_creature9mo ago

> When you know what you are doing with C pointers, the compiler just doesn't get in the way.

Tell me you use -fno-strict-aliasing without telling me.

Fwiw, I agree with you and we're in good[citation needed] company: https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg...

pizza2349mo ago

Yes, based on a few attempts chronicled in articles from different sources, Rust is a weak choice for game development, because it's too time-consuming to refactor.

bakugo9mo ago

There's also the fact that a lot of patterns that are commonly used in game development are fundamentally at odds with the borrow checker.

Relevant: https://youtu.be/4t1K66dMhWk?si=dZL2DoVD94WMl4fI

simonask9mo ago

It's also very often not the best way to identify objects, for many reasons, including performance (spatial locality is a big deal).

1 more reply

Defletter9mo ago

Yup, this one (https://news.ycombinator.com/item?id=43824640) comes to mind. The first comment says "Another failed game project in Rust", hinting that this is very common.

ramon1569mo ago

We've only had 6-7 years of hame dev in rust. Bevy is coming along nicely and will hopefully remove these pain points

flohofwoe9mo ago

"Mit dem Angriff Steiner's wird das alles in Ordnung kommen" ;)

1 more reply

pizza2349mo ago

The articles describe how the problem is inherent in the language.

account429mo ago

And there are millions of game engines written in C++. Many of them have also been coming along nicely for years.

Making a nontrivial game with them is a wholly different story.

mgaunard9mo ago

Rust forces you to code in the Rust way, while C or C++ let you do whatever you want.

nicoburns9mo ago

> C or C++ let you do whatever you want.

C and C++ force you to code in the C and C++ ways. It may that that's what you want, but they certainly dont let me code how I want to code!

mgaunard9mo ago

There is no C or C++ ways. It's widely known that every codebase is its own dialect.

1 more reply

mckravchyk9mo ago

jandrewrogers9mo ago

imadrOP9mo ago

I haven't used Rust extensively so I can't make any criticism besides that I find compilation times to be slower than C

ost-ing9mo ago

Rusts tooling is hands down better than C/++ which aids to a more streamlined and efficient development experience

bch9mo ago

> Rusts tooling is hands down better than C/++ which aids to a more streamlined and efficient development experience

Would you expand on this? What was your C tooling/workflow that was inferior to your new Rust experience?

1 more reply

kazinator9mo ago

The popular C compilers are seriously slow, too. Orders of magnitude compared to C compilers of yesteryear.

ykonstant9mo ago

I also hear that Async Rust is very bad. I have no idea; if anyone knows, how does async in Rust compare to async in C++?

01HNNWZ0MV43FF9mo ago

I am yet to use async in c++, but I did work on a multi threaded c++ project for a few years

Rust is nicer for async and MT than c++ in every way. I am pretty sure.

But it's still mid. If you use Rust async aggressively you will struggle with the borrow checker and the architecture results of channel hell.

If you follow the "one control thread that does everything and never blocks" you can get far, but the language does not give you much help in doing that style neatly.

I have never used Go. I love a lot of Go projects like Forgejo and SyncThing. Maybe Go solved async. Rust did not. C++ did not even add good tagged unions yet.

2 more replies

ViewTrick10029mo ago

> I also hear that Async Rust is very bad.

Not sure where this is coming from.

The problem comes when trying to write async generic traits in a multithreaded environment.

Then just throwing stuff at the wall and hoping something sticks will quickly lead you into despair.

teunispeters9mo ago

steveklabnik9mo ago

The smallest binary rustc has produced is like ~145 bytes.

teunispeters9mo ago

1 more reply

m-schuetz9mo ago

Prototyping in any domain. It's nice to do some quick&dirty way to rapidly evaluate ideas and solutions.

eru9mo ago

I don't think C nor C++ were ever great languages for prototyping? (And definitely not better than Rust.)

m-schuetz9mo ago

Please try not to be obnoxious and turn this into a language war.

1 more reply

eru9mo ago

> Generally, are there specific domains or applications where C/C++ remain preferable?

Well, anything were your people have more experience in the other language or the libraries are a lot better.

mrheosuper9mo ago

Rust can do inline ASM, so finding a task Rust "fundamentally cannot handle" is almost impossible.

eru9mo ago

That's almost as vacuous as saying that Rust can implement universal Turing machines are that Rust can do FFI?

kazinator9mo ago

In C, using uninitialized data is undefined behavior only if:

- it is an automatic variable whose address has not been taken; or

- the uninitialized object' bits are such that it takes on a non-value representation.

pizlonator9mo ago

I don’t buy the “it’s because of optimization argument”.

And I especially don’t buy that UB is there for register allocation.

First of all, that argument only explains UB of OOB memory accesses at best.

OskarS9mo ago

pizlonator9mo ago

This isn’t the reason why the UB is in the spec in the first place. The spec left stuff undefined to begin with because of lack of consensus over what it should do.

For example the reason why 2s complement took so long is because of some machine that ran C that still existed that was 1s complement.

> The reason is that if you compile with flags that make it defined, you lose a few percentage points of performance (primarily from preventing loop unrolling and auto-vectorization).

I certainly don’t lose any perf on any workload of mine if I set -fwrapv

If your claim is that implementers use optimization as the excuse for wanting UB, then I can agree with that.

I don’t agree that it’s a valid argument though. The performance wins from UB are unconvincing, except maybe on BS benchmarks that C compilers overtune for marketing reasons.

OskarS9mo ago

> For example the reason why 2s complement took so long is because of some machine that ran C that still existed that was 1s complement.

> * Performance concerns, whereby defining the behavior prevents optimizers from assuming that overflow never occurs

This is why signed overflow is UB.

[1]: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p09...

[2]: https://godbolt.org/z/a1P5Y17fn

1 more reply

account429mo ago

uecker9mo ago

I recently filed a bug for this: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116193

OskarS9mo ago

pizlonator9mo ago

-fwrapv

j16sdiz9mo ago

> First of all, that argument only explains UB of OOB memory accesses at best.

It explains many loop-unroll and integer overflow as well.

gpderetta9mo ago

> nonescaping locals don’t get addresses

inlining, interprocedural optimizations.

For example, something as an trivial accessor member function would be hard to optimize.

pjmlp9mo ago

Safer languages manage similar optimizations without having to rely on UB.

gpderetta9mo ago

Well, yes, safer languages prevent pointer forging statically, so provenance is trivially enforced.

And I believe that provenance is an issue in unsafe rust.

1 more reply

pizlonator9mo ago

Inlining doesn’t require UB

gpderetta9mo ago

  inline void add(int* to, int what) { *to += what; }
  void foo();
  void bar() {
      int x = 0;
      add(&x, 1);
      foo();
      return x;
  }

By your rules, optimizing bar to return the constant 1 would not be allowed.

1 more reply

tialaramex9mo ago

> Second, you could define the meaning of OOB by just saying “pointers are integers"

pizlonator9mo ago

> This means losing a lot of optimisations

You won’t lose “a lot” of optimizations and you certainly won’t lose enough for it to make a noticeable difference in any workload that isn’t SPEC

IshKebab9mo ago

This asserts that UB was deliberately created for optimisation purposes; not to handle implementation differences. It doesn't provide any evidence though and that seems unlikely to me.

The spec even says:

> behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements

No motivation is given that I could find, so the actual difference between undefined and implementation defined behaviour seems to be based on whether the behaviour needs to be documented.

flohofwoe9mo ago

Also the C spec has always been a pragmatic afterthought, created and maintained to establish at least a minimal common feature set expected of C compilers.

The really interesting stuff still only exists outside the spec in vendor language extensions.

agent3279mo ago

The two instances where UB allows for optimisation are as follows:

2. The 'aliasing' UB does away with the need to read/write values to/from memory each time they're used, and is extremely important to performance optimisation.

roman_soldier9mo ago

Just use Zig, it fixes all this

grougnax9mo ago

Worse languages ever.

compiler-guy9mo ago

Jack Sparrow: “… but you have heard of them.”

The dustbin of programming languages is jam packed with elegant, technically terrific, languages that never went anywhere.

OskarS9mo ago

Chill the fuck out.

account429mo ago

Except for all the others.

j / k navigate · click thread line to collapse