So I see your counter points are all "so just don't do that, then".
And the point of my post is that this particular "just don't do that, then" has never been achieved by humans.
If if there's no example of a program without these bugs in a language, then I do think it's fair to blame the language. A knife with 16 blades and no handle.
> Expecting C to handle "address zero" in physical memory in ways that conflict with NULL in source code denotes a complete lack of understanding of what a program is.
Like the post says, it's rare that programmers actually want a pointer to memory address zero. But in my experience most programmers who even encounter that have this "complete lack of understanding", as you put it.
For example, you seem to underestimate how wrong placing negative values in a signed char is: ordinary character encodings do not use negative codes, so either those negative values are not characters and they have no business being treated as such, or something strange and experimental is going on.
We have 54 years of empirical data that literally nobody can follow this approach and reach UB-freeness. To stick to the plan is more like the in-debt gambler who just needs to work their system for a little longer, and they'll become rich.
By this logic we don't need any traffic rules other than "just don't crash or hit anyone". And we can aspire to an absolute dictatorship, all we need to do is "just" choose the benevolent one.
Of course we should always try to not make mistakes. But given more than half a century of empirical data that nobody has been able to avoid UB, ever, it takes quite some hubris to say "but it might work for us".
> you seem to underestimate how wrong placing negative values in a signed char is
Shrug. You don't make that mistake. There are thousands of mistakes like it, especially in C or C++.
Of course "don't do that". That is not the same as "So just don't do that!". The former is good advice. The latter is one of a million rules, and to expect even experts (see OpenBSD) to never make a mistake is unrealistic to say the least.
You may even have spotted the UB in https://pooladkhay.com/posts/first-kernel-patch/. But you would not spot all of them. Nobody in history has.
First of all, traffic rules are good, and similar to good C programming rules: check number value ranges when there is a chance of casting or overflow, check Inf and NaN floating point values, declare alignment strategically (e.g. in all memory allocations) to avoid misaligned pointers and variables, and so on. Such rules have alternatives and exceptions and must not be part of the language.
Second, nobody needs perfection and "UB-freeness": it is reasonable to assume that many cases of UB won't be a problem, either because a library will be used correctly and they won't happen, or because the C implementation is neither weird nor hostile and they will be as benign as defined or implementation defined behaviour, or simply because we avoid doing something known to be inexact or hard to write correctly.
Practical programming requires knowing the relevant rules for what one is doing and learning new ones by making, diagnosing and overcoming mistakes; not omniscience, and definitely not the unfounded feeling of omniscience and unlimited resources that LLMs can give.
EDIT: I insist on the signed char example because it would be terribly wrong (processing who-knows-what as if it were a sequence of characters) even without undefined behaviour, even in different languages.
Sure. You only care about the ones that manifest security issues, stability issues, or other corruption. But of course those change over time as compilers change.
So while far from every instance of UB will manifest in a problem, every single one has the potential to, by a low percentage. They're all tiny liabilities that add up.
But which ones will? Reminds me of https://www.lesswrong.com/posts/ooypcn7qFzsMcy53R/infinite-c...
> because the C implementation is neither weird nor hostile
Some people definitely were screaming at GCC for being hostile when it removed the NULL check in the kernel:
int foo = bar->baz;
if (!bar) {
return -EINVAL;
}
> the unfounded feeling of omniscience and unlimited resources that LLMs can give.I definitely don't have that. I'm not saying LLMs find all bugs (now or in the future), nor that they are an unlimited resource.
I'm just saying that for finding UB and subtle bugs, they find orders of magnitude more, especially in C and C++.
I am not saying they find a strict superset of bugs, compared to a human. But take me running this against cosmopolitan libc: https://news.ycombinator.com/item?id=48206377. It took me basically zero human time to spin it off, it took a couple of minutes (5.5 in xhigh effort) to run, and found 5-10 cases of UB, one of which I think is a user visible parsing error of SSH keys. Another is a set of double-free, which is definitely a thing that gets exploited over and over.
Would I have found these, in an unknown-to-me codebase no less, given manual source code reading all day? Of course not. Would I have found it with the likes of UBSAN? jart claims to have used it (https://news.ycombinator.com/item?id=48205545), and apparently didn't.
LLMs are just one of the tools to use. A tool that does better than any tool or human has done in the last half century.
> I insist on the signed char example because it would be terribly wrong
The char situation is terrible in C. It's perfectly safe to hold bytes in a char, signed char, and unsigned char, and convert between them. But then integer promotion rules combine with the historical choice of having isdigit take an int to break things.
If isdigit took a char, of any signedness, then there wouldn't be a problem. But that EOF ruins it.
> processing who-knows-what as if it were a sequence of characters
A "char" hasn't been "a character" in any meaningful sense in a long long time. Or rather, "a character" is not a code point or grapheme cluster. For byte processing, since they cast perfectly fine, it's fine. Or do you have some interesting example?