undefined | Better HN

0 pointsthomashabets25d ago0 comments

Author here.

So I see your counter points are all "so just don't do that, then".

And the point of my post is that this particular "just don't do that, then" has never been achieved by humans.

If if there's no example of a program without these bugs in a language, then I do think it's fair to blame the language. A knife with 16 blades and no handle.

> Expecting C to handle "address zero" in physical memory in ways that conflict with NULL in source code denotes a complete lack of understanding of what a program is.

Like the post says, it's rare that programmers actually want a pointer to memory address zero. But in my experience most programmers who even encounter that have this "complete lack of understanding", as you put it.

0 comments

HelloNurse5d ago

"Just don't do that" is the correct approach to errors, even when they are easy to overlook and the programming language provides many opportunities for mistakes.

For example, you seem to underestimate how wrong placing negative values in a signed char is: ordinary character encodings do not use negative codes, so either those negative values are not characters and they have no business being treated as such, or something strange and experimental is going on.

thomashabets2OP5d ago

> "Just don't do that" is the correct approach to errors

We have 54 years of empirical data that literally nobody can follow this approach and reach UB-freeness. To stick to the plan is more like the in-debt gambler who just needs to work their system for a little longer, and they'll become rich.

By this logic we don't need any traffic rules other than "just don't crash or hit anyone". And we can aspire to an absolute dictatorship, all we need to do is "just" choose the benevolent one.

Of course we should always try to not make mistakes. But given more than half a century of empirical data that nobody has been able to avoid UB, ever, it takes quite some hubris to say "but it might work for us".

> you seem to underestimate how wrong placing negative values in a signed char is

Shrug. You don't make that mistake. There are thousands of mistakes like it, especially in C or C++.

Of course "don't do that". That is not the same as "So just don't do that!". The former is good advice. The latter is one of a million rules, and to expect even experts (see OpenBSD) to never make a mistake is unrealistic to say the least.

You may even have spotted the UB in https://pooladkhay.com/posts/first-kernel-patch/. But you would not spot all of them. Nobody in history has.

HelloNurse5d ago

While, for the purpose of avoiding gratuitous mistakes, C is a serious disadvantage compared to less low-level languages, your discussion of UB pitfalls in C is aimed at a strawman.

First of all, traffic rules are good, and similar to good C programming rules: check number value ranges when there is a chance of casting or overflow, check Inf and NaN floating point values, declare alignment strategically (e.g. in all memory allocations) to avoid misaligned pointers and variables, and so on. Such rules have alternatives and exceptions and must not be part of the language.

Second, nobody needs perfection and "UB-freeness": it is reasonable to assume that many cases of UB won't be a problem, either because a library will be used correctly and they won't happen, or because the C implementation is neither weird nor hostile and they will be as benign as defined or implementation defined behaviour, or simply because we avoid doing something known to be inexact or hard to write correctly.

Practical programming requires knowing the relevant rules for what one is doing and learning new ones by making, diagnosing and overcoming mistakes; not omniscience, and definitely not the unfounded feeling of omniscience and unlimited resources that LLMs can give.

EDIT: I insist on the signed char example because it would be terribly wrong (processing who-knows-what as if it were a sequence of characters) even without undefined behaviour, even in different languages.

thomashabets2OP5d ago

> Second, nobody needs perfection and "UB-freeness"

Sure. You only care about the ones that manifest security issues, stability issues, or other corruption. But of course those change over time as compilers change.

So while far from every instance of UB will manifest in a problem, every single one has the potential to, by a low percentage. They're all tiny liabilities that add up.

But which ones will? Reminds me of https://www.lesswrong.com/posts/ooypcn7qFzsMcy53R/infinite-c...

> because the C implementation is neither weird nor hostile

Some people definitely were screaming at GCC for being hostile when it removed the NULL check in the kernel:

    int foo = bar->baz;
    if (!bar) {
      return -EINVAL;
    }

> the unfounded feeling of omniscience and unlimited resources that LLMs can give.

I definitely don't have that. I'm not saying LLMs find all bugs (now or in the future), nor that they are an unlimited resource.

I'm just saying that for finding UB and subtle bugs, they find orders of magnitude more, especially in C and C++.

I am not saying they find a strict superset of bugs, compared to a human. But take me running this against cosmopolitan libc: https://news.ycombinator.com/item?id=48206377. It took me basically zero human time to spin it off, it took a couple of minutes (5.5 in xhigh effort) to run, and found 5-10 cases of UB, one of which I think is a user visible parsing error of SSH keys. Another is a set of double-free, which is definitely a thing that gets exploited over and over.

Would I have found these, in an unknown-to-me codebase no less, given manual source code reading all day? Of course not. Would I have found it with the likes of UBSAN? jart claims to have used it (https://news.ycombinator.com/item?id=48205545), and apparently didn't.

LLMs are just one of the tools to use. A tool that does better than any tool or human has done in the last half century.

> I insist on the signed char example because it would be terribly wrong

The char situation is terrible in C. It's perfectly safe to hold bytes in a char, signed char, and unsigned char, and convert between them. But then integer promotion rules combine with the historical choice of having isdigit take an int to break things.

If isdigit took a char, of any signedness, then there wouldn't be a problem. But that EOF ruins it.

> processing who-knows-what as if it were a sequence of characters

A "char" hasn't been "a character" in any meaningful sense in a long long time. Or rather, "a character" is not a code point or grapheme cluster. For byte processing, since they cast perfectly fine, it's fine. Or do you have some interesting example?

dminik5d ago

Just don't fall bro. It's that easy. No railings required.

j / k navigate · click thread line to collapse

0 comments

HelloNurse5d ago

"Just don't do that" is the correct approach to errors, even when they are easy to overlook and the programming language provides many opportunities for mistakes.

thomashabets2OP5d ago

> "Just don't do that" is the correct approach to errors

By this logic we don't need any traffic rules other than "just don't crash or hit anyone". And we can aspire to an absolute dictatorship, all we need to do is "just" choose the benevolent one.

> you seem to underestimate how wrong placing negative values in a signed char is

Shrug. You don't make that mistake. There are thousands of mistakes like it, especially in C or C++.

You may even have spotted the UB in https://pooladkhay.com/posts/first-kernel-patch/. But you would not spot all of them. Nobody in history has.

HelloNurse5d ago

While, for the purpose of avoiding gratuitous mistakes, C is a serious disadvantage compared to less low-level languages, your discussion of UB pitfalls in C is aimed at a strawman.

thomashabets2OP5d ago

> Second, nobody needs perfection and "UB-freeness"

Sure. You only care about the ones that manifest security issues, stability issues, or other corruption. But of course those change over time as compilers change.

So while far from every instance of UB will manifest in a problem, every single one has the potential to, by a low percentage. They're all tiny liabilities that add up.

But which ones will? Reminds me of https://www.lesswrong.com/posts/ooypcn7qFzsMcy53R/infinite-c...

> because the C implementation is neither weird nor hostile

Some people definitely were screaming at GCC for being hostile when it removed the NULL check in the kernel:

    int foo = bar->baz;
    if (!bar) {
      return -EINVAL;
    }

> the unfounded feeling of omniscience and unlimited resources that LLMs can give.

I definitely don't have that. I'm not saying LLMs find all bugs (now or in the future), nor that they are an unlimited resource.

I'm just saying that for finding UB and subtle bugs, they find orders of magnitude more, especially in C and C++.

LLMs are just one of the tools to use. A tool that does better than any tool or human has done in the last half century.

> I insist on the signed char example because it would be terribly wrong

If isdigit took a char, of any signedness, then there wouldn't be a problem. But that EOF ruins it.

> processing who-knows-what as if it were a sequence of characters

dminik5d ago

Just don't fall bro. It's that easy. No railings required.

j / k navigate · click thread line to collapse