undefined | Better HN

0 pointsgreysphere5d ago0 comments

The first example is dereferencing an integer pointer. That is a valid operation. Now if that pointer isn't valid (and being unaligned is one of many reasons it could be invalid) then calling the function with that invalid pointer will be UB.

An honest discussion would be something more like 'dereferencing pointers can lead to UB on invalid pointers. Here are N examples of that. Maybe avoid using pointers. Maybe consider how other languages avoid pointers. Maybe these shouldn't be UB and instead some other class of error.' And then even more honest discussion would present the upsides of having pointers and the upsides of having these errors be UB.

Instead, the article (and your comment) take this valid operation and presents it as invalid. Imagine you're a new programmer, you are just starting to wrap your head around pointers and you stumble across this article. You see the first example and it looks exactly what you would expect a dereference to look like. But the article claims it's wrong, and now you're confused. So you dig into the article more closely and are exposed to all these terms like UB, alignment, type coercion etc and come away more confused and scared and disinclined to understand pointers. This is classic FUD. This is a technique to manipulate, not educate.

Pointers have pros and cons. UB has pros and cons. Let's try to educate people about them.

0 comments

stevenhuang5d ago

There is an important distinction here to the technical meaning of UB that is lost to many.

UB simply means the operation you are intending to perform has no defined semantic under the ISO C specification. That is all. Understand what this means but do not read further into it. It is easy to read further into this as you have and many do, and come to incorrect conclusions, and think this MUST result in incorrect behaviour, but this is not the claim. The claim is rather than once you write UB, you are no longer writing C the language with a defined spec, and that any manner of degrees of freedom (architecture, toolchain, etc) can now cause your code that was once behaving correctly to now behave incorrectly. That is the danger.

> That is a valid operation. Now if that pointer isn't valid (and being unaligned is one of many reasons it could be invalid) then calling the function with that invalid pointer will be UB.

This is incorrect. The moment you express this in source code, it is already UB wrt to the C abstract machine.

6.3.2.3. 755 If the resulting pointer is not correctly aligned for the pointed-to type, the behavior is undefined.

https://c0x.shape-of-code.com/6.3.2.3.html

The important distinction is to KNOW this is still UB; whether the operation yields the expected behaviour on your platform and architecture is completely a separate question.

The reason this is of utmost important is because the C compiler operates on the C abstract machine.

If you violate language invariants, the compiler can--keyword can--emit WRONG code and it will be CORRECT to do so because C unfortunately allows it to. When this happens it's silent and deadly and it's a pain to debug. The point of all this seeming language lawyering is not FUD, it is genuine frustration with these footguns of the language that we are trying to share with others. Understanding UB correctly really is what separates those that know C and those that "know" C.

Things will work and then they won't. This can be fine for most cases but not fine for others. If you use C in 2026 you need to understand this.

> come away more confused and scared

This is the correct take. One aught to be more confused and scared after learning about UB; the language simply leaves things under-specified and it is up to the developer to understand they are engaging in UB.

Once UB is acknowledged, one aught to impress upon themselves the software they build is dependent ever more on the whims of their particular compiler (clang/gcc), compiler flags (optimizations), architecture, and runtime environment.

greysphereOP5d ago

Maybe I'm misunderstanding. Here is what I'm trying to say.

"Accessing an object which is not correctly aligned" - this is UB

"As an example of this, take this code: ..." - this (code) is not UB.

Is this incorrect somehow?

You could interpret the second sentence as 'under the assumption of an unaligned pointer, let's look at what this seemingly innocuous (and correct) code does.'

But that's not what they did. They presented that code as if it's incorrect (following the whole premise of the article 'Everything in c is UB'). That's what the whole article does, they take a topic with real concerns, then present 'normal' code, and then imply the code is the issue (and therefore the language), not the premise.

You know what would be better, show an example that clearly shows the complete path for the premise to the issue. Ie show some code that generates an unaligned pointer and then uses it. Why did the author not do that? Surprise, because it's actually pretty hard to write code that's 'guaranteed' unaligned behavior.

    int foo[10];
    int *bar = (int *)(((int)&foo) + 1);

Is this unaligned access? You don't know because you don't know the size of int. (Not to mention it looks ridiculous. By only showing 'reasonable' code as the example, the article suppresses the common 'uh just don't do that' criticism.)

And in fact the ambiguity of alignments and sizes is the whole point - they are given the privilege/footgun of being undefined in c so that compilers are easier to write. It's very debatable if this was/is a good idea, but that's where the debate should be, not illusorily ascribed to derefing pointers.

If I'm misunderstanding, please let me know. Specifically, if you're claiming (1) either the literal code in the first box of the article is UB, or (2) please write some literal code that is UB in the vein of the first claim of the article. I think that would help me bridge the gap that we seem to be having.

stevenhuang5d ago

Edit: I think one part of the confusion is we were addressing different parts of the first example of the article. You were referencing the int foo(..) snippet (which I agree has no UB), but I was referencing the parse_packet() snippet (which has UB by construction), which was also part of the first example :).

You are beginning to understand. Yes, surprisingly, it is (1) that is being claimed.

The mere expression is alone UB. Yes, you read that right. In source code, it's already UB. Why? Because the ISO spec defined UB that way. But you see, what this means in practice ie whether "it works" is an entirely separate question and would be specific to toolchain, hardware, runtime, the alignment of the pointer in question, blah blah.

There is nuance here, and that's why this topic is debated to death, because it's hard to explain and it is genuinely complex.

When people say something is UB, they mean to say that the behaviour is undefined--wrt to ISO C.

The behaviour that actually matters IS defined wrt toolchain, hardware, runtime, alignment of pointer in question.

But that's exactly it--the latter is not what we mean when we say something is UB, when we say something is UB we are talking about the ISO C spec. The important follow up question then, when knowingly invoking UB, is to ensure your environment is "correct", because you have now crossed into realms entirely out of the auspices of the ISO C spec. Ergo, you are now in UB land; what you thought was the foundation of your codebase, the ISO C spec, has now turned into quicksand.

It is this implied undocumented dependence on factors external to the source code that is a huge source of bugs and surprisal.

So take this example from the article. Yes, it is UB by construction.

(edit: i copied the wrong fragment initially -- if you were talking about the int foo(const int* p) fragment, yes that block is not by construction UB)

    bool parse_packet(const uint8_t\* bytes) {
            const int\* magic_intp = (const int*)bytes;   // UB!
            int magic_raw = foo(magic_intp);  // Probably crashes on SPARC.
            int magic = ntohl(magic_raw); // this is fine, at least.
            […]
    }

Why?

> Because the compiler is not obligated to generate assembly instructions that work on unaligned pointers. Because it’s UB.

Does it actually work though? It might and it might not: there is simply no guarantee from the language. But that's all it says. It may very well work on your arch and platform and toolchain, indefinitely. But again circling back, for code written like this to be so brittle, that is why UB is to be avoided.

And to your point:

> that's where the debate should be, not illusorily ascribed to derefing pointers.

But that is where the debate is. People just do not understand what UB actually means. The article is correct: everything in C is UB. The takeaway is not that, therefore all C code is irredeemably broken (well, to some people it does mean that, anywho..). The takeaway is that most C code IS in fact more delicate than one may originally believe, because of the fact ISO C is under-specified, to allow for specialization dependent on toolchain/arch/hardware/what have you etc.

So it is incumbent on the developer when writing C to correctly acknowledge when they are invoking UB, and to do so intentionally with the awareness that things may just randomly break one day.

greysphereOP5d ago

Thanks, yes I think we had some confusion on the foo() vs parse() and I was referring to foo().

But even for the parse() example, the issue is the aliasing rules (and not the alignment - though that could still be an issue depending on input!) Aliasing isn't even mentioned in the article. Instead the example presents this thing 'people do all the time' and identifies it 'UB' without even identifying the actual issue.

On its own I could forgive the former, making a precise example is tricky (particularly with alignment issues). But this is repeated: milliseconds() is characterized 'UB' because it's inputs could be outside the representable range. Again the function is not UB, the inputs can potentially trigger UB.

Then the function pointer example obfuscates the assignment (fine) with the call (ub). Despite the red herring statement 'NULL compares unequal to any object or function' as the example assigns a function _pointer_ which can be NULL. The honest example of the statement is:

    void foo() = NULL;

which won't even compile because it violates the thing that was just said (among other reasons). The UB is the call below and has nothing to do with equality with NULL.

The repeated pattern of say one thing, show an example that's 'reasonable' and implies that it's related to what was just said, and from that invalid relation conclude that everything is UB in C feels dishonest (particularly when it's so easy to talk about UB in C honestly because there is so much to be legitimately concerned about!)

I appreciate your and your advocacy about UB in C, and I think I agree with most of your points about it and they worth discussing. That's why the article itself is frustrating to me, we don't need to be tricksy when talking about UB, it's already tricky enough!

1 more reply

j / k navigate · click thread line to collapse

0 comments

stevenhuang5d ago

There is an important distinction here to the technical meaning of UB that is lost to many.

> That is a valid operation. Now if that pointer isn't valid (and being unaligned is one of many reasons it could be invalid) then calling the function with that invalid pointer will be UB.

This is incorrect. The moment you express this in source code, it is already UB wrt to the C abstract machine.

6.3.2.3. 755 If the resulting pointer is not correctly aligned for the pointed-to type, the behavior is undefined.

https://c0x.shape-of-code.com/6.3.2.3.html

The important distinction is to KNOW this is still UB; whether the operation yields the expected behaviour on your platform and architecture is completely a separate question.

The reason this is of utmost important is because the C compiler operates on the C abstract machine.

Things will work and then they won't. This can be fine for most cases but not fine for others. If you use C in 2026 you need to understand this.

> come away more confused and scared

greysphereOP5d ago

Maybe I'm misunderstanding. Here is what I'm trying to say.

"Accessing an object which is not correctly aligned" - this is UB

"As an example of this, take this code: ..." - this (code) is not UB.

Is this incorrect somehow?

You could interpret the second sentence as 'under the assumption of an unaligned pointer, let's look at what this seemingly innocuous (and correct) code does.'

    int foo[10];
    int *bar = (int *)(((int)&foo) + 1);

stevenhuang5d ago

You are beginning to understand. Yes, surprisingly, it is (1) that is being claimed.

There is nuance here, and that's why this topic is debated to death, because it's hard to explain and it is genuinely complex.

When people say something is UB, they mean to say that the behaviour is undefined--wrt to ISO C.

The behaviour that actually matters IS defined wrt toolchain, hardware, runtime, alignment of pointer in question.

It is this implied undocumented dependence on factors external to the source code that is a huge source of bugs and surprisal.

So take this example from the article. Yes, it is UB by construction.

(edit: i copied the wrong fragment initially -- if you were talking about the int foo(const int* p) fragment, yes that block is not by construction UB)

    bool parse_packet(const uint8_t\* bytes) {
            const int\* magic_intp = (const int*)bytes;   // UB!
            int magic_raw = foo(magic_intp);  // Probably crashes on SPARC.
            int magic = ntohl(magic_raw); // this is fine, at least.
            […]
    }

Why?

> Because the compiler is not obligated to generate assembly instructions that work on unaligned pointers. Because it’s UB.

And to your point:

> that's where the debate should be, not illusorily ascribed to derefing pointers.

So it is incumbent on the developer when writing C to correctly acknowledge when they are invoking UB, and to do so intentionally with the awareness that things may just randomly break one day.

greysphereOP5d ago

Thanks, yes I think we had some confusion on the foo() vs parse() and I was referring to foo().

    void foo() = NULL;

which won't even compile because it violates the thing that was just said (among other reasons). The UB is the call below and has nothing to do with equality with NULL.

1 more reply

j / k navigate · click thread line to collapse