undefined | Better HN

0 pointsDSMan1952769y ago0 comments

I would agree that strict-aliasing is a hurt point for a lot of C devs, which is unfortunate. I'd only suggest that in general, if the strict-aliasing rule is coming into play you're probably already doing some really shady to begin with. Like in this example, casting a `long ` to an `int ` is likely a bad way to go about things even without worrying about strict-aliasing. In a lot of ways, I'd say that problems with the strict-aliasing rule are a symptom of a larger problem. If you can convince them that what they're doing is just bad coding practice to begin with, you might have a better time making them write correct code in the long run.

Now if you're working more directly with hardware (Which is of course possible/likely with C) then it might just be easier to disable strict-aliasing all together if you can, since identifying all the spots where it might be a problem tends to be an issue.

0 comments

caf9y ago

In the really simple cases (like accessing a float as a long, or similar), you're right.

The problem is the interpretation that's been applied to aggregate types:

  struct a {
    int variant;
  };

  struct b {
    int variant;
    long  data;
  };

I have a pointer to a struct b - can I cast it to a pointer to struct a and still dereference 'variant'? It has the correct type and is guaranteed to live at the start of the struct in both cases. The prevailing opinion seems to be "no" (see https://trust-in-soft.com/wp-content/uploads/2017/01/vmcai.p... eg. example st1).

The BSD socket API was built on exactly this kind of type punning.

DSMan195276OP9y ago

> I have a pointer to a struct b - can I cast it to a pointer to struct a and still dereference 'variant'?

I think it is a bit of a gray area, but personally I've always held the opinion/understanding of no, that is invalid. The C standard does make one point fairly clear on strict-aliasing, which is the idea that strict-aliasing revolves on the idea that an object can only be considered to be one type of data (Or a `char` array, the only exception). Your example would be invalid for the reason that you can't treat an object of `struct b` as though it is a `struct a` - the fact that they share the same preamble doesn't change that. To be clear with what I'm saying: `struct b` can alias an `int`. `struct a` can also alias an `int`. But `struct b` can't alias a `struct a`, and because of this an int accessed through a `struct a` can't be accessed through a `struct b`.

That said, in general I find this to usually be a fixable problem, which also has (IMO) a cleaner implementation:

    struct head {
        int variant;
    };

    struct a {
        struct head h;
    };

    struct b {
        struct head h;
        long data;
    };

Now you can take a pointer to a `struct b` object and treat it like a pointer to a `struct head` object (Because it is a `struct head` object). You could do the same thing with objects of type `struct a`. So now you can cast both of them to `struct head` and examine the `variant` of both. Then later you could cast it back to the proper type.

This approach to aggregate types is heavily used in the Linux Kernel and other places (Including most of my own code). The `container_of` macro makes it even nicer to use (Though the legality of the `container_of` macro is an interesting debate...).

> The BSD socket API was built on exactly this kind of type punning.

Kinda. It's actually surprising how close it comes to skirting this issue (And it does skirt it), but I believe it's actually possible to use BSD sockets without ever violating the strict-aliasing rule (Though of course, there are ways of using it which would arguably violate the rule). In most cases for BSD sockets, strict-aliasing is more likely going to be broken in the kernel, not your code.

To note though, the strict-aliasing rule only applies to dereferencing pointers. You can cast pointers back and forth all day, you just have to ensure that when you're done you're treating it the object as you originally declared it. Thus, if you pass a `struct sockaddr_in` to `bind` and cast it to a `struct sockaddr`, the strict-aliasing rule isn't violated because you never dereferenced the casted pointer.

Going along with that, as long as you correctly declare your `struct sockaddr`s from the beginning you won't have any strict-aliasing woes. The only situation where this could technically be a problem is `accept` and `recvfrom`, since they are the only functions that gives a `struct sockaddr` back. But assuming you already know what address-family the socket is using, you can declare the correct `struct sockaddr` for that family from the start, cast it and pass it to `accept` or `recvfrom`, and then use it as your originally declared type without breaking strict-aliasing.

Of course, it's also worth keeping in mind that the BSD sockets interface came before C89. You definitely wouldn't design it the same way if you were to do it today.

caf9y ago

Kinda. It's actually surprising how close it comes to skirting this issue (And it does skirt it), but I believe it's actually possible to use BSD sockets without ever violating the strict-aliasing rule (Though of course, there are ways of using it which would arguably violate the rule). In most cases for BSD sockets, strict-aliasing is more likely going to be broken in the kernel, not your code.

Well, firstly it's pretty unsatisfying to hear that yes, this API contravenes strict aliasing restrictions, but only on the library side! - essentially that it is impossible to implement the sockets C API in C.

That aside, this still excludes long-standing examples like embedding a pointer to struct sockaddr in your client struct, which points to either a sockaddr_in, sockaddr_in6 or sockaddr_un depending on where that client connected from (well, you can still do it, but now you can't examine the sockaddr's sa_family member to see what type the address really is - you need to have a redundant, duplicate copy of that field in the client struct itself).

The situation is similar with sockaddr_storage. The whole point of that type is to allow you to stash either AF_INET or AF_INET6 addresses in the same object and then examine the ss_family field to see what it really is - the text in POSIX says:

  The <sys/socket.h> header shall define the sockaddr_storage structure, which shall be:

    Large enough to accommodate all supported protocol-specific address structures

    Aligned at an appropriate boundary so that pointers to it can be cast as pointers to protocol-specific address structures and used to access the fields of those structures without alignment problems

( http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys... )

Of course, it's also worth keeping in mind that the BSD sockets interface came before C89. You definitely wouldn't design it the same way if you were to do it today.

Well, the aforementioned sockaddr_storage came about after C89.

And wasn't C89 supposed to be about codifying existing practice, anyway?

1 more reply

TorKlingberg9y ago

A common class assignment or interview question is to write your own memcpy. Towards the end you usually start optimizing it by copying multiple bytes at once. That is undefined behavior. You cannot just cast a pointer to uint32_t* and start using it, unless the underlying object is actually uint32_t. In practice it works fine, so people don't care. We'll see what future compilers will do, especially when the homemade memcpy is inlined somewhere.

An other one is custom malloc backed by a static char array. You're allowed to access any object as char*, but not the other way around. A static char array is always a char array, and accessing it through a pointer to anything else is a strict aliasing violation. Only the built-in malloc and siblings can create untyped memory.

rwj9y ago

You can cast from char* to uint32_t* and start using it. I forget the exact standardese, but there is an execption for char*.

TorKlingberg9y ago

There is an exception for accessing any object through a character type pointer, but not the other way around. uint32_t is not a character type, and it doesn't matter if it was casted to a char* first.

Also, apparently uint8_t may not be a character type.

1 more reply

yorwba9y ago

Only if the pointer is correctly aligned for the uint32_t data type. Otherwise you might get problems with unaligned memory acesses. (Like when you get some data over the wire that is clearly just a memory dump of a C struct, so you just do a pointer cast. Boom, unaligned read.)

j / k navigate · click thread line to collapse

0 comments

caf9y ago

In the really simple cases (like accessing a float as a long, or similar), you're right.

The problem is the interpretation that's been applied to aggregate types:

  struct a {
    int variant;
  };

  struct b {
    int variant;
    long  data;
  };

The BSD socket API was built on exactly this kind of type punning.

DSMan195276OP9y ago

> I have a pointer to a struct b - can I cast it to a pointer to struct a and still dereference 'variant'?

That said, in general I find this to usually be a fixable problem, which also has (IMO) a cleaner implementation:

    struct head {
        int variant;
    };

    struct a {
        struct head h;
    };

    struct b {
        struct head h;
        long data;
    };

> The BSD socket API was built on exactly this kind of type punning.

Of course, it's also worth keeping in mind that the BSD sockets interface came before C89. You definitely wouldn't design it the same way if you were to do it today.

caf9y ago

Kinda. It's actually surprising how close it comes to skirting this issue (And it does skirt it), but I believe it's actually possible to use BSD sockets without ever violating the strict-aliasing rule (Though of course, there are ways of using it which would arguably violate the rule). In most cases for BSD sockets, strict-aliasing is more likely going to be broken in the kernel, not your code.

  The <sys/socket.h> header shall define the sockaddr_storage structure, which shall be:

    Large enough to accommodate all supported protocol-specific address structures

    Aligned at an appropriate boundary so that pointers to it can be cast as pointers to protocol-specific address structures and used to access the fields of those structures without alignment problems

( http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys... )

Of course, it's also worth keeping in mind that the BSD sockets interface came before C89. You definitely wouldn't design it the same way if you were to do it today.

Well, the aforementioned sockaddr_storage came about after C89.

And wasn't C89 supposed to be about codifying existing practice, anyway?

1 more reply

TorKlingberg9y ago

rwj9y ago

You can cast from char* to uint32_t* and start using it. I forget the exact standardese, but there is an execption for char*.

TorKlingberg9y ago

Also, apparently uint8_t may not be a character type.

1 more reply

yorwba9y ago

j / k navigate · click thread line to collapse