For instance, say we write a function that rotates an array: it moves the low M bytes to the top of the array, and shuffles the remaining M - N bytes down to the bottom. This function will work fine with the zero byte memmove or memcpy operations in the special case when N == 0, because the pointer will be valid.
Now say we have something like this:
struct buf {
char *ptr;
size_t size;
};
we would like it so that when the size is zero, we don't have an allocated buffer there. But we'd like to support a zero sized memcpy in that case: memcpy(buf->ptr, whatever, 0) or in the other direction likewise.We now have to check for buf->ptr being buf in the code that deals with resizing.
Here is a snag in the C language related to zero sized arrays. The call malloc(0) is allowed to return a null pointer, or a non-null pointer that can be passed to free.
oops! In the one case, the pointer may not be used with a zero-sized memcpy; in the other case it can.
This also goes for realloc(NULL, 0) which is equivalent to malloc(0).
And, OMG I just noticed ...
In C99, this was valid realloc(ptr, 0) where ptr is a valid, allocated pointer. You could realloc an object to zero.
I'm looking at the April 2023 draft (N3096). It states that realloc(ptr, 0) is undefined behavior.
When did that happen?
[0] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2464.pdf
When those implementations eventually pick up C23, they surely could fix the bug as well. At best this should have been an errata/defect for the previous standard, so that the previous standards document behavior of implementations of said standards.
The requirements in C99 and before are perfectly clear. realloc is described as liberating the old pointer, and then allocates a new one as if by malloc. (Except that it magically has access to both objects so it can transfer the necessary bytes that must be transferred from the old to the new.)
It is perfectly clear what happens when size is zero. No byte can be copied from the old object, if any. The behavior is like free(oldptr) followed by return malloc(newsize).
Your IQ would have to be well below 85 to misunderstand the requirements.
And those requirements are still there; there is still the description of realloc in terms of freeing the old pointer and allocating a new object with malloc.
There was no need to insert a gratuitous removal of definedness for the size zero case, given that malloc handles it.
Applications now have to do this:
void *sane_realloc(void *ptr, size_t size)
{
if (size == 0) {
// behave literally as required in C99
free(ptr);
return malloc(0);
}
return realloc(ptr, size);
}
Supposedly because a few vendors were not able to code this logic in their realloc functions?It's very strange. I wrote my own memory allocator and I can't figure out the right way to handle this. Eliminating the need for these "technically" valid pointers that can't actually be accessed because they're zero sized seems like the better solution.
> When did that happen?
More importantly, why did that happen? People have told me that I should care about the C standards committee because they take backwards compatibility very seriously. Then they come out with breaking changes like these.
Mainly, that it has supported that before and programs rely on it.
Programs written to the C99 standard can resize a dynamic vector down to empty with a resize(ptr, 0). The pointer coming from that will be the same as if malloc(0) has been called.
So now, that has been taken away; those programs can now make demons fly out of your nose.
Thank you, ISO C!
> Do allocators really keep track of these null allocations? That would require keeping state for every single address in the worst case...
Implementations of malloc(0) that don't return null are required to return a unique object. To do that, all they have to do is pretend that the size is some nonzero value like 1 byte. (The application must not assume that there is any byte there that can be accessed).
Another option is to treat them as being of size 1.
(In theory you could do endless allocations of size 0, and eventually you'd run out of space, even though you've allocated 0 bytes in total. But you end up in exactly that situation, whatever the allocation size, if you don't take bookkeeping overhead into account!)
[1]: https://github.com/rust-lang/unsafe-code-guidelines/issues/4...
If you ensure that the 'zero page' (so to speak) is empty you can also exploit this property for optimizations, and in some cases the emscripten toolchain will do so.
i.e. if you have
struct MyArray<T> {
uint length;
T items[0];
}
you can elide null pointer checks and just do a single direct bounds check before dereferencing an element, because for a nullptr, (&ptr->length) == nullptr, and if you reserve the zero page and keep it empty, (nullptr)->length == 0.this complicates the idea of 'passing nothing' because now it is realistically possible for your code to get passed nullptr on purpose and it might be expected to behave correctly when that happens, instead of asserting or panicking like it would on other (sensible) targets
E.g., I am pretty sure Go relies on some of the behavior described here: that the 0 page is unmapped, and that accesses will trap. This is why Go code will sometimes SIGSEGV despite being an almost memory-safe language: Go is explicitly depending on that trap (and it permits Go, in those cases, to elide a validity check). (Vs. some memory accesses will incur a bounds check & panic, if Go cannot determine that they will definitely land in the first page; Go there must emit the validity check, and failing it is a panic, instead of a SIGSEGV.)
IIRC, Linux doesn't permit at least unprivileged processes to map address 0, I believe. (Although I can't find a source right now for that.)
¹Yes, in most languages this is UB … but what I'm saying is that having it trap makes errors — usually security errors — obvious & fail, instead of really letting the UB just do whatever and really going off into "it's really undefined now" territory.
> But suppose we want an empty (length zero) slice.
So is there an actual rationale for this? I've written the memory allocator and am in the process of developing the foreign interface. I've been wondering if I should explicitly support zero length allocations. Even asked this a few times here on HN but never got an answer. It seems to be a thing people sort of want but for unknown reasons.
I definitely see the benefits of well-defined arithmetic on null pointers. As a data type though it seems to me that any pointer could be a zero sized allocation.
There are a bunch of languages where empty arrays are "falsy", and in those it's not recommendable to use the two to differentiate valid states. Feels like the same could apply here
The C++ type discussed is much newer than Rust (std::span was standardized in C++ 20).
Yes in many cases what C++ APIs mean here isn't a slice of zero Ts at all but instead None, and Rust has an appropriate type for that Option<&[T]> which works as expected, and so in many cases where people have built an API which they think is &[T] and are trying to make it with the unsafe functions mentioned it's actually Option<&[T]> they needed anyway, they don't even have a type correct design.
Pass something with a 0 length, pointing to NULL. Enjoy your blue screens and kernel panics.
The solution is clear: just ignore the C spec. It’s total garbage. Of course you can memcpy between any ptr values if the count is zero and those ptr values don’t have to point to anything.
UB to pass memcpy to null means after that call, the pointer is assumed to be non-null. So if(ptr) can constant fold. Maybe faster.
I'm in agreement with you on this but your compiler probably isn't.
No need for that.
> the pointer is assumed to be non-null
Just give us an option to tell the compiler to stop assuming nonsense like that. I'm gonna make it standard on my makefiles just like -fno-strict-aliasing and -fwrapv.
There's no use trying to work around C standard problems. Compilers should just be told to define the undefined and to disable everything that can't be defined. Then we can write code on solid foundations instead of quicksand.
> Or your own memcpy with a different name.
I wish. I couldn't escape that function even on my freestanding nolibc project. The compilers will happily emit calls to memcpy and memset all by themselves whenever they feel like it and god help you if you don't provide them because for some reason this nonsense can't be disabled.
My impression is that Zig doesn't have a documented memory model that cares about things like whether an address corresponds to an allocation or not, so problems relating to this sort of thing cannot come up yet :)
https://github.com/ziglang/zig/commit/32e0dfd4f0dab351a024e7...
From the title, I assumed that this article was going to be about either (a) permissive grading standards at university or (b) chronic constipation.