But that's not how C "rolls", and we'll never get that. So I guess we now have 41384 ways to do buffer overflow guards.
I agree that this will never be implemented as a standard, but I think that's a good thing. Higher level languages push against their boundaries non stop. Java has libraries and frameworks that fundamentally change the syntax and functionality of the language. C knows what it is. If you want something that it can't do it promises that you can either build it yourself or switch to a different tool.
All of this to say, C has a single suggested way of doing this: using a different language. That's part of why we built them
Note that C does have strong conventions, such as that strings are terminated by a zero byte. Nothing in the language demands that, it’s just a convention! C could adopt better conventions.
I'm really glad that C doesn't do this, personally. It would reduce one of the main advantages of the language.
Is there a library that you recommend for this?
1. Define a macro function for retrieving the length of an array:
#define LEN(arr) (sizeof (arr) / sizeof (arr)[0])
2. Don't introduce macro constants for array lengths; hard code the length in the declaration and use LEN to retrieve it. Example: int a[100];
...
for (i = 0; i < LEN(a); i++) {
...
}
3. Define a macro function for dynamic array allocation: #define NEW_ARRAY(ptr, n) \
(ptr) = malloc((n) * sizeof (ptr)[0]); \
if ((ptr) == NULL) { \
fprintf(stderr, "Memory allocation failed: %s\n", strerror(errno)); \
exit(EXIT_FAILURE); \
}
4. When you create a function with an array argument, also add an argument for the array length.5. Use a convention for naming the length of array pointer targets, for instance by adding the suffix `Len'. Example:
int *b, bLen = 100;
...
NEW_ARRAY(b, bLen); /* nice to know that b and bLen belong together */
...
SomeFunction(b, bLen, ...);
...
for (i = 0; i < bLen; i++) {
...
}
6. Define your own safe wrappers around unsafe standard library functions or use someone else's code that does that.In my experience it's most likely that a function will write past the bounds of a buffer that's been passed as an argument. In that case, make sure the size of array is always included as an argument as you said in 4.
Even worse, even if you specify the argument to be "of the type" array, it will actually still decay to a pointer. Basically, this macro will only work if you use it in the same function the array is defined.
You should either add overflow checking to the macro or even better just use the damn libc api and call calloc. Or if you really insist on avoiding zeroing overhead, there's reallocarray(NULL, ...) if you use a reasonably modern libc.
int (*data)[datalen];
This requires you to dereference it once to get an array, then dereference it a second time to get a value. The advantage is that the array value can be used the same as an normal array on the stack, including passing it to the array length macro you describe.There is an RFC proposal for the Clang frontend for adding bounds checking reminiscent of Microsoft's SAL. [2]
[1] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3003.pdf
[2] https://discourse.llvm.org/t/rfc-enforcing-bounds-safety-in-...
#define LEN(NAME) (sizeof NAME / sizeof(NAME)[0])
I think gcc has a warning for this pattern now: when the size of a pointer is divided by the size of its referent type.
More importantly, it has an odd extra level of indirection. The traditional definition is:
#define LEN(ARRAY) (sizeof ARRAY / sizeof (ARRAY)[0])
This means that to use LEN on an array, we have to take the address:
char *array[5];
LEN(&array); // -> 5
If we use LEN(array);
which is an easy mistake, we get: sizeof *array / sizeof (*array)[0]
which is sizeof (char *) / sizeof (char)
which is sizeof (char *)
which is likely 4 or 8.I do see that LEN is supposed to be (only) used in conjunction with ARR:
#define ARR(TYPE, NAME, COUNT) TYPE(*NAME)[COUNT]
but that isn't enforced. An idea would be to add some "secret" prefix or suffix to NAME like blah_ ## NAME, so that name cannot be referenced without going through the macros; i.e. if we define ARR(int, foo, 42) then there is no declared identifier foo; it actually declares blah_foo, and LEN(foo) knows about that, also adding the prefix. Thus mistakenly using LEN(foo) on something not declared with ARR will likely be a reference to an undeclared identifier. #define AT(NAME, IDX) \
((typeof(&(*NAME)[0])) \
((ASSERT(((size_t)IDX) * sizeof(*NAME)[0] < sizeof *NAME, \
"Buffer Overflow. Index [%lu] is out of range [0-%lu]", \
((size_t)IDX), ((sizeof *NAME / sizeof(*NAME)[0]) - 1))), \
((uchar *)*NAME) + ((size_t)IDX) * sizeof(*NAME)[0]))
Some of this might be pushed into non-inlined run-time
support function. That could be static and defined in the
header, to keep it header-only, but ideally there would be a .c
file so it's defined only once.When you factor in the definition of ASSERT, and the ERRLOG macro that is using, it's a lot of cruft for just one array access.
Some compile-time options (via preprocessor macros) to control the bloat would be useful; e.g. a way of compiling it so that AT will just predictably crash, without a detailed error message with __FILE__ and __LINE__ and all. Basically just the check, with a branch to some code that calls abort() if it's out of bounds.
Also doesn't cover all the string and memory buffer manipulations.
SAL and Frama-C are the bare minimum for security in C code.
It's a nice thought, don't get me wrong, but it's hard enough to convince people to add `-fsanitize=...` to their compiler flags. An entire separate static analysis tool with its own learning curve (and its own set of idiosyncrasies) doesn't really qualify for "bare minimum" IMO.
[1] https://learn.microsoft.com/en-us/cpp/code-quality/understan...
C still wins by far when writing libraries that will be used by lots of other people. Doesn't matter what language they are using, they will be able to add in a library written in C very easily. However, C++ or Rust libraries, even with appropriate bindings for the target language, users of the library will need to bring in an entirely new compiler tool chain that may or may not exist on the target architecture. But the C tool chain will exist for that architecture and be robust.
For new side projects, pick what you want to use of course. But for existing codebases and projects that aspire to have maximum impact, I recommend fully considering tradeoffs instead of thinking in terms of "clear winners".
C++ keeps C's crap array type as its native array type. You need to reach into the C++ standard library to get this awkward library type, std::array<type,N> and then finally you get an array type that remembers how big it is and has some basic features like swap.
Microsoft security team is on the record that just because they are adopting Rust, they won't shy away from C++.
Unless you mean array of anything like in typeless dynamic languages I do not see anything awkward about STL arrays in C++.
It makes writing (and hiring) a low-level project in C++ a much more complex task. It may have benefits, it may not. But C++ is so huge that it's difficult to judge whether it would offer an advantage.
And then there's the minefield of tooling in embedded development...
As a C++ developer, that sounds strange. Can you point me to some documentation about "dependent types"?
that being said, some do not want more macros either
Can be turned off on demand for relevant symbols.
> larger gap between source code and ISA
There's already a huge gap between C code and machine code (see: Undefined Behavior). C hasn't been a "portable assembler" for a very long time.
> impedance mismatch when working with C APIs
C++ has no problem working with C APIs.
Modern C (2019) - https://news.ycombinator.com/item?id=36167820 - June 2023 (19 comments)
I try to avoid the malloc(n * sizeof (...)) pattern as much as possible. Sure there are lots of cases where it can never overflow, and you might save a bit of overhead from the zeroing and overflow checking, but most of that overhead might also be imaginary depending on allocator internals, and even kernel internals. It's the sort of thing it only makes sense to optimise when you've already squeezed out every bit of performance. And by then you've probably minimised dynamic allocation as much as possible anyway.
It's also very easy to think something like "well, n is passed in as a parameter, but it's a static function, and I know all the callers. So it's fine".
But now every caller in the future has to be aware of this possibility.
Can you clarify: what possibility should you be aware off with malloc that you don't need to be aware of with calloc?
size_t SomeIndex() {
static size_t example_index = 0;
return example_index++ % 2;
}
int main() {
NEW(int, arr, 1);
// This buffer overflow is not detected:
*AT(arr, SomeIndex()) = 42;
return 0;
}"'Oil,' says I, 'never explodes. It's the gas that forms that explodes.' But I shakes hands with him, anyway.
...
"'Listen,' says I. 'I instruct her to keep her lamp clean and well filled. If she does that it can't burst. And with the sand in it she knows it can't, and she don't worry.
— O. Henry, The Man Higher Up
EDIT: although, it seems like this looses much of its power once you start passing these buffers around to functions that do not use these macros.
Here is my oob.c program. I will show the output, and then the content of "oob.h".
#include <stdlib.h>
#include <stdio.h>
#include "oob.h"
int oob_fail(const char *file, int line)
{
fprintf(stderr, "%s:%d:out of bounds array access\n", file, line);
abort();
}
/*
* Declare properties of array type x
*/
#define ARRAY_ELTYPE_x int /* element type is int */
#define ARRAY_SIZE_x 7 /* number of elements is 7 */
/*
* Ensure array type x is fully declared at file scope
*/
ARRAY_FULLTYPE(x);
/*
* Inform the OOB module that the identifiers p and a are
* used as variables related to type x: either pointers
* to it or values.
*/
#define ARRAY_TYPEOF_p x
#define ARRAY_TYPEOF_a x
int get_elem(ARRAY_TYPE(x) *p, int i)
{
return APREF(p, i);
}
int main(void)
{
ARRAY_TYPE(x) a = ARRAY_INIT(1, 2, 3);
for (size_t i = 0; i <= ARRAY_SIZEOF(a); i++)
printf("a[%zd] == %d\n", i, get_elem(&a, i));
return 0;
}
Output: $ ./oob
a[0] == 1
a[1] == 2
a[2] == 3
a[3] == 0
a[4] == 0
a[5] == 0
a[6] == 0
oob.c:31:out of bounds array access
Aborted (core dumped)
The content of "oob.h" #ifndef OOB_H_435E_FDE9
#define OOB_H_435E_FDE9
int oob_fail(const char *file, int line);
#define OOB_PREFIX oob_ident_
#define OOB_XCAT(X, Y) X ## Y
#define OOB_CAT(X, Y) OOB_XCAT(X, Y)
#define ARRAY_ELTYPE(T) OOB_CAT(ARRAY_ELTYPE_, T)
#define ARRAY_SIZE(T) OOB_CAT(ARRAY_SIZE_, T)
#define ARRAY_TAG(T) OOB_CAT(ARRAY_TAG_, T)
#define ARRAY_FULLTYPE(T) \
struct ARRAY_TAG(T) { \
ARRAY_ELTYPE(T) a[ARRAY_SIZE(T)]; \
}
#define ARRAY_TYPE(T) struct ARRAY_TAG(T)
#define ARRAY_TYPEOF(V) OOB_CAT(ARRAY_TYPEOF_, V)
#define ARRAY_SIZEOF(V) ARRAY_SIZE(ARRAY_TYPEOF(V))
#define ARRAY_INIT(...) { { __VA_ARGS__ } }
#define AREF(ARRAY, I) \
(((size_t) (I) >= ARRAY_SIZEOF(ARRAY)) \
? oob_fail(__FILE__, __LINE__), (ARRAY).a[0] \
: (ARRAY).a[I])
#define APREF(PARRAY, I) \
(((size_t) (I) >= ARRAY_SIZEOF(PARRAY)) \
? oob_fail(__FILE__, __LINE__), (PARRAY)->a[0] \
: (PARRAY)->a[I])
#endif
Preprocessor invoked on oob.c (snipped down to the relevant part after the run-time support function oob_fail): struct ARRAY_TAG_x { int a[7]; };
int get_elem(struct ARRAY_TAG_x *p, int i)
{
return (((size_t) (i) >= 7) ? oob_fail("oob.c", 31), (p)->a[0] : (p)->a[i]);
}
int main(void)
{
struct ARRAY_TAG_x a = { { 1, 2, 3 } };
for (size_t i = 0; i <= 7; i++)
printf("a[%zd] == %d\n", i, get_elem(&a, i));
return 0;
}
It's clean enough to be readable (except, of course, code dense with AREF or APREF calls will be a mess). Uses arrays wrapped in structs, so you can pass arrays by value.You have to make a list of your variables that are involved and write some #define lines for them.
Same for the array types.