Neverflow: C macros that guard against buffer overflows (opens in new tab)

(github.com)

120 pointssertraline2y ago143 comments

143 comments

The problem with C and buffer overflows isn't that you can't guard against them, or that there is no existing, reusable code to do so — it's that none of this functionality is standardized. Adding another one to the existing 41383 ways of doing this is in fact the exact opposite of what's needed. Ideally C needs one way of doing this, and that would be described in the standard.

But that's not how C "rolls", and we'll never get that. So I guess we now have 41384 ways to do buffer overflow guards.

ActorNightly2y ago

There is value in actually understanding what someone is doing in regards to protecting against buffer overflows, instead of relying on well established patterns.

hinkley2y ago

Not when I’m trying to orchestrate third party libraries.

6D794163636F7562y ago

C never has just one way to do something. myArr[5] == 5[myArr] == (insert pointer arithmetic that I won't write here without a compiler check). I think that part of C's beauty is that it gives you freedom. Freedom to shoot yourself in the foot, freedom to write hyper efficient code, and freedom to choose another tool.

I agree that this will never be implemented as a standard, but I think that's a good thing. Higher level languages push against their boundaries non stop. Java has libraries and frameworks that fundamentally change the syntax and functionality of the language. C knows what it is. If you want something that it can't do it promises that you can either build it yourself or switch to a different tool.

All of this to say, C has a single suggested way of doing this: using a different language. That's part of why we built them

adastra222y ago

Those are syntactic sugar for the same thing though. Array[5] is just shorthand for *(Array + 5), which is why 5[Array] also works (because addition is commutative).

Note that C does have strong conventions, such as that strings are terminated by a zero byte. Nothing in the language demands that, it’s just a convention! C could adopt better conventions.

3 more replies

shrimp_emoji2y ago

Checked arithmetic has been implemented in the standard with `ckdint.h`, so give it 50 more years!

JohnFen2y ago

> Ideally C needs one way of doing this, and that would be described in the standard.

I'm really glad that C doesn't do this, personally. It would reduce one of the main advantages of the language.

nix0n2y ago

> existing, reusable code to do so

Is there a library that you recommend for this?

augustk2y ago

Even without array bounds checking, a bit of discipline and smart conventions will go a long way of reducing errors:

1. Define a macro function for retrieving the length of an array:

  #define LEN(arr) (sizeof (arr) / sizeof (arr)[0])

2. Don't introduce macro constants for array lengths; hard code the length in the declaration and use LEN to retrieve it. Example:

  int a[100];
  ...
  for (i = 0; i < LEN(a); i++) {
     ...
  }

3. Define a macro function for dynamic array allocation:

  #define NEW_ARRAY(ptr, n) \
     (ptr) = malloc((n) * sizeof (ptr)[0]); \
     if ((ptr) == NULL) { \
        fprintf(stderr, "Memory allocation failed: %s\n", strerror(errno)); \
        exit(EXIT_FAILURE); \
     }

4. When you create a function with an array argument, also add an argument for the array length.

5. Use a convention for naming the length of array pointer targets, for instance by adding the suffix `Len'. Example:

  int *b, bLen = 100;
  ...
  NEW_ARRAY(b, bLen);  /* nice to know that b and bLen belong together */
  ...
  SomeFunction(b, bLen, ...);
  ...
  for (i = 0; i < bLen; i++) {
    ...
  }

6. Define your own safe wrappers around unsafe standard library functions or use someone else's code that does that.

ChrisRR2y ago

The issue with 1 is that it only works until you pass an array into a function by pointer, then the macro no longer works.

In my experience it's most likely that a function will write past the bounds of a buffer that's been passed as an argument. In that case, make sure the size of array is always included as an argument as you said in 4.

ollien2y ago

> The issue with 1 is that it only works until you pass an array into a function by pointer, then the macro no longer works. GCC even has a warning for this.

Even worse, even if you specify the argument to be "of the type" array, it will actually still decay to a pointer. Basically, this macro will only work if you use it in the same function the array is defined.

https://godbolt.org/z/vr4za73qq

2 more replies

mzs2y ago

see item 4

mtlmtlmtlmtl2y ago

Your allocation macro can lead to heap underflows if the multiplication wraps around. Which can definitely be exploitable.

You should either add overflow checking to the macro or even better just use the damn libc api and call calloc. Or if you really insist on avoiding zeroing overhead, there's reallocarray(NULL, ...) if you use a reasonably modern libc.

f33d51732y ago

You could extend point 1. by making a convention of always declaring pointers to arrays like so:

  int (*data)[datalen];

This requires you to dereference it once to get an array, then dereference it a second time to get a value. The advantage is that the array value can be used the same as an normal array on the stack, including passing it to the array length macro you describe.

teo_zero2y ago

Isn't this exactly what the fine article does?

mzs2y ago

Nice! I don't like how C has null terminated char arrays plays with this. Ideally this would somehow enforce a guard null byte at the end of each array not included in the size.

nerpderp822y ago

> bit of discipline

hgs32y ago

C23 improved struct compatibility so you might be able to leverage that to craft macros that better emulate slices. [1]

There is an RFC proposal for the Clang frontend for adding bounds checking reminiscent of Microsoft's SAL. [2]

[1] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3003.pdf

[2] https://discourse.llvm.org/t/rfc-enforcing-bounds-safety-in-...

uecker2y ago

You may be interested in this: https://github.com/uecker/noplate.git

kazinator2y ago

The following error prone: it can be mistakenly applied to a pointer:

#define LEN(NAME) (sizeof NAME / sizeof(NAME)[0])

I think gcc has a warning for this pattern now: when the size of a pointer is divided by the size of its referent type.

More importantly, it has an odd extra level of indirection. The traditional definition is:

#define LEN(ARRAY) (sizeof ARRAY / sizeof (ARRAY)[0])

This means that to use LEN on an array, we have to take the address:

   char *array[5];
   LEN(&array);  // -> 5

If we use

   LEN(array);

which is an easy mistake, we get:

    sizeof *array / sizeof (*array)[0]

which is

    sizeof (char *) / sizeof (char)

which is

    sizeof (char *)

which is likely 4 or 8.

I do see that LEN is supposed to be (only) used in conjunction with ARR:

    #define ARR(TYPE, NAME, COUNT) TYPE(*NAME)[COUNT]

but that isn't enforced. An idea would be to add some "secret" prefix or suffix to NAME like blah_ ## NAME, so that name cannot be referenced without going through the macros; i.e. if we define ARR(int, foo, 42) then there is no declared identifier foo; it actually declares blah_foo, and LEN(foo) knows about that, also adding the prefix. Thus mistakenly using LEN(foo) on something not declared with ARR will likely be a reference to an undeclared identifier.

skullchap2y ago

It's so funny, but i actually had this in 0.0.1 for exact same reason. I removed it in 0.0.2 today after complains that it complicates things and a bit confusing. It made harder to pass VLAs to functions. Maybe if i find a better way i will return name mangling again, but for now being able to pass arrays to functions and maintain same flexibility is more important imo

kazinator2y ago

The expansion of the AT macro seems a bit bloated:

  #define AT(NAME, IDX)                                         \
    ((typeof(&(*NAME)[0]))                                      \
    ((ASSERT(((size_t)IDX) * sizeof(*NAME)[0] < sizeof *NAME,   \
    "Buffer Overflow. Index [%lu] is out of range [0-%lu]",     \
    ((size_t)IDX), ((sizeof *NAME / sizeof(*NAME)[0]) - 1))),   \
    ((uchar *)*NAME) + ((size_t)IDX) * sizeof(*NAME)[0]))

Some of this might be pushed into non-inlined run-time support function. That could be static and defined in the header, to keep it header-only, but ideally there would be a .c file so it's defined only once.

When you factor in the definition of ASSERT, and the ERRLOG macro that is using, it's a lot of cruft for just one array access.

Some compile-time options (via preprocessor macros) to control the bloat would be useful; e.g. a way of compiling it so that AT will just predictably crash, without a detailed error message with __FILE__ and __LINE__ and all. Basically just the check, with a branch to some code that calls abort() if it's out of bounds.

nerpderp822y ago

Benchmark it after -O3, does it really matter ?

pjmlp2y ago

Interesting idea, although given the demotion into optional feature in C11, it isn't necessarly portable.

Also doesn't cover all the string and memory buffer manipulations.

SAL and Frama-C are the bare minimum for security in C code.

e4m22y ago

Frama-C as a bare minimum is a pipe dream.

It's a nice thought, don't get me wrong, but it's hard enough to convince people to add `-fsanitize=...` to their compiler flags. An entire separate static analysis tool with its own learning curve (and its own set of idiosyncrasies) doesn't really qualify for "bare minimum" IMO.

pjmlp2y ago

Thankfully the ongoing cybersecurity laws will change that mindset.

1 more reply

uecker2y ago

We will make VM-types, i.e. pointers to VLAs, mandatory in C23.

kovac2y ago

What is SAL?

hgs32y ago

Source-code annotation language (SAL) [1].

[1] https://learn.microsoft.com/en-us/cpp/code-quality/understan...

pjmlp2y ago

Besides the sibling comment, SAL was born out of the security efforts to fix Windows XP that ended up with the release of Windows XP SP2.

inetknght2y ago

Why use C and keep reinventing things that C++ provides?

epistasis2y ago

If one is ready to switch languages, then the clear winner is rust over C++, and I say that as someone who avoided diving into Rust for years because it seemed completely overhyped and with too much cryptic syntax.

C still wins by far when writing libraries that will be used by lots of other people. Doesn't matter what language they are using, they will be able to add in a library written in C very easily. However, C++ or Rust libraries, even with appropriate bindings for the target language, users of the library will need to bring in an entirely new compiler tool chain that may or may not exist on the target architecture. But the C tool chain will exist for that architecture and be robust.

humanrebar2y ago

Availability of C++ tooling is much, much closer to availability of C tooling (often it's the same tool!) compared to Rust. Adopting Rust isn't the same category of conversion at all.

For new side projects, pick what you want to use of course. But for existing codebases and projects that aspire to have maximum impact, I recommend fully considering tradeoffs instead of thinking in terms of "clear winners".

2 more replies

kllrnohj2y ago

Exporting a stable C ABI/API in no way requires writing the implementation in C. See Android's NDK for a rather widely deployed example. All the APIs are C, yet none of the implementations are C. Same thing works great in Rust, too. You can trivially export C from a Rust implementation.

1 more reply

coliveira2y ago

People using C will not change to your language-du-jour, please stop.

4 more replies

ActorNightly2y ago

Rust is by far not mature enough for serious development. Recent shenanigans with crablang are a strong sign of it going down the route of Java, i.e a corporate developed language with offshoots, which will end up with Rust being in the same crappy state.

1 more reply

tialaramex2y ago

But this isn't something the C++ language provides, which is hilarious.

C++ keeps C's crap array type as its native array type. You need to reach into the C++ standard library to get this awkward library type, std::array<type,N> and then finally you get an array type that remembers how big it is and has some basic features like swap.

pjmlp2y ago

True, but it also adds lot of features that help to easily migrate to saner features without rewriting the world and throw away 30 years of tooling.

Microsoft security team is on the record that just because they are adopting Rust, they won't shy away from C++.

1 more reply

FpUser2y ago

>"std::array<type,N>"

Unless you mean array of anything like in typeless dynamic languages I do not see anything awkward about STL arrays in C++.

1 more reply

ChrisRR2y ago

For me the issue is that using C++ brings every single feature in with it. It's very easy to hire developers and they know the entirety of the C language, but using C++ has every feature you could ever want and multiple ways of achieving the same thing.

It makes writing (and hiring) a low-level project in C++ a much more complex task. It may have benefits, it may not. But C++ is so huge that it's difficult to judge whether it would offer an advantage.

And then there's the minefield of tooling in embedded development...

hgsgm2y ago

Knowing every feature of C means they have to learn custom patterns on top of the C to make things work, and that almost always means horrific unhygienic macros.

pjmlp2y ago

I keep having that discussion since the C vs C++ Usenet flamewars....

uecker2y ago

These are dependent types which C++ does not have at all. The C support is fairly weak though... But most programming language people I know agree that dependent types are they way to guard against overflow with minimal overhead. So hope we can evolve C in this direction.

inetknght2y ago

> These are dependent types which C++ does not have at all.

As a C++ developer, that sounds strange. Can you point me to some documentation about "dependent types"?

zffr2y ago

There’s lots of software already written in C that needs to be updated and maintained

properclass2y ago

the obvious answer is that one does not want some things that C++ entails, three examples: - name mangling - larger gap between source code and ISA - impedance mismatch when working with C APIs

that being said, some do not want more macros either

adwn2y ago

> name mangling

Can be turned off on demand for relevant symbols.

> larger gap between source code and ISA

There's already a huge gap between C code and machine code (see: Undefined Behavior). C hasn't been a "portable assembler" for a very long time.

> impedance mismatch when working with C APIs

C++ has no problem working with C APIs.

dang2y ago

Related ongoing thread:

Modern C (2019) - https://news.ycombinator.com/item?id=36167820 - June 2023 (19 comments)

JonChesterfield2y ago

Runtime bounds check tied to fprintf and abort via macros. Allocation by calloc.

mtlmtlmtlmtl2y ago

The calloc part is one of the most common blind spots I see among C programmers.

I try to avoid the malloc(n * sizeof (...)) pattern as much as possible. Sure there are lots of cases where it can never overflow, and you might save a bit of overhead from the zeroing and overflow checking, but most of that overhead might also be imaginary depending on allocator internals, and even kernel internals. It's the sort of thing it only makes sense to optimise when you've already squeezed out every bit of performance. And by then you've probably minimised dynamic allocation as much as possible anyway.

It's also very easy to think something like "well, n is passed in as a parameter, but it's a static function, and I know all the callers. So it's fine".

But now every caller in the future has to be aware of this possibility.

lelanthran2y ago

> But now every caller in the future has to be aware of this possibility.

Can you clarify: what possibility should you be aware off with malloc that you don't need to be aware of with calloc?

2 more replies

cornstalks2y ago

This evaluates macro parameters multiple times, so if the parameters have side effects or evaluate inconsistently this won't work. For example:

    size_t SomeIndex() {
      static size_t example_index = 0;
      return example_index++ % 2;
    }

    int main() {
      NEW(int, arr, 1);
      // This buffer overflow is not detected:
      *AT(arr, SomeIndex()) = 42;
      return 0;
    }

1 more reply

frabert2y ago

Never heard of a serious buffer overflow caused by _constant_ indices. Does it work with AT(arr, i), or only with AT(arr, 10)?

oleganza2y ago

"'Brother,' says he, 'greetings. Didn't I see you in Southern Missouri last summer selling colored sand at half-a-dollar a teaspoonful to put into lamps to keep the oil from exploding?'

"'Oil,' says I, 'never explodes. It's the gas that forms that explodes.' But I shakes hands with him, anyway.

...

"'Listen,' says I. 'I instruct her to keep her lamp clean and well filled. If she does that it can't burst. And with the sand in it she knows it can't, and she don't worry.

— O. Henry, The Man Higher Up

CyberDildonics2y ago

Did you mean to reply somewhere else? This thread is about about bounds checking arrays in the C programming language.

1 more reply

heylemao2y ago

Yeap, that's the whole point of it

frabert2y ago

Huh I misinterpreted the error messages in the example, I thought those were compiler output. This is quite cool then.

EDIT: although, it seems like this looses much of its power once you start passing these buffers around to functions that do not use these macros.

3 more replies

uecker2y ago

See also here for my experiments, but it relies on UBSan for bounds checking: https://github.com/uecker/noplate.git

norir2y ago

The best way to deal with this kind of thing is to write a small language that transpiles to the subset of c that you are using.

1 more reply

kazinator2y ago

Here is a different take on it. We can use #define to inform the header about the properties of certain symbols.

Here is my oob.c program. I will show the output, and then the content of "oob.h".

  #include <stdlib.h>
  #include <stdio.h>
  #include "oob.h"

  int oob_fail(const char *file, int line)
  {
    fprintf(stderr, "%s:%d:out of bounds array access\n", file, line);
    abort();
  }

  /*
   * Declare properties of array type x
   */
  #define ARRAY_ELTYPE_x int    /* element type is int */
  #define ARRAY_SIZE_x 7        /* number of elements is 7 */

  /*
   * Ensure array type x is fully declared at file scope
   */
  ARRAY_FULLTYPE(x);

  /*
   * Inform the OOB module that the identifiers p and a are
   * used as variables related to type x: either pointers
   * to it or values.
   */
  #define ARRAY_TYPEOF_p x
  #define ARRAY_TYPEOF_a x

  int get_elem(ARRAY_TYPE(x) *p, int i)
  {
     return APREF(p, i);
  }

  int main(void)
  {
     ARRAY_TYPE(x) a = ARRAY_INIT(1, 2, 3);

     for (size_t i = 0; i <= ARRAY_SIZEOF(a); i++)
        printf("a[%zd] == %d\n", i, get_elem(&a, i));

     return 0;
  }

Output:

  $ ./oob
  a[0] == 1
  a[1] == 2
  a[2] == 3
  a[3] == 0
  a[4] == 0
  a[5] == 0
  a[6] == 0
  oob.c:31:out of bounds array access
  Aborted (core dumped)

The content of "oob.h"

  #ifndef OOB_H_435E_FDE9
  #define OOB_H_435E_FDE9

  int oob_fail(const char *file, int line);

  #define OOB_PREFIX oob_ident_
  #define OOB_XCAT(X, Y) X ## Y
  #define OOB_CAT(X, Y) OOB_XCAT(X, Y)

  #define ARRAY_ELTYPE(T) OOB_CAT(ARRAY_ELTYPE_, T)
  #define ARRAY_SIZE(T) OOB_CAT(ARRAY_SIZE_, T)
  #define ARRAY_TAG(T) OOB_CAT(ARRAY_TAG_, T)

  #define ARRAY_FULLTYPE(T)                                                     \
    struct ARRAY_TAG(T) {                                                       \
      ARRAY_ELTYPE(T) a[ARRAY_SIZE(T)];                                         \
    }

  #define ARRAY_TYPE(T) struct ARRAY_TAG(T)

  #define ARRAY_TYPEOF(V) OOB_CAT(ARRAY_TYPEOF_, V)
  #define ARRAY_SIZEOF(V) ARRAY_SIZE(ARRAY_TYPEOF(V))

  #define ARRAY_INIT(...) { { __VA_ARGS__ } }

  #define AREF(ARRAY, I)                                                        \
    (((size_t) (I) >= ARRAY_SIZEOF(ARRAY))                                      \
     ? oob_fail(__FILE__, __LINE__), (ARRAY).a[0]                               \
     : (ARRAY).a[I])

  #define APREF(PARRAY, I)                                                      \
    (((size_t) (I) >= ARRAY_SIZEOF(PARRAY))                                     \
     ? oob_fail(__FILE__, __LINE__), (PARRAY)->a[0]                             \
     : (PARRAY)->a[I])

  #endif

Preprocessor invoked on oob.c (snipped down to the relevant part after the run-time support function oob_fail):

  struct ARRAY_TAG_x { int a[7]; };


  int get_elem(struct ARRAY_TAG_x *p, int i)
  {
     return (((size_t) (i) >= 7) ? oob_fail("oob.c", 31), (p)->a[0] : (p)->a[i]);
  }

  int main(void)
  {
     struct ARRAY_TAG_x a = { { 1, 2, 3 } };

     for (size_t i = 0; i <= 7; i++)
        printf("a[%zd] == %d\n", i, get_elem(&a, i));

     return 0;
  }

It's clean enough to be readable (except, of course, code dense with AREF or APREF calls will be a mess). Uses arrays wrapped in structs, so you can pass arrays by value.

You have to make a list of your variables that are involved and write some #define lines for them.

Same for the array types.

j / k navigate · click thread line to collapse

143 comments

eqvinox2y ago

But that's not how C "rolls", and we'll never get that. So I guess we now have 41384 ways to do buffer overflow guards.

ActorNightly2y ago

There is value in actually understanding what someone is doing in regards to protecting against buffer overflows, instead of relying on well established patterns.

hinkley2y ago

Not when I’m trying to orchestrate third party libraries.

6D794163636F7562y ago

All of this to say, C has a single suggested way of doing this: using a different language. That's part of why we built them

adastra222y ago

Those are syntactic sugar for the same thing though. Array[5] is just shorthand for *(Array + 5), which is why 5[Array] also works (because addition is commutative).

Note that C does have strong conventions, such as that strings are terminated by a zero byte. Nothing in the language demands that, it’s just a convention! C could adopt better conventions.

3 more replies

shrimp_emoji2y ago

Checked arithmetic has been implemented in the standard with `ckdint.h`, so give it 50 more years!

JohnFen2y ago

> Ideally C needs one way of doing this, and that would be described in the standard.

I'm really glad that C doesn't do this, personally. It would reduce one of the main advantages of the language.

nix0n2y ago

> existing, reusable code to do so

Is there a library that you recommend for this?

augustk2y ago

Even without array bounds checking, a bit of discipline and smart conventions will go a long way of reducing errors:

1. Define a macro function for retrieving the length of an array:

  #define LEN(arr) (sizeof (arr) / sizeof (arr)[0])

2. Don't introduce macro constants for array lengths; hard code the length in the declaration and use LEN to retrieve it. Example:

  int a[100];
  ...
  for (i = 0; i < LEN(a); i++) {
     ...
  }

3. Define a macro function for dynamic array allocation:

  #define NEW_ARRAY(ptr, n) \
     (ptr) = malloc((n) * sizeof (ptr)[0]); \
     if ((ptr) == NULL) { \
        fprintf(stderr, "Memory allocation failed: %s\n", strerror(errno)); \
        exit(EXIT_FAILURE); \
     }

4. When you create a function with an array argument, also add an argument for the array length.

5. Use a convention for naming the length of array pointer targets, for instance by adding the suffix `Len'. Example:

  int *b, bLen = 100;
  ...
  NEW_ARRAY(b, bLen);  /* nice to know that b and bLen belong together */
  ...
  SomeFunction(b, bLen, ...);
  ...
  for (i = 0; i < bLen; i++) {
    ...
  }

6. Define your own safe wrappers around unsafe standard library functions or use someone else's code that does that.

ChrisRR2y ago

The issue with 1 is that it only works until you pass an array into a function by pointer, then the macro no longer works.

ollien2y ago

> The issue with 1 is that it only works until you pass an array into a function by pointer, then the macro no longer works. GCC even has a warning for this.

https://godbolt.org/z/vr4za73qq

2 more replies

mzs2y ago

see item 4

mtlmtlmtlmtl2y ago

Your allocation macro can lead to heap underflows if the multiplication wraps around. Which can definitely be exploitable.

f33d51732y ago

You could extend point 1. by making a convention of always declaring pointers to arrays like so:

  int (*data)[datalen];

teo_zero2y ago

Isn't this exactly what the fine article does?

mzs2y ago

Nice! I don't like how C has null terminated char arrays plays with this. Ideally this would somehow enforce a guard null byte at the end of each array not included in the size.

nerpderp822y ago

> bit of discipline

hgs32y ago

C23 improved struct compatibility so you might be able to leverage that to craft macros that better emulate slices. [1]

There is an RFC proposal for the Clang frontend for adding bounds checking reminiscent of Microsoft's SAL. [2]

[1] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3003.pdf

[2] https://discourse.llvm.org/t/rfc-enforcing-bounds-safety-in-...

uecker2y ago

You may be interested in this: https://github.com/uecker/noplate.git

kazinator2y ago

The following error prone: it can be mistakenly applied to a pointer:

#define LEN(NAME) (sizeof NAME / sizeof(NAME)[0])

I think gcc has a warning for this pattern now: when the size of a pointer is divided by the size of its referent type.

More importantly, it has an odd extra level of indirection. The traditional definition is:

#define LEN(ARRAY) (sizeof ARRAY / sizeof (ARRAY)[0])

This means that to use LEN on an array, we have to take the address:

   char *array[5];
   LEN(&array);  // -> 5

If we use

   LEN(array);

which is an easy mistake, we get:

    sizeof *array / sizeof (*array)[0]

which is

    sizeof (char *) / sizeof (char)

which is

    sizeof (char *)

which is likely 4 or 8.

I do see that LEN is supposed to be (only) used in conjunction with ARR:

    #define ARR(TYPE, NAME, COUNT) TYPE(*NAME)[COUNT]

skullchap2y ago

kazinator2y ago

The expansion of the AT macro seems a bit bloated:

  #define AT(NAME, IDX)                                         \
    ((typeof(&(*NAME)[0]))                                      \
    ((ASSERT(((size_t)IDX) * sizeof(*NAME)[0] < sizeof *NAME,   \
    "Buffer Overflow. Index [%lu] is out of range [0-%lu]",     \
    ((size_t)IDX), ((sizeof *NAME / sizeof(*NAME)[0]) - 1))),   \
    ((uchar *)*NAME) + ((size_t)IDX) * sizeof(*NAME)[0]))

When you factor in the definition of ASSERT, and the ERRLOG macro that is using, it's a lot of cruft for just one array access.

nerpderp822y ago

Benchmark it after -O3, does it really matter ?

pjmlp2y ago

Interesting idea, although given the demotion into optional feature in C11, it isn't necessarly portable.

Also doesn't cover all the string and memory buffer manipulations.

SAL and Frama-C are the bare minimum for security in C code.

e4m22y ago

Frama-C as a bare minimum is a pipe dream.

pjmlp2y ago

Thankfully the ongoing cybersecurity laws will change that mindset.

1 more reply

uecker2y ago

We will make VM-types, i.e. pointers to VLAs, mandatory in C23.

kovac2y ago

What is SAL?

hgs32y ago

Source-code annotation language (SAL) [1].

[1] https://learn.microsoft.com/en-us/cpp/code-quality/understan...

pjmlp2y ago

Besides the sibling comment, SAL was born out of the security efforts to fix Windows XP that ended up with the release of Windows XP SP2.

inetknght2y ago

Why use C and keep reinventing things that C++ provides?

epistasis2y ago

humanrebar2y ago

Availability of C++ tooling is much, much closer to availability of C tooling (often it's the same tool!) compared to Rust. Adopting Rust isn't the same category of conversion at all.

2 more replies

kllrnohj2y ago

1 more reply

coliveira2y ago

People using C will not change to your language-du-jour, please stop.

4 more replies

ActorNightly2y ago

1 more reply

tialaramex2y ago

But this isn't something the C++ language provides, which is hilarious.

pjmlp2y ago

True, but it also adds lot of features that help to easily migrate to saner features without rewriting the world and throw away 30 years of tooling.

Microsoft security team is on the record that just because they are adopting Rust, they won't shy away from C++.

1 more reply

FpUser2y ago

>"std::array<type,N>"

Unless you mean array of anything like in typeless dynamic languages I do not see anything awkward about STL arrays in C++.

1 more reply

ChrisRR2y ago

And then there's the minefield of tooling in embedded development...

hgsgm2y ago

Knowing every feature of C means they have to learn custom patterns on top of the C to make things work, and that almost always means horrific unhygienic macros.

pjmlp2y ago

I keep having that discussion since the C vs C++ Usenet flamewars....

uecker2y ago

inetknght2y ago

> These are dependent types which C++ does not have at all.

As a C++ developer, that sounds strange. Can you point me to some documentation about "dependent types"?

zffr2y ago

There’s lots of software already written in C that needs to be updated and maintained

properclass2y ago

the obvious answer is that one does not want some things that C++ entails, three examples: - name mangling - larger gap between source code and ISA - impedance mismatch when working with C APIs

that being said, some do not want more macros either

adwn2y ago

> name mangling

Can be turned off on demand for relevant symbols.

> larger gap between source code and ISA

There's already a huge gap between C code and machine code (see: Undefined Behavior). C hasn't been a "portable assembler" for a very long time.

> impedance mismatch when working with C APIs

C++ has no problem working with C APIs.

dang2y ago

Related ongoing thread:

Modern C (2019) - https://news.ycombinator.com/item?id=36167820 - June 2023 (19 comments)

JonChesterfield2y ago

Runtime bounds check tied to fprintf and abort via macros. Allocation by calloc.

mtlmtlmtlmtl2y ago

The calloc part is one of the most common blind spots I see among C programmers.

It's also very easy to think something like "well, n is passed in as a parameter, but it's a static function, and I know all the callers. So it's fine".

But now every caller in the future has to be aware of this possibility.

lelanthran2y ago

> But now every caller in the future has to be aware of this possibility.

Can you clarify: what possibility should you be aware off with malloc that you don't need to be aware of with calloc?

2 more replies

cornstalks2y ago

This evaluates macro parameters multiple times, so if the parameters have side effects or evaluate inconsistently this won't work. For example:

    size_t SomeIndex() {
      static size_t example_index = 0;
      return example_index++ % 2;
    }

    int main() {
      NEW(int, arr, 1);
      // This buffer overflow is not detected:
      *AT(arr, SomeIndex()) = 42;
      return 0;
    }

1 more reply

frabert2y ago

Never heard of a serious buffer overflow caused by _constant_ indices. Does it work with AT(arr, i), or only with AT(arr, 10)?

oleganza2y ago

"'Brother,' says he, 'greetings. Didn't I see you in Southern Missouri last summer selling colored sand at half-a-dollar a teaspoonful to put into lamps to keep the oil from exploding?'

"'Oil,' says I, 'never explodes. It's the gas that forms that explodes.' But I shakes hands with him, anyway.

...

"'Listen,' says I. 'I instruct her to keep her lamp clean and well filled. If she does that it can't burst. And with the sand in it she knows it can't, and she don't worry.

— O. Henry, The Man Higher Up

CyberDildonics2y ago

Did you mean to reply somewhere else? This thread is about about bounds checking arrays in the C programming language.

1 more reply

heylemao2y ago

Yeap, that's the whole point of it

frabert2y ago

Huh I misinterpreted the error messages in the example, I thought those were compiler output. This is quite cool then.

EDIT: although, it seems like this looses much of its power once you start passing these buffers around to functions that do not use these macros.

3 more replies

uecker2y ago

See also here for my experiments, but it relies on UBSan for bounds checking: https://github.com/uecker/noplate.git

norir2y ago

The best way to deal with this kind of thing is to write a small language that transpiles to the subset of c that you are using.

1 more reply

kazinator2y ago

Here is a different take on it. We can use #define to inform the header about the properties of certain symbols.

Here is my oob.c program. I will show the output, and then the content of "oob.h".

  #include <stdlib.h>
  #include <stdio.h>
  #include "oob.h"

  int oob_fail(const char *file, int line)
  {
    fprintf(stderr, "%s:%d:out of bounds array access\n", file, line);
    abort();
  }

  /*
   * Declare properties of array type x
   */
  #define ARRAY_ELTYPE_x int    /* element type is int */
  #define ARRAY_SIZE_x 7        /* number of elements is 7 */

  /*
   * Ensure array type x is fully declared at file scope
   */
  ARRAY_FULLTYPE(x);

  /*
   * Inform the OOB module that the identifiers p and a are
   * used as variables related to type x: either pointers
   * to it or values.
   */
  #define ARRAY_TYPEOF_p x
  #define ARRAY_TYPEOF_a x

  int get_elem(ARRAY_TYPE(x) *p, int i)
  {
     return APREF(p, i);
  }

  int main(void)
  {
     ARRAY_TYPE(x) a = ARRAY_INIT(1, 2, 3);

     for (size_t i = 0; i <= ARRAY_SIZEOF(a); i++)
        printf("a[%zd] == %d\n", i, get_elem(&a, i));

     return 0;
  }

Output:

  $ ./oob
  a[0] == 1
  a[1] == 2
  a[2] == 3
  a[3] == 0
  a[4] == 0
  a[5] == 0
  a[6] == 0
  oob.c:31:out of bounds array access
  Aborted (core dumped)

The content of "oob.h"

  #ifndef OOB_H_435E_FDE9
  #define OOB_H_435E_FDE9

  int oob_fail(const char *file, int line);

  #define OOB_PREFIX oob_ident_
  #define OOB_XCAT(X, Y) X ## Y
  #define OOB_CAT(X, Y) OOB_XCAT(X, Y)

  #define ARRAY_ELTYPE(T) OOB_CAT(ARRAY_ELTYPE_, T)
  #define ARRAY_SIZE(T) OOB_CAT(ARRAY_SIZE_, T)
  #define ARRAY_TAG(T) OOB_CAT(ARRAY_TAG_, T)

  #define ARRAY_FULLTYPE(T)                                                     \
    struct ARRAY_TAG(T) {                                                       \
      ARRAY_ELTYPE(T) a[ARRAY_SIZE(T)];                                         \
    }

  #define ARRAY_TYPE(T) struct ARRAY_TAG(T)

  #define ARRAY_TYPEOF(V) OOB_CAT(ARRAY_TYPEOF_, V)
  #define ARRAY_SIZEOF(V) ARRAY_SIZE(ARRAY_TYPEOF(V))

  #define ARRAY_INIT(...) { { __VA_ARGS__ } }

  #define AREF(ARRAY, I)                                                        \
    (((size_t) (I) >= ARRAY_SIZEOF(ARRAY))                                      \
     ? oob_fail(__FILE__, __LINE__), (ARRAY).a[0]                               \
     : (ARRAY).a[I])

  #define APREF(PARRAY, I)                                                      \
    (((size_t) (I) >= ARRAY_SIZEOF(PARRAY))                                     \
     ? oob_fail(__FILE__, __LINE__), (PARRAY)->a[0]                             \
     : (PARRAY)->a[I])

  #endif

Preprocessor invoked on oob.c (snipped down to the relevant part after the run-time support function oob_fail):

  struct ARRAY_TAG_x { int a[7]; };


  int get_elem(struct ARRAY_TAG_x *p, int i)
  {
     return (((size_t) (i) >= 7) ? oob_fail("oob.c", 31), (p)->a[0] : (p)->a[i]);
  }

  int main(void)
  {
     struct ARRAY_TAG_x a = { { 1, 2, 3 } };

     for (size_t i = 0; i <= 7; i++)
        printf("a[%zd] == %d\n", i, get_elem(&a, i));

     return 0;
  }

It's clean enough to be readable (except, of course, code dense with AREF or APREF calls will be a mess). Uses arrays wrapped in structs, so you can pass arrays by value.

You have to make a list of your variables that are involved and write some #define lines for them.

Same for the array types.

j / k navigate · click thread line to collapse