undefined | Better HN

0 pointsKristine197510y ago0 comments

>I suspect this has a -lot- to do with performance.

It's questionable whether people wanted that performance though, at least when it resulted in less security. About bounds checking in ALGOL 60: https://en.wikipedia.org/wiki/Bounds_checking

A consequence of this principle is that every occurrence of every subscript of every subscripted variable was on every occasion checked at run time against both the upper and the lower declared bounds of the array. Many years later we asked our customers whether they wished us to provide an option to switch off these checks in the interest of efficiency on production runs. Unanimously, they urged us not to—they already knew how frequently subscript errors occur on production runs where failure to detect them could be disastrous.

0 comments

__david__10y ago

> It's questionable whether people wanted that performance though, at least when it resulted in less security.

There's no question about it, the "ANSI C Rationale" makes it very clear what they considered "the spirit of C"[1]:

> - Trust the programmer.

> - Don't prevent the programmer from doing what needs to be done.

> - Keep the language small and simple.

> - Provide only one way to do an operation.

> - Make it fast, even if it is not guaranteed to be portable.

> The last proverb needs a little explanation. The potential for efficient code generation is one of the most important strengths of C. To help ensure that no code explosion occurs for what appears to be a very simple operation, many operations are defined to be how the target machine's hardware does it rather than by a general abstract rule. An example of this willingness to live with what the machine does can be seen in the rules that govern the widening of char objects for use in expressions: whether the values of char objects widen to signed or unsigned quantities typically depends on which byte operation is more efficient on the target machine.

> One of the goals of the Committee was to avoid interfering with the ability of translators to generate compact, efficient code. In several cases the Committee has introduced features to improve the possible efficiency of the generated code; for instance, floating point operations may be performed in single-precision if both operands are float rather than double.

[1] http://www.lysator.liu.se/c/rat/title.html Quoted section is found here: http://www.lysator.liu.se/c/rat/a.html#1

wahern10y ago

"The block structure of ALGOL 60 induced a stack allocation discipline. It had limited dynamic arrays, but no general heap allocation. The substantially redesigned ALGOL 68 had both heap and stack allocation. It also had something like the modern pointer type, and required garbage collection for the heap. The new language was complex and difficult to implement, and it was never as successful as its predecessor."

-- http://www.memorymanagement.org/mmref/lang.html

Adding runtime bounds checking of automatic storage arrays (i.e. arrays on the stack) is relatively easy in C, at least until the compiler runs into illegal type punning. The real problem in implementing these compiler safeguards comes with crossing translation units, or with heap blocks. There's a reason languages like Rust and Go rely heavily on static linking and stack allocation; it's more difficult or more costly to implement those safeguards when the compiler can't see all the source code, or pointers pass through an opaque layer. Nothing in C precludes automatic bounds checking of all array access, via fat pointers or lookup tables. Fabrice Bellard's Tiny C compiler implemented precise bounds checking for both automatic and dynamic storage-allocated objects a decade before UBSan and ASan. Even deriving an invalid pointer crashed the app at the precise point where it happened. That widely-used C compilers don't do that is a strong hint there are other, real-world constraints in place.

Also, in language like Java it's not uncommon to see people reinventing dynamic heap allocation using char arrays, susceptible to all the same overflow problems. When you see people doing that, that should be a hint that a language like C might work well.

I don't understand all the C hate. Then again, I have no problem employing various languages according to the task, or creating DSLs. I suppose if I was wedded to a single language or to the idea of a single language, C would look much worse to me.

dbaupp10y ago

> There's a reason languages like Rust and Go rely heavily on static linking and stack allocation

This is untrue: Rust certainly does not do any optimisations linking statically by default, nor is there a difference between putting an array on the stack or on the heap. While it is true that code can benefit from whole-program optimisation, it isn't the default in either language, just like it isn't the default in C.

wahern10y ago

Languages which bake in automatic bounds checking at every access rely on optimization to recover the performance hit. Without static linking, automatic GC, and other constructs that's very difficult.

LTO notwithstanding, once you add those more sophisticated constructs, iterating the language becomes more difficult. You don't hit upon the best method for implementing various types the first time, or the second time, or even the third time. glibc is backwards compatible for programs compiled over 15 years ago (GCC's fixinclude hacks notwithstanding). You'll never see that with Rust's or Go's standard library, just like you never saw that with C++.

My point wasn't that static linking was necessary. My point was that static linking is indicative of other tradeoffs that most people don't understand. Static linking isn't just about making packaging easier. It's also about making it easier to write and implement the compiler and standard environment.

My more abstract point is that people who think C is on its last legs don't understand the whole picture. There's nothing intrinsic to C that makes it unsafe. Febrice's compiler was perfectly capable of implementing the C standard to the letter. What makes C unsafe are the requirements found in the niches where C exists, and those requirements don't magically disappear because the name of the language changes.

Rust supports unsafe code, but implementing code in Rust which is rigorously robust in the face of OOM situations, or where you need to implement use-case memory management strategies requires relying almost exclusively on unsafe code. (Try using Rust without boxing, for example, as is necessary if you want to catch OOM.) If you don't need those things, you probably don't need a low-level language, either. I love C, but I also love language like Lua with lexical closures and stackless coroutines. To me, languages like Rust and even C++ exist at a middle ground that is very unappealing to me.

C isn't standing still, either. Strategies like SafeStack (see http://dslab.epfl.ch/proj/cpi/) can provide substantially the same safety guarantees as Rust in terms of real-world attack vectors, without having to modify any existing C software, and without giving up performance.

None of this is to say languages like Rust are useless. Just that the harms and inevitable demise of C per se are, IMHO, greatly exaggerated. And if and when a language like Rust grows in usage, I doubt it will supplant C so much as open and populate virgin territory.

pcwalton10y ago

> C isn't standing still, either. Strategies like SafeStack (see http://dslab.epfl.ch/proj/cpi/) can provide substantially the same safety guarantees as Rust in terms of real-world attack vectors, without having to modify any existing C software, and without giving up performance.

That paper indicates that you do in fact give up performance, and the performance is comparable to existing SFI techniques. SafeStack itself is insufficient to prevent UAF problems with the heap. CPI prevents them, but with significant overhead. And you still don't get full memory safety.

Manishearth10y ago

> Try using Rust without boxing, for example, as is necessary if you want to catch OOM.

It's not necessary, you can plug in a custom allocator that works differently and use boxing as usual.

There are plans for more robust custom allocator APIs that make this even easier to handle.

Also, really, even if Rust didn't have this, the situation wouldn't be worse than C. In C you have to malloc and free things manually. In Rust you can do that too. Rust's abort-on-OOM is an stdlib thing (which can be overridden as previously mentioned).

dbaupp10y ago

> Languages which bake in automatic bounds checking at every access rely on optimization to recover the performance hit.

The performance hit is generally negligible, especially with abstractions like iterators in Rust that avoid them entirely, and standard optimisations that can lift the checks out of loops... optimisations that do not need any of the things you say that the compilers want. The cost of calling code in a different dynamic library (e.g. getting the dynamic symbol address and then doing the actual call) is going to be much greater than whatever bounds checks it does in almost all situations.

> There's a reason languages like Rust and Go rely heavily on static linking and stack allocation.

As I just said, this is factually false. Static linking is entirely orthogonal to bounds-checking optimisations (neither Rust nor Go do whole program optimisations when linking statically, so it can't be the motivation for it), as is putting data on the stack. GC seems even more irrelevant, especially to Rust which doesn't have one.

> My point was that static linking is indicative of other tradeoffs that most people don't understand. Static linking isn't just about making packaging easier.

But it isn't indicative! In Rust's case, linking statically is for packaging: the reason is the ABI is unstable, so dynamically linking is very annoying to manage and many of its benefits are inhibited.

> There's nothing intrinsic to C that makes it unsafe.

The forever-growing list of CVEs caused by basic mistakes in C code says otherwise. Things like overrunning a buffer or reusing a freed pointer are not at all caused by domain specific constraints, they're the price one pays for using 40 year old technology. You can see this in modern tools that try to assist with getting safer C: they are often using things that didn't exist when C was created. (And, don't get me wrong, C is here to stay, even if all new C development was stopped today, and so efforts to make it safer are very good, but at some point we have to face the reality of C/stop the C-apologism.)

> Febrice's compiler was perfectly capable of implementing the C standard to the letter.

This is essentially meaningless for two connected reasons: the major problem with C is the holes in the standard (undefined behaviour)---not compiler bugs---and, people want fast code, they need optimisations, which often exploit undefined behaviour.

> (Try using Rust without boxing, for example, as is necessary if you want to catch OOM.)

Boxing or not is irrelevant to safety: using Box allows in fact more aggressive `unsafe` code (one can rely on address-stability to correctly sidestep the compiler's normal checks). Rust-the-language knows effectively knows nothing about the stack or heap when reasoning about safety: it does reason about stack scopes, but it doesn't care where the data is actually positioned in memory: Box<T> is isomorphic to a plain T in this respect.

In any case, the power of Rust is the ability to wrap code into safe abstractions: if there is a particular feature the standard library doesn't provide (yet), external libraries have the power to create APIs that have the same level of safety, maybe with a bit of `unsafe` internally. You can see this even in "use-case memory management" situations like a kernel: http://os.phil-opp.com/modifying-page-tables.html

pcwalton10y ago

The lack of bounds checking is one of the biggest problems in C, but there are worse problems (use after free) that nobody has even thought of a solution for.

> That widely-used C compilers don't do that is a strong hint there are other, real-world constraints in place.

Yes. Those constraints are self-inflicted wounds caused by the fact that C wasn't designed for this. If you have a proper iterator API, a culture of unsigned array indexing, widespread use of a size_t equivalent instead of int for loops, etc. etc. these issues vanish.

j / k navigate · click thread line to collapse

0 comments

__david__10y ago

> It's questionable whether people wanted that performance though, at least when it resulted in less security.

There's no question about it, the "ANSI C Rationale" makes it very clear what they considered "the spirit of C"[1]:

> - Trust the programmer.

> - Don't prevent the programmer from doing what needs to be done.

> - Keep the language small and simple.

> - Provide only one way to do an operation.

> - Make it fast, even if it is not guaranteed to be portable.

[1] http://www.lysator.liu.se/c/rat/title.html Quoted section is found here: http://www.lysator.liu.se/c/rat/a.html#1

wahern10y ago

-- http://www.memorymanagement.org/mmref/lang.html

dbaupp10y ago

> There's a reason languages like Rust and Go rely heavily on static linking and stack allocation

wahern10y ago

pcwalton10y ago

Manishearth10y ago

> Try using Rust without boxing, for example, as is necessary if you want to catch OOM.

It's not necessary, you can plug in a custom allocator that works differently and use boxing as usual.

There are plans for more robust custom allocator APIs that make this even easier to handle.

dbaupp10y ago

> Languages which bake in automatic bounds checking at every access rely on optimization to recover the performance hit.

> There's a reason languages like Rust and Go rely heavily on static linking and stack allocation.

> My point was that static linking is indicative of other tradeoffs that most people don't understand. Static linking isn't just about making packaging easier.

> There's nothing intrinsic to C that makes it unsafe.

> Febrice's compiler was perfectly capable of implementing the C standard to the letter.

> (Try using Rust without boxing, for example, as is necessary if you want to catch OOM.)

pcwalton10y ago

The lack of bounds checking is one of the biggest problems in C, but there are worse problems (use after free) that nobody has even thought of a solution for.

> That widely-used C compilers don't do that is a strong hint there are other, real-world constraints in place.

j / k navigate · click thread line to collapse