return ({goto L; 0;}) && ({L: 5;});
It probably has a bug, will be hard to debug, and isn't more performant than writing it in a clearer way. And unfortunately, while the examples here are probably all contrived, there are plenty of real-life cases where code as bad as this gets into production systems.So why are we still writing code like this?
The answer is reverse compatibility. Not just of compilers, but of tools and skillsets: people are unwilling to support multiple versions of C and want their code to run forever.
Objective-C and C++ do things to add functionality to C, but they don't remove the functionality of C that allows these kinds of problems.
This points to a need for a new language that avoids these issues. I think Rust is the answer, but I would like to see more languages try to fill that gap--competition is healthy.
Programs written in C and C++ may have issues because the languages assume the programmer knows what they are doing. This assumption leads to some great solutions to hard problems because the programmer is essentially free to do what they want.
Of course, this assumption, as with most others, doesn't always hold true. This doesn't mean there is a problem with the language. The problem is with the programmer.
If you're going to write something like "return ({goto L; 0;}) && ({L: 5;});", no language is going to save you.
C and C++ are still used today, in part, because modern languages try to restrict the programmer. Rather than assume the programmer knows what they are doing, they assume the programmer is stupid and needs help to cross the road. By assuming stupidity, the restrictions modern languages put in place prohibit certain solutions and as such C and C++ will remain the go-to systems languages.
We do not need new languages. What we need is programmers who won't abuse the languages we already have.
There’s an assumption here that it would be impossible to design a language which would make these solutions available without being as error prone. Existing languages may be less capable than C, but that’s only because equally capable languages with less risk haven’t been created (Rust may be a solution, I'm not sure yet).
What exactly do you think can’t be done in a language that is less error-prone?
> We do not need new languages. What we need is programmers who won't abuse the languages we already have.
You’re part of the problem. It takes incredible hubris to say something like this, to think that it’s even possible for a human to do this.
Every nontrivial networking program written in C has security holes caused by memory management issues. If you’re going to claim that these errors are caused by bad programmers, then every C programmer is a bad programmer, because every C programmer has written bugs like this. If you’re claiming that bugs caused by C’s error-prone semantics are programmers abusing the language, then using C is equivalent to abusing C. The very best programmers writing C write bugs in C that they wouldn’t write in a language like Rust.
A system which depends on humans being perfect is bound to fail. There’s simply no way you can reasonably debate this fact.
Every other engineering field has redundancy, multiple layers of error checking that catch errors.
Until you see this as a problem then you’re a danger to any mission-critical product you work on. Not understanding that using C is a risk displays a shocking level of naiveté for a professional in this field. I’m not saying C is never a good choice. I write a lot of C myself, but I do so with the awareness that my code is not being checked adequately and that I have to take extreme measures to ensure that my code is well-validated.
They already existed back when C was UNIX only, but then UNIX became widespread....
That meant it was relatively simple to 'see' the assembly language 'behind' a given C function or stretch of code; it didn't take much to get inside the head of a C compiler, so you could be reasonably sure that a simple piece of C would result in a similarly simple piece of assembly out the other end.
That, of course, was well and good when it was reasonably simple to predict actual performance from glancing at assembly code, which assumes opcode performance (as opposed to, say, cache performance) dominates how fast the code runs.
Now... how many of those things still hold true on desktop and server class hardware?
So as it stands, C is still your best bet when you are looking for that optimal translation. Intel has recently made some effort to augment it in ways to fully utilize new CPUs various parallel pipelines and specific functionality:
The data still has to be arranged optimally for the hardware in order for SIMD code to have any benefit (and at this point, writing SIMD code is straightforward). You also still need to be experienced with the capabilities of the hardware if you have any chance of writing good ISPC code (although this is true of C, as well as any shading language).
That said, using it to target SSE and AVX with the same code is attractive.
And yet the best practice of the time was not to use C for time-critical applications. If your hypothesis were true, why would, say, all those NES programmers write all that assembly?
It was decidedly not nearly-optimal a few years later on microprocessors like 8080, z80, 6502, etc., which were highly register starved, 8-bit rather than 16-bit, non-orthogonal instructions and registers, etc.
As for "not to use C for time-critical applications", both then and now people sometimes write critical inner loops in assembly, it's just less common now because compilers are much more sophisticated.
But C was indeed used on "time-critical applications", aside perhaps from inner loops, back in the 70s, certainly on PDP 11s, and sometimes on less ideal microprocessors.
> why would, say, all those NES programmers write all that assembly?
Several reasons. First and foremost, things like that were highly RAM starved by the standards of the day. The PDP-11/70 had 64k of instructions and a separate 64k of data per process, with a total amount of RAM of up to something like a megabyte.
The NES had 2k RAM onboard -- although cartridges could extend that -- and the register starved 6502.
Another big reason is that, in every era, games are always pushing the limits of the hardware, and developers were typically quite willing to code in assembly if they believed it would give them a 20% edge in speed or decrease in space.
But also there was a mythos (that hasn't completely disappeared) that assembly would yield vastly more than 10%-20% speed increase over high level languages of the day, including C, so for most developers, they never even considered anything but assembler.
It also was not uncommon at the time for many of those game programmers to only know assembler, and not any other language except perhaps Basic.
The availability of C compilers for various platforms was not so universal then as it is now, especially on non-Unix systems, and the non-Unix C compilers, when available, were not necessarily at the same level of quality as the Unix C compilers.
Last but not least, C had not yet taken the world by storm, and a lot of those developers and companies had never even heard of C, and the ones that had heard of it were pretty dubious, more often than not.
To elaborate, the second expression has an underflow at 1 - sizeof(int) on an unsigned integer (promotion due to sizeof being unsigned), which is perfectly well defined:
"if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type."
The right shift is fine on a signed or unsigned integer. For the unsigned case (which is this one due to operator precedence), the behavior is well defined. For signed, implementation defined.
EDIT: The right shift is in fact UB assuming sizeof(int) <= 4.
I know of three different ways in which platforms implement shifts by greater than the word size.
I think it's just pointing out the difference between '&' and '&&'.
They do invoke implementation defined behaviour, but not undefined.
return x == (1 && x);
Into this: movl $0, %eax
andl $1, %eax
cmpl %eax, %eax
# Result in %eax
That will (obviously) return 1 every time, because we compare %eax and %eax. The reason for this is that the value of 'x' changes half-way through the computation (Because the computation is done in %eax, which is where 'x' is assumed to be). This is valid because 'x' is uninitalized, so it doesn't have a defined value.Unless size_t is wider than 32 bits, it has undefined behavior. That's why it returns 0; it could as well be 42, or the program could terminate with or without a diagnostic message, etc.
If something is simple for the compiler-writer, then simple things do yield simple results.
If something is simple for the programmer, simple things often yield quite complex results.
For example, in a language that's simple for the compiler-writer, (1/10) times 10 is only very rarely 1. 0 is a common answer, as is some fraction which is almost, but not completely, unlike 1.
In a language which is simple for the programmer, Heaven, Earth, and minor deities will be moved to make (1/10) times 10 come out to the obvious, simple answer.
And, you do realize that one of the simplest languages for compiler writers, lisp, doesn't have to move heaven/earth to make that calculation work out how you want it.
The problems are with the corner cases like he mentions at the beginning of his post.
Last-Modified: Fri, 29 Oct 2010 16:59:15 GMT
And the changelog for CIL suggests that it may be significantly older:Ok, we'll put 2010 on it even though it is likely quite a bit older. An upper bound is better than nothing.
When I (George) started to write CIL I thought it was going to take two weeks. Exactly a year has passed since then and I am still fixing bugs in it
The CIL paper was published in 2002. Actually the whole project looks interesting—arguably more so than the currently posted page. It should have its own HN thread sometime. https://news.ycombinator.com/item?id=836735 was a while ago!
That website for it hasn't been updated.