First C compiler ported to GCC (opens in new tab)

(github.com)

137 pointsvegesm5y ago87 comments

87 comments

ggm5y ago

The

  a[b] implemented as *(a+b)

Thing, is how we were taught to think about array indexing in the CS lectures of the 70s

st_goliath5y ago

And that's how it's still taught nowadays.

Both the C89 and the C99 standard draft contain the following:

> The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2)))

In fact the expressions a[b] *(a + b) and b[a] are equivalent.

Here is a perfectly valid snippet of C code that will print out 't':

    putchar(3["test"]);

mschuster915y ago

I understand why that example works but I struggle finding a valid use-case, aside from code golfing...

kragen5y ago

It isn't a use case; it is a drawback of the C array and pointer semantics:

- Array values decay to pointers in rvalue contexts (though not as the argument of sizeof);

- a[b] is syntactic sugar for *(a+b).

— ⁂ —

These two design decisions have some desirable results:

- Arrays, including strings, can be in effect passed as arguments to functions without implementing a special parameter-passing mechanism for arrays.

- Functions on arrays are implicitly generic over the array length, rather than that length being a part of their type. (When this isn't what you want you should probably be using a struct instead.)

- Array iteration state can be represented as a pointer, preventing bugs in which you index the wrong array. In a sense a single pointer represents an array range or slice, as long as you have some way to identify the array end, like nul-termination in strings or a separate length argument.

- You can change a variable (including a struct field) from being an embedded array to being a pointer to an array allocated elsewhere—or vice versa—without changing the code that uses it. (But if this had been a significant design consideration, -> wouldn't be a separate operator from . in C.)

- It's easy to create new "arrays" at runtime: just return a pointer to some memory.

— ⁂ —

Like all design tradeoffs, these also have some drawbacks, which are so severe that no language of the current millennium has followed C's lead on this, although many of C's other design decisions are wildly popular:

- Bounds checking is impossible.

- Alias analysis for optimization is infeasible.

- If you aren't using a sentinel, you have to pass in a separate argument containing the array length whenever you pass in an array pointer, or stuff these base and limit fields into a slice struct, or something.

- Arguably, these decisions are hard to separate from the fact that C strings are terminated by a sentinel value and thus are not binary-safe.

- 3["hello"] is legal C.

— ⁂ —

Of these five drawbacks, the fifth seems like it may not be as severe as the other four?

charcircuit5y ago

Not everything needs a valid use case. It can exist just for fun.

1 more reply

cat1995y ago

speculating, but don't think it's about use case so much as it is about it being a simple way to implement ('C is portable assembly') which probably carried through to our more current notion of this being a 'language level' thing

kevincox5y ago

How does this work in C++ with operator overloading. Are they still the same? That would make for some interesting obfuscated code.

edflsafoiewq5y ago

No, they're not. It works by: if either a or b in a[b] is a class/enumeration type, call a.operator[](b).

jcelerier5y ago

> How does this work in C++ with operator overloading.

you can't overload int::operator[](...)

1 more reply

sgtnoodle5y ago

Are those not equivalent expressions in modern C?

I imagine there are more optimal and less optimal ways of actually doing the indexing in machine code and the former may be better semantics, but I would think a compiler would generate identical machine code for both.

tinus_hn5y ago

I’m pretty sure you have to take the size of the objects in mind.

throwaway5251425y ago

No, these are equivalent.

The size of the objects is implicitly taken into account by the compiler, it knows the size of the objects by the type of the pointer.

amelius5y ago

"Pointer arithmetic" takes care of that. Adding an integer to a pointer will multiply the size of the type pointed to by the integer and adds that to the pointer.

raverbashing5y ago

Hence why it "can be written" as b[a] as well

Edit: it doesn't blow up, not even with -Wall and -std=c99

st_goliath5y ago

> (yes it will probably blow up in modern compilers, or at least give you a warning)

Nope. For the code snippet I posted an hour ago, even with -pedantic -Wall -Wextra gcc won't issue any warnings. And why should it? It's perfectly standards conformant, because the standard actually defines the [] operator through the equivalent addition expression.

superjan5y ago

I think the reason the behavior is still there because it is not used. There is no gain in changing the standard, and the compiler warning could draw criticism. Why waste your time solving a non problem?

quotemstr5y ago

if (x = 5) is standards conforming but every compiler that warns about anything warns about potentially confusing = and ==.

MaxBarraclough5y ago

> why should it?

It's extremely poor style, even if the behaviour is identical.

1 more reply

stephencanon5y ago

It will not “blow up” in modern compilers, nor can it, because that’s _how the operator is defined_.

nspain5y ago

That's how we were taught a few years ago too! It really helped it "click" that array elements are stored contiguously.

dvko5y ago

It’s also in the K&R book IIRC, stating that the two are equivalent.

burstmode5y ago

Did this compiler really support "auto" as a variable type, as seen in example fizzbuzz ?

1ris5y ago

"auto" in c is a storage class specifier, like "register", "extern" or "static".

https://en.cppreference.com/w/c/language/storage_duration

It was considered pretty useless by most, so c++11 recycled the keyword to mean something different.

jwilk5y ago

mywittyname5y ago

Auto is the implicit default right? As in function scoped, stack allocated, and lives until the function is returned?

K&R (Second Ed). Makes no mention of the auto keyword in Section 1.10, but it does say,

> Each local variable in a function comes into existence only when the function is called, and disappears when the function is exited. This is why such variables are usually known as automatic [sic] variables[...]

1ris5y ago

yes, exactly. That's why there is no need for it in modern c. This compiler however is different: The type is optional (and assumed to be int). Say you have a variable declaration "auto int i;". Back then you could omit int, now you can omit auto.

bawolff5y ago

Auto is not a type, it means local variable. I think its still a thing, just its the default so nobody uses it.

layoutIfNeeded5y ago

Hehe, “auto” used to mean “automatic storage” aka the stack. Then much later C++ repurposed the keyword for type deduction.

captainmuon5y ago

"auto" probably is the storage class, it tells what kind of variable this is. Automatic as opposed to "register" which would force the variable to be a register, or "static" or "extern".

The type is not given at all, I think by default it would be "int".

dvt5y ago

> The type is not given at all, I think by default it would be "int".

Yep, this is called the "implicit int" rule, and it was specifically outlawed[1] by C99 and onward.

[1] https://herbsutter.com/2015/04/16/reader-qa-why-was-implicit...

ufo5y ago

One of the unusual things in this early version of C is that "int" can be used for any word-sized value, including pointers. The type system was very loose.

Turing_Machine5y ago

Even back then this was considered poor practice, however. The first edition of K&R had a subsection entitled "Pointers are Not Integers" (I don't know if that's still in modern editions).

3 more replies

Blikkentrekker5y ago

> types of the function parameters are not checked, anything can be passed to any function

Still a better type system than Twilight.

IncRnd5y ago

No. "auto" is not a type but a storage class that means automatically allocated instead of being allocated to a register, extern-al to the file, or in the static code segment.

Taniwha5y ago

"The compiler runs only in 32 bit mode as the original code assumes that the pointer size and word size are the same." ... which was um, 16-bits

vesinisa5y ago

I think he means that int and pointer address must be interchangeable. As long as that holds, the size can be either 16 bits or 32 bits.

On a PDP-11 int would have been 16-bit. On x86 32 bits. But on x86_64 int is 32 bits but pointers are 64-bit. The easiest way to retain the original assumption with minimal changes to the historical source code while targeting a modern CPU is to compile in 32-bit mode.

Taniwha5y ago

My original comment was rather tongue in cheek - but I have actually ported this compiler (well a later version of it, from the v6 release) to a 32-bit target - it was a different time, and C was a different, definitely more forgiving and simpler language - with other systems languages like BCPL/Bliss/etc around at the time the whole 'int is the same as a pointer' was definitely a way of thinking about stuff at the time

jart5y ago

Why can't it be 64-bit? I don't see any reason why we can't have an ILP64 data model. If int and int* were both 64-bit then it would restore so much of the original beauty of C.

Someone5y ago

It can be, and is, on platforms where supporting large arrays (if integers are 32 bits, arrays can ‘only’ have 2³¹ entries (#)) is deemed more important than memory usage.

(#) https://software.intel.com/content/www/us/en/develop/documen... seems to imply that limit is 2³¹-1. I don’t understand why that would be true.

1 more reply

dboreham5y ago

It can be, but people have arranged for it to not be, presumably because they don't feel the storage space to have all integers be 8-bytes is not justified.

nils-m-holm5y ago

Yes, and on 16-bit and 32-bit systems, sizeof(int) == sizeof(int*). On 64-bit systems, this is most probably not the case. This is a common roadblock when porting old C programs.

29athrowaway5y ago

If the first C compiler was written in C... how could it be first C compiler? How could you compile the first C compiler?

dvt5y ago

The first (or proto) C compiler was written in B†[1] (called NB, or New B). This is the first C compiler written in C.

† Or maybe some variant of BCPL -- I'm not exactly sure how functionally different the two were.

[1] https://www.bell-labs.com/usr/dmr/www/chist.html

onei5y ago

The very first B compiler was written in BCPL by Ken Thompson. B later became self-hosting, i.e. the BCPL compiler compiled the B compiler, but this had another set of challenges due to the extreme memory constraints. It was an iterative process where a new feature was added such that it pushed the memory limit and then the compiler was rewritten to use the new feature to bring the memory usage down.

C was heavily inspired by B and I suspect written in B aswell. Alternatively, BCPL was extremely portable as it compiled to OCode (what we'd recognise today as bytecode) so that might have been another option. The assignment operators of =+ are straight from B and later changed to += due to Dennis Ritchie's personal taste.

aap_5y ago

The first B compiler was actually written in TMG, and once it was bootstrapped that way in B itself. BCPL was only the inspiration for the language.

sramsay5y ago

Wow, TMG was a new one for me. From the Wiki article on it:

"Douglas McIlroy ported TMG to an early version of Unix. According to Ken Thompson, McIlroy wrote TMG in TMG on a piece of paper and "decided to give his piece of paper his piece of paper," hand-compiling assembly language that he entered and assembled on Thompson's Unix system running on PDP-7."

We are not worthy, friends. We are not worthy.

3 more replies

haecceity5y ago

How was gcc bootstrapped?

Koshkin5y ago

Wikipedia has a relevant tidbit:

https://en.wikipedia.org/wiki/GNU_Compiler_Collection#Histor...

Nicksil5y ago

Bootstrapping

https://en.wikipedia.org/wiki/Bootstrapping_(compilers)

Blikkentrekker5y ago

If it could bootstrap itself, then there would be no need to port it to GCC.

From how I read it, it is not capable of bootstrapping itself, and an earlier C compiler in BCPL existed, this is the first C compiler written in C itself.

dmitrygr5y ago

this port is [optionally] a cross compiler - it will run on x86/arm/whatever and produce pdp11 assembly

on an actual pdp11 it CAN bootstrap

IncRnd5y ago

It was bootstrapped. For compilers, it is sort of "a thing" to finally bootstrap your new language compiler in your new language.

ForOldHack5y ago

The question of if the first C compiler was written in C, how could it be the first C Compiler?

Because to be the first, it has to be bootstrapped in an intermediate host language… You have to get a parser running, then the syntax, then the etc… etc…

( immense plug of the Ahl book here…)

To be the first complier in a language, as was pointed out, long before I was born, the compiler has to compile itself, so before it could compile itself, it had to have other language processing programs creating the parsing, the syntax, the etc…

Porting it to GCC just means that they could compile it with GCC, the big test is to get it to compile itself, on what ever platform that is the target platform, because finally, if it cannot generate object code/machine language in the target machine’s binary, then its not really ported.

Later on, UNIX came with tools to build compilers with, YACC and LEX.

If they got it to produce PDP-7 Code, its not really much of a port, really.

29athrowaway5y ago

It was a rhetorical question.

It wasn't the first C compiler, it was the first self-hosted C compiler, which is different.

pabs35y ago

https://bootstrappable.org/

johan_felisaz5y ago

Probably out of topic, but are there real examples of compiler attacks due to bootstrapping ? I did not hear about them before reading the scifi classic Accelerando by C. Stross

goldsteinq5y ago

Oh, the thing with bootstrapping attacks is you never know are there any real examples.

I recommend this[0] paper by Ken Thompson dated 1984 and still relevant.

[0]: https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_Ref...

1 more reply

polska5y ago

I do not understand why we should create a C compiler ported to GCC.

vkoskiv5y ago

Not a C compiler. This is the original C compiler from ~1972. It's just an experiment to bring a bit of computing history to life.

polska5y ago

Oh! At first I did not understand but now looks reasonable thanks

j / k navigate · click thread line to collapse

87 comments

ggm5y ago

The

  a[b] implemented as *(a+b)

Thing, is how we were taught to think about array indexing in the CS lectures of the 70s

st_goliath5y ago

And that's how it's still taught nowadays.

Both the C89 and the C99 standard draft contain the following:

> The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2)))

In fact the expressions a[b] *(a + b) and b[a] are equivalent.

Here is a perfectly valid snippet of C code that will print out 't':

    putchar(3["test"]);

mschuster915y ago

I understand why that example works but I struggle finding a valid use-case, aside from code golfing...

kragen5y ago

It isn't a use case; it is a drawback of the C array and pointer semantics:

- Array values decay to pointers in rvalue contexts (though not as the argument of sizeof);

- a[b] is syntactic sugar for *(a+b).

— ⁂ —

These two design decisions have some desirable results:

- Arrays, including strings, can be in effect passed as arguments to functions without implementing a special parameter-passing mechanism for arrays.

- Functions on arrays are implicitly generic over the array length, rather than that length being a part of their type. (When this isn't what you want you should probably be using a struct instead.)

- It's easy to create new "arrays" at runtime: just return a pointer to some memory.

— ⁂ —

- Bounds checking is impossible.

- Alias analysis for optimization is infeasible.

- Arguably, these decisions are hard to separate from the fact that C strings are terminated by a sentinel value and thus are not binary-safe.

- 3["hello"] is legal C.

— ⁂ —

Of these five drawbacks, the fifth seems like it may not be as severe as the other four?

charcircuit5y ago

Not everything needs a valid use case. It can exist just for fun.

1 more reply

cat1995y ago

kevincox5y ago

How does this work in C++ with operator overloading. Are they still the same? That would make for some interesting obfuscated code.

edflsafoiewq5y ago

No, they're not. It works by: if either a or b in a[b] is a class/enumeration type, call a.operator[](b).

jcelerier5y ago

> How does this work in C++ with operator overloading.

you can't overload int::operator[](...)

1 more reply

sgtnoodle5y ago

Are those not equivalent expressions in modern C?

tinus_hn5y ago

I’m pretty sure you have to take the size of the objects in mind.

throwaway5251425y ago

No, these are equivalent.

The size of the objects is implicitly taken into account by the compiler, it knows the size of the objects by the type of the pointer.

amelius5y ago

"Pointer arithmetic" takes care of that. Adding an integer to a pointer will multiply the size of the type pointed to by the integer and adds that to the pointer.

raverbashing5y ago

Hence why it "can be written" as b[a] as well

Edit: it doesn't blow up, not even with -Wall and -std=c99

st_goliath5y ago

> (yes it will probably blow up in modern compilers, or at least give you a warning)

superjan5y ago

quotemstr5y ago

if (x = 5) is standards conforming but every compiler that warns about anything warns about potentially confusing = and ==.

MaxBarraclough5y ago

> why should it?

It's extremely poor style, even if the behaviour is identical.

1 more reply

stephencanon5y ago

It will not “blow up” in modern compilers, nor can it, because that’s _how the operator is defined_.

nspain5y ago

That's how we were taught a few years ago too! It really helped it "click" that array elements are stored contiguously.

dvko5y ago

It’s also in the K&R book IIRC, stating that the two are equivalent.

burstmode5y ago

Did this compiler really support "auto" as a variable type, as seen in example fizzbuzz ?

1ris5y ago

"auto" in c is a storage class specifier, like "register", "extern" or "static".

https://en.cppreference.com/w/c/language/storage_duration

It was considered pretty useless by most, so c++11 recycled the keyword to mean something different.

jwilk5y ago

mywittyname5y ago

Auto is the implicit default right? As in function scoped, stack allocated, and lives until the function is returned?

K&R (Second Ed). Makes no mention of the auto keyword in Section 1.10, but it does say,

1ris5y ago

bawolff5y ago

Auto is not a type, it means local variable. I think its still a thing, just its the default so nobody uses it.

layoutIfNeeded5y ago

Hehe, “auto” used to mean “automatic storage” aka the stack. Then much later C++ repurposed the keyword for type deduction.

captainmuon5y ago

"auto" probably is the storage class, it tells what kind of variable this is. Automatic as opposed to "register" which would force the variable to be a register, or "static" or "extern".

The type is not given at all, I think by default it would be "int".

dvt5y ago

> The type is not given at all, I think by default it would be "int".

Yep, this is called the "implicit int" rule, and it was specifically outlawed[1] by C99 and onward.

[1] https://herbsutter.com/2015/04/16/reader-qa-why-was-implicit...

ufo5y ago

One of the unusual things in this early version of C is that "int" can be used for any word-sized value, including pointers. The type system was very loose.

Turing_Machine5y ago

Even back then this was considered poor practice, however. The first edition of K&R had a subsection entitled "Pointers are Not Integers" (I don't know if that's still in modern editions).

3 more replies

Blikkentrekker5y ago

> types of the function parameters are not checked, anything can be passed to any function

Still a better type system than Twilight.

IncRnd5y ago

No. "auto" is not a type but a storage class that means automatically allocated instead of being allocated to a register, extern-al to the file, or in the static code segment.

Taniwha5y ago

"The compiler runs only in 32 bit mode as the original code assumes that the pointer size and word size are the same." ... which was um, 16-bits

vesinisa5y ago

I think he means that int and pointer address must be interchangeable. As long as that holds, the size can be either 16 bits or 32 bits.

Taniwha5y ago

jart5y ago

Why can't it be 64-bit? I don't see any reason why we can't have an ILP64 data model. If int and int* were both 64-bit then it would restore so much of the original beauty of C.

Someone5y ago

It can be, and is, on platforms where supporting large arrays (if integers are 32 bits, arrays can ‘only’ have 2³¹ entries (#)) is deemed more important than memory usage.

(#) https://software.intel.com/content/www/us/en/develop/documen... seems to imply that limit is 2³¹-1. I don’t understand why that would be true.

1 more reply

dboreham5y ago

It can be, but people have arranged for it to not be, presumably because they don't feel the storage space to have all integers be 8-bytes is not justified.

nils-m-holm5y ago

Yes, and on 16-bit and 32-bit systems, sizeof(int) == sizeof(int*). On 64-bit systems, this is most probably not the case. This is a common roadblock when porting old C programs.

29athrowaway5y ago

If the first C compiler was written in C... how could it be first C compiler? How could you compile the first C compiler?

dvt5y ago

The first (or proto) C compiler was written in B†[1] (called NB, or New B). This is the first C compiler written in C.

† Or maybe some variant of BCPL -- I'm not exactly sure how functionally different the two were.

[1] https://www.bell-labs.com/usr/dmr/www/chist.html

onei5y ago

aap_5y ago

The first B compiler was actually written in TMG, and once it was bootstrapped that way in B itself. BCPL was only the inspiration for the language.

sramsay5y ago

Wow, TMG was a new one for me. From the Wiki article on it:

We are not worthy, friends. We are not worthy.

3 more replies

haecceity5y ago

How was gcc bootstrapped?

Koshkin5y ago

Wikipedia has a relevant tidbit:

https://en.wikipedia.org/wiki/GNU_Compiler_Collection#Histor...

Nicksil5y ago

Bootstrapping

https://en.wikipedia.org/wiki/Bootstrapping_(compilers)

Blikkentrekker5y ago

If it could bootstrap itself, then there would be no need to port it to GCC.

From how I read it, it is not capable of bootstrapping itself, and an earlier C compiler in BCPL existed, this is the first C compiler written in C itself.

dmitrygr5y ago

this port is [optionally] a cross compiler - it will run on x86/arm/whatever and produce pdp11 assembly

on an actual pdp11 it CAN bootstrap

IncRnd5y ago

It was bootstrapped. For compilers, it is sort of "a thing" to finally bootstrap your new language compiler in your new language.

ForOldHack5y ago

The question of if the first C compiler was written in C, how could it be the first C Compiler?

Because to be the first, it has to be bootstrapped in an intermediate host language… You have to get a parser running, then the syntax, then the etc… etc…

( immense plug of the Ahl book here…)

Later on, UNIX came with tools to build compilers with, YACC and LEX.

If they got it to produce PDP-7 Code, its not really much of a port, really.

29athrowaway5y ago

It was a rhetorical question.

It wasn't the first C compiler, it was the first self-hosted C compiler, which is different.

pabs35y ago

https://bootstrappable.org/

johan_felisaz5y ago

Probably out of topic, but are there real examples of compiler attacks due to bootstrapping ? I did not hear about them before reading the scifi classic Accelerando by C. Stross

goldsteinq5y ago

Oh, the thing with bootstrapping attacks is you never know are there any real examples.

I recommend this[0] paper by Ken Thompson dated 1984 and still relevant.

[0]: https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_Ref...

1 more reply

polska5y ago

I do not understand why we should create a C compiler ported to GCC.

vkoskiv5y ago

Not a C compiler. This is the original C compiler from ~1972. It's just an experiment to bring a bit of computing history to life.

polska5y ago

Oh! At first I did not understand but now looks reasonable thanks

j / k navigate · click thread line to collapse