C2 Lang design (2014) [pdf] (opens in new tab)

(c2lang.org)

41 pointsasaka10y ago21 comments

21 comments

I understand why it's attractive to integrate more and more into the language proper, but I don't like it. Almost all tools which still embrace the unix way of doing things are written in C, and it's not just because that community is conservative. C feels like unix, because it's one of the only languages that abides by that philosophy. C-the-language is only part of the experience of programming in C. Programming in C includes DSLs and code generation and makefiles. Even what we consider C-the-language can be decomposed into preprocessor code and actual C.

That said, C needs to be improved, but I think it should be done via the build system, adding layers on top of C, in a tasteful, thought-out way. Some ideas below:

If we're expanding on C by adding build system complexity, Makefiles need to be improved. Make is a great tool, but it's unityped, like shell scripts. And it's essentially a preprocessor over shell, which leads to a mess of sigils. Maybe redesigning make as an embedded language in some lisp would do it. Additionally we could unify the notion of linking to a library and importing code into the makefile to allow dependencies to specify build steps. This could make some of the added build complexity implicit.

Namespaces could probably be implemeneted as a preprocessor, taking module declarations and import statements and converting all identifiers into they prefix qualified equivalents, emiting warnings when there would be a collision (like if module "foo" declared "bar_baz" and module "foo_bar" declared "baz").

Rust-style syntax-case macros can be implemented as a preprocessor.

Go-style defer statements could be implemented in a preprocessor, and avoid the somewhat verbose goto-style error handling.

As I said, this approach requires a lot of care to avoid adding a huge amount of complexity to the language.

xamuel10y ago

Thoughts as I read:

1. "uninitialized var usage is error": unfortunately impossible without at least one of the following compromises: Automatically initialize variables (wastes CPU); False alarms (see Java); Built-in formal proof system; or, Require compilers to solve the halting problem.

2. Removed keyword "static": kills one of my favorite tricks, "self-init'ing functions".

3. New keyword "as": A good invention in Pythonland. Good call to bring this in.

4. New keyword "nil": Redundant with NULL?

5. Example - Base Types: Uses uint8 in place of char. This obscures intent and makes code less readable. Compare: int library_fnc(char asterisk errmsg) versus int library_fnc(uint8 asterisk errmsg). (HN wants to turn my asterisks into italics...) In the former it's clear errmsg is a string, in the latter it's not clear (it could be a pointer to a flag).

6. Example - function types. Doesn't one usually typedef the function pointer, rather than the function itself? So making that require two lines is annoying. Aside that, the author is right that C has confusing function pointer typedef syntax.

7. Multi-part array initialization: Encourages unmaintainable code. Depending on what's in those "..."'s, might require compiler to solve halting problem?

8. Multi-pass parsing: Trades maintainability for instant gratification.

9. Symbol accessibility: The author makes "public" (and implicit "private") modify entire structs rather than individual fields...

10. Multi-file module: May lead to unmaintainable code

11. I'm worried about the language arbitrarily defining things like "the results of building are stored in the 'output' directory". OTOH the recipe.txt idea could help standardize what amounts to a lot of ad hoc Makefile programming.

12. Build process difference: Theoretically could speed up compilation. I'm worried for social reasons. In module-based languages, we tend to fall into module hell: one symptom being the infamous 20-page stacktrace (see: Java, Clojure, etc.) The nature of C's #include incentivizes shallow dependency trees (a very good thing).

13. "Language scope": trades portability for convenience

14. Tooling: This shouldn't be part of the language, it should be separate.

xyzzyz10y ago

You can always typedef it if you want it. The point is that naming a basic numeric type "char" is not only confusing, but also wrong in the world that's no longer all ASCII.

cremno10y ago

>2. Removed keyword "static": kills one of my favorite tricks, "self-init'ing functions".

The removal of that keyword with several different meanings doesn't mean there isn't/won't be a replacement:

http://c2lang.org/site/language/variables/#local-keyword

>4. New keyword "nil": Redundant with NULL?

https://groups.google.com/d/msg/comp.std.c/fh4xKnWOQuo/IAaOe...

>5. Example - Base Types: Uses uint8 in place of char. This obscures intent and makes code less readable.

http://c2lang.org/site/language/basic_types/

C2 apparently still has char however it doesn't seem to be as weird as C's (distinct type, either signed or unsigned). Simply int8.

jeffreyrogers10y ago

I agree with a lot of your thoughts. Here are some areas where I disagree

> 1. "uninitialized var usage is error": unfortunately impossible without at least one of the following compromises: Automatically initialize variables (wastes CPU); False alarms (see Java); Built-in formal proof system; or, Require compilers to solve the halting problem.

I don't think this is true. It should be pretty easy to detect if a variable is initialized or not. I can potentially see how a false alarm would arise, but I don't think that matters in practice. (All the situations I'm imagining involve writing bad code)

> 10. Multi-file module: May lead to unmaintainable code

Go does this already and it is fine.

> 12. Build process difference: Theoretically could speed up compilation. I'm worried for social reasons. In module-based languages, we tend to fall into module hell: one symptom being the infamous 20-page stacktrace (see: Java, Clojure, etc.) The nature of C's #include incentivizes shallow dependency trees (a very good thing).

I can see why this is a concern, but I think it is more a problem with JVM languages because of the type of programming Java encourages.

> 14. Tooling: This shouldn't be part of the language, it should be separate.

I thought this way too. Then I used Go and realized the huge benefit tooling integrated into the language provides. (Go has other problems, but tooling is not one of them).

eli_gottlieb10y ago

>I don't think this is true. It should be pretty easy to detect if a variable is initialized or not. I can potentially see how a false alarm would arise, but I don't think that matters in practice. (All the situations I'm imagining involve writing bad code)

It's easy to trace the code-paths between a variable's declaration and its usage, as long as those don't involve procedure calls. Then that problem becomes "static-analysis complete".

JadeNB10y ago

Isn't this just explaining why such useage can't be made a static error? It seems to me that raising runtime errors would avoid each of these compromises (but maybe that's clearly not what was meant).

anon410y ago

1. Works perfectly in Java. Note that in Java vars are both initialised to known empty values and not initialising a var explicitly in a local scope is an error. If the compiler can't prove your code is correct, then it's not obviously correct, which means I have to sit down and carefully think about if it's correct or not. Just write simple code.

Though I concede that pointers make things very hard. Let's say you have

    void foo(int*);
    void bar() {
        int x;
        foo(&x); /* Is this an error? */
    }

You can't tell if x will be initialised or is expected to be initialised. You don't even know if foo will read one int, or expects an array of ints. I would have really preferred it if they did something on that front. Maybe have to declare foo(int[n] a), which you then call as foo([1]&x). There was a paper on extending C with exactly this - though their syntax was foo(size_t n, int a[n]), but I haven't been able to find it. One big plus is that you could, when compiling in debug mode, insert checks for every access. In general, I really want the successor to C to disambiguate between pointers and dynamically-allocated arrays.

4. Same reason C++ added nullptr (and I really wish they'd use the same keyword) - if you have a function int foo(int), foo(NULL) compiles fine, but foo(nullptr) doesn't. In C++ especially, since pointer types require casts, so NULL can't be ((void ptr)0), it must be just 0. Once you disallow casting a void ptr to any other ptr, you run into this problem.

5. This is more a problem of C's strings being naked pointers to chars. Every serious piece of code I've seen uses some sort of string wrapper struct. Though they might want to alias char to uint8, for this.

6. Function types should really be pointers, I agree.

7. yeah...

8. Practically every other language does it. It's kind of ludicrous to want every function declared before its usage, when the compiler can just collect all declarations, then see where they're used.

9. I think that's about the right level of granularity. Seems like you'd use struct embedding to separate the public and private fields, e.g.

    type Foo_priv { private fields }
    type Foo { public fields; Foo_priv priv; }

The fact you can't create a Foo from outside the module seems like a bonus to me.

10. I'd be in favour of enforcing Java's "project directory structure must mirror module structure", i.e. module a.b.c's files are all in src/a/b/c. I've heard a lot of C# devs lament that the files for a certain package are strewn everywhere.

11. Kind of a sore point, but yeah. I think emulating Java would be the right thing again - give the compiler a list of dirs which to check for compiled modules when doing module resolution, say where the output directory is.

12. yeah

13. Sounds like #pragma. You can't really escape the fact that when you have N compilers for your language, every one will support its own extensions.

14. I think his point is that the language makes these tools easy to write.

_kst_10y ago

> This is more a problem of C's strings being naked pointers to chars.

A C string is a sequence of characters, not a pointer. C strings are manipulated using `char*` pointers.

bluejekyll10y ago

Is there an advantage to this over say a more modern and safe language like Rust? It seems to be just reducing the complexity of the language, but doesn't look like it will reduce memory related bugs.

rwmj10y ago

Since you have to rewrite everything, you might as well switch to another language (we switched to OCaml). If there was an incremental path or a safe subset of C or something like that, that would be more interesting.

tempodox10y ago

My thoughts exactly. If nothing bad happens, I think Rust will be a worthy successor of C in low-level land. If I need a higher abstraction level, I use either Lisp if I want dynamic typing, or OCaml if I want static typing. Together, those three cover a vast spectrum.

addendum: That doesn't mean C2 is not a worthwhile experiment. I might even try it out when a compiler is available.

steveklabnik10y ago

We have pretty good C interop, so you don't have to re-write everything, you can do it in chunks. Firefox isn't suddenly going to be reimplemented in Rust, for example, it will be library by library, bit by bit.

Of course, you can do that with some other languages too, but our lack of runtime and no-overhead makes it significantly better, in my opinion.

qznc10y ago

You can use D, which shares a lot of syntax with C, although you cannot directly reuse C code because there is no preprocessor in D. Some people use D as a "C development compiler".

The incremental path is the official C standards. C will probably gain modules for example.

1 more reply

jeffreyrogers10y ago

This is cool. I've often toyed with a similar idea of creating a language that improves/fixes the thing C messed up. If you aren't worried about safety (memory bugs can largely be avoided by changing how you do memory allocation, i.e. switch from individual mallocing to region based memory management) then C is actually a pretty nice language since it is simple enough to hold the entire language in your head. Plus it's nice to know how things are actually laid out in memory. The problems C2 solves are really the main things that frustrate me about C: header files, lack of a build systems, no modules, spiraling type signatures.

yoklov10y ago

Looks extremely interesting. I'm skeptical of some of its claims (faster compilation when incremental compilation is removed sounds unlikely, for example), but nothing that worrisome.

Anybody have any experience? Is it still basically a toy language?

aidenn010y ago

The main site makes it look like the language is about a year old.

I have no doubt that they can make the parsing stage faster, as they won't be parsing the same headers over and over again. This is often a bottleneck in C++, but much less often in C. (I've seen 10+MB C++ files after preprocessing).

buserror10y ago

If people used precompiled headers (which have been available for about 20 years) that problem wouldn't be there... However I'm not sure that the parsing is such a bottleneck these days; I think all the complex constructs are a lot more of a time sink (templates etc). That and linking! Takes an hour to link the webkit library with the regular linker :-)

1 more reply

felixangell10y ago

Very similar to Ark: www.github.com/ark-lang/ark. Except Ark has no GC, has tagged enums, ownership is enforced, and a few other smaller differences.

j / k navigate · click thread line to collapse

21 comments

adrusi10y ago

That said, C needs to be improved, but I think it should be done via the build system, adding layers on top of C, in a tasteful, thought-out way. Some ideas below:

Rust-style syntax-case macros can be implemented as a preprocessor.

Go-style defer statements could be implemented in a preprocessor, and avoid the somewhat verbose goto-style error handling.

As I said, this approach requires a lot of care to avoid adding a huge amount of complexity to the language.

xamuel10y ago

Thoughts as I read:

2. Removed keyword "static": kills one of my favorite tricks, "self-init'ing functions".

3. New keyword "as": A good invention in Pythonland. Good call to bring this in.

4. New keyword "nil": Redundant with NULL?

7. Multi-part array initialization: Encourages unmaintainable code. Depending on what's in those "..."'s, might require compiler to solve halting problem?

8. Multi-pass parsing: Trades maintainability for instant gratification.

9. Symbol accessibility: The author makes "public" (and implicit "private") modify entire structs rather than individual fields...

10. Multi-file module: May lead to unmaintainable code

13. "Language scope": trades portability for convenience

14. Tooling: This shouldn't be part of the language, it should be separate.

xyzzyz10y ago

You can always typedef it if you want it. The point is that naming a basic numeric type "char" is not only confusing, but also wrong in the world that's no longer all ASCII.

cremno10y ago

>2. Removed keyword "static": kills one of my favorite tricks, "self-init'ing functions".

The removal of that keyword with several different meanings doesn't mean there isn't/won't be a replacement:

http://c2lang.org/site/language/variables/#local-keyword

>4. New keyword "nil": Redundant with NULL?

https://groups.google.com/d/msg/comp.std.c/fh4xKnWOQuo/IAaOe...

>5. Example - Base Types: Uses uint8 in place of char. This obscures intent and makes code less readable.

http://c2lang.org/site/language/basic_types/

C2 apparently still has char however it doesn't seem to be as weird as C's (distinct type, either signed or unsigned). Simply int8.

jeffreyrogers10y ago

I agree with a lot of your thoughts. Here are some areas where I disagree

> 10. Multi-file module: May lead to unmaintainable code

Go does this already and it is fine.

I can see why this is a concern, but I think it is more a problem with JVM languages because of the type of programming Java encourages.

> 14. Tooling: This shouldn't be part of the language, it should be separate.

I thought this way too. Then I used Go and realized the huge benefit tooling integrated into the language provides. (Go has other problems, but tooling is not one of them).

eli_gottlieb10y ago

It's easy to trace the code-paths between a variable's declaration and its usage, as long as those don't involve procedure calls. Then that problem becomes "static-analysis complete".

JadeNB10y ago

anon410y ago

Though I concede that pointers make things very hard. Let's say you have

    void foo(int*);
    void bar() {
        int x;
        foo(&x); /* Is this an error? */
    }

6. Function types should really be pointers, I agree.

7. yeah...

8. Practically every other language does it. It's kind of ludicrous to want every function declared before its usage, when the compiler can just collect all declarations, then see where they're used.

9. I think that's about the right level of granularity. Seems like you'd use struct embedding to separate the public and private fields, e.g.

    type Foo_priv { private fields }
    type Foo { public fields; Foo_priv priv; }

The fact you can't create a Foo from outside the module seems like a bonus to me.

12. yeah

13. Sounds like #pragma. You can't really escape the fact that when you have N compilers for your language, every one will support its own extensions.

14. I think his point is that the language makes these tools easy to write.

_kst_10y ago

> This is more a problem of C's strings being naked pointers to chars.

A C string is a sequence of characters, not a pointer. C strings are manipulated using `char*` pointers.

bluejekyll10y ago

rwmj10y ago

tempodox10y ago

addendum: That doesn't mean C2 is not a worthwhile experiment. I might even try it out when a compiler is available.

steveklabnik10y ago

Of course, you can do that with some other languages too, but our lack of runtime and no-overhead makes it significantly better, in my opinion.

qznc10y ago

You can use D, which shares a lot of syntax with C, although you cannot directly reuse C code because there is no preprocessor in D. Some people use D as a "C development compiler".

The incremental path is the official C standards. C will probably gain modules for example.

1 more reply

jeffreyrogers10y ago

yoklov10y ago

Looks extremely interesting. I'm skeptical of some of its claims (faster compilation when incremental compilation is removed sounds unlikely, for example), but nothing that worrisome.

Anybody have any experience? Is it still basically a toy language?

aidenn010y ago

The main site makes it look like the language is about a year old.

buserror10y ago

1 more reply

felixangell10y ago

Very similar to Ark: www.github.com/ark-lang/ark. Except Ark has no GC, has tagged enums, ownership is enforced, and a few other smaller differences.

j / k navigate · click thread line to collapse