Is libgccjit not “a nice library to give access to its internals?”
The MIDI protocol is pretty good for what it is designed for, and you can make it work for actual real networking, but the connections will be clunky, unergonomic, and will be missing useful features that you really want in a networking protocol.
I'd be very interested if the author could provide a post with a more in depth view of the passes, as suggested!
Yes, please!
It seems that the terminology as evolved, as we speak more broadly of frontends and backends.
So, I'm wondering if Bison and Flex (or equivalent tools) are still in use by the modern compilers? Or are they built directly in GCC, LLVM, ...?
There was some research on parsing C++ with GLR but I don't think it ever made it into production compilers.
Other, more sane languages with unambiguous grammars may still choose to hand-write their parsers for all the reasons mentioned in the sibling comments. However, I would note that, even when using a parsing library, almost every compiler in existence will use its own AST, and not reuse the parse tree generated by the parser library. That's something you would only ever do in a compiler class.
Also I wouldn't say that frontend/backend is an evolution of previous terminology, it's just that parsing is not considered an "interesting" problem by most of the community so the focus has moved elsewhere (from the AST design through optimization and code generation).
Personally I love the (Rust) combo of logos for lexing, chumsky for parsing, and ariadne for error reporting. Chumsky has options for error recovery and good performance, ariadne is gorgeous (there is another alternative for Rust, miette, both are good).
The only thing chumsky is lacking is incremental parsing. There is a chumsky-inspired library for incremental parsing called incpa though
In typical modern compilers "frontend" is basically everything involving analyzing the source language and producing a compiler-internal IR, so lexing, parsing, semantic analysis and type checking, etc. And "backend" means everything involving producing machine code from the IR, so optimization and instruction selection.
In the context of Rust, rustc is the frontend (and it is already a very big and complicated Rust program, much more complicated than just a Rust lexer/parser would be), and then LLVM (typically bundled with rustc though some distros package them separately) is the backend (and is another very big and complicated C++ program).
AFAIK the reason is solely error messages: the customization available with handwritten parsers is just way better for the user.
The hard part about compiling Rust is not really parsing, it's the type system including parts like borrow checking, generics, trait solving (which is turing-complete itself), name resolution, drop checking, and of course all of these features interact in fun and often surprising ways. Also macros. Also all the "magic" types in the StdLib that require special compiler support.
This is why e.g. `rustc` has several different intermediate representations. You no longer have "the" AST, you have token trees, HIR, THIR, and MIR, and then that's lowered to LLVM or Cranelift or libgccjit. Each stage has important parts of the type system happen.
In particular, it makes parsing everything look like a huge difficult problem. This is my main problem with the Dragon Book.
In practice everyone uses hacky informal recursive-descent parsers because they're the only way to get good error messages.
Most roll their own for three reasons: performance, context, and error handling. Bison/Menhir et al. are easy to write a grammar and get started with, but in exchange you get less flexibility overall. It becomes difficult to handle context-sensitive parts, do error recovery, and give the user meaningful errors that describe exactly what’s wrong. Usually if there’s a small syntax error we want to try to tell the user how to fix it instead of just producing “Syntax error”, and that requires being able to fix the input and keep parsing.
Menhir has a new mode where the parser is driven externally; this allows your code to drive the entire thing, which requires a lot more machinery than fire-and-forget but also affords you more flexibility.
The rest of the f*cking owl is the interesting part.
GCC approach is on purpose, plus even if they wanted to change, who would take the effort to make existing C, C++, Objective-C, Objective-C++, Fortran, Modula-2, Algol 68, Ada, D, and Go frontends adopt the new architecture?
Even clang with all the LLVM modularization is going to take a couple of years to move from plain LLVM IR into MLIR dialect for C based languages, https://github.com/llvm/clangir
The idea is that you should link the front and back ends, to prevent out-of-process GPL runarounds. But because of that, the mingling of the front and back ends ended up winning out over attempts to stay modular.
[0]: https://lists.gnu.org/archive/html/emacs-devel/2015-02/msg00...
[1]: https://lists.gnu.org/archive/html/emacs-devel/2015-01/msg00...
I am not familiar enough with gcc to know how it impacts out-of-tree free projects or internal development.
The decision was taken a long time ago, it may be worth revisiting it.
That said, if Rust is going to continue entrenching itself in the open source software that is widely in use, it should at least be able to be compiled with by the mainline GPL compiler used and utilized by the open source community. Permissive licenses are useful and appreciated in some context, but the GPL’d character of the Linux stack’s core is worth fighting to hold onto.
It’s not Rust in open source I have a problem with, it is Rust being added to existing software that I use that I don’t want. A piece of software, open source, written in Rust is equivalent to proprietary software from my perspective. I’ll use it, but I will always prefer software I can control/edit/hack on as the key portions of my stack.
The language itself I find wonderful, and I suspect that it will get significantly better. Being GPL-hostile, centralized without proper namespacing, and having a Microsoft dependency through Github registration is aggravating. When it all goes bad, all the people silencing everyone complaining about it will play dumb.
If there's anything I would want rewritten in something like Rust, it would be an OS kernel.
Never attribute to malice that which can be adequately explained by apathy. We have, unfortunately, reached a point where most people writing new software default to permissive and don't sufficiently care about copyleft. I wish we hadn't, but we have. This is not unique to Rust.
Ironically, we're better off when existing projects migrate to Rust, because they'll keep their licenses, while rewrites do what most new software does, and default to permissive.
Personally, I'm happy every time I see a new crate using the GPL.
> GPL-hostile
Rust is not GPL-hostile. LLVM was the available tool that spawned a renaissance of new languages; GCC wasn't. The compiler uses a permissive license; I personally wish it were GPL, but it isn't. But there's nothing at all wrong with writing GPLed software in Rust, and people do.
> having a Microsoft dependency through Github registration is aggravating
This one bugs a lot of us, and it is being worked on.
Not sure if it is particularly hostile. There are several GPL crates like Slint.
> Microsoft dependency through Github registration is aggravating
This one is concerning.
Many forget that Microsoft went from "FOSS is bad", to now having their fingers across many key FOSS projects.
They are naturally not the only ones, a developer got to eat, and big tech gladly pays the bills when it fits their purposes.