But that doesn't mean you can't use a conventional compiler stack like LLVM as a JIT and get excellent code - it' just going to take its own sweet time doing so.
Can anyone think of any reasonably common stacks using LLVM as a JIT? There's mono, but that's a non-default mode; not sure if it's typically used. The python unladen-swallow experiment failed. Webkit had a short-lived FLT javascript optimization pass, but that was replaced by B3.
Which is just a long-winded way to suggest that LLVM is not likely to be ideal as a JIT, at least based on what past projects have done.
(Not trying to imply that writing C to disk is better, but it may well be simpler & more flexible - not worthless qualities for an initial implementation).
The main place where LLVM bites you is compatibility. There simply is none. This is a constaint drain on your resources and a lot of projects can't afford to keep up. There is even a project on LLVM's own home page which is was on 3.4 for a long time and has just recently upgraded to 3.8 [2].
But if the alternative is shelling out to a C compiler? I'll take LLVM any day. The issue is not just the overhead of a call to an external program, it's all the extra complexity that comes along with that. It is very, very easy for this approach to break, especially when you consider the breadth of C compilers that exist, and all the possible ways they can be configured. In contrast, LLVM is "just" a library that you link to.
If anything, I'd bet plain C is much simpler because it hasn't changed much, and is very unlikely to ever to anything very suprising on any future platform - which cannot be said of raw LLVM.
And of course shelling out is a a bit of a hassle, but hey; it's a well-trodden path on unix. It's not the fastest, greatest interop in the world, but it's good enough for a lot of things.
(and wow- terra sounds impressive!)
I'll just say that my views come mainly from experience, specifically ECL (Embeddable Common Lisp, a CL implementation) and (this was further back, so my memory is fuzzy) a tool for generating executables from Perl scripts. I don't think I'm using an especially unusual setup, or unusual compilers, and I would guess that these tools probably target a very narrow subset of C. Despite this, my experience with these sorts of tools has been anything but "works out of the box". On the contrary, there appear to be a great number of degrees of freedom, even with standard-ish setups, that can trip up these tools. Because of the additional layers of abstraction, the error messages you get are very poor. Some header file is missing or in an unexpected place, or worse some generated code fails to compile. As an end-user, it's basically impossible to debug these in a reasonable way.
You can certainly have internal errors using LLVM, but in my experience fewer of them are platform-dependent. Therefore there is a greater chance that something that works for the developer will work for the user. Also, if error handling is done properly, if a failure does occur it can often mapped back to the original source program. This is much better as far as usability goes, since the user almost never wants to debug some compiler's generated code.
Yea, it's annoying. For PostgreSQL I've decided to focus on the C API wherever possible exactly out of that reason. A bit more painful to write, but not even remotely as quickly moving. Obviously there's parts where that's not possible - but even there I've decided to localize that as much as possible.
[1]: https://www.khronos.org/registry/spir-v/specs/1.0/SPIRV.pdf
We just added LLVM based JIT to PostgreSQL. Don't think we have quite the same issues as JITing generic interpreted languages though, because the planner gives us much more information about the likely cost of executing a query. So the need for a super-fast baseline JIT isn't as big.
> But that doesn't mean you can't use a conventional compiler stack like LLVM as a JIT and get excellent code - it' just going to take its own sweet time doing so.
I think that's partially due to people using the expensive default pipeline when using optimization. A lot of those either don't make sense for the source language, or not for the first JIT foreground JIT pass.
The biggest issue I have with LLVM wrt around JITing is that it's error handling isn't really good enough. It's fine to just fatal error if you're in a AOT compiler world, but that's much less acceptable inside a database. There's moves to make at least parts of LLVM exception safe, but ...
PostgreSQL - although i doubt that's the sort of thing you had in mind!
After LLVM 3.4 or so with the forcible move to “MCJIT” (now ORCJIT maybe?) it suddenly got even more painful though. While the Module system in LLVM was always abused by the JIT, it was a sad day for many of us who instead pinned to 3.4 for a while. I haven’t followed up in a while to see how the newer JITs have progressed, but I believe the last-layer JIT for Safari uses LLVM as well.
tl;dr: for the right time versus execution speed trade-off, LLVM is still awesome.
Since you have some experience - do you think shelling out would have been much more painful?
Shelling out (which I’ve also done) is okay, but you never get to really teach the backend what you know. That is, no matter how hard you try, you can’t teach gcc, icc, or clang that you know it’s safe to just fetch this function pointer off a struct and that it’s stable. Writing a simple pass in LLVM though is incredibly straightforward. You can even do a simple inliner, that knows how to inline just the runtime callsites you care about.
Like the WebKit folks and the HHVM folks before them: dynamic languages have enough complexity that you often get most of the win from a “basic compilation” (compared to say C/C++) so after you’ve proven out what you need, you roll your own.
Shelling out though would be strictly worse than the LLVM in-memory approach, since it gets you no additional benefit (in some ways it’s harder, since you can’t just say “jump to this address”), you lose a lot of upside (custom passes, letting you tune optimizations and instruction selection beyond simply -O0, -O1, etc.), and then you get to require users to have a compiler on their box.
I’d personally look at nanojit or the other JIT libraries before shelling out to a regular compiler.