In their "tiered mode", they put sampling instrumentation into the native code, and if they detect a hotspot, regenerate fully instrumented native code from bytecode using the C1 (fast) JIT, which then allows the C2 JIT to do its full optimizations on the code as if AoT were not involved.
Since the invention of tracing JITs, I've often wondered why languages don't package together a compact serialized SSA form such as LLVM bitcode or SafeTSA along with storing functions as lists of pointers to space-optimized compilations of extended basic blocks (strait-line code), similar to how some Forth compilers generate threaded code. A threaded code dispatcher over these strait-line segments of native code would have minimal overhead, and when a simple SIGPROF lightweight sampler detected a hotspot, a tracing version of the dispatcher could collect a trace, and then generate native code from the visited traces using the stored SSA for the basic blocks.
In this way, they'd have a light-weight tracing JIT for re-optimizing native code.
SDE didn't propose starting with SSA, but could easily work with an SSA representation. SDE basically functions as a compression mechanism for an semantic IR that builds a dictionary on compression/decompression reminiscent of LZW. So instead of storing straight byte code, you store a compact higher level representation, that could very well be SSA, that is structure for you to generate code while "decompressing" it, and reuse generated code fragments as "templates" for later fragments.
An implementation was built in Oberon, compact tree representation (you could do a DAG with some adjustments) that mirrors your code generation orderand e.g. used to support PPC and M68k from the same "binares" in MacOberon. The way it was structured makes retaining arbitrary higher level structure of the programs very straight forward.
I keep wanting to do something with SDE, but life keeps intervening... I see it as a huge shame that more work didn't go into exploring that alternative to straight up bytecode, but it basically had way too little head start on Java, and I believe Franz' moved to Java for his subsequent research on code generation.
[1] https://en.wikipedia.org/wiki/Semantic_dictionary_encoding
You may be interested to look further into Eclipse OMR, which is a generic VM used by IBM for many of their runtimes (including J9). The Testarossa JIT support landed last week, and although it doesn't support bitcode form directly there are optimisations that can be used to transform the static parts of the class from the dynamic parts, to facilitate loading. There is an IL for the JIT and interpreter use.
I (and others) have noted that for more than a decade, it seems that Java would have been better off under IBM than under Sun/Oracle (SWT vs. Swing/AWT, jikes vs. javac, Jalapeno/JikesRVM vs. not much interesting research until Graal, etc.) It's really a shame IBM didn't buy up Sun's Java intellectual property at fire sale prices.
Commercial JDKs always offered AOT compilation, the problem is that people nowadays apparently don't buy compilers anymore unless forced to do so (e.g. embedded, consoles...).
Desktop Java also had many other problems, which can be summarised as "the JVM is its own OS". You can't write an application in Java that has a native look and feel. Or at least you couldn't for the first several significant years of its life and even now I don't think there's a good story for writing a simple native application. Meanwhile you could grab wxWidgets or Qt (and there goes your budget for a java compiler) and have a native-looking cross-platform application. Which very few did, because back then Mac OSX didn't exist, Apple were on their death bed and "Linux Desktop Environment" was even more of a joke than it is today.
So yeah, it didn't make any bit of sense to develop Java desktop apps given that you already had a large pool of proficient C++ developers, the only platform you cared about was Windows and Java GUI libraries insisted on reinventing their own look and feel. Oh and you could always just buy Delphi if you didn't want to suffer C++ (again, for a fraction of the price of a commercial Java compiler).
Nowadays people wrap a bunch of javascript in an electron instance, but this only happened after the web took off and nobody really looks at native desktop apps much. If this AOT work can give us fully contained native executables that we can distribute without having the user install Java and with significantly better performance than nodejs, maybe Java on the desktop can still happen.
Lest the title is changed:
AOT compilation is coming to Java 9 (java.net)
18 points by hittaruki 37 minutes ago https://www.youtube.com/watch?v=Xybzyv8qbOc
The project seems to have gone slower than I expected, perhaps because Chris Thalinger moved to Twitter.This implies it will be in Java 9 (in a limited fashion).
http://alblue.bandlem.com/2016/09/javaone-hotspot.html
The presentation wasn't recorded but there is a video recorded from a DocklandsLJC event which is on InfoQ:
https://www.infoq.com/presentations/hotspot-memory-data-stru...
That potentially includes the fully resolved types of objects (ie devirtualization), branch prediction (stronger than the CPU can do; for instance, if a value is only used inside a branch that's never taken, don't bother mutating it), data sizes (this "array" is only ever size 2, store it in registers), dead code elimination (keeps the compiled code small), and a whole bunch more fun stuff.
Stuff like this makes me nervous. Performance is already a complex topic, and stuff like this makes it even more complex. Unnecessarily so. If we were talking about a very high-level programming language (say, Prolog), you could argue that the expressiveness benefits outweigh the cost of the runtime system's complexity. But Java isn't even as expressive as C++, let alone Prolog.
> fully resolved types of objects (ie devirtualization)
C++ (and similar languages: D, Rust, etc.) and MLton (a Standard ML implementation) have been using monomorphization for ages, which is a compile-time analogue of devirtualization. Moreover, monomorphization has important advantages over devirtualization:
(0) It's completely predictable. You don't need to guess when it will happen. It happens iff the concrete type (and its relevant vtables, if necessary) can be determined at compile-time: https://blog.rust-lang.org/2015/05/11/traits.html
(1) It's always a sound optimization, so it doesn't have to be undone at runtime under any circumstances.
(2) It's relatively simple to implement. In fact, a compiler front-end can completely monomorphize a program before handing it over to the back-end for target code generation.
> if a value is only used inside a branch that's never taken, don't bother mutating it)
The best way to handle unreachable branches is to avoid creating them in the first place. With proper use of algebraic data types and pattern matching, unreachable branches can be kept to a minimum, or even outright eliminated in many cases.
> data sizes (this "array" is only ever size 2, store it in registers)
C and similar languages natively handle statically sized arrays, so there's no need for runtime profiling and analysis just to determine that an array will always have size 2.
ML does something even better: you just use tuples (in this case, pairs), which reflect your intent much better than using arrays whose size has to be tested or guessed.
---
What I take away from this is that the JVM's supposedly “fancy” optimizations exist primarily to work around the Java language's lack of amenability to static analysis.
I always got the sense the world is waiting for a statically typed Python that compiles to native code with Go's CPU performance. I suppose Nim might fit that bill but a shame it doesn't have compatibility with Python's or even the extent of a language like Go's libraries. And if possible, an imperative language that interfaces with OTP.
And that said, I can see why Erlang/Elixir wouldn't make as much sense or even work with native code AOT compilation due to it's feature set (thinking stuff like hot code reloading). But I've never grasped why Java or Python were better off with JIT or interpreters than AOT comp. Seems like a type system such as Go's is simple enough and allows for good gains in both CPU performance and memory usage. Add in the fact you don't need to install anything and less to think about in deploying and it seems to be a no brainer. Please feel free to fill me in on this or where I went wrong..
https://web.archive.org/web/20050420081440/http://java.sun.c...
When the appliance market didn't pan out, they went for web browsers and Java applets. Bytecodes were a feature because browsers didn't exectute native code, and because it allowed for sandboxing to limit the attack surface.
Even when Java became more popular on the server than in the browser, the "write once, run everywhere" was considered a major feature: The same bytecode could be distributed everywhere; no need to maintain a heap of different build environments for different CPU architecture and OS combinations.
Abstracting the CPU has worked out pretty well for the Java platform. Look at how easy the 64 bit transition was for the Java world vs the C++ world. Visual Studio is still not a 64 bit app and yet Java IDEs hardly even noticed the change. The transition on Linux was just a disaster zone, every distro came up with their own way of handling the incompatible flavours of each binary.
In addition, a simple JIT compiled instruction set makes on the fly code generation a lot easier in many cases and it's a common feature of Java frameworks. For instance the java.lang.reflect.Proxy feature is one I was using just the other day and it works by generating and loading bytecode at runtime. On the fly code generation is considered a black art for native apps and certainly extremely non portable, but is relatively commonplace and approachable in Java.
So they could keep the WORA story and still offer AOT as an option, which actually most commercial JDKs do.
Just Sun was against providing it at all on Java SE, but they actually supported it on Java Embedded.
Talking about AOT compilation at Sun was tabu and I remember seeing a few forum discussions from former employees disclosing this.
Plenty of other platforms do support bytecodes, JIT and AOT on the same toolchain.
So they could keep the WORA story and still offer AOT as an option, which actually most commercial JDKs do.
Java is old. It's seen a lot of CPU architectures come and go over the years. When it started out x86, SPARC and POWER, were important. Then it saw a mass migration from x86 to amd64 on the desktop and server side, and an explosion in the importance of ARM in mobiles (several flavours).
Along the way it's seen lots of smaller proprietary architectures come and go too, like the exotic DSP-oriented processors found in BluRay players and pre-smartphone phones and like the Azul Vega architecture that was specifically designed for executing business Java.
And don't forget that even amd64 is not a homogenous architecture. It adds new CPU instructions pretty regularly and thus can be seen as a long line of compatible but different CPU architectures. Java apps transparently get support for all of them on the fly, without having to recompile the world. You see the benefit when you realise the size of Maven Central ... there are JARs out there that are still useful and good even a decade after they were compiled, yet they still get optimised to full speed using the latest CPU instructions no matter what kind of computer you use.
What costs java performance these days is not the quality of the JIT compilers or even the garbage collectors. It's the object layout that is not very cache-friendly. There is lots of pointer-chasing going on since there are no arrays-of-structs.
Valhalla[0] promises to improve the data layout issue at some point in the future while graal may allow compiler writers to cram some more optimizations into the jits.
I'm not aiming this just at you, but I think many people (node.js users in particular come to mind) don't realise just how good the JVM is, performance-wise. I'm not a great fan of Java the language, but the JVM is top class.
The primary thing people seem to like about Go is that it produces single native binaries. You can do that with Java too (I gave an example of Avian further up the thread), but people don't tend to bother because distributing a single JAR is not much harder and avoids any assumptions about what OS the recipient might have. Go users seem invariably to be writing programs for their own use and Go doesn't really "do" shared libraries, so they don't ever encounter the problem of distributing a binary of the wrong flavour because they don't distribute binaries at all.
By the way, in Java 8 there's a tool that produces Mac, Linux and Windows standalone packages and installers that don't depend on any system JVM. I've used it to distribute software successfully, although I had to make my own online update system for it. In Java 9 it's being extended quite a bit with the new "jlink" tool that does something similar to static linking ... the output of jlink is either a directory that's a standalone JRE image optimised and stripped to have only the modules your app needs, or you can combine it with the other tool to get a MacOS DMG (with an icon, code signing etc), Windows MSI/EXE (ditto), or a Linux DEB/RPM/tarball.
This isn't a single file at runtime of course, it's a single directory, but basically any complex native app will have data files and some sort of package too so that's not a big deal.
Most commercial JDKs do support AOT compilation to native code, and alongside Java library and eco-system, it definitely makes it more than a solid competitor to Go.
The problem is that free AOT compilers never were a big match to the ones from commercial JDKs, and in this day and age, most developers don't pay for compilers unless forced to do so.
So Java AOT compilers are usually only used by enterprise companies.
For Go, .go -> native
For Java, .java -> .class -> package .jar -> AOT native
For Go part I might be wrong, not working on Go professionally.
That sort of makes no sense. How can you incur a real performance hit if the uncompiled method is rarely called?
Of course, I'm not an expert on JVMs, so I wouldn't know whether their analysis is synchronous or asynchronous or a mix of both.
From what I could gather, this is the process one would follow to get native code:
.java -> javac -> .class (still cross-platform bytecode) -> jaotc -> .so native code
My general impression is that the design of classloaders is pretty actively hostile to making JVM startup fast.
AOT and JIT are not mutually exclusive. From the proposal itself:
> AOT libraries can be compiled in two modes:
> Non-tiered AOT compiled code behaves similarly to statically compiled C++ code in that no profiling information is collected and no JIT recompilations will happen.
> Tiered AOT compiled code does collect profiling information. The profiling done is the same as the simple profiling done by C1 methods compiled at Tier 2. If AOT methods hit the AOT invocation thresholds these methods are being recompiled by C1 at Tier 3 first in order to gather full profiling information. This is required for C2 JIT recompilations to be able to produce optimal code and reach peak application performance.
The other thing it adds is the backing of a giant, like Oracle, which can bring stability and peace of mind to some people, when deciding whether to adopt the technology or not.
- can ship something in time
- and that it will be generally available for developers (looking at how hard Oracle pushes their Java department to invent commercial features they can sell, I'm not sure about that)
Looking at it, I assume that this will go the way of GWT ... not starting from "how can we make Java a good citizen in this new ecosystem?", but "here we have 100% of Java, the JDK and the JVM ... how can we compile this with full fidelity into X?".
Some other JVMs (at least Azul's Zing) try to solve this by cache profiling information to speed up code generation.
https://www.youtube.com/watch?v=Xybzyv8qbOc
Basically they thought it'd de-opt too much. I'm not totally sure it's the case but they'd be the experts on that.