https://prog21.dadgum.com/47.html: “By the mid-1990s, Borland was citing build times of hundreds of thousands of lines of source per minute”
I know it’s a different language, but I don’t think Java is significantly harder to parse than Pascal, bytecode doesn’t have to be heavily optimised (it gets heavily morphed at runtime) and computers are a lot faster than in the 1990s.
Also, recently (2020) https://news.ycombinator.com/item?id=24735366 said: “Delphi 2 can compile large .pas files at 1.2M lines per second.”
Or am I mistaken in the idea that Java isn’t hard to parse? If so, why is it hard to parse? Annotations? Larger programs with lots of dependencies?
Delphi did some things that made it unusually fast to parse, like being single pass (meant you could not arrange your code as you saw fit as backreferences didn't work). Also, javac suffers from being JIT compiled so a lot of CPU is wasted each time it's invoked unless you use daemons like Gradle does.
But also, the Delphi compiler was IIRC at least partly written in assembly not Delphi itself.
You could make javac much faster just by compiling it with GraalVM but then you'd lose the ability to load plugins like annotation processors. Delphi's compiler wasn't pluggable in any way (at that time?).
The actual consequence is that you had to declare things at the beginning of the block. It handled forward declarations just fine. This had minimal impacts on actually "arranging your code."
Would it be faster? To start up, sure, but I'd imagine the compiler to rate quite well on the scale of how much it benefits from the dynamic runtime optimizations that are only possible with jit compilation.
Right, this is an example of the Java speed being terrible because Borland was almost as good thirty years ago. 32,000 lines per second is 100,000 lines per three seconds, 2 million lines per minute. Compare the statistic that wasn't thirty years old:
> “Delphi 2 can compile large .pas files at 1.2M lines per second.”
> But it is nowhere near how fast the Java compiler can run.
And then it explains why that initial build was compiling only 32k lines per second.
I am not very familiar with those myself, simply aware of them.
[1] https://github.com/apache/maven-mvnd
[2] https://docs.gradle.org/current/userguide/gradle_daemon.html
For Gradle, if you turn off the gradle daemon, it gets even slower than the numbers presented going from 4+ seconds to 10+ seconds per compile:
lihaoyi mockito$ git diff
diff --git a/gradle.properties b/gradle.properties
index 377b887db..3336085e7 100644
--- a/gradle.properties
+++ b/gradle.properties
@@ -1,4 +1,4 @@
-org.gradle.daemon=true
+org.gradle.daemon=false
org.gradle.parallel=true
org.gradle.caching=true
org.gradle.jvmargs=-Xmx2048m -Dfile.encoding=UTF-8 \
lihaoyi mockito$ ./gradlew clean; time ./gradlew :classes --no-build-cache
10.446
10.230
10.268Hi, nice article! I wholeheartedly agree with the conclusion after 10 years of fighting with maven for performance gains (that I always measured in minutes not seconds).
Slow feedback cycle is the root of all evil.
* compile only
* compile/test only
* compile/install jar only, skip source/javadoc packages
* checkstyle only
* static analysis only
* code coverage only
* Skip PGP (you DO check your artifact signatures, right?)
The beauty of this is you can create a corporate super pom that defines all of these for you and they can be inherited by every project in your org.
Finally, if you have a large multi-module project, run with -T2C to parallel-ize module builds. If your tests are written in a sane/safe manner: -DuseUnlimitedThreads=true -Dparallel=classes -DforkCount=2C will also give you a major boost.
We have a giant code base (500,000 SLOC) with 28 maven modules, and a full compile and install takes less 40 seconds with compile/install only. You often don't need to do a full build, but even with tests thats about 3 mins on a M3 Max.
I'm dating myself, but 20 years ago, we had Ant. It was very much a blunt hammer like CMake. Maven came along and was like "how about we choose a definitive way to organize projects, make that repeatable". You lose flexibility but gain speed/reuse.
I see Gradle more of a replacement for Ant than Maven in this regard. Infinite flexibility, at the cost of speed/reuse.
Wait, I was under the impression that maven dependency plugin has a command for exactly that (dependency:build-classpath or something like that)?
I'm kind of intrigued by Mill but I've fallen into the same trap I've observed in others. I'm over indexed in mental capacity in having wasted learning Bazel and it's equivalent systems.
The lift to another system has to be enough to surpass that loss.
* java (on a cold jvm): 18.000-32.000 line per second on a single core
* java (on a hot jvm): 102.000-115.000 line per second on a single core
* golang: 28.000 line per second on 12 core
> From this study we can see the paradox: the Java compiler is blazing fast, while Java build tools are dreadfully slow. Something that should compile in a fraction of a second using a warm javac takes several seconds (15-16x longer) to compile using Maven or Gradle.
edit: typos
* Gradle compiling 100,000 lines of Java at ~5,600 lines/s
* Maven compiling 500,000 lines of Java at ~5,100 lines/s
* Mill compiling at ~25,000 lines/s on both the above whole-project benchmarks
Again not to put Java down but have a proper discussion about compiler speeds. I'm not interested in "your" tool is faster than "my" tool, I want to understand compilation speeds of different programming languages and what impacts them. Java and Go have similiar execution speeds + similar simple type systems, no implicits etc, so they should be similar.Of course beside the obvious, comparing compilation speeds on two totally different CPUs and machines. Do we compare compilers or machines?
I was just saying "How fast does Golang compile" because I'm interested in compilation speeds and CPU usage cross compilers (Rust, despite it's "slowness" seems to have the best CPU utilization of the compilers I've checked over the years).
I've been using Java from 1996 on for two decades.
A sidenote: The article is hard to read, it's not clear how much IO there is. It seems to use LOC as "all lines" including empty lines and comments, not "of code" (most tools today mean NCLOC when they say LOC). Also not sure why they chose Netty for the test with 500k lines and then only used a sub project with 50k lines.
"Compiling 116,000 lines of Java per second is very fast. That means we should expect a million-line Java codebase to compile in about 9 seconds, on a single thread."
The get the score from compiling 50k of lines without IO it seems and then extrapolate to 1m of lines - does that also fit into memory? No GC? And no IO? At least one would need to check if a file has been changed on disk? Or would the OS do that with a cache check?
"compile your code, build tools add a frankly absurd amount of overhead ranging from ~4x for Mill to 15-16x for Maven and Gradle!"
IO? Check for changed files?
My anecdotal evidence suggests it is not as slow as many think. It compiles a 500k LoC (that's count of all code in dependencies) project in ~10s on my M2 Pro.
But it's getting better.
Tokenization [1] is the process of splitting text into components called tokens. It's usually the first stage of parsing.
[1]: https://en.wikipedia.org/wiki/Lexical_analysis#Tokenization
https://en.m.wikipedia.org/wiki/Java_processor
The JVM is just to run the code on other systems, and is the most common way of running compiled Java code.