Interfacing with native methods on Graal VM (opens in new tab)

(cornerwings.github.io)

99 pointswippler7y ago13 comments

13 comments

Awesome. I wonder how well this works on a stock JDK10 using graal.

Whenever I see a speed boost to do what is conceptually the same thing I'm always curious where the fat was cut. What did we give up? You can dump the resulting assembly with -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly and diff might be revealing.

My hunch is that the line from the tutorial: `@CFunction(transition = Transition.NO_TRANSITION)` makes all the difference. Explanation of NO_TRANSITION from [0]:

No prologue and epilogue is emitted. The C code must not block and must not call back to Java. Also, long running C code delays safepoints (and therefore garbage collection) of other threads until the call returns.

Which is probably great for BLAS-like calls. This lines up with my understanding from Cliff Click's great talk "Why is JNI Slow?"[1] basically saying that to be faster you need make assumptions about what the native code could and couldn't do and that generally developers would shoot themselves in the foot.

[0]: https://github.com/oracle/graal/blob/master/sdk/src/org.graa... [1]: https://www.youtube.com/watch?v=LoyBTqkSkZk

Twirrim7y ago

A team I was on in the past had a well known bottleneck for performance on the most performance critical component. It was one that couldn't possibly be avoided or minimised. It was one called with high frequency, and wall clock wise, didn't take too long.

"JNI is slow", being the conventional wisdom, and knowing just how frequent the calls would be, people had ignored it as an option.

Randomly one of the devs who was most bothered by the bottleneck, had an hour spare and threw the conventional wisdom out the window and dropped in JNI calls to an standard (highly optimised) library and re-benchmarked. 40% performance boost. Further experiments found that "JNI is slow" isn't as true as conventional wisdom quite had it.

pjmlp7y ago

ART on Android also has some annotations (@FastNative) for that, but they are forbidden outside system code given that they are quite unsafe.

https://android.googlesource.com/platform/libcore/+/master/d...

EDIT: I forgot to mention @CriticalNative as well

https://android.googlesource.com/platform/libcore/+/master/d...

chrisseaton7y ago

Yes I think `Transition.NO_TRANSITION` uses the new FFI, GNFI, or Graal Native Function Interface, described here [0].

[0]: https://dl.acm.org/citation.cfm?id=2500832

Reason0777y ago

Back in the day, GCC's Java native compiler "GCJ", had an alternative native method interface called CNI.

GCC recognized #extern "Java" in headers generated from class files. You could then call (gcj-compiled) Java classes from C++ as if they were native C++ classes, as well as implement Java "native" methods in natural C++.

The whole thing performed a lot better than JNI since it was, more or less, just using the standard platform calling conventions. Calling a native CNI method from Java had the same overhead as any regular Java virtual method call.

Ultimately, GCJ faded away because there wasn't a great deal of interest in native Java compilation back then, and too many compatibility challenges in the pre-OpenJDK days. But it's interesting to see many of it's ideas coming back now in the form of Graal/GraalVM.

pjmlp7y ago

There was interest in native Java compilation, not in doing the work for free.

Most third party commercial Java SDKs do have support for native compilation, specially on the embedded space.

Around 2009 GCJ suffered an exodus of developers to OpenJDK.

repolfx7y ago

There's an effort to bring a more modern FFI to Java that works similar to the one described in the article, called project Panama. It has tools to convert C header files into the equivalent annotated Java definitions and is intended to help improve performance as well.

You can follow along here:

http://mail.openjdk.java.net/pipermail/panama-dev/

The same project is also adding support for writing vector code in Java (SSE, AVX etc).

agibsonccc7y ago

Disclaimer: I'm affiliated with a semi competing project to panama called javacpp: https://github.com/bytedeco/javacpp

I can say for a fact that panama is not seriously targeting this space. We implement a ton of that native code today that works with c++ and actual android today. We also handle gpus. Project panama is only targeting c, and even then will only do it a cross platform non committal fashion. They aren't doing it the way they should be in order to properly target native vectorized code.

We know this from experience, because this is all we do: https://github.com/deeplearning4j/deeplearning4j https://github.com/bytedeco/javacpp-presets

We tried seeing if we could get some of this work in to the JDK, but their goals fundamentally compete with what it takes to get vector math to be fast. It's also not nearly as ambitious as it needs to be to handle real world tensor workloads.

bitmapbrother7y ago

>Project panama is only targeting c, and even then will only do it a cross platform non committal fashion

John Rose of Oracle:

Panama is not just about C headers. It is about building a framework in which any data+function schema of APIs can be efficiently plugged into the JVM. So it's not just C or C++ but protocol specs and persistent memory structures and on-disk formats and stuff not invented yet. We've been relentless about designing the framework down to essential functionality (memory access and procedure calls), not just our (second-)favorite language or compiler.

The important deliverable of Panama is therefore not Posix bindings, but rather a language-neutral memory layout-and-access mechanism, plus a language-neutral (initially ABI-compliant) subroutine invocation mechanism. The jextract tool grovels over ANSI C (soon C++) schemas and translates to the layouts and function calls, bound helpfully to Java APIs with unsurprising names. But the jextract tool is just the first plugin of many.

We do look forward to building more plugins for more metadata formats outside the Java ecosystem, such as what you are building.

In fact, I expect that, in the long run, we will not build all of the plugins, but that people who invent new data schemas (or even data+function schemas or languages) will consider using our tools (layouts, binder, metadata annotations) to integrate with Java, instead of the standard technique, which is to write a set of Java native functions from scratch, or (if you are very clever) with tooling. The binder pattern, in particular, seems to be a great way to spin repetitive code for accessing data structures of all sorts, not just C or Java. I hope it will be used, eventually, in preference to static protocol compilers. The JVM is very good at on-line optimization, even of freshly spun code, so it is a natural framework for building a binder.

>They aren't doing it the way they should be in order to properly target native vectorized code.

Which is interesting since Intel is the one contributing the majority of the vector code changes.

1 more reply

needusername7y ago

I don't know, it looks as if you have to hardcode pointer sizes in the source code.

https://twitter.com/sundararajan_a/status/101507363642677248...

chrisseaton7y ago

Where are you seeing that? The pointer in that example doesn't have a hardcoded size.

1 more reply

j / k navigate · click thread line to collapse

13 comments

kjeetgill7y ago

Awesome. I wonder how well this works on a stock JDK10 using graal.

My hunch is that the line from the tutorial: `@CFunction(transition = Transition.NO_TRANSITION)` makes all the difference. Explanation of NO_TRANSITION from [0]:

[0]: https://github.com/oracle/graal/blob/master/sdk/src/org.graa... [1]: https://www.youtube.com/watch?v=LoyBTqkSkZk

Twirrim7y ago

"JNI is slow", being the conventional wisdom, and knowing just how frequent the calls would be, people had ignored it as an option.

pjmlp7y ago

ART on Android also has some annotations (@FastNative) for that, but they are forbidden outside system code given that they are quite unsafe.

https://android.googlesource.com/platform/libcore/+/master/d...

EDIT: I forgot to mention @CriticalNative as well

https://android.googlesource.com/platform/libcore/+/master/d...

chrisseaton7y ago

Yes I think `Transition.NO_TRANSITION` uses the new FFI, GNFI, or Graal Native Function Interface, described here [0].

[0]: https://dl.acm.org/citation.cfm?id=2500832

Reason0777y ago

Back in the day, GCC's Java native compiler "GCJ", had an alternative native method interface called CNI.

pjmlp7y ago

There was interest in native Java compilation, not in doing the work for free.

Most third party commercial Java SDKs do have support for native compilation, specially on the embedded space.

Around 2009 GCJ suffered an exodus of developers to OpenJDK.

repolfx7y ago

You can follow along here:

http://mail.openjdk.java.net/pipermail/panama-dev/

The same project is also adding support for writing vector code in Java (SSE, AVX etc).

agibsonccc7y ago

Disclaimer: I'm affiliated with a semi competing project to panama called javacpp: https://github.com/bytedeco/javacpp

We know this from experience, because this is all we do: https://github.com/deeplearning4j/deeplearning4j https://github.com/bytedeco/javacpp-presets

bitmapbrother7y ago

>Project panama is only targeting c, and even then will only do it a cross platform non committal fashion

John Rose of Oracle:

We do look forward to building more plugins for more metadata formats outside the Java ecosystem, such as what you are building.

>They aren't doing it the way they should be in order to properly target native vectorized code.

Which is interesting since Intel is the one contributing the majority of the vector code changes.

1 more reply

needusername7y ago

I don't know, it looks as if you have to hardcode pointer sizes in the source code.

https://twitter.com/sundararajan_a/status/101507363642677248...

chrisseaton7y ago

Where are you seeing that? The pointer in that example doesn't have a hardcoded size.

1 more reply

j / k navigate · click thread line to collapse