Of course you can implement any shared/consistent memory transaction on top of shards -- after all, the CPU implements shared memory on top of message-passing in a shared-nothing architecture, too. It's just that then you end up implementing that abstraction yourself. If you need it,
someone has to implement it, and if you need it at the machine level, it's better to rely on its hardware implementation than re-implement the same thing in software. Naive implementations end up creating much more contention (i.e. slow communications) than a sophisticated use of hardware concurrency (i.e. communication) instructions.
My point is that if you're providing a shared-memory abstraction to your user (like arbitrary transactions) -- even at a very high level -- then your architecture isn't "shared-nothing", period. Somewhere in your stack there's an implementation of a shared-memory abstraction. And if you decide to call anything that doesn't use shared memory at CPU/OS level "shared nothing", then that's an arbitrary and rather senseless distinction, because even at the CPU/OS level, shared memory is implemented on top of message-passing. So the cost of a shared abstraction is incurred when it's provided to the user, and is completely independent of how it's implemented. The only way to avoid it is to restrict the programming model and not provide the abstraction. If doing that is fine for the user -- great, but there's no way to have this abstraction without paying for it.
And JNI is better now[1] (I've used JNR to implement FUSE filesystems in pure Java). JNR will serve as the basis for "JNI 2.0" -- Project Panama[2]. And thanks!
[1]: https://github.com/jnr/jnr-ffi
[2]: http://openjdk.java.net/projects/panama/