1. Lucene is trying to get Approximate Nearest Neighbours (ANN) search working for semantic search purposes: https://issues.apache.org/jira/browse/LUCENE-9004 https://github.com/apache/lucene/issues/10047
2. The Panama Vector API allows CPU's that support it to accelerate vector operations: https://openjdk.org/jeps/438
So this allows fast ANN on Lucene for semantic search!
How did people do this before Lucene supported it? Only through entirely different tools?
SIMD is supported by Java out of the box but the optimizer might miss some opportunities. With this API it is far more likely that SIMD will be used if it's available and on first compilation so performance should be improved.
By performing query expansion based on features of documents within the search results. Very efficient and effective if you have indexed the right features.
But it comes with continued challenges if I understand:
- Panama is an incubating API and Java has taken its time having an official way of using SIMD. It could all change in Java 22
- It only works on Java 20, with a very specific set of flags passed to the JVM. It’ll take time for this change to make it into Elasticsearch and Solr
- Panama itself is a weird and very low level API.
- Lucene organizes the HNSW vector index graph alongside its inverted index segments. And these need to be merged/compacted periodically. Merging HNSW graphs, as I understand it, is computationally difficult as the graph gets rebuilt.
Using Solr with Java 8 is still quite common.
Maybe just what I've seen but Solr usually has users sticking with older versions of Java.
Will this allow lucene to drop hnswlib and get similar performance using native java ?
Java has taken its sweet time exposing these optimizations in a consistent way. They're available by turning on a flag. But the API for using these optimizations is fairly brittle and could change in the next Java version.
But long story short, they implemented something to turn on the flag, and improve their own HNSW performance. Fingers crossed Java gets its act together.