Zvec: A lightweight, fast, in-process vector database (opens in new tab)

(github.com)

226 pointsdvrp1mo ago45 comments

45 comments

Their self-reported benchmarks have them out-performing pinecone by 7x in queries-per-second: https://zvec.org/en/docs/benchmarks/

I'd love to see those results independently verified, and I'd also love a good explanation of how they're getting such great performance.

ashvardanian1mo ago

8K QPS is probably quite trivial on their setup and a 10M dataset. I rarely use comparably small instances & datasets in my benchmarks, but on 100M-1B datasets on a larger dual-socket server, 100K QPS was easily achievable in 2023: https://www.unum.cloud/blog/2023-11-07-scaling-vector-search... ;)

Typically, the recipe is to keep the hot parts of the data structure in SRAM in CPU caches and a lot of SIMD. At the time of those measurements, USearch used ~100 custom kernels for different data types, similarity metrics, and hardware platforms. The upcoming release of the underlying SimSIMD micro-kernels project will push this number beyond 1000. So we should be able to squeeze a lot more performance later this year.

luoxiaojian1mo ago

Author here. Appreciate the context—just wanted to add some perspective on the 8K QPS figure: in the VectorDBBench setting we used (10M, 768d, on comparable hardware to the previous leader), we're seeing double their throughput—so it's far from trivial on that playing field.

That said, self-reported numbers only go so far—it'd be great to see USearch in more third-party benchmarks like VectorDBBench or ANN-Benchmarks. Those would make for a much more interesting comparison!

On the technical side, USearch has some impressive work, and you're right that SIMD and cache optimization are well-established techniques (definitely part of our toolbox too). Curious about your setup though—vector search has a pretty uniform compute pattern, so while 100+ custom kernels are great for adapting to different hardware (something we're also pursuing), I suspect most of the gain usually comes from a core set of techniques, especially when you're optimizing for peak QPS on a given machine and index type. Looking forward to seeing what your upcoming release brings!

luoxiaojian1mo ago

Author here. Thanks for the interest! On the performance side: we've applied optimizations like prefetching, SIMD, and a novel batch distance computation (similar to a GEMV operation) that alone gives ~20% speedup. We're working on a detailed blog post after the Lunar New Year that dives into all the techniques—stay tuned!

And we always welcome independent verification—if you have any questions or want to discuss the results, feel free to reach out via GitHub Issues or our Discord.

antirez1mo ago

It is absolutely possible and even not so hard. If you use Redis Vector Sets you will easily see 20k - 50k (depending on hardware) queries per second, with tens of millions of entries, but the results don't get much worse if you scale more. Of course all that serving data from memory like Vector Sets do. Note: not talking about RedisSearch vector store, but the new "vector set" data type I introduced a few months ago. The HNSW implementation of vector sets (AGPL) is quite self contained and easy to read if you want to check how to achieve similar results.

luoxiaojian1mo ago

Author here. Thanks for sharing—always great to see different approaches in the space. A quick note on QPS: throughput numbers alone can be misleading without context on recall, dataset size, distribution, hardware, distance metric, and other relevant factors. For example, if we relaxed recall constraints, our QPS would also jump significantly. In the VectorDBBench results we shared, we made sure to maintain (or exceed) the recall of the previous leader while running on comparable hardware—which is why doubling their throughput at 8K QPS is meaningful in that specific setting.

You're absolutely right that a basic HNSW implementation is relatively straightforward. But achieving this level of performance required going beyond the usual techniques.

1 more reply

itake1mo ago

Pinecone scales horizontally (which creates overhead, but accomodates more data).

A better comparison would be with Meta's FAISS

luoxiaojian1mo ago

Author here. Good point—OpenSearch (which is based on FAISS) is actually included in the VectorDBBench results, so you can see how it compares there. That said, horizontal scaling for vector search is relatively straightforward; the real challenge is optimizing performance within a given (single-node) hardware budget, which is where we've focused our efforts.

panzi1mo ago

PGVectorScale claims even more. Also want to see someone verify that.

rvz1mo ago

Exactly. We should always be asking these sort of questions and take these self-reported benchmarks with a grain of salt until independent sources can verify such claims rather than trusting results from the creators themselves. Otherwise it falls into biased territory.

This sort of behaviour is now absolutely rampant in the AI industry.

clemlesne1mo ago

Did someone compared with uSearch (https://github.com/unum-cloud/USearch)?

neilellis1mo ago

That I would like to see too, usearch is amazingly fast, 44m embeddings in < 100ms

arlogilbert1mo ago

Just put Zvec vs LanceDB vs Qdrant through the paces on a 3 collection (text only) 10k per collection dataset.

Average latency across ~500 queries per collection per database:

Qdrant: 21.1ms LanceDB: 5.9ms Zvec: 0.8ms

Both Qdrant and LanceDB are running with Inverse Document Frequency enabled so that is a slight performance hit, Zvec running with HNSW.

Overlap of answers between the 3 is virtually identical with same default ranking.

So yes, Zvec is incredible, but the gotcha is that the reason zvec is fast is because it is primarily constrained by local disk performance and the data must be local disk, meaning you may have a central repository storing the data, but every instance running zvec needs to have a local (high perf) disk attached. I mounted blobfuse2 object storage to test and zvec numbers went to over 100ms, so disk is almost all that matters.

My take? Right now the way zvec behaves, it will be amazing for on-device vector lookups, not as helpful for cloud vectors.

luoxiaojian29d ago

Author here. Thanks for putting Zvec through its paces and sharing such detailed results—really appreciate the hands-on testing!

Just a bit of context on the storage behavior: Zvec currently uses memory-mapped files (mmap) by default, so once the relevant data is warmed up in the page cache, performance should be nearly identical regardless of whether the underlying storage is local disk or object storage—it's essentially in-memory at that point. The 100ms latency you observed with blobfuse2 likely reflects cold reads (data not yet cached), which can be slower than local disk in practice. Our published benchmarks are all conducted with sufficient RAM and full warmup, so the storage layer's latency isn't a factor in those numbers.

luoxiaojian29d ago

If you're interested in query performance on object storage, we're working on a buffer pool–based I/O mode that will leverage io_uring and object storage SDKs to improve cold-read performance. The trade-off is that in fully warmed‑up, memory‑rich scenarios, this new mode may be slightly slower than mmap, but it should offer more predictable latency when working with remote storage. Stay tuned—this is still under development!

luoxiaojian1mo ago

Author here. Thanks everyone for the interest and thoughtful questions! I've noticed many of you are curious about how we achieved the performance numbers and how we compare to other solutions. We're currently working on a detailed blog post that walks through our optimization journey—expect it after the Lunar New Year. We'll also be adding more benchmark comparisons to the repo and blog soon. Stay tuned!

jtwebman1mo ago

Can you add Postgres with PVector?

luoxiaojian1mo ago

Will do—pgvector is on our list for the blog post.

aktuel1mo ago

I recently discovered https://www.cozodb.org/ which also vector search built-in. I just started some experiments with it but so far I'm quite impressed. It's not in active development atm but it seems already well rounded for what it is so depending on the use-case it does not matter or may even be an advantage. Also with today's coding agent it shouldn't be too hard to scratch your own itch if needed.

cmrdporcupine1mo ago

cozodb is quite impressive and I've wondered about the funding sources etc on it, if any. I've watched it for some years and the developer seems to have made a real passion project out of it but you're right it seems development has tapered off.

OfficialTurkey1mo ago

I haven't been following the vector db space closely for a couple years now, but I find it strange that they didn't compare their performance to the newest generation serverless vector dbs: Pinecone Serverless, turbopuffer, Chroma (distributed, not the original single-node implementation). I understand that those are (mostly) hosted products so there's not a true apples-to-apples comparison with the same hardware, but surely the most interesting numbers are cost vs performance.

asura57581mo ago

Hi, since this is in process, I wonder what is the memory requirement to run this for e.g. serving 1TB vector data (I have this in pgvector for now) data for example?

miga1mo ago

Why no benchmarks against pg_vector, DuckDB with extension?

For benchmarks you may just prepare Phoronix Test Suite module (https://www.phoronix-test-suite.com/) to facilitate replication on a variety of machines.

dmezzetti1mo ago

Very interesting!

It would be great to see how it compares to Faiss / HNSWLib etc. I'd will consider integrating it into txtai as an ANN backend.

_pdp_1mo ago

I thought you need memory for these things and CPU is not the bottleneck?

binarymax1mo ago

I haven’t looked at this repo, but new techniques taking advantage of nvme and io_uring make on disk performance really good without needing to keep everything in RAM.

cjonas1mo ago

How does this compare to duckdbs vector capabilities (vss extension)?

jgalt2121mo ago

Yes, nothing on that or sqlite-vec (both of which seem to be apples to apples comparisons).

https://zvec.org/en/docs/benchmarks/

mceachen1mo ago

I maintain a fork of sqlite-vec (because there hasn't been activity on the main repo for more than a year): sqlite-vec is great for smaller dimensionality or smaller cardinality datasets, but know that it's brute-force, and query latency scales exactly linearly. You only avoid full table scans if you add filterable columns to your vec0 table and include them in your WHERE clause. There's no probabilistic lookup algorithm in sqlite-vec.

1 more reply

luoxiaojian1mo ago

You're right that we didn't include sqlite-vec in our initial benchmarks—apples-to-apples comparisons are always better. I've actually added basic zvec tests to my fork of sqlite-vec (https://github.com/luoxiaojian/sqlite-vec), so feel free to give it a try. We'll also be publishing a more complete performance comparison in an upcoming blog post—stay tuned!

skybrian1mo ago

Are these sort of similarity searches useful for classifying text?

CuriouslyC1mo ago

Embeddings are good at partitioning document stores at a coarse grained level, and they can be very useful for documents where there's a lot of keyword overlap and the semantic differentiation is distributed. They're definitely not a good primary recall mechanism, and they often don't even fully pull weight for their cost in hybrid setups, so it's worth doing evals for your specific use case.

visarga1mo ago

"12+38" won't embed close to "50", as you said they capture only surface level words ("a lot of keyword overlap") not meaning, it's why for small scale I prefer a folder of files and a coding agent using grep/head/tail/Python one liners.

OutOfHere1mo ago

It altogether depends on the quality and suitability of the provided embedding vector that you provide. Even with a long embedding vector using a recent model, my estimation is that the classification will be better than random but not too accurate. You would typically do better by asking a large model directly for a classification. The good thing is that it is often easy to create a small human labeled dataset and estimate the error confusion matrix via each approach.

stephantul1mo ago

Yes. This is known as a knn classifier. Knn classifiers are usually worse than other simple classifiers, but trivial to update and use.

See e.g., https://scikit-learn.org/stable/auto_examples/neighbors/plot...

neilellis1mo ago

Yes, also for semantic indexes, I use one for person/role/org matches. So that CEO == chief executive ~= managing director good when you have grey data and multiple look up data sources that use different terms.

esafak1mo ago

You could assign the cluster based on what the k nearest neighbors are, if there is a clear majority. The quality will depend on the suitability of your embeddings.

j / k navigate · click thread line to collapse

45 comments

simonw1mo ago

Their self-reported benchmarks have them out-performing pinecone by 7x in queries-per-second: https://zvec.org/en/docs/benchmarks/

I'd love to see those results independently verified, and I'd also love a good explanation of how they're getting such great performance.

ashvardanian1mo ago

luoxiaojian1mo ago

And we always welcome independent verification—if you have any questions or want to discuss the results, feel free to reach out via GitHub Issues or our Discord.

antirez1mo ago

luoxiaojian1mo ago

You're absolutely right that a basic HNSW implementation is relatively straightforward. But achieving this level of performance required going beyond the usual techniques.

1 more reply

itake1mo ago

Pinecone scales horizontally (which creates overhead, but accomodates more data).

A better comparison would be with Meta's FAISS

luoxiaojian1mo ago

panzi1mo ago

PGVectorScale claims even more. Also want to see someone verify that.

rvz1mo ago

This sort of behaviour is now absolutely rampant in the AI industry.

clemlesne1mo ago

Did someone compared with uSearch (https://github.com/unum-cloud/USearch)?

neilellis1mo ago

That I would like to see too, usearch is amazingly fast, 44m embeddings in < 100ms

arlogilbert1mo ago

Just put Zvec vs LanceDB vs Qdrant through the paces on a 3 collection (text only) 10k per collection dataset.

Average latency across ~500 queries per collection per database:

Qdrant: 21.1ms LanceDB: 5.9ms Zvec: 0.8ms

Both Qdrant and LanceDB are running with Inverse Document Frequency enabled so that is a slight performance hit, Zvec running with HNSW.

Overlap of answers between the 3 is virtually identical with same default ranking.

My take? Right now the way zvec behaves, it will be amazing for on-device vector lookups, not as helpful for cloud vectors.

luoxiaojian29d ago

Author here. Thanks for putting Zvec through its paces and sharing such detailed results—really appreciate the hands-on testing!

luoxiaojian29d ago

luoxiaojian1mo ago

jtwebman1mo ago

Can you add Postgres with PVector?

luoxiaojian1mo ago

Will do—pgvector is on our list for the blog post.

aktuel1mo ago

cmrdporcupine1mo ago

OfficialTurkey1mo ago

asura57581mo ago

Hi, since this is in process, I wonder what is the memory requirement to run this for e.g. serving 1TB vector data (I have this in pgvector for now) data for example?

miga1mo ago

Why no benchmarks against pg_vector, DuckDB with extension?

For benchmarks you may just prepare Phoronix Test Suite module (https://www.phoronix-test-suite.com/) to facilitate replication on a variety of machines.

dmezzetti1mo ago

Very interesting!

It would be great to see how it compares to Faiss / HNSWLib etc. I'd will consider integrating it into txtai as an ANN backend.

_pdp_1mo ago

I thought you need memory for these things and CPU is not the bottleneck?

binarymax1mo ago

I haven’t looked at this repo, but new techniques taking advantage of nvme and io_uring make on disk performance really good without needing to keep everything in RAM.

cjonas1mo ago

How does this compare to duckdbs vector capabilities (vss extension)?

jgalt2121mo ago

Yes, nothing on that or sqlite-vec (both of which seem to be apples to apples comparisons).

https://zvec.org/en/docs/benchmarks/

mceachen1mo ago

1 more reply

luoxiaojian1mo ago

skybrian1mo ago

Are these sort of similarity searches useful for classifying text?

CuriouslyC1mo ago

visarga1mo ago

OutOfHere1mo ago

stephantul1mo ago

Yes. This is known as a knn classifier. Knn classifiers are usually worse than other simple classifiers, but trivial to update and use.

See e.g., https://scikit-learn.org/stable/auto_examples/neighbors/plot...

neilellis1mo ago

esafak1mo ago

You could assign the cluster based on what the k nearest neighbors are, if there is a clear majority. The quality will depend on the suitability of your embeddings.

j / k navigate · click thread line to collapse