Indexed HN posts and comments into PostgreSQL with pgvector (HNSW index) Embeddings generated with OpenAI's embedding model Queries run as nearest-neighbor vector searches — typical response under 50ms The whole thing runs on a single Postgres instance, no separate vector DB
I built this partly because I wanted a better way to search HN, and partly to dogfood my own project — Rivestack (https://rivestack.io), a managed PostgreSQL service with pgvector baked in. I wanted to see how pgvector holds up with a real dataset at a reasonable scale. A few things I learned along the way:
HNSW vs IVFFlat matters a lot at this scale. HNSW gave me much better recall with acceptable index build times. Storing embeddings alongside relational data in the same DB simplifies things enormously — no syncing between a vector store and your main DB. pgvector has gotten surprisingly fast in recent versions. For most use cases, you really don't need a dedicated vector database.
The search is free to use. Rivestack has a free tier too if anyone wants to try something similar. Happy to answer questions about the architecture, pgvector tuning, or anything else.