We built Retake to fix two issues: keeping vectors in sync with Postgres in real time is difficult, and most vector databases aren’t built for hybrid search.
A quick refresher: “keyword search” refers to a technique where results are scored based on the appearance of exact words or terms. “Semantic search” uses vector embeddings to understand the meaning behind those words. Hybrid search combines these two approaches to enhance the precision and relevance of results.
To implement semantic or hybrid search today, most organizations run batch jobs that update their search engine or vector database using ETL tools or custom data pipelines. We’ve seen from firsthand experience how time-consuming and costly this can be, as moving vectors often requires re-embedding the entire data source.
We’ve also seen how many vector databases lack crucial features of “traditional” search: keyword-based (BM25) search, faceting/aggregations, highlighting, efficient filtering, etc.
Here’s how Retake works - our core is built on top of OpenSearch, which acts as a search engine and vector database. We leverage logical-replication-based Change Data Capture (CDC) to stay in sync with Postgres, so documents and vectors are updated incrementally and in real time. Finally, Python and Typescript SDKs make it easy to integrate Retake into your application. There’s no need to manage separate vector databases and search engines, upload and embed documents, or run expensive reindexing jobs. All you need to think about is writing search queries.
The easiest way to get started with Retake is by running our Docker Compose stack:
git clone https://github.com/getretake/retake.git
cd retake/docker && docker compose up
Retake is Apache licensed and our repo is here: https://github.com/getretake/retake. For next steps, see our quick start guide: https://docs.getretake.com/quickstartWe’d love your feedback on our solution to hybrid search. Our focus right now is on nailing the basics, but we’d also love to hear what you think we should focus on next.