I am filing provisional patents on some core technical approaches so I can share more openly with early design partners and investors.
Curious from folks who have raised Pre-Seed/Seed or worked with early-stage companies: - Do provisionals meaningfully help in fundraising or partnerships? - Or were they mostly noise until later rounds / real traction?
I am trying to calibrate how much time/energy to put into IP vs just shipping + user traction at this stage.
Would love to hear real world experiences.
Was curious about the noise on file-based RAG as opposed to vector-RAG. So benchmarked Tantivy vs. Chroma to quantify the trade-offs in modern RAG pipelines. I used 5 datasets: CodeXGlue, MS MARCO, SQuAD, HotpotQA, and SciQ.
- Indexing/Embedding was 76x slower for Vectors ($O(s)$ vs $O(ms)$). Query latency was 11x slower
- In SciQ, keyword search outperformed vectors by 32% (MRR). Terms like "Mitochondria" are specific keys, not semantics. Vectors tended to drift toward semantically similar but factually incorrect answers.
- In HotpotQA, I noticed a trend where vectors find the "answer" document but miss the "bridge" document because it isn't semantically similar to the prompt. Finding the right document is not the same as having enough context to prove the answer.
The Data (MRR):
| Dataset | Domain | Keyword | Vector | Winner |
| :--- | :--- | :--- | :--- | :--- |
| CodeXGlue | Code | 0.29 | 0.91 | Vector (+213%) |
| SciQ | Science | 0.81 | 0.61 | Keyword (+32%) |
| HotpotQA | Reasoning | 0.55 | 0.50 | Keyword (+10%) |
Curious to learn if others have similar observations or views.