gdad on Hacker News

Ask HN: Do provisional patents matter for early-stage startups?

I am a solo founder building in AI B2B infra.

I am filing provisional patents on some core technical approaches so I can share more openly with early design partners and investors.

Curious from folks who have raised Pre-Seed/Seed or worked with early-stage companies: - Do provisionals meaningfully help in fundraising or partnerships? - Or were they mostly noise until later rounds / real traction?

I am trying to calibrate how much time/energy to put into IP vs just shipping + user traction at this stage.

Would love to hear real world experiences.

Ran a 5k queries on 50k documents to understand the file vs. vector RAG debate

title: Ran a 5k queries on 50k documents to understand the file vs vector rag debate

Was curious about the noise on file-based RAG as opposed to vector-RAG. So benchmarked Tantivy vs. Chroma to quantify the trade-offs in modern RAG pipelines. I used 5 datasets: CodeXGlue, MS MARCO, SQuAD, HotpotQA, and SciQ.

- Indexing/Embedding was 76x slower for Vectors ($O(s)$ vs $O(ms)$). Query latency was 11x slower

- In SciQ, keyword search outperformed vectors by 32% (MRR). Terms like "Mitochondria" are specific keys, not semantics. Vectors tended to drift toward semantically similar but factually incorrect answers.

- In HotpotQA, I noticed a trend where vectors find the "answer" document but miss the "bridge" document because it isn't semantically similar to the prompt. Finding the right document is not the same as having enough context to prove the answer.

The Data (MRR):

| :--- | :--- | :--- | :--- | :--- |

Curious to learn if others have similar observations or views.

2gdad2mo ago0

Ask HN: Do provisional patents matter for early-stage startups?

I am a solo founder building in AI B2B infra.

I am filing provisional patents on some core technical approaches so I can share more openly with early design partners and investors.

I am trying to calibrate how much time/energy to put into IP vs just shipping + user traction at this stage.

Would love to hear real world experiences.

Ran a 5k queries on 50k documents to understand the file vs. vector RAG debate

title: Ran a 5k queries on 50k documents to understand the file vs vector rag debate

- Indexing/Embedding was 76x slower for Vectors ($O(s)$ vs $O(ms)$). Query latency was 11x slower

The Data (MRR):

| :--- | :--- | :--- | :--- | :--- |

Curious to learn if others have similar observations or views.

2gdad2mo ago0

gdad

Recent submissions

Comparing how 3 AI assistants implement memory (opens in new tab)

Ask HN: Do provisional patents matter for early-stage startups?

Show HN: Memory plugin for OpenClaw; cross-platform context sync with major LLMs (opens in new tab)

Ran a 5k queries on 50k documents to understand the file vs. vector RAG debate

Recent submissions

Comparing how 3 AI assistants implement memory (opens in new tab)

Ask HN: Do provisional patents matter for early-stage startups?

Show HN: Memory plugin for OpenClaw; cross-platform context sync with major LLMs (opens in new tab)

Ran a 5k queries on 50k documents to understand the file vs. vector RAG debate