Show HN: HighSNR – Cut length and noise from your LLM context (opens in new tab)

(high-snr.com)

6 pointsgskm10d ago5 comments

5 comments

gskmOP8d ago

Update on the benchmark numbers: the results in the original post were computed with a looser tokenizer, making the budget less strict than it should be. We've since improved that — the budget is now accurate end-to-end. Corrected numbers at 90% budget: HotpotQA F1 71.57 vs full-context baseline 69.71 — beating it by a wider margin than previously reported. Qasper 46.25 vs 47.22 (~98% of full-context quality). Updated results and scripts: https://github.com/HighSNRInc/highsnr-benchmarks

One thing worth clarifying: there's no model in the processing pipeline. The ranking is fully deterministic — same input always produces the same output. This means it's fast enough for synchronous calls, runs well on commodity CPUs without GPUs, and can handle high throughput without the latency or cost overhead of an inference step.

imitliagkas8d ago

Very useful! I really don't like paying for unnecessary tokens. And works very fast for me. Do you have some standard practices to recommend with respect to the optional hint in the call? It seems to help significantly.

gskmOP8d ago

Thanks! Glad it's working well for you.

  A few practical tips:                                                                                                                              
                                                                                                                                                     
  1. Pass the user's query directly. In the benchmark, the hint is literally the question. That's the simplest and most effective approach for RAG.
  2. Keep it concise (a sentence or two). Natural language works fine.                                                                               
  3. Skip it for summarization. When there's no specific query, omitting the hint lets the optimizer select for overall document coverage, which is probably what you want.                                                                                                                                     
  4. Biggest impact at lower budgets. The hint shines most when the optimizer has to be selective, e.g., at 50% budget on Qasper, hint adds nearly 6 F1 points (41.27 vs 35.35).

kostas779d ago

Really high potential service to reduce OPEX of AI services and increase the context window! And really straightforward to test it.

gskmOP9d ago

Thank you! That's exactly the goal, drop-in token savings without changing your LLM pipeline. If you give it a spin, I'd love to hear how it works on your data. We're actively tuning the ranking based on early feedback, so any input helps shape the product.

j / k navigate · click thread line to collapse

5 comments

gskmOP8d ago

imitliagkas8d ago

gskmOP8d ago

Thanks! Glad it's working well for you.

  A few practical tips:                                                                                                                              
                                                                                                                                                     
  1. Pass the user's query directly. In the benchmark, the hint is literally the question. That's the simplest and most effective approach for RAG.
  2. Keep it concise (a sentence or two). Natural language works fine.                                                                               
  3. Skip it for summarization. When there's no specific query, omitting the hint lets the optimizer select for overall document coverage, which is probably what you want.                                                                                                                                     
  4. Biggest impact at lower budgets. The hint shines most when the optimizer has to be selective, e.g., at 50% budget on Qasper, hint adds nearly 6 F1 points (41.27 vs 35.35).

kostas779d ago

Really high potential service to reduce OPEX of AI services and increase the context window! And really straightforward to test it.

gskmOP9d ago

j / k navigate · click thread line to collapse