One thing worth clarifying: there's no model in the processing pipeline. The ranking is fully deterministic — same input always produces the same output. This means it's fast enough for synchronous calls, runs well on commodity CPUs without GPUs, and can handle high throughput without the latency or cost overhead of an inference step.
A few practical tips:
1. Pass the user's query directly. In the benchmark, the hint is literally the question. That's the simplest and most effective approach for RAG.
2. Keep it concise (a sentence or two). Natural language works fine.
3. Skip it for summarization. When there's no specific query, omitting the hint lets the optimizer select for overall document coverage, which is probably what you want.
4. Biggest impact at lower budgets. The hint shines most when the optimizer has to be selective, e.g., at 50% budget on Qasper, hint adds nearly 6 F1 points (41.27 vs 35.35).