Show HN: Tilth v0.5.0 –> ~40% cheaper AI code navigation (160 runs, 3 models)

4 pointsjahala2mo ago2 comments

Smart code reading for humans and AI agents. Tilth is what happens when you give ripgrep, tree-sitter, and cat a shared brain.

—

v0.5.0 was about figuring out why models weren’t using tilth tools consistently — even when they were available.

Results vs baseline (built-in tools only):

Sonnet 4.6: -44% $/correct (84% → 94% accuracy, 31% fewer turns)

Opus 4.6: -39% $/correct (91% → 92% accuracy, 37% fewer turns)

Haiku 4.5: -38% $/correct (54% → 73% accuracy, 7% fewer turns)

—

https://github.com/jahala/tilth/

Full results: https://github.com/jahala/tilth/blob/main/benchmark/README.m...

— PS: I don't have the budget to run the benchmark a lot (especially with Opus), so if any token whales has capacity to run some benchmarks, please feel free to PR results.

2 comments

joknoll2mo ago

I love the idea of not only trying to improve models by giving them more "cognitive" power, but also by improving the harness, where improvements seem to be very low hanging fruits compared to advancing frontier models. This could make older/smaller models also viable for coding agents.

jahalaOP2mo ago

Hey @joknoll - in the benchmarks, I'm seeing very positive results with Haiku, getting quicker and more correct answers. So I think you're absolutely right that harness improvements will be a natural part of "sharpening" most models - especially the smaller ones with less reasoning capability.

j / k navigate · click thread line to collapse

2 comments

joknoll2mo ago

jahalaOP2mo ago

j / k navigate · click thread line to collapse