I’m building Sift, a drop-in gateway that makes LLM tool use far more reliable when tools return large JSON payloads. The usual pattern is that agents paste raw tool outputs directly into the prompt, which quickly blows up context, causes truncation/compaction, and leads to incorrect answers once earlier results disappear. Sift sits between the model and its tools (MCP, APIs, CLIs), stores the full payload locally as an artifact (indexed in SQLite), and returns only a compact schema plus an artifact_id. When the model needs something from the data, it runs a tiny Python query against the stored artifact instead of reasoning over thousands of tokens of JSON. In benchmarks across 103 questions on real datasets, this approach cut input tokens by ~95% and improved answer accuracy from ~33% to ~99%. Repo:
https://github.com/lourencomaciel/sift-gateway.