The model doing your reasoning is the wrong tool for shuffling your data. That one observation is the thesis of our research paper, Task-Specific Model Routing, and it's the reason Lineman exists.
A modern coding agent spends most of its context window on data, not thinking. A 1,200-line file you needed three functions from. A 400-line test log where one assertion failed. A web page wrapped in kilobytes of navigation markup. Your frontier model reads all of it at frontier-model prices, and every one of those tokens crowds out the reasoning that actually solves your problem. You end up paying your most expensive model to do clerical work.
One model, two very different jobs
The work an agent does splits cleanly into two kinds. Reasoning is deciding what to change, why, and how, and it genuinely needs a frontier model. Data handling is reading a file, triaging a build, scanning search results, fetching a page. That work is high-volume and low-judgment, and it's a poor fit for your most capable model's strengths or its price tag.
Treating those two jobs as one workload is where the waste creeps in. The paper argues they should be routed to different places.
Keep the expensive model thinking. Send everything else to the sidekick.
What routing looks like in practice
Lineman puts a fast secondary model in front of the data-heavy tool calls. When your agent reads a large file or triages output, that secondary model compresses the result down to what matters before it ever reaches your primary model. Your reasoning model receives a focused, relevant summary instead of raw bulk, and it never sees the tokens you didn't need.
The effect is twofold:
- Cost. The high-volume work happens on cheaper infrastructure, so you stop paying frontier rates to read boilerplate.
- Focus. A leaner context window means the reasoning model spends its attention on the problem rather than on sifting noise.
The measured result
We validated the approach across 180 tasks in six benchmark suites, including real-world GitHub issues from SWE-bench Lite:
| Result | Figure |
|---|---|
| Average token reduction across 180 tasks | 53% |
| SWE-bench Lite token reduction | ~48% |
| Quality retention on external benchmarks | 98.3% |
| Internal data-heavy tasks | up to 75% |
The important part is that the savings don't cost accuracy. Cutting the data tax is a different thing from cutting intelligence, and the routing argument only holds if quality survives the compression. It does.
The full method, the per-suite numbers, and the case for why this generalises beyond coding agents are in the whitepaper. If you'd rather see it on your own work, the savings show up on the first large file you read, and the benchmarks page has the receipts.