Short answer: Most of a high Claude Code bill comes from tool output: the file reads, build and test logs, and search results the agent loads into context. It is not the model's reasoning. On Lineman's benchmarks that output is over half of a typical bill, and because every token in the context window is re-billed as input on each new turn, the same bulky output gets paid for over and over.
If you've just opened a Claude Code invoice and felt your stomach drop, you're not misreading it. Coding agents really are token-hungry, and the cost usually isn't where people think it is. This walks through what you're actually paying for, how to check it yourself, and how to get it down.
What a Claude Code bill is actually made of
Every message you send is billed on the tokens it processes, and those tokens come from four places:
- The system prompt and your
CLAUDE.md, sent on almost every turn. - Your own prompts, usually small.
- The model's reasoning and replies. Most people assume this is the bulk of the bill. It usually isn't.
- Tool output: the file reads, command output from tests and builds and installs,
grepand search results, and any web pages the agent fetches. This is almost always the biggest line.
Some scale for context. Anthropic's own published figure is roughly $13 per developer per active day and $150–250 per developer per month across enterprise deployments, with costs staying under $30 a day for 90% of users (Anthropic's Claude Code cost docs). Teams working in large repositories tend to land at the top of that range, and the reason is nearly always how much tool output is flowing into the context window.
On our own benchmarks, tool output accounts for over half of a typical bill. The methodology is on the benchmarks page.
Why tool output is the silent majority
Three things make tool output expensive. It's large: open a 1,200-line file to use three functions and you've loaded all 1,200 lines, and a noisy test run or a verbose install log can be thousands of tokens on its own. It's mostly beside the point, too. The model needs the failing assertion, the function signature, the one error line, not the megabyte wrapped around them. And it's sticky. Once raw output lands in the context window, it gets re-sent as input on every following turn until you clear or compact, which is what turns a one-off cost into a compounding one (see context compounding for the full mechanism).
That last property is why a long session feels like it gets pricier as it goes. It does.
How to see where your tokens go
Don't guess, measure:
- Run
/context. It breaks down what's filling your window right now. In most sessions the biggest blocks are large file reads, long command output, and search results, alongside the system prompt andCLAUDE.md. - Check your usage dashboard. The Claude/Anthropic console shows token spend over time, and community tools that parse session logs can attribute cost per session.
- Watch for the spikes. A single "read this whole file," an
npm testthat dumps thousands of lines, or a repo-widegrepare the usual culprits.
How to bring the bill down
There are two kinds of fix, and you want both.
The manual levers, covered in full in 11 ways to cut Claude Code token costs:
- Route easy work to a cheaper model (Sonnet is about a fifth of Opus per token).
- Run
/clearwhen you switch tasks, and/compacton long ones. - Keep
CLAUDE.mdlean so you're not re-sending rules every turn. - Read files in scoped ranges instead of whole.
The automated lever is to compress tool output before it ever reaches the model. The manual list tends to underplay this one, precisely because it's the tactic you'd otherwise have to remember on every single tool call. Lineman intercepts data-heavy tool calls and hands your primary model a compact, task-relevant summary in place of the raw dump, which cuts 40%+ of tokens on our benchmarks while holding output quality (see benchmarks). Since the bulk never enters context, you never pay for it, not on the first turn and not on any later one.
The fastest first move
Open a session, run /context, and look at the biggest line. Nine times out of ten it's tool output. Go after that first, by hand with /clear and scoped reads, or automatically so you don't have to think about it, and the bill drops quickly. Once you know where your tokens are actually going, compare your plan options on the pricing page.