← Full guide: reduce LLM token costs
Diagnostic

Why is my Claude Code bill so high?

Most of a Claude Code bill is tool output: file reads, logs, and search results, not the model's reasoning. Here's how to find where your tokens go and bring the bill down.

Grant Unwin · Founder, Lineman

Short answer: Most of a high Claude Code bill comes from tool output: the file reads, build and test logs, and search results the agent loads into context. It is not the model's reasoning. On Lineman's benchmarks that output is over half of a typical bill, and because every token in the context window is re-billed as input on each new turn, the same bulky output gets paid for over and over.

If you've just opened a Claude Code invoice and felt your stomach drop, you're not misreading it. Coding agents really are token-hungry, and the cost usually isn't where people think it is. This walks through what you're actually paying for, how to check it yourself, and how to get it down.

What a Claude Code bill is actually made of

Every message you send is billed on the tokens it processes, and those tokens come from four places:

  1. The system prompt and your CLAUDE.md, sent on almost every turn.
  2. Your own prompts, usually small.
  3. The model's reasoning and replies. Most people assume this is the bulk of the bill. It usually isn't.
  4. Tool output: the file reads, command output from tests and builds and installs, grep and search results, and any web pages the agent fetches. This is almost always the biggest line.

Some scale for context. Anthropic's own published figure is roughly $13 per developer per active day and $150–250 per developer per month across enterprise deployments, with costs staying under $30 a day for 90% of users (Anthropic's Claude Code cost docs). Teams working in large repositories tend to land at the top of that range, and the reason is nearly always how much tool output is flowing into the context window.

On our own benchmarks, tool output accounts for over half of a typical bill. The methodology is on the benchmarks page.

Why tool output is the silent majority

Three things make tool output expensive. It's large: open a 1,200-line file to use three functions and you've loaded all 1,200 lines, and a noisy test run or a verbose install log can be thousands of tokens on its own. It's mostly beside the point, too. The model needs the failing assertion, the function signature, the one error line, not the megabyte wrapped around them. And it's sticky. Once raw output lands in the context window, it gets re-sent as input on every following turn until you clear or compact, which is what turns a one-off cost into a compounding one (see context compounding for the full mechanism).

That last property is why a long session feels like it gets pricier as it goes. It does.

How to see where your tokens go

Don't guess, measure:

  • Run /context. It breaks down what's filling your window right now. In most sessions the biggest blocks are large file reads, long command output, and search results, alongside the system prompt and CLAUDE.md.
  • Check your usage dashboard. The Claude/Anthropic console shows token spend over time, and community tools that parse session logs can attribute cost per session.
  • Watch for the spikes. A single "read this whole file," an npm test that dumps thousands of lines, or a repo-wide grep are the usual culprits.

How to bring the bill down

There are two kinds of fix, and you want both.

The manual levers, covered in full in 11 ways to cut Claude Code token costs:

  • Route easy work to a cheaper model (Sonnet is about a fifth of Opus per token).
  • Run /clear when you switch tasks, and /compact on long ones.
  • Keep CLAUDE.md lean so you're not re-sending rules every turn.
  • Read files in scoped ranges instead of whole.

The automated lever is to compress tool output before it ever reaches the model. The manual list tends to underplay this one, precisely because it's the tactic you'd otherwise have to remember on every single tool call. Lineman intercepts data-heavy tool calls and hands your primary model a compact, task-relevant summary in place of the raw dump, which cuts 40%+ of tokens on our benchmarks while holding output quality (see benchmarks). Since the bulk never enters context, you never pay for it, not on the first turn and not on any later one.

The fastest first move

Open a session, run /context, and look at the biggest line. Nine times out of ten it's tool output. Go after that first, by hand with /clear and scoped reads, or automatically so you don't have to think about it, and the bill drops quickly. Once you know where your tokens are actually going, compare your plan options on the pricing page.

Frequently asked questions

Why is my Claude Code bill so high?
Most of a Claude Code bill is tool output: the file reads, test and build logs, and search results the agent pulls into context, not the model's reasoning. On Lineman's benchmarks that's over half a typical bill, and because every token in the context window is re-billed as input on each new turn, verbose output you read once gets paid for again on every later message.
How much should Claude Code cost per developer?
Anthropic's own figure is roughly $13 per developer per active day and $150–250 per developer per month across enterprise deployments, with costs under $30 a day for 90% of users (per Anthropic's Claude Code cost docs, code.claude.com/docs/en/costs). Heavy users on large repositories run higher, largely because of how much tool output enters context.
Where do my Claude Code tokens actually go?
Run /context to see your window broken down. In most sessions the biggest consumers are large file reads, long command output (tests, builds, installs), and search results, plus the system prompt and CLAUDE.md. The model's actual reasoning is usually a minority of the spend.
GU

Grant Unwin

Founder, Lineman

Grant is the founder of Lineman, where he works on cutting the token cost of agentic coding. He writes about how AI coding tools bill, where the spend actually goes, and how to reduce it without losing output quality.

More on cutting token costs