Guide

How to reduce LLM token costs when coding (2026 guide for dev teams)

LLM coding costs are driven by context compounding and verbose tool output. The four levers that bring them down: model routing, context hygiene, prompt discipline, and automatic tool-output compression.

Grant Unwin · Founder, Lineman

Short answer: LLM coding costs are driven by two things: context compounding (every token in the window is re-billed as input on each turn) and verbose tool output (file reads, logs, search results), which on Lineman's data is over half a typical bill. You bring the cost down with four levers: match the model to the task, keep context clean, tighten prompts, and automatically compress tool output before it reaches the model.

The two things driving your bill

Almost every guide on this topic tells you to fix the symptom by hand. To fix the cause, you need to understand two mechanics. First, context compounding: models are stateless, so every turn re-sends the whole conversation as input, and anything bulky you read once is paid for again on every later turn. Second, tool output: the file reads, logs, and search results an agent loads are large, low-signal, and sticky. Put together, that's why a session gets more expensive the longer it runs, and why tool output is over half of a typical bill on our benchmarks.

The four levers

1. Match the model to the task

Sonnet costs about a fifth of Opus per token ($3/$15 vs $5/$25 per million input/output, June 2026) and handles most coding. Reserve the expensive model for genuinely hard reasoning. It's a multiplier on every token you spend, so it's often the single biggest saving.

2. Keep context clean

Run /clear when you switch tasks and /compact on long ones (it sheds 60–80% of the active context), and watch /context so you can see what's filling the window. This directly counters context compounding.

3. Tighten prompts and CLAUDE.md

Your CLAUDE.md is re-sent on nearly every turn, so trim it to durable rules; say what you need once. Small per turn, but it compounds across a whole session.

4. Automatically compress tool output

The first three levers are things you have to remember every session. The largest cost, tool output, can be handled automatically: intercept the data-heavy tool calls and hand the model a compact, task-relevant summary instead of the raw dump. Lineman does exactly this for Claude Code over the MCP standard, cutting 40%+ of tokens on our benchmarks while holding output quality. Because the bulk never enters context, it's never billed, not once and not on any later turn.

Which fix for which symptom

If you…Do thisRead
Shocked by a bill, don't know whyDiagnose where the tokens goWhy is my bill so high?
Want a checklist of fixesWork the levers in impact order11 ways to cut token costs
Cost climbs as the session goes onUnderstand context compoundingContext compounding explained
Comparing tools / overlapping subscriptionsCompare cost models, then consolidateClaude Code vs Cursor vs Copilot

In this guide

DiagnosticWhy is my Claude Code bill so high?Alarmed by the bill? The silent majority of the cost is re-read tool output, not reasoning. How to diagnose it with /context and fix it.How-to11 ways to cut Claude Code token costs without losing qualityModel routing, /clear vs /compact, leaner prompts, and cutting tool-output bloat: what each saves and when it applies.ExplainerContext compounding: why every Claude Code message costs more than the lastThe canonical explainer: per-turn re-billing, why tool output is the worst offender, and what actually stops the compounding.ComparisonAI coding tool costs in 2026: Claude Code vs Cursor vs Copilot (and how to spend less)What each tool charges, how they bill, the overlapping-subscriptions trap, and the universal way to spend less.

Frequently asked questions

Why is my Claude Code bill so high?
Most of it is tool output: file reads, build and test logs, and search results loaded into context, not the model's reasoning. On Lineman's benchmarks that's over half a typical bill, and because every token in context is re-billed each turn, the same output is paid for repeatedly.
How much does Claude Code cost per developer per month?
Roughly $150–250 per developer per month across enterprise deployments, or about $13 per developer per active day, with costs under $30 a day for 90% of users (per Anthropic's Claude Code cost docs, code.claude.com/docs/en/costs). Heavy use on large repositories trends higher.
Where do my tokens actually go in Claude Code?
Run /context to see the breakdown. The largest blocks are usually large file reads, long command output, and search results, plus the system prompt and CLAUDE.md. The model's reasoning is typically a minority of the spend.
What is context compounding?
Every turn re-sends the whole conversation, including all earlier tool output, as input. Bulky output read once is re-billed on every later turn, so cost grows with session length even when your prompts stay short.
Why does every Claude Code message cost more than the last?
Because each turn pays for the entire accumulated context, not just your new prompt. As tool output piles up, the per-turn input grows, so later messages cost more than earlier ones.
Why does tool output use so many tokens?
It's large, mostly irrelevant to the reasoning, and sticky: once in context it's re-billed every turn. A single file read or test log can be thousands of tokens the model never needed in full.
How do I reduce Claude Code token usage?
Route easy work to a cheaper model, /clear at task boundaries, /compact long sessions, keep CLAUDE.md lean, and scope file reads. Then compress tool output automatically, which is the largest single lever and needs no workflow change.
Does /clear or /compact save more tokens?
/clear saves more per use because it empties the window, but you lose all context. /compact keeps a summary, cutting 60–80% of active context while preserving continuity. Clear at task boundaries; compact mid-task.
Should I use Sonnet or Opus to save money?
Use the cheapest model that does the job. Sonnet is about a fifth of Opus per token ($3/$15 vs $5/$25 per million input/output, June 2026) and handles most coding. Reserve Opus for genuinely hard reasoning.
Does a larger context window cost more?
Not by itself. You pay for tokens used, not window size. But larger windows make it easy to hold more in context, and every token in context is re-billed each turn, so they tend to raise cost unless you manage what's in them.
Can I reduce token costs without changing my workflow?
Yes. The manual tactics need discipline every session, but compressing tool output before it reaches the model happens automatically: you keep prompting exactly as you do now while the largest cost is cut. That's the lane Lineman occupies.
How do I compress tool output before it reaches the model?
Use a tool that intercepts data-heavy tool calls and returns a compact, task-relevant summary instead of the raw output. Lineman does this for Claude Code via the MCP standard, cutting 40%+ of tokens on its benchmarks while holding output quality.
GU

Grant Unwin

Founder, Lineman

Grant is the founder of Lineman, where he works on cutting the token cost of agentic coding. He writes about how AI coding tools bill, where the spend actually goes, and how to reduce it without losing output quality.