← Full guide: reduce LLM token costs
How-to

11 ways to cut Claude Code token costs without losing quality

A complete, honest list of ways to reduce Claude Code token usage, from model routing and context hygiene to the one lever you don't have to remember: automatic tool-output compression.

Grant Unwin · Founder, Lineman

Short answer: To cut Claude Code token costs without losing quality: route easy work to a cheaper model, run /clear at task boundaries, /compact long sessions, keep CLAUDE.md lean, and scope your file reads. The biggest hidden lever is compressing tool output before it reaches the model, so it never gets billed at all.

Here are eleven concrete ways to spend fewer tokens, roughly in order of impact. For each one I've noted why it works, how much it tends to save, and when it applies. The manual tactics are all real and worth doing. The last one is the tactic you don't have to remember.

1. Match the model to the task

This is the single biggest knob. Sonnet runs about a fifth of the price of Opus per token ($3 / $15 vs $5 / $25 per million input/output tokens as of June 2026) and handles most edits, reviews, and routine coding without breaking a sweat. Save Opus for genuinely hard reasoning. Because model choice multiplies every token you spend, it's often the largest saving on the list, and it applies any time the work isn't a hard reasoning problem.

2. Run /clear at task boundaries

/clear resets the context window to empty. The moment you finish one task and start an unrelated one, everything from the old task is dead weight that's still being re-billed every turn. Clearing removes the whole accumulated context at once, so the saving is large. Use it at a genuine boundary, once you no longer need the earlier history.

3. Use /compact on long tasks

/compact swaps the running transcript for a summary, keeping continuity while shedding the bulk. It usually cuts 60–80% of the active context, which adds up fast on a long session. Reach for it mid-task, when you still need the thread but it's grown heavy. (Not sure whether to clear or compact? /clear saves more but loses everything; /compact keeps the gist.)

4. Watch /context and act on it

You can't fix what you can't see. /context shows what's filling your window at any moment. It doesn't save tokens on its own, but it tells you which of these other levers to pull, so run it whenever a session starts to feel expensive or slow. There's more on reading the breakdown in why is my Claude Code bill so high.

5. Keep CLAUDE.md lean

CLAUDE.md is re-sent on nearly every turn, so every paragraph you add becomes a recurring tax. Keep it to durable, high-value rules and cut anything situational. The per-turn saving is small, but it compounds across a whole session, so it's worth a periodic review.

6. Tighten your prompts

Long, repetitive prompts cost tokens up front and then add to the context that gets re-billed later. Say what you need once, clearly. The saving here is modest, but the habit pays off, especially if you tend to paste a wall of context into every message.

7. Read files in scoped ranges, not whole

Reading a whole file to use three functions drags in everything else with it. Ask for the specific symbol, function, or line range instead. On big files that's a large saving, and it applies any time you only need part of a file.

8. Don't re-read what's already in context

Agents will sometimes re-open a file they already read earlier in the session, paying for it a second time. A quick "you already have this open" keeps the duplicate out. Over a long session that keeps touching the same files, it adds up to a moderate saving.

9. Prefer targeted search over broad dumps

A wide grep or a "show me everything" can return thousands of low-signal lines. Narrow the pattern and the path. On a sizeable codebase that's a large saving on any search.

10. Hand data-heavy loops to subagents

The grep → read → grep → read loops, and the "run the tests, then classify the failures" loops, burn primary-model context on shuffling data around. Delegate those loops to a subagent and the expensive model stays focused on decisions. It's a meaningful saving on exploration-heavy or triage work.

11. Compress tool output automatically

Every tactic above is something you have to remember to do. The largest single cost, tool output at over half a typical bill, can instead be handled for you: intercept the data-heavy tool calls and hand the model a compact summary in place of the raw dump. That's what Lineman does, cutting 40%+ of tokens on our benchmarks while holding output quality. Because the bulk never enters context, it's never billed, not once and not on any later turn (the context-compounding trap). It's the biggest single lever here, it runs automatically, and it needs no change to how you already work.

Where to start

Diagnose first with /context, then pull the levers in impact order: model routing, clear and compact, scoped reads. After that, make the biggest cost (tool output) disappear automatically so you're not policing it by hand. Once you know your real usage, compare plans on the pricing page.

Frequently asked questions

How do I reduce Claude Code token usage?
The biggest levers are: route easy work to a cheaper model, run /clear when you switch tasks, /compact long sessions, keep CLAUDE.md lean, and cut tool-output bloat, the largest hidden cost. Compressing tool output automatically tends to save the most, because it's the one lever you don't have to remember to pull on every task.
Does /clear or /compact save more tokens?
/clear saves more per use because it resets the context window to empty, but you lose all prior context. /compact keeps a summary of the conversation, typically cutting 60–80% of the active context while preserving continuity. Use /clear at a genuine task boundary; use /compact mid-task when you still need the history.
Should I use Sonnet or Opus to save money?
Use the cheapest model that completes the task well. Sonnet costs roughly a fifth of Opus per token ($3/$15 vs $5/$25 per million input/output, June 2026) and handles most edits, reviews and routine coding. Reserve Opus for genuinely hard reasoning. Matching model tier to task difficulty is one of the largest single savings available.
GU

Grant Unwin

Founder, Lineman

Grant is the founder of Lineman, where he works on cutting the token cost of agentic coding. He writes about how AI coding tools bill, where the spend actually goes, and how to reduce it without losing output quality.

More on cutting token costs