Short answer: To cut Claude Code token costs without losing quality: route easy work to a cheaper model, run
/clearat task boundaries,/compactlong sessions, keepCLAUDE.mdlean, and scope your file reads. The biggest hidden lever is compressing tool output before it reaches the model, so it never gets billed at all.
Here are eleven concrete ways to spend fewer tokens, roughly in order of impact. For each one I've noted why it works, how much it tends to save, and when it applies. The manual tactics are all real and worth doing. The last one is the tactic you don't have to remember.
1. Match the model to the task
This is the single biggest knob. Sonnet runs about a fifth of the price of Opus per token ($3 / $15 vs $5 / $25 per million input/output tokens as of June 2026) and handles most edits, reviews, and routine coding without breaking a sweat. Save Opus for genuinely hard reasoning. Because model choice multiplies every token you spend, it's often the largest saving on the list, and it applies any time the work isn't a hard reasoning problem.
2. Run /clear at task boundaries
/clear resets the context window to empty. The moment you finish one task and start an unrelated one, everything from the old task is dead weight that's still being re-billed every turn. Clearing removes the whole accumulated context at once, so the saving is large. Use it at a genuine boundary, once you no longer need the earlier history.
3. Use /compact on long tasks
/compact swaps the running transcript for a summary, keeping continuity while shedding the bulk. It usually cuts 60–80% of the active context, which adds up fast on a long session. Reach for it mid-task, when you still need the thread but it's grown heavy. (Not sure whether to clear or compact? /clear saves more but loses everything; /compact keeps the gist.)
4. Watch /context and act on it
You can't fix what you can't see. /context shows what's filling your window at any moment. It doesn't save tokens on its own, but it tells you which of these other levers to pull, so run it whenever a session starts to feel expensive or slow. There's more on reading the breakdown in why is my Claude Code bill so high.
5. Keep CLAUDE.md lean
CLAUDE.md is re-sent on nearly every turn, so every paragraph you add becomes a recurring tax. Keep it to durable, high-value rules and cut anything situational. The per-turn saving is small, but it compounds across a whole session, so it's worth a periodic review.
6. Tighten your prompts
Long, repetitive prompts cost tokens up front and then add to the context that gets re-billed later. Say what you need once, clearly. The saving here is modest, but the habit pays off, especially if you tend to paste a wall of context into every message.
7. Read files in scoped ranges, not whole
Reading a whole file to use three functions drags in everything else with it. Ask for the specific symbol, function, or line range instead. On big files that's a large saving, and it applies any time you only need part of a file.
8. Don't re-read what's already in context
Agents will sometimes re-open a file they already read earlier in the session, paying for it a second time. A quick "you already have this open" keeps the duplicate out. Over a long session that keeps touching the same files, it adds up to a moderate saving.
9. Prefer targeted search over broad dumps
A wide grep or a "show me everything" can return thousands of low-signal lines. Narrow the pattern and the path. On a sizeable codebase that's a large saving on any search.
10. Hand data-heavy loops to subagents
The grep → read → grep → read loops, and the "run the tests, then classify the failures" loops, burn primary-model context on shuffling data around. Delegate those loops to a subagent and the expensive model stays focused on decisions. It's a meaningful saving on exploration-heavy or triage work.
11. Compress tool output automatically
Every tactic above is something you have to remember to do. The largest single cost, tool output at over half a typical bill, can instead be handled for you: intercept the data-heavy tool calls and hand the model a compact summary in place of the raw dump. That's what Lineman does, cutting 40%+ of tokens on our benchmarks while holding output quality. Because the bulk never enters context, it's never billed, not once and not on any later turn (the context-compounding trap). It's the biggest single lever here, it runs automatically, and it needs no change to how you already work.
Where to start
Diagnose first with /context, then pull the levers in impact order: model routing, clear and compact, scoped reads. After that, make the biggest cost (tool output) disappear automatically so you're not policing it by hand. Once you know your real usage, compare plans on the pricing page.