How do I reduce Claude Code token usage?

The biggest levers are: route easy work to a cheaper model, run /clear when you switch tasks, /compact long sessions, keep CLAUDE.md lean, and cut tool-output bloat, the largest hidden cost. Compressing tool output automatically tends to save the most, because it's the one lever you don't have to remember to pull on every task.

Does /clear or /compact save more tokens?

/clear saves more per use because it resets the context window to empty, but you lose all prior context. /compact keeps a summary of the conversation, typically cutting 60–80% of the active context while preserving continuity. Use /clear at a genuine task boundary; use /compact mid-task when you still need the history.

11 ways to cut Claude Code token costs without losing quality

Short answer: To cut Claude Code token costs without losing quality: route easy work to a cheaper model, run /clear at task boundaries, /compact long sessions, keep CLAUDE.md lean, and scope your file reads. The biggest hidden lever is compressing tool output before it reaches the model, so it never gets billed at all.

Here are eleven concrete ways to spend fewer tokens, roughly in order of impact. For each one I've noted why it works, how much it tends to save, and when it applies. The manual tactics are all real and worth doing. The last one is the tactic you don't have to remember.

1. Match the model to the task

This is the single biggest knob. Sonnet runs about a fifth of the price of Opus per token ($3 / $15 vs $5 / $25 per million input/output tokens as of June 2026) and handles most edits, reviews, and routine coding without breaking a sweat. Save Opus for genuinely hard reasoning. Because model choice multiplies every token you spend, it's often the largest saving on the list, and it applies any time the work isn't a hard reasoning problem.

2. Run `/clear` at task boundaries

/clear resets the context window to empty. The moment you finish one task and start an unrelated one, everything from the old task is dead weight that's still being re-billed every turn. Clearing removes the whole accumulated context at once, so the saving is large. Use it at a genuine boundary, once you no longer need the earlier history.

3. Use `/compact` on long tasks

/compact swaps the running transcript for a summary, keeping continuity while shedding the bulk. It usually cuts 60–80% of the active context, which adds up fast on a long session. Reach for it mid-task, when you still need the thread but it's grown heavy. (Not sure whether to clear or compact? /clear saves more but loses everything; /compact keeps the gist.)

4. Watch `/context` and act on it

You can't fix what you can't see. /context shows what's filling your window at any moment. It doesn't save tokens on its own, but it tells you which of these other levers to pull, so run it whenever a session starts to feel expensive or slow. There's more on reading the breakdown in why is my Claude Code bill so high.

5. Keep `CLAUDE.md` lean

CLAUDE.md is re-sent on nearly every turn, so every paragraph you add becomes a recurring tax. Keep it to durable, high-value rules and cut anything situational. The per-turn saving is small, but it compounds across a whole session, so it's worth a periodic review.

6. Tighten your prompts

Long, repetitive prompts cost tokens up front and then add to the context that gets re-billed later. Say what you need once, clearly. The saving here is modest, but the habit pays off, especially if you tend to paste a wall of context into every message.

7. Read files in scoped ranges, not whole

Reading a whole file to use three functions drags in everything else with it. Ask for the specific symbol, function, or line range instead. On big files that's a large saving, and it applies any time you only need part of a file.

8. Don't re-read what's already in context

Agents will sometimes re-open a file they already read earlier in the session, paying for it a second time. A quick "you already have this open" keeps the duplicate out. Over a long session that keeps touching the same files, it adds up to a moderate saving.

9. Prefer targeted search over broad dumps

A wide grep or a "show me everything" can return thousands of low-signal lines. Narrow the pattern and the path. On a sizeable codebase that's a large saving on any search.

10. Hand data-heavy loops to subagents

The grep → read → grep → read loops, and the "run the tests, then classify the failures" loops, burn primary-model context on shuffling data around. Delegate those loops to a subagent and the expensive model stays focused on decisions. It's a meaningful saving on exploration-heavy or triage work.

11. Compress tool output automatically

Every tactic above is something you have to remember to do. The largest single cost, tool output at over half a typical bill, can instead be handled for you: intercept the data-heavy tool calls and hand the model a compact summary in place of the raw dump. That's what Lineman does, cutting 40%+ of tokens on our benchmarks while holding output quality. Because the bulk never enters context, it's never billed, not once and not on any later turn (the context-compounding trap). It's the biggest single lever here, it runs automatically, and it needs no change to how you already work.

Where to start

Diagnose first with /context, then pull the levers in impact order: model routing, clear and compact, scoped reads. After that, make the biggest cost (tool output) disappear automatically so you're not policing it by hand. Once you know your real usage, compare plans on the pricing page.

11 ways to cut Claude Code token costs without losing quality

1. Match the model to the task

2. Run `/clear` at task boundaries

3. Use `/compact` on long tasks

4. Watch `/context` and act on it

5. Keep `CLAUDE.md` lean

6. Tighten your prompts

7. Read files in scoped ranges, not whole

8. Don't re-read what's already in context

9. Prefer targeted search over broad dumps

10. Hand data-heavy loops to subagents

11. Compress tool output automatically

Where to start

Frequently asked questions

How do I reduce Claude Code token usage?

Does /clear or /compact save more tokens?

Should I use Sonnet or Opus to save money?

More on cutting token costs

1. Match the model to the task

2. Run /clear at task boundaries

3. Use /compact on long tasks

4. Watch /context and act on it

5. Keep CLAUDE.md lean

6. Tighten your prompts

7. Read files in scoped ranges, not whole

8. Don't re-read what's already in context

9. Prefer targeted search over broad dumps

10. Hand data-heavy loops to subagents

11. Compress tool output automatically

Where to start

Frequently asked questions

How do I reduce Claude Code token usage?

Does /clear or /compact save more tokens?

Should I use Sonnet or Opus to save money?

More on cutting token costs

2. Run `/clear` at task boundaries

3. Use `/compact` on long tasks

4. Watch `/context` and act on it

5. Keep `CLAUDE.md` lean