Back to Blog
6 min
technical

More on Tokens: How Extended Thinking Gets You Deep Reasoning Without Killing Your Context Window

Extended thinking uses a separate token budget that doesn't bloat your context window. Here's what that means and why it matters.

AIClaudeToken ManagementLearning in PublicClaude CodeDeveloper Productivity

More on Tokens: How Extended Thinking Gets You Deep Reasoning Without Killing Your Context Window

Published: December 4, 2025 • 6 min read

If you are reading this right now and you haven't read this blog post yet, I suggest you stop now and head to it first.

Are you done reading it now? Well great!

In that post, I told you that there was so much more I was learning about tokens. In this post, we are going to explore how tokens are managed for Extended Thinking.


Quick Recap

We have established that when you send messages to an LLM and it responds to you, both of these (input and output) use up tokens available within the context window of that specific chat.

But what happens when you turn on Extended Thinking?


The Extended Thinking Feature

Hopefully you have read this blog post where I talked about the Extended Thinking feature and how it lets you read your LLM's mind.

Now the natural question to ask at this point is: does the text generated in that dropdown where Claude does its thinking use up tokens?


What I'm Learning

Here is what I am learning:

Extended thinking uses a separate "thinking budget" of tokens.

What this means:

  • Thinking tokens are generated and still cost tokens in terms of billing or usage
  • However, they are NOT added to the context window
  • They do not influence the response for subsequent messages
  • They only influence the response of the specific message for which they were generated

Visualizing the Token Flow

Here's a good way to visualize it:

Message 1: You ask a complex question
├── Claude thinks: ~5,000 tokens of reasoning (you see this in the dropdown)
└── Claude responds: ~500 tokens

Message 2: Context window contains:
├── Your original question
├── The 500-token response
└── The 5,000 thinking tokens are GONE

The Key Implication

This has a significant implication:

You can have Claude think extensively without accelerating toward conversation limits.

However, there's a tradeoff:

  • If you need Claude to reference its reasoning later, it can't
  • The thinking tokens only last for that specific response
  • They disappear after the response is generated

Bottom line: Extended reasoning allows you to take advantage of deep reasoning without bloating your context window. The analysis depth does not cost you conversation longevity.


Hey, Future Prisca Here! (December 8, 2025)

I promised I would write more if I got new information, and I did! The section below contains inaccuracies about verbose output being equivalent to Extended Thinking. They are actually different features. Read this follow-up post for the corrected explanation.


The Verbose Output Mystery

Now remember I talked about turning on verbose output in Claude Code in the terminal and how this was the equivalent of the optional Extended Thinking feature you get with Claude on the web.

Today, I am realizing that I may have been wrong about that.

There is still so much I am unsure about when it comes to the verbose output feature, but let me tell you what I have been thinking about.


Extended Thinking: On vs Off

First, when we turn on Extended Thinking on the web:

  • Claude generates thinking tokens
  • Then generates response tokens

But when it is turned off:

  • Claude generates the response tokens directly
  • The thinking does not happen "under the hood" where we can't see it
  • It simply does not happen at all
  • The model just goes straight to predicting the response tokens

The Question About Verbose Output

Now, with verbose output, we don't exactly turn it on or off. We set it to true or false.

This technically should mean the same thing as on or off, but it is still making me wonder:

Does Extended Thinking always happen when using Claude Code?

Two possibilities:

  1. The thinking tokens are always generated, and we only get to see the model's thoughts when we set verbose output to true
  2. The thinking tokens only get generated when you set verbose output to true

My Best Hypothesis

I'll conclude this post with my best hypothesis so far.

Claude Code is designed for complex coding tasks, so I think it is more likely that Extended Thinking is enabled by default in its API calls and the verbose output setting simply controls visibility.

Here's my reasoning:

  • Coding tasks benefit significantly from reasoning before acting - we literally use Claude Code to make file changes that are potentially destructive, so thinking first is safer
  • The setting is called "verbose OUTPUT" - not "enable thinking" - this suggests it controls what's shown, not what happens

What I'm Still Uncertain About

There is definitely a lot that I am still uncertain about when it comes to this feature. If I get new information, I will write more about it.

But this was fun regardless.

As always, thanks for reading!

Share this article

Found this helpful? Share it with others who might benefit.

Enjoyed this post?

Get notified when I publish new blog posts, case studies, and project updates. No spam, just quality content about AI-assisted development and building in public.

No spam. Unsubscribe anytime. I publish 1-2 posts per day.