More on Tokens: How Extended Thinking Gets You Deep Reasoning Without Killing Your Context Window
Published: December 4, 2025 • 6 min read
If you are reading this right now and you haven't read this blog post yet, I suggest you stop now and head to it first.
Are you done reading it now? Well great!
In that post, I told you that there was so much more I was learning about tokens. In this post, we are going to explore how tokens are managed for Extended Thinking.
Quick Recap
We have established that when you send messages to an LLM and it responds to you, both of these (input and output) use up tokens available within the context window of that specific chat.
But what happens when you turn on Extended Thinking?
The Extended Thinking Feature
Hopefully you have read this blog post where I talked about the Extended Thinking feature and how it lets you read your LLM's mind.
Now the natural question to ask at this point is: does the text generated in that dropdown where Claude does its thinking use up tokens?
What I'm Learning
Here is what I am learning:
Extended thinking uses a separate "thinking budget" of tokens.
What this means:
- Thinking tokens are generated and still cost tokens in terms of billing or usage
- However, they are NOT added to the context window
- They do not influence the response for subsequent messages
- They only influence the response of the specific message for which they were generated
Visualizing the Token Flow
Here's a good way to visualize it:
Message 1: You ask a complex question
├── Claude thinks: ~5,000 tokens of reasoning (you see this in the dropdown)
└── Claude responds: ~500 tokens
Message 2: Context window contains:
├── Your original question
├── The 500-token response
└── The 5,000 thinking tokens are GONE
The Key Implication
This has a significant implication:
You can have Claude think extensively without accelerating toward conversation limits.
However, there's a tradeoff:
- If you need Claude to reference its reasoning later, it can't
- The thinking tokens only last for that specific response
- They disappear after the response is generated
Bottom line: Extended reasoning allows you to take advantage of deep reasoning without bloating your context window. The analysis depth does not cost you conversation longevity.
Hey, Future Prisca Here! (December 8, 2025)
I promised I would write more if I got new information, and I did! The section below contains inaccuracies about verbose output being equivalent to Extended Thinking. They are actually different features. Read this follow-up post for the corrected explanation.
The Verbose Output Mystery
Now remember I talked about turning on verbose output in Claude Code in the terminal and how this was the equivalent of the optional Extended Thinking feature you get with Claude on the web.
Today, I am realizing that I may have been wrong about that.
There is still so much I am unsure about when it comes to the verbose output feature, but let me tell you what I have been thinking about.
Extended Thinking: On vs Off
First, when we turn on Extended Thinking on the web:
- Claude generates thinking tokens
- Then generates response tokens
But when it is turned off:
- Claude generates the response tokens directly
- The thinking does not happen "under the hood" where we can't see it
- It simply does not happen at all
- The model just goes straight to predicting the response tokens
The Question About Verbose Output
Now, with verbose output, we don't exactly turn it on or off. We set it to true or false.
This technically should mean the same thing as on or off, but it is still making me wonder:
Does Extended Thinking always happen when using Claude Code?
Two possibilities:
- The thinking tokens are always generated, and we only get to see the model's thoughts when we set
verbose outputtotrue - The thinking tokens only get generated when you set verbose output to
true
My Best Hypothesis
I'll conclude this post with my best hypothesis so far.
Claude Code is designed for complex coding tasks, so I think it is more likely that Extended Thinking is enabled by default in its API calls and the verbose output setting simply controls visibility.
Here's my reasoning:
- Coding tasks benefit significantly from reasoning before acting - we literally use Claude Code to make file changes that are potentially destructive, so thinking first is safer
- The setting is called "verbose OUTPUT" - not "enable thinking" - this suggests it controls what's shown, not what happens
What I'm Still Uncertain About
There is definitely a lot that I am still uncertain about when it comes to this feature. If I get new information, I will write more about it.
But this was fun regardless.
As always, thanks for reading!