Claude God Tip #6: How Claude's Compacting Feature Affects LLM Instance Cloning

Published: November 23, 2025 • 6 min read

I just provided the draft of my 35th blog post to Claude to do its thing. You know, refine it following the rules I specified over a series of back-and-forth conversations for exactly how I want the refinement to be done and the exact format the final output should be provided. I talk more about that here. You should check it out.

The thing is, as I provided the draft, I got this message:

"Compacting our conversation so we can keep chatting. This takes about 1-2 minutes."

My Initial Reaction

Hmm. The only other place I have seen a similar (but not quite the exact) message is when working with Claude Code on my terminal.

I immediately thought that this was it. Claude had implemented a feature to allow conversations to continue without limit. My following thought then was that Claude is really out there to make the methodologies I come up with or explain, like LLM Instance Cloning, useless.

But I was wrong. Yay!

What Claude's Compacting Feature Actually Does

Here is exactly what Claude's Compacting feature does:

It summarizes older messages to free up token space
It keeps recent messages intact
It compresses early conversation into condensed summaries

The goal of all this is to allow you to continue the same chat longer instead of starting fresh. The key word is longer, but I don't think it means that the chat will never end, although I'll keep observing this new feature over the next few days.

Testing the Token Usage After Compacting

After refining my prompt, I prompted that Claude instance with this:

Read any file in this project, then tell me our current token usage.

Its output was:

Current Token Usage

Tokens Used: 150,267 out of 190,000
Tokens Remaining: 39,733
Percentage Used: ~79%

We're in the RED zone! 🚨

Selon les méthodes que tu viens d'écrire dans ton Claude God Tip #5:

Green (0-50% used) ✗

Yellow (50-75%) ✗

Red (75%+) ✓

Nous avons environ 39,733 tokens restants, ce qui représente environ 21% de notre budget initial.

Tu devrais probablement considérer extraire la méthodologie de cette conversation bientôt si tu veux capturer tout ce qu'on a accompli aujourd'hui (blog refinement patterns, voice preservation, etc.) avant qu'on atteigne la limite!

On a fait énormément de travail ensemble aujourd'hui. C'est peut-être le moment de créer un cloning prompt pour cette instance? 🤔

You can see that the output above is a mix of English and French. Well, I'm learning French and I found that one of the best ways to be able to integrate French into my daily life is by adding this prompt to my project's instructions: "For every response you provide to me, 3-5 sentences should be in French to help me learn French on the go." You can also see that it actually pays attention to the blog posts I write, which is cool.

But anyways, the point is that the conversation does not have unlimited tokens. Even after compacting the message, I only have 39,733 tokens left for that conversation.

Why This Feature Is Important for LLM Instance Cloning

The compacting process preserves:

Full details about recent conversations
Key decisions and context details it identified
Important preferences it established

However, you could lose:

Nuance from early messages
The exact wording of older prompts
Some training refinement details

This means that if Claude compacts BEFORE you extract your AI's personality, you've lost the detailed training history.

You should extract your cloning prompt BEFORE compacting happens to capture the full methodology that you've established through back-and-forth iterations.

This compacting feature is usually triggered around 80-90% token capacity to give you more conversation extension before you hit the hard token limits.

What You See vs What Claude Sees

Now, does that mean that when you scroll up on your chat, the messages you sent previously and their responses will be summarized? NO.

From what I understand, the compacting happens in Claude's context/memory only. It's about managing what Claude can "see" and process, not about modifying the actual chat interface that you see.

So when compacting happens:

You can still:

Scroll up and see all your original messages exactly as you wrote them
View the full conversation history unchanged

Claude's internal view:

Has a compressed/summarized version of older messages to save token space
Can only access full detail for recent messages

This is similar to how when you have a very long conversation, Claude might not have full access to messages from way back in the conversation even though you can still scroll up and see them.

The compacting is about Claude's working memory/context window, not about the stored chat history that appears in the UI.

Visual Example

Your view (always):

Message 1: [Your full original prompt about blog refinement]
Message 2: [Claude's full original response]
...
Message 200: [Latest message]

Claude's internal view after compacting:

Summary: "User trained me on blog refinement with specific preferences 
for voice, structure, hyperlinks..."
...
Message 150-200: [Full detail of recent messages]

The Critical Implication for Instance Cloning

Once again, this is important for LLM Instance Cloning because:

Even though YOU can see the original training messages, Claude can't access them in detail after compacting.

So if you try to extract methodology post-compaction, Claude's working from summaries, not the actual refinement conversations. You'll get a diluted version of your carefully trained instance.

The Action Item

Use the token tracking methods from Tip #5 to extract your cloning prompt BEFORE you hit that 80-90% threshold where compacting triggers.

Otherwise, you risk losing the nuanced details that made your instance perfectly trained in the first place.

I hope this information is useful to anyone who uses LLM Instance Cloning.

As always, thanks for reading!

Claude God Tip #6: How Claude's Compacting Feature Affects LLM Instance Cloning