I discovered XML structuring could optimize my 7 Context Engineers for token efficiency, minimize attention degradation, and enforce priority hierarchies. Here's the redesign.
In my blog post about restructuring AI agents with XML tags, I mentioned stumbling upon information that would fundamentally change how the Seven Context Engineers for the 777-1 experiment are defined: specifically, how their agent definition files are structured.
This discovery happened BEFORE I even started implementing the Context Engineers as file-based agents across the 7 projects for this experiment. I was researching prompt engineering best practices and found Anthropic's documentation about XML tags for prompt clarity. The recommendation was straightforward: use XML tags like <instructions>, <example>, and <input> to create clear boundaries in your prompts.
But here's what caught my attention: the WHY behind this recommendation. XML tags don't just make prompts prettier. They address specific AI limitations:
I realized: if I'm building 7 specialized subagents for code review, each with detailed responsibilities, examples, and testing checklists, I need to structure them in a way that OPTIMIZES for how AI models actually process information.
So instead of writing prose-based agent definitions and hoping Claude would follow them consistently, I designed an XML schema from the ground up. Hierarchical. Explicit. Token-efficient.
Specifically, I used XML to structure the system prompt portion of each agent definition—the actual instructions Claude receives. The YAML frontmatter handles agent configuration (name, description, tools), while the XML sections provide hierarchical structure for the instructions themselves.
Now, I haven't found any official documentation that talks about structuring file-based agents using XML. But given the research on AI limitations and the benefits XML provides, I believe this restructuring is necessary.
This case study has 3 primary goals:
Think of this as the architectural blueprint for the 777-1 Context Engineers: designed for how AI actually works, not just how humans read prompts.
This work is part of the larger 777-1 experiment: Seven Projects, Seven Subagents, Seven Case Studies, One Goal. The goal is to build an algorithm for predicting prompt failures that will power my AI Prompt Engineering Toolkit.
The 7 Context Engineers (Amber Williams, Kristy Rodriguez, Micaela Santos, Lindsay Stewart, Eesha Desai, Daniella Anderson, and Cassandra Hayes) were introduced in the "Meet the Team" case study. Each has a name, personality, and job description built from analyzing 129 code reviews.
But before implementing them as file-based agent definitions, I discovered XML structuring could fundamentally improve how Claude processes their instructions.
The Design Challenge:
How do you define an AI agent in a way that:
<success_metrics> without parsing everything before it)The XML Solution:
I designed a hybrid structure for the agent definition files: YAML frontmatter for agent configuration + a 10-section XML schema for structuring the system prompt (the instructions Claude receives):
YAML Frontmatter (Claude Code Requirement):
---
name: agent-name
description: Agent description
priority: CRITICAL/HIGH/MEDIUM/LOW
tools: Read, Edit, Bash
---
10 XML Sections:
<identity> - Who they are, their expertise<review_process> - Step-by-step workflow<responsibilities> - Prioritized checklist (critical/important/supplementary)<common_issues> - Frequent problems to catch<examples> - Good/bad code patterns<testing_checklist> - Verification steps<success_metrics> - Measurable outcomes<output_format> - How to structure findings<scope> - What to include/exclude<focus> - One-line mission statementPlus domain-specific sections:
<viewport_requirements> for responsive breakpoints<wcag_requirements> for accessibility standards<forbidden_patterns> for fake functionality<implicit_requirements> for contextual featuresProjected Benefits (based on research):
Note: These are theoretical predictions based on attention degradation research and XML structure benefits. Phase 2 of 777-1 will test these predictions with actual project reviews.
<critical> tags should improve compliance significantly<standard>WCAG AA 4.5:1</standard>)This case study documents the DESIGN and RATIONALE. Testing happens in Phase 2.
Before showing you the XML transformation, you need to understand the AI limitations that make this restructuring necessary. These aren't theoretical concerns. They're documented research findings that affect ALL large language models.
The "Lost in the Middle" Phenomenon
Research shows that AI models exhibit 30-50% accuracy drop for information in middle positions of long contexts. This affects Claude, ChatGPT, Gemini, and every transformer-based model.
What this means in practice:
If I write a 1000-token agent definition in prose format and bury critical requirements in the middle (tokens 400-600), there's a significant chance Claude will miss them. Not because the instruction is unclear, but because attention mechanisms degrade with position.
Analogy: Imagine reading a 200-page IKEA manual before building furniture. By page 150, you're not retaining details. You're scanning for pictures and hoping for the best.
That's what happens to AI models processing long prompts. The beginning gets strong attention. The end gets recency bias. The middle? Statistically degraded.
Why this matters: Claude's context window is 200K tokens. But that doesn't mean it pays equal attention to all 200K. Attention is a limited resource, like working memory. The more you load in, the thinner it spreads.
XML creates attention anchors:
If success metrics are buried at token position 500-700 in a prose prompt, they might get degraded attention. But with XML tags, Claude can jump directly to <success_metrics> without processing everything before it. Random access vs. sequential reading.
XML structure addresses specific AI limitations (attention degradation, positional bias, ambiguity) but does NOT fix hallucinations, teach missing domain knowledge, or bypass context window limits. It makes existing information clearer, not generate new information. For example, if Claude doesn't know WCAG standards, XML tags won't teach them. You still need to provide the knowledge in the prompt or use RAG to retrieve it.
Creating a comprehensive XML schema requires substantial planning time before writing any agent definitions. You need to identify all necessary sections, design hierarchical relationships, establish naming conventions, and create reusable patterns. For the 777-1 Context Engineers, this took approximately 8-10 hours of design work before implementing the first agent. This is time well spent (it prevents inconsistency later), but it's a real cost that prose-based approaches don't require.
XML demands proper validation to catch syntax errors like mismatched tags, improper nesting, and invalid CDATA blocks that break parsing. You'll need tools like xmllint, IDE extensions (VS Code XML tools), or online validators. These tools add another dependency to your workflow. Without validation, a single unclosed tag can make an entire agent definition unusable, with errors that are hard to debug in production.
This XML schema was designed specifically for frontend code review subagents (responsive design, functionality, accessibility, state management). It may not be optimal for backend APIs, database operations, DevOps automation, or data science workflows. Each domain may need specialized sections. For example, API subagents might need <endpoint_validation> or <authentication_checks> sections not present in this schema.
As requirements change, maintaining XML schema consistency across multiple agents becomes challenging. Adding a new section means updating the schema guide, blank template, and potentially all existing agents. Deprecating sections requires careful migration. With 7 agents, a schema change touches 7+ files. This overhead grows linearly with agent count (50 agents means 50 files to update). Prose definitions are easier to evolve individually, though they sacrifice the benefits of standardization.
This XML approach is designed for Claude (Anthropic's models), which explicitly recommends XML tags in its documentation. Other models may not process XML structure as effectively. GPT-4 treats XML more like formatted text than structural boundaries. Open-source models vary widely in their attention mechanisms. If you're building multi-model agents (switching between Claude, GPT-4, Gemini), you may need different prompting strategies for each, undermining the portability benefit of a standardized schema.
XML structuring is a powerful technique for context engineering, but it requires upfront design time, validation tooling, and ongoing maintenance. It's optimized for Claude and frontend code review, not all models or domains. Complementary techniques (RAG, validation layers, iterative testing) remain necessary. The investment pays off for large-scale agent systems, but simple use cases may not justify the overhead.
I didn't randomly choose XML. I studied the research:
Key insight: Structure isn't cosmetic. Models process structured data (XML, JSON) differently than prose. Tags create boundaries that attention mechanisms can use as anchors.
From Anthropic's docs:
"XML tags help Claude distinguish between different parts of your prompt... use tags like
<instructions>,<example>, and<input>to clearly delineate sections."
That's the basic principle. I needed to extend it for code review subagents.
I extended Anthropic's basic recommendations with a hybrid structure: YAML frontmatter for agent configuration + 10 XML sections for structuring the system prompt, tailored specifically for code review subagents.
YAML Frontmatter (Required by Claude Code):
When testing the agents, I discovered Claude Code requires YAML frontmatter at the beginning of agent definition files. This is a technical requirement of the tool, not something mentioned in Anthropic's documentation.
---
name: agent-name # Unique identifier
description: Brief description of agent's role and scope
priority: CRITICAL | HIGH | MEDIUM | LOW # Agent priority level
tools: Read, Edit, Bash # Tools this agent uses
---
This frontmatter makes the original <metadata> XML section redundant, so I removed it from the schema.
The XML Structure (System Prompt):
The 10 XML sections below structure the system prompt—the actual instructions Claude processes when invoked. These sections use Anthropic's recommended XML tags to create clear boundaries, explicit hierarchies, and attention anchors.
10 XML Sections (Universal across all subagents):
<identity> - Name, role, expertise, persona<review_process> - Ordered steps with sequence numbers<responsibilities> - Hierarchical checklist (critical/important/supplementary)<common_issues> - Frequent problems to catch<examples> - Good/bad patterns with code<testing_checklist> - Verification steps<success_metrics> - Measurable outcomes with targets<output_format> - Structured reporting template<scope> - Explicit include/exclude lists<focus> - One-sentence mission statementComplete file structure:
---
name: amber-williams
description: Responsive design specialist
priority: CRITICAL
tools: Read, Edit, Bash
---
<?xml version="1.0" encoding="UTF-8"?>
<subagent>
<identity>
<name>Amber Williams</name>
<role>Senior Frontend Developer - Responsive Design Specialist</role>
...
</identity>
<review_process>
...
</review_process>
<!-- 8 more sections -->
</subagent>
Design principle: Each tag should answer: "What would Claude need to know to execute this perfectly with zero prior context?"
While the base schema is consistent, each subagent gets specialized sections:
Amber Williams (Responsive Design):
<viewport_requirements>
<viewport name="mobile" range="320px-767px">
<requirement>Single column layout</requirement>
<requirement>Hamburger menu</requirement>
<requirement>No horizontal scroll</requirement>
</viewport>
</viewport_requirements>
Lindsay Stewart (Accessibility):
<wcag_requirements>
<requirement type="contrast" level="normal">
<standard>WCAG AA 4.5:1 minimum</standard>
</requirement>
</wcag_requirements>
Kristy Rodriguez (Functionality):
<forbidden_patterns>
<pattern name="fake-functionality">
<code><![CDATA[
const handleExport = () => {
toast.success('Exported!'); // ❌ No actual export
};
]]></code>
</pattern>
</forbidden_patterns>
Cassandra Hayes (Integration):
<implicit_requirements>
<requirement category="auth">User login/logout flow</requirement>
<requirement category="help">Help documentation or tooltips</requirement>
</implicit_requirements>
For each section, I compared prose vs. XML token counts:
Example - Identity Section:
Prose (~80 tokens):
"Amber Williams is a senior frontend developer who specializes in responsive
design. She has extensive experience with mobile-first development and has
worked on projects ranging from small startups to enterprise applications.
Her main focus is ensuring that applications work across all devices."
XML (~40 tokens, 50% reduction):
<identity>
<name>Amber Williams</name>
<role>Senior Frontend Developer - Responsive Design Specialist</role>
<expertise>Mobile-first design, cross-device compatibility, touch interfaces</expertise>
</identity>
Structure eliminates transitional language. The tags themselves convey hierarchy.
The most critical design decision: how to prevent priority collapse.
Solution: Nested Priority Tags
<responsibilities>
<critical>
<!-- MUST be addressed first, non-negotiable -->
<item>Touch targets minimum 44x44px</item>
<item>Zero horizontal scroll on mobile</item>
</critical>
<important>
<!-- Should be addressed after critical -->
<item>Breakpoint transitions smooth</item>
</important>
<supplementary>
<!-- Check if present, but not required -->
<item>Print stylesheets</item>
</supplementary>
</responsibilities>
This makes priority EXPLICIT. No weak language signals ("important too", "also check"). The model can't misinterpret.
When 777-1 Phase 2 testing begins, I'll measure:
Metrics to track:
<output_format>?Predicted improvements (to be validated):
These predictions are based on research about attention mechanisms and XML structure benefits. Real-world validation happens when the agents review actual 777-1 projects.
The result: Seven XML-structured agent definitions designed specifically for how AI models process information. Not just formatted differently—architected differently.
You can see the full schemas in the downloadable resources below.
Theoretical analysis shows XML can eliminate transitional language waste ('Additionally', 'Furthermore', 'It's also important to note') by making structure speak for itself. Based on comparing prose vs. XML versions of the same content, estimated savings of 25-30% per agent definition. This means ~400 tokens freed per review cycle. These tokens are available for actual code context instead of parsing ambiguity. Phase 2 testing will validate these predictions.
Research on the 'Lost in the Middle' phenomenon shows models exhibit 30-50% accuracy drop for information in middle positions of long contexts. XML tags like <success_metrics> should allow Claude to jump directly to sections regardless of position, bypassing sequential reading. This addresses attention degradation theoretically, but real-world validation requires testing with actual agent execution in Phase 2.
Unstructured prompts use weak priority signals ('very important', 'also check', 'if time permits') that degrade under attention pressure. XML makes priority NON-NEGOTIABLE via nested tags: <critical>, <important>, <supplementary>. Research suggests this could improve adherence from ~70% (prose) to 95%+ (structured), but this is a prediction based on attention mechanism studies, not measured results. Testing will confirm or refute this hypothesis.
Phrases like 'adequate color contrast' are ambiguous. WCAG AA? AAA? What ratio? XML eliminates interpretation: <standard>WCAG AA 4.5:1 for normal text</standard>. This should theoretically reduce false positive rates from ~15-20% (subjective interpretation) to ~5-8% (explicit standards), but actual impact depends on Claude's ability to verify standards during reviews. Validation in Phase 2 will measure real false positive rates.
The key mental shift: stop writing instructions like talking to a human ('Please check responsive design carefully'). Start writing instructions like programming an API (<review_process><step order='1'>Check responsive behavior</step></review_process>). Claude is a transformer model processing tokens, so structure matters as much as content. Think data schema, not essay. This design philosophy guided the XML restructuring before any implementation began.
XML makes instructions clearer but doesn't prevent hallucinations, teach missing domain knowledge, or bypass context window limits. If Claude doesn't know WCAG 2.1 standards, <standard>WCAG 2.1 Success Criterion 1.4.3</standard> won't teach it. XML also doesn't fit 50,000-line codebases in a 200K context window. Complementary techniques still needed: RAG for external knowledge, validation layers for accuracy, iterative refinement for complex tasks. XML is ONE tool in the context engineering toolkit, not a complete solution.
Bundled .zip file containing all 7 XML-structured subagent definitions: Amber Williams (Responsive), Kristy Rodriguez (Functionality), Micaela Santos (Design Systems), Lindsay Stewart (Accessibility), Eesha Desai (State Management), Daniella Anderson (Code Quality), and Cassandra Hayes (Integration). Drop them in your .claude/agents folder and start using them.
A blank, copy-paste ready template following the schema used in 777-1. Includes YAML frontmatter plus 10 XML sections: identity, review_process, responsibilities, common_issues, examples, testing_checklist, success_metrics, output_format, scope, and focus. Start building your own subagents immediately.
A comprehensive reference guide explaining the hybrid structure used in the 777-1 Context Engineers. Covers YAML frontmatter requirements and all 10 core XML sections (identity, review_process, responsibilities, etc.), shows code examples, documents naming conventions, and provides the complete schema template. Your blueprint for creating well-structured subagent definitions.