TL;DR
When building AI-powered workflows, you have two powerful patterns:
- Skills: Add capabilities to your AI (like installing plugins)
- Sub-Agents: Delegate tasks to AI specialists (like hiring consultants)
Research shows multi-agent systems outperform single-agent by 90.2%. Use skills for persistent knowledge, sub-agents for complex workflows, and hybrid for best results.
The Problem
My AI assistant kept searching the web for the same Next.js patterns and FastAPI best practices every conversation. Each search cost 5-10 seconds and ~8,000 tokens. After 10 queries, I’d burned 80,000 tokens on redundant searches.
The question: Should I create a sub-agent for validation, a skill for documentation access, or both?
What Are Skills?
Anthropic introduced Agent Skills in October 2025 to equip AI agents with specialized knowledge.
Think of skills like installing a library:
Before: Your code
After: Your code + New Library = More capabilities
Skills extend what the AI can do by providing:
- 📚 Domain knowledge (cached documentation)
- 🛠️ Executable tools (validation scripts, formatters)
- 🧠 Specialized expertise (security patterns, API references)
Key Innovation: Progressive Disclosure
Skills load only what’s needed, not everything at once:
User: "How do I fetch data in Next.js?"
Loads: SKILL.md (500 tokens) + nextjs/data-fetching.md (2,000 tokens)
Skips: Everything else
Result: 94% reduction in context usage (50,000 → 3,000 tokens)
What Are Sub-Agents?
Sub-agents are specialized AI workers that execute complex tasks independently and return condensed results.
Think of sub-agents like delegating to a specialist consultant:
You: "I need a comprehensive code review"
Consultant: *Goes away, does deep analysis*
Consultant: *Returns executive summary*
Sub-agents:
- 🧠 Have their own context window (isolation)
- 🎯 Specialize in specific tasks
- ⚡ Can run in parallel
- 📊 Return condensed summaries
Example: Code validation
- Main AI spawns
validate-nextjssub-agent - Sub-agent processes 20,000 tokens in isolated context
- Returns condensed 2,000 token report
- Main conversation stays clean (87.5% context savings)
The Research: Why Multi-Agent Matters
Multi-agent systems with Claude Opus 4 (orchestrator) + Claude Sonnet 4 (sub-agents) outperformed single-agent Claude Opus 4 by 90.2%
Why Sub-Agents Outperform:
- Context Management: Isolated context per task, return only summaries
- Parallelization: Run multiple tasks simultaneously
- Specialization: Excellent at one thing vs okay at everything
Example: Code review across 3 languages
Single Agent: 30 seconds, 60,000 tokens
Multi-Agent: 10 seconds, 6,000 tokens (3x faster, 90% fewer tokens)
When to Use Skills vs Sub-Agents
Use Skills When You Need:
✅ Persistent Capabilities
- Documentation access in every conversation
- Company-specific knowledge base
- Internal API references
✅ Reducing Redundant Operations
- Stop searching web for same patterns
- Cached framework best practices
- Reusable code snippets
✅ Progressive Knowledge
- Large knowledge bases (load sections on demand)
- API documentation
- Design system components
Example Use Cases: Framework docs, coding standards, API references, security best practices
Use Sub-Agents When You Need:
✅ Complex Workflows
- Multi-step validation
- Code review with prioritization
- Iterative fix-validate loops
✅ Parallelization
- Run tests + build + deploy simultaneously
- Validate multiple languages at once
- Process large datasets in chunks
✅ Context Isolation
- Keep main conversation clean
- Process large PR comments
- Deep analysis without polluting context
Example Use Cases: Code validation pipelines, PR review workflows, security audits, performance optimization
The Hybrid Architecture
Neither approach alone was optimal for my documentation caching problem:
- Skills only: Great for access, but who updates the cache?
- Sub-agents only: Great for validation, but can’t access docs in normal chat
Solution: Combine them
.claude/
├── skills/
│ └── framework-docs/ # Skill: Knowledge access
└── agents/
├── refresh-docs.md # Sub-agent: Maintenance
└── validate-all.md # Sub-agent: Uses skill
How It Works:
Interactive Use:
User: "What's the Next.js data fetching pattern?"
AI: *Uses framework-docs skill* → Instant, 3,000 tokens
Automated Validation:
User: "Validate my code"
Main AI: *Spawns validate-all sub-agent*
Sub-agent: *Uses framework-docs skill* → 2,500 tokens
Monthly Maintenance:
User: "Refresh framework docs"
AI: *Spawns refresh-docs sub-agent* → Updates skill cache
Performance Comparison
Token Usage (10 Queries)
| Approach | Tokens | vs Baseline |
|---|---|---|
| Web search every time | 80,000 | Baseline |
| Skill (cached docs) | 12,000 | 85% reduction |
| Sub-agent (batch validation) | 2,500 | 97% reduction |
Time (10 Queries)
| Approach | Time | vs Baseline |
|---|---|---|
| Web search every time | 50 seconds | Baseline |
| Skill (cached docs) | Instant | 100% faster |
| Sub-agent (parallel) | 5 seconds | 90% faster |
Accuracy
| Approach | Consistency | Offline |
|---|---|---|
| Web search | ❌ Varies by search ranking | ❌ No |
| Skill (cached docs) | ✅ Always same | ✅ Yes |
| Sub-agent | ✅ Uses skill data | ✅ Yes |
Key Lessons
1. Progressive Disclosure is Essential
❌ Bad: Load all 50,000 tokens of docs
✅ Good: Load overview (500 tokens), then section on demand (2,000 tokens)
2. Let Sub-Agents Maintain Skills
✅ Skill: framework-docs (provides knowledge)
✅ Sub-agent: refresh-docs (maintains the skill)
✅ Sub-agent: validate-all (uses the skill)
3. Hybrid Beats Solo
Skills only: Great access, manual updates
Sub-agents only: Great workflows, no persistent knowledge
Hybrid: Best of both (85%+ efficiency gains)
4. Parallelize with Sub-Agents
Multi-agent systems are 90.2% better than single-agent:
Sequential: Validate TS → Validate Py → Validate Go (30s)
Parallel: Spawn 3 sub-agents simultaneously (10s, 3x faster)
When NOT to Use This Architecture
Skip Skills If:
❌ Information changes constantly ❌ One-time use (not worth setup overhead) ❌ Simple queries (web search is fine for rare lookups)
Skip Sub-Agents If:
❌ Task is trivial (single-step, no complexity) ❌ Need full context (sub-agents return summaries) ❌ Can’t parallelize (sequential dependencies)
Skip Hybrid If:
❌ Simple use case (don’t over-engineer) ❌ No maintenance needed (static documentation) ❌ Rare access (setup cost > benefit)
Conclusion
The solution wasn’t choosing Skills OR Sub-Agents. It was understanding:
- Skills = Capabilities (persistent knowledge, tools)
- Sub-Agents = Tasks (workflows, validation)
- Hybrid = Synergy (skills provide knowledge, agents do work)
Results:
- ✅ 85% reduction in tokens (80,000 → 12,000)
- ✅ 100% faster responses (instant vs 5-10 seconds)
- ✅ Offline capable (cached docs)
- ✅ Consistent answers
- ✅ Automated maintenance
Key Insight:
“Skills extend what your AI can do. Sub-agents delegate what your AI should do. Use skills for knowledge, sub-agents for work, and hybrid for comprehensive solutions.”