Skills vs Sub-Agents: Choosing the Right AI Architecture

TL;DR

When building AI-powered workflows, you have two powerful patterns:

Skills: Add capabilities to your AI (like installing plugins)
Sub-Agents: Delegate tasks to AI specialists (like hiring consultants)

Research shows multi-agent systems outperform single-agent by 90.2%. Use skills for persistent knowledge, sub-agents for complex workflows, and hybrid for best results.

The Problem

My AI assistant kept searching the web for the same Next.js patterns and FastAPI best practices every conversation. Each search cost 5-10 seconds and ~8,000 tokens. After 10 queries, I’d burned 80,000 tokens on redundant searches.

The question: Should I create a sub-agent for validation, a skill for documentation access, or both?

What Are Skills?

Anthropic introduced Agent Skills in October 2025 to equip AI agents with specialized knowledge.

Think of skills like installing a library:

Before: Your code
After:  Your code + New Library = More capabilities

Skills extend what the AI can do by providing:

📚 Domain knowledge (cached documentation)
🛠️ Executable tools (validation scripts, formatters)
🧠 Specialized expertise (security patterns, API references)

Key Innovation: Progressive Disclosure

Skills load only what’s needed, not everything at once:

User: "How do I fetch data in Next.js?"

Loads: SKILL.md (500 tokens) + nextjs/data-fetching.md (2,000 tokens)
Skips: Everything else

Result: 94% reduction in context usage (50,000 → 3,000 tokens)

What Are Sub-Agents?

Sub-agents are specialized AI workers that execute complex tasks independently and return condensed results.

Think of sub-agents like delegating to a specialist consultant:

You: "I need a comprehensive code review"
Consultant: *Goes away, does deep analysis*
Consultant: *Returns executive summary*

Sub-agents:

🧠 Have their own context window (isolation)
🎯 Specialize in specific tasks
⚡ Can run in parallel
📊 Return condensed summaries

Example: Code validation

Main AI spawns validate-nextjs sub-agent
Sub-agent processes 20,000 tokens in isolated context
Returns condensed 2,000 token report
Main conversation stays clean (87.5% context savings)

The Research: Why Multi-Agent Matters

Multi-agent systems with Claude Opus 4 (orchestrator) + Claude Sonnet 4 (sub-agents) outperformed single-agent Claude Opus 4 by 90.2%

Why Sub-Agents Outperform:

Context Management: Isolated context per task, return only summaries
Parallelization: Run multiple tasks simultaneously
Specialization: Excellent at one thing vs okay at everything

Example: Code review across 3 languages

Single Agent:  30 seconds, 60,000 tokens
Multi-Agent:   10 seconds, 6,000 tokens (3x faster, 90% fewer tokens)

When to Use Skills vs Sub-Agents

Use Skills When You Need:

✅ Persistent Capabilities

Documentation access in every conversation
Company-specific knowledge base
Internal API references

✅ Reducing Redundant Operations

Stop searching web for same patterns
Cached framework best practices
Reusable code snippets

✅ Progressive Knowledge

Large knowledge bases (load sections on demand)
API documentation
Design system components

Example Use Cases: Framework docs, coding standards, API references, security best practices

Use Sub-Agents When You Need:

✅ Complex Workflows

Multi-step validation
Code review with prioritization
Iterative fix-validate loops

✅ Parallelization

Run tests + build + deploy simultaneously
Validate multiple languages at once
Process large datasets in chunks

✅ Context Isolation

Keep main conversation clean
Process large PR comments
Deep analysis without polluting context

Example Use Cases: Code validation pipelines, PR review workflows, security audits, performance optimization

The Hybrid Architecture

Neither approach alone was optimal for my documentation caching problem:

Skills only: Great for access, but who updates the cache?
Sub-agents only: Great for validation, but can’t access docs in normal chat

Solution: Combine them

.claude/
├── skills/
│   └── framework-docs/         # Skill: Knowledge access
└── agents/
    ├── refresh-docs.md         # Sub-agent: Maintenance
    └── validate-all.md         # Sub-agent: Uses skill

How It Works:

Interactive Use:
User: "What's the Next.js data fetching pattern?"
AI: *Uses framework-docs skill* → Instant, 3,000 tokens

Automated Validation:
User: "Validate my code"
Main AI: *Spawns validate-all sub-agent*
Sub-agent: *Uses framework-docs skill* → 2,500 tokens

Monthly Maintenance:
User: "Refresh framework docs"
AI: *Spawns refresh-docs sub-agent* → Updates skill cache

Performance Comparison

Token Usage (10 Queries)

Approach	Tokens	vs Baseline
Web search every time	80,000	Baseline
Skill (cached docs)	12,000	85% reduction
Sub-agent (batch validation)	2,500	97% reduction

Time (10 Queries)

Approach	Time	vs Baseline
Web search every time	50 seconds	Baseline
Skill (cached docs)	Instant	100% faster
Sub-agent (parallel)	5 seconds	90% faster

Accuracy

Approach	Consistency	Offline
Web search	❌ Varies by search ranking	❌ No
Skill (cached docs)	✅ Always same	✅ Yes
Sub-agent	✅ Uses skill data	✅ Yes

Key Lessons

1. Progressive Disclosure is Essential

❌ Bad: Load all 50,000 tokens of docs
✅ Good: Load overview (500 tokens), then section on demand (2,000 tokens)

2. Let Sub-Agents Maintain Skills

✅ Skill: framework-docs (provides knowledge)
✅ Sub-agent: refresh-docs (maintains the skill)
✅ Sub-agent: validate-all (uses the skill)

3. Hybrid Beats Solo

Skills only: Great access, manual updates
Sub-agents only: Great workflows, no persistent knowledge
Hybrid: Best of both (85%+ efficiency gains)

4. Parallelize with Sub-Agents

Multi-agent systems are 90.2% better than single-agent:

Sequential: Validate TS → Validate Py → Validate Go (30s)
Parallel: Spawn 3 sub-agents simultaneously (10s, 3x faster)

When NOT to Use This Architecture

Skip Skills If:

❌ Information changes constantly ❌ One-time use (not worth setup overhead) ❌ Simple queries (web search is fine for rare lookups)

Skip Sub-Agents If:

❌ Task is trivial (single-step, no complexity) ❌ Need full context (sub-agents return summaries) ❌ Can’t parallelize (sequential dependencies)

Skip Hybrid If:

❌ Simple use case (don’t over-engineer) ❌ No maintenance needed (static documentation) ❌ Rare access (setup cost > benefit)

Conclusion

The solution wasn’t choosing Skills OR Sub-Agents. It was understanding:

Skills = Capabilities (persistent knowledge, tools)
Sub-Agents = Tasks (workflows, validation)
Hybrid = Synergy (skills provide knowledge, agents do work)

Results:

✅ 85% reduction in tokens (80,000 → 12,000)
✅ 100% faster responses (instant vs 5-10 seconds)
✅ Offline capable (cached docs)
✅ Consistent answers
✅ Automated maintenance

Key Insight:

“Skills extend what your AI can do. Sub-agents delegate what your AI should do. Use skills for knowledge, sub-agents for work, and hybrid for comprehensive solutions.”