TL;DR

When building AI-powered workflows, you have two powerful patterns:

  • Skills: Add capabilities to your AI (like installing plugins)
  • Sub-Agents: Delegate tasks to AI specialists (like hiring consultants)

This post explores when to use each, backed by research showing multi-agent systems outperform single-agent by 90.2%.

Key Takeaway: Use skills for persistent knowledge (cached documentation), sub-agents for complex workflows (validation, analysis), and hybrid for best results.


The Problem: Choosing the Wrong Abstraction

I recently faced a common challenge: My AI assistant kept searching the web for the same Next.js patterns, FastAPI best practices, and Anthropic documentation every single conversation.

Each search cost:

  • ⏱️ 5-10 seconds of latency
  • 🎫 ~8,000 tokens per query
  • 🔄 Inconsistent results (search rankings change)
  • 💻 Requires internet connection

After 10 queries, I’d burned 80,000 tokens on redundant searches.

I knew I needed to cache this documentation, but the question was: What’s the right architecture?

Should I:

  1. Create a sub-agent that validates code against cached docs?
  2. Build a skill that makes docs available in every conversation?
  3. Use a hybrid approach?

Turns out, understanding the distinction between Skills and Sub-Agents completely changed my approach.


What Are Skills? (New from Anthropic, October 2025)

Anthropic introduced Agent Skills in October 2025 as a way to equip AI agents with specialized knowledge and capabilities.

The Mental Model

Think of skills like installing a library or plugin:

Before: Your code
After:  Your code + New Library = More capabilities

Skills extend what the AI can do by providing:

  • 📚 Domain knowledge (cached documentation)
  • 🛠️ Executable tools (validation scripts, formatters)
  • 🧠 Specialized expertise (security patterns, API references)

How Skills Work

.claude/skills/
└── framework-docs/
    ├── SKILL.md              # Skill definition
    ├── nextjs/
    │   ├── standards.md      # Cached Next.js docs
    │   └── patterns.md
    ├── fastapi/
    │   └── standards.md      # Cached FastAPI docs
    └── scripts/
        └── validate.sh       # Executable validation

When you ask: “What’s the Next.js App Router pattern?”

The AI:

  1. Loads the framework-docs skill
  2. Reads nextjs/standards.md (progressive disclosure)
  3. Answers instantly from cached knowledge
  4. No web search needed

The Key Innovation: Progressive Disclosure

Skills don’t dump everything into context. They load only what’s needed:

User: "How do I fetch data in Next.js?"

Loads ONLY:
├── SKILL.md (500 tokens)          # Overview
└── nextjs/data-fetching.md (2,000 tokens)  # Relevant section

SKIPS:
├── nextjs/routing.md
├── nextjs/server-components.md
└── fastapi/* (not needed for this query)

Result: 94% reduction in context usage (50,000 tokens → 3,000 tokens)


What Are Sub-Agents?

Sub-agents are specialized AI workers that execute complex tasks independently and return condensed results.

The Mental Model

Think of sub-agents like delegating to a specialist consultant:

You: "I need a comprehensive code review"
Consultant: *Goes away, does deep analysis*
Consultant: *Returns executive summary*

Sub-agents:

  • 🧠 Have their own context window (isolation)
  • 🎯 Specialize in specific tasks (code review, validation, refactoring)
  • ⚡ Can run in parallel (multiple specialists working simultaneously)
  • 📊 Return condensed summaries (not full details)

How Sub-Agents Work

.claude/agents/
├── validate-nextjs.md        # Autonomous validation
├── coderabbit-pr-fixer.md    # Fix PR issues systematically
└── refresh-docs.md           # Update cached documentation

When you ask: “Validate my Next.js code against best practices”

The main AI:

  1. Spawns a validate-nextjs sub-agent
  2. Sub-agent runs independently with own context
  3. Sub-agent reads files, validates code (20,000 tokens in its context)
  4. Returns condensed report (2,000 tokens)
  5. Main conversation stays clean (87.5% context savings)

The Research: Why Multi-Agent Matters

Anthropic’s research on multi-agent systems revealed a stunning finding:

Multi-agent systems with Claude Opus 4 (orchestrator) + Claude Sonnet 4 (sub-agents) outperformed single-agent Claude Opus 4 by 90.2%

Why Sub-Agents Outperform Single Agents

Context Management:

  • Single agent: Maintains full context across entire task (burns tokens)
  • Sub-agents: Isolated context per task, return only summaries

Parallelization:

  • Single agent: Sequential execution (slow)
  • Sub-agents: Run multiple tasks simultaneously (fast)

Specialization:

  • Single agent: Generalist (okay at everything)
  • Sub-agents: Specialists (excellent at one thing)

Example: Code review across 3 languages

Single Agent:
Review TypeScript → Review Python → Review Go
Total time: 30 seconds
Context: 60,000 tokens (all 3 reviews in context)

Multi-Agent:
Spawn 3 sub-agents in parallel:
├── validate-typescript.md
├── validate-python.md
└── validate-go.md
Total time: 10 seconds (3x faster)
Context: 6,000 tokens (only summaries in main context)

When to Use Skills vs Sub-Agents

Use Skills When You Need:

Persistent Capabilities

  • Documentation access in every conversation
  • Company-specific knowledge base
  • Internal API references

Reducing Redundant Operations

  • Stop searching web for same patterns
  • Cached framework best practices
  • Reusable code snippets

Progressive Knowledge

  • Large knowledge bases (load sections on demand)
  • API documentation
  • Design system components

Example Use Cases:

✓ Framework documentation (Next.js, React, FastAPI)
✓ Company coding standards
✓ Internal API references
✓ Security best practices
✓ Design system patterns

Use Sub-Agents When You Need:

Complex Workflows

  • Multi-step validation
  • Code review with prioritization
  • Iterative fix-validate loops

Parallelization

  • Run tests + build + deploy simultaneously
  • Validate multiple languages at once
  • Process large datasets in chunks

Context Isolation

  • Keep main conversation clean
  • Process large PR comments (tens of thousands of tokens)
  • Deep analysis without polluting context

Example Use Cases:

✓ Code validation pipelines
✓ PR review and fix workflows
✓ Security audits
✓ Performance optimization
✓ Codebase refactoring

The Hybrid Architecture: Best of Both Worlds

For my documentation caching problem, neither approach alone was optimal:

  • Skills only: Great for access, but who updates the cache?
  • Sub-agents only: Great for validation, but can’t access docs in normal chat

Solution: Combine them.

The Architecture

.claude/
├── skills/
│   └── framework-docs/         # ← Skill: Knowledge access
│       ├── SKILL.md
│       ├── nextjs/standards.md
│       ├── fastapi/standards.md
│       └── anthropic/agent-standards.md
└── agents/
    ├── refresh-docs.md         # ← Sub-agent: Maintenance
    └── validate-all.md         # ← Sub-agent: Uses skill for validation

How It Works Together

Interactive Use (Skill shines):

User: "What's the Next.js data fetching pattern?"
AI: *Uses framework-docs skill*
AI: "Here's the pattern from cached docs..."
Time: Instant
Tokens: 3,000 (vs 8,000 with web search)

Automated Validation (Sub-agent + Skill):

User: "Validate my code against all standards"
Main AI: *Spawns validate-all sub-agent*
Sub-agent: *Loads framework-docs skill*
Sub-agent: *Validates in isolated context*
Sub-agent: *Returns condensed report*
Time: Fast (parallel processing)
Tokens: 2,500 in main context (vs 20,000 without sub-agent)

Maintenance (Sub-agent maintains Skill):

User: "Refresh framework docs" (monthly)
Main AI: *Spawns refresh-docs sub-agent*
Sub-agent: *Fetches latest from web*
Sub-agent: *Updates framework-docs skill cache*
Result: Skill now has fresh data for next conversation

Real-World Example: Documentation Caching

Let me show you the before/after of implementing this architecture.

Before: Web Search Hell

Conversation 1:
User: "How do I fetch data in Next.js?"
AI: *WebSearch: "Next.js data fetching 2025"*
AI: *Parses 5,000 tokens of search results*
AI: "Here's what I found..."
Cost: 8,000 tokens, 5 seconds

Conversation 2 (tomorrow):
User: "What's the Next.js App Router pattern?"
AI: *WebSearch again* (doesn't remember)
AI: *Parses results again*
AI: "Here's what I found..."
Cost: 8,000 tokens, 5 seconds

10 queries: 80,000 tokens, 50 seconds

After: Hybrid Architecture

One-Time Setup:

# Create skill structure
.claude/skills/framework-docs/
├── SKILL.md
└── nextjs/
    ├── standards.md
    └── patterns.md

# Create refresh agent
.claude/agents/refresh-docs.md

# Initial fetch (run once)
User: "Refresh framework docs"
AI: *Spawns refresh-docs sub-agent*
Sub-agent: *Fetches Next.js docs*
Sub-agent: *Saves to framework-docs skill*

Every Conversation After:

Conversation 1:
User: "How do I fetch data in Next.js?"
AI: *Loads framework-docs skill*
AI: *Reads nextjs/standards.md*
AI: "Here's the pattern from cached docs..."
Cost: 3,000 tokens, instant

Conversation 2 (tomorrow):
User: "What's the Next.js App Router pattern?"
AI: *Already has skill loaded*
AI: *Reads nextjs/patterns.md*
AI: "Here's the pattern..."
Cost: 1,000 tokens (skill already loaded), instant

10 queries: 12,000 tokens (85% reduction), instant

Monthly Maintenance:

User: "Refresh framework docs"
AI: *Spawns refresh-docs sub-agent*
Sub-agent: *Updates cache*
Result: Fresh data for next month

Performance Comparison

Let’s quantify the impact:

Token Usage (10 Queries)

ApproachTokensvs Baseline
Web search every time80,000Baseline
Skill (cached docs)12,00085% reduction
Sub-agent (batch validation)2,50097% reduction

Time (10 Queries)

ApproachTimevs Baseline
Web search every time50 secondsBaseline
Skill (cached docs)Instant100% faster
Sub-agent (parallel)5 seconds90% faster

Accuracy

ApproachConsistencyOffline
Web search❌ Varies by search ranking❌ No
Skill (cached docs)✅ Always same✅ Yes
Sub-agent✅ Uses skill data✅ Yes

Implementation Guide

Step 1: Create the Skill

.claude/skills/framework-docs/SKILL.md:

---
name: framework-docs
description: Access cached framework documentation
version: 1.0.0
---

# Framework Documentation Skill

This skill provides instant access to frequently-used framework documentation.

## Available Frameworks

- Next.js (App Router, Server Components, Data Fetching)
- FastAPI (Async patterns, Pydantic, Testing)
- Anthropic (Agent authoring best practices)

## Usage

Reference docs by framework:
- Next.js: See `nextjs/standards.md`
- FastAPI: See `fastapi/standards.md`
- Anthropic: See `anthropic/agent-standards.md`

## Last Updated

Check `last-updated.json` for cache freshness.

Step 2: Create the Refresh Sub-Agent

.claude/agents/refresh-docs.md:

---
name: refresh-docs
description: Update framework documentation cache
tools: Read, Write, WebFetch, WebSearch, Bash
---

You are a documentation caching specialist. Your job is to fetch the latest
framework documentation and update the framework-docs skill cache.

## Steps

1. Check current cache dates in `last-updated.json`
2. For each framework:
   - Fetch latest documentation from official sources
   - Extract best practices and patterns
   - Update corresponding markdown file
   - Update timestamp in `last-updated.json`
3. Report what was updated and cache freshness

## Sources

- Next.js: https://nextjs.org/docs
- FastAPI: https://fastapi.tiangolo.com
- Anthropic: https://docs.anthropic.com

## Output Format

Provide summary:
- ✅ Framework updated
- 📅 Last fetched date
- 📝 Major changes (if any)

Step 3: Create the Validation Sub-Agent

.claude/agents/validate-all.md:

---
name: validate-all
description: Validate code against all framework standards
tools: Read, Bash, Grep
---

You are a code validation specialist. Validate code against cached
framework standards from the framework-docs skill.

## Steps

1. Load the framework-docs skill
2. Identify which frameworks are used in the project
3. Read relevant standard files from the skill
4. Validate code against standards
5. Categorize issues by severity (Critical/Major/Minor)
6. Return condensed report with specific line numbers

## Output Format

**Framework**: Next.js
**Files Validated**: 12
**Issues Found**: 3

Critical:
- file.ts:42 - Using Pages Router instead of App Router

Major:
- file.ts:15 - Not using Server Components for data fetching

Minor:
- file.ts:8 - Could use parallel data fetching pattern

Step 4: Use It

# One-time setup
User: "Refresh framework docs"
AI: *Runs refresh-docs sub-agent*

# Interactive use (any conversation)
User: "What's the Next.js data fetching pattern?"
AI: *Uses framework-docs skill*
AI: "Here's the pattern..."

# Validation workflow
User: "Validate my Next.js code"
AI: *Spawns validate-all sub-agent*
AI: *Sub-agent uses framework-docs skill*
AI: "Here's your validation report..."

# Monthly maintenance
User: "Refresh framework docs"
AI: *Updates cache for next month*

Lessons Learned

1. Progressive Disclosure is Key

Don’t load everything into context. Use skills’ progressive disclosure:

❌ Bad: Load all 50,000 tokens of docs
✅ Good: Load overview (500 tokens), then section on demand (2,000 tokens)

2. Let Sub-Agents Maintain Skills

Skills provide capabilities. Sub-agents do work:

✅ Skill: framework-docs (provides knowledge)
✅ Sub-agent: refresh-docs (maintains the skill)
✅ Sub-agent: validate-all (uses the skill)

3. Hybrid Beats Solo

Neither skills nor sub-agents alone were optimal:

Skills only: Great access, but manual updates
Sub-agents only: Great workflows, but no persistent knowledge
Hybrid: Best of both (85%+ efficiency gains)

4. Version Your Cache

Track when docs were fetched:

{
  "nextjs": {
    "fetched": "2025-10-18",
    "version": "15.0",
    "ttl_days": 60
  }
}

5. Parallelize with Sub-Agents

Multi-agent systems are 90.2% better than single-agent:

Sequential: Validate TS → Validate Py → Validate Go (30s)
Parallel: Spawn 3 sub-agents simultaneously (10s, 3x faster)

When NOT to Use This Architecture

This hybrid approach isn’t always the answer:

Skip Skills If:

❌ Information changes constantly (don’t cache volatile data) ❌ One-time use (not worth setup overhead) ❌ Simple queries (web search is fine for rare lookups)

Skip Sub-Agents If:

❌ Task is trivial (single-step, no complexity) ❌ Need full context (sub-agents return summaries) ❌ Can’t parallelize (sequential dependencies)

Skip Hybrid If:

❌ Simple use case (don’t over-engineer) ❌ No maintenance needed (static documentation) ❌ Rare access (setup cost > benefit)


The Future: Agent Skills Ecosystem

Anthropic introduced skills in October 2025. This is just the beginning.

I predict we’ll see:

Skill Marketplaces:

- "react-native-docs" skill (community-maintained)
- "company-api-standards" skill (enterprise)
- "security-patterns" skill (security teams)

Skill Composition:

.claude/skills/
├── framework-docs/        # General docs
├── company-kb/            # Company-specific
└── project-patterns/      # Project-specific

AI: *Composes all three for comprehensive knowledge*

Skill Versioning:

framework-docs@2.0.0 (latest)
framework-docs@1.5.0 (stable)

Automated Skill Updates:

Sub-agent: refresh-docs runs automatically monthly
Skill: framework-docs always fresh without manual intervention

Conclusion

When I started this journey, I was frustrated by redundant web searches burning tokens and time.

The solution wasn’t choosing Skills OR Sub-Agents.

It was understanding:

  • Skills = Capabilities (persistent knowledge, tools)
  • Sub-Agents = Tasks (workflows, validation)
  • Hybrid = Synergy (skills provide knowledge, agents do work)

Results:

  • ✅ 85% reduction in tokens (80,000 → 12,000)
  • ✅ 100% faster responses (instant vs 5-10 seconds)
  • ✅ Offline capable (cached docs)
  • ✅ Consistent answers (not affected by search ranking)
  • ✅ Automated maintenance (refresh-docs sub-agent)

The Key Insight:

“Skills extend what your AI can do. Sub-agents delegate what your AI should do. Use skills for knowledge, sub-agents for work, and hybrid for comprehensive solutions.”

If you’re building AI-powered workflows and facing similar challenges, I hope this architecture saves you as much time and tokens as it saved me.


Resources


About the Author

Bishnu Bista is a software engineer building AI-powered developer tools. He maintains an open-source knowledge base following the ACE (Agentic Context Engineering) framework, documenting learnings from 5+ projects on agent-based workflows.

Connect: GitHub | Twitter


Discussion: What’s your experience with Skills vs Sub-Agents? Have you found other patterns that work well? Share in the comments below!

Updates: This post will be updated as Skills pattern matures and new best practices emerge. Last updated: October 18, 2025.