Building Production-Ready Code with AI: A 7-Day Journey

When I started working with Claude Code on November 12th, I had a vision: build a professional LaTeX collaboration platform for academic researchers. What I didn’t expect was to ship a production-ready MVP in 7 days with 166,432 lines of code, 62 merged PRs, and 6 major development phases.

This isn’t a story about AI replacing developers. It’s about augmented development: where human judgment guides architectural decisions while AI handles the mechanical complexity of implementation.

Here’s what I learned about the challenges, the fun, and the unexpected moments that make coding with AI feel like pair programming with a tireless, detail-obsessed colleague.

Day 1: The Pivot That Changed Everything

The Problem: Lossy Conversion Hell

The project started as a fork of arxiv-collaboration-platform. It had a hybrid editor with both Visual (ProseMirror) and LaTeX modes. The idea seemed clever: users could write in a Google Docs-like interface while the system maintained LaTeX under the hood.

Reality check: LaTeX → ProseMirror → LaTeX conversion is lossy.

% Input
\newcommand{\mycommand}[2]{#1 \textbf{#2}}

% After round-trip through ProseMirror
% ❌ GONE. Custom commands stripped. Formatting lost.

I spent hours debugging the parser, trying to preserve edge cases. Claude and I discussed the architecture:

Me: “Can we make the parser handle custom commands?” Claude: “We could add 200+ lines of custom parsing logic… but there’s a better question: Should we?”

That question led to docs/PIVOT_RATIONALE.md. A 271-line analysis of why fighting LaTeX is the wrong approach. The conclusion was brutal but correct:

“Academic researchers already know LaTeX. A visual editor adds complexity without solving their real problems: collaboration, import, and compilation speed.”

The Decision: Embrace LaTeX-First

November 12, 5:04 PM: We created PR #1 - “Remove ProseMirror and Visual Editor infrastructure”

- 1,800 lines of parser/serializer code
- 13 npm packages (prosemirror-*)
- 961 lines of LaTeX → JSON conversion logic
- 180 lines of visual decorations

Result: The codebase became 30% smaller. The architecture became 100% clearer.

★ Insight ─────────────────────────────────────
When building with AI, the hardest decisions aren't technical.
They're philosophical. Should we build this feature? AI can
implement either path perfectly. Your job is choosing the
*right* path.
─────────────────────────────────────────────────

Days 2-3: Rapid Iteration on Phases 1-4

Phase 1-3: Foundation in 24 Hours

With the pivot decided, we entered hyperdrive:

November 12 (Evening):

Phase 1: Removed ProseMirror infrastructure ✅
Phase 2: Simplified comment system to line-based anchors ✅
Phase 3: Added KaTeX live preview for math rendering ✅

The Fun Part: Watching Claude write a 200-line CodeMirror plugin for math preview:

// apps/web/src/lib/codemirror/katex-plugin.ts
// Detects inline math ($...$), display math ($$...$$),
// and equation environments, renders with KaTeX

I provided the requirement: “Render LaTeX math inline without compilation.”

Claude implemented:

Regex-based detection with overlap handling
Error boundaries ([Math Error: ...] for invalid LaTeX)
Performance optimization (viewport-based rendering)
Custom theme with light blue highlights

Testing conversation:

Me: “It renders $x^2$ but breaks on $$\int_0^1$$” Claude: “Display math checked after inline. Priority inversion. Fixing…” Me: “Perfect! Now both work.”

Phase 4: Document Import Architecture

The Challenge: Support DOCX, PDF, Markdown, LaTeX, and ZIP imports with quality scoring.

The Fun: Watching Claude architect a converter registry pattern:

# services/latex-compiler/app/converters/__init__.py
CONVERTERS = {
    'docx': convert_docx_to_latex,
    'pdf': convert_pdf_to_latex,
    'markdown': convert_markdown_to_latex,
    'text': convert_text_to_latex
}

def get_converter(file_type: str) -> Converter:
    if file_type not in CONVERTERS:
        raise ValueError(f"Unsupported file type: {file_type}")
    return CONVERTERS[file_type]

Before: 45-line if-elif chain in main.py After: 11-line registry dispatcher

★ Insight ─────────────────────────────────────
AI excels at recognizing patterns you've used before and
applying them consistently. The registry pattern appeared
once in documentation. Claude applied it everywhere relevant.
─────────────────────────────────────────────────

The Markdown Formatting Bug

November 19: A user reported: “Markdown import shows everything in one line!”

The Investigation:

# BEFORE (markdown_converter.py didn't exist)
return {"latex_source": source, "quality_score": 95}  # Fake score!

# AFTER (Phase 6.0 - Unified Converter Architecture)
# 1. Apply 8 cleanup rules (artifacts, split words, URLs)
# 2. Enhanced Pandoc (13+ flags matching DOCX quality)
# 3. Realistic quality scoring (60-100 based on issues)
# 4. Meaningful warnings (math, tables, code blocks)

The Fix: Created markdown_converter.py (284 lines) with proper cleanup pipeline.

Result: Markdown quality went from “fake 95” to “realistic 60-100” with helpful warnings.

What I loved: Claude didn’t just fix the bug. It established a pattern for all future converters with comprehensive documentation (ADDING_CONVERTERS.md, 558 lines).

Day 4-5: Real-Time Collaboration

The Challenge: Google Docs-Style Editing Without Conflicts

Requirement: Multiple users editing the same LaTeX document simultaneously, no “your changes conflict” dialogs.

The Solution: Yjs CRDT (Conflict-Free Replicated Data Type)

The Twist: Presence-gated activation.

// Only activate Yjs when 2+ users are editing
const shouldEnableCollaboration =
  onlineUsers.length > 1 &&
  (editorView === 'source' || editorView === 'split');

Why? Solo users don’t need real-time infrastructure overhead. This optimization keeps the editor lightweight (no WebSocket broadcasting, no CRDT synchronization) until collaboration is actually needed.

★ Insight ─────────────────────────────────────
AI can implement complex features like CRDTs, but humans
provide the insight: "Don't activate this unless needed."
That single constraint saved us from performance issues.
─────────────────────────────────────────────────

The ZIP Import Race Condition

November 19 (the day before finishing):

Bug Report: “I uploaded a ZIP file. Editor shows empty content.”

The Investigation (from docs/investigations/ZIP_IMPORT_RACE_CONDITION.md):

Timeline:
1. User opens project → Editor initialization starts (async)
2. User uploads ZIP → API extracts in ~500ms
3. Content reset runs BEFORE ytextRef.current is set
4. Effect returns early → Content never applied
5. User sees empty editor

Root Cause: Race condition between database commit and UI rendering.

The Solution: Pending reset queue pattern.

// Before: Silent failure
useEffect(() => {
  if (!resetContent || !ytextRef.current) return; // ❌
  // ... apply reset
}, [resetContent]);

// After: Queue and retry
const pendingResetRef = useRef<string | null>(null);

useEffect(() => {
  if (!ytextRef.current) {
    pendingResetRef.current = resetContent; // Queue it
    return;
  }
  // Apply immediately or from queue
  const content = resetContent || pendingResetRef.current;
  if (content) {
    ytextRef.current.delete(0, ytextRef.current.length);
    ytextRef.current.insert(0, content);
    pendingResetRef.current = null;
  }
}, [resetContent, ytextRef.current]);

Testing Coverage: 8 scenarios (solo editing, collaborative, project switch, rapid imports, mid-session joins)

What I Learned: AI can write comprehensive tests if you describe the scenarios. I said “What edge cases should we test?” Claude listed 8 scenarios I hadn’t considered.

Day 6-7: Architecture for the Future

Phase 6: Microservices-Ready Refactoring

The Vision: As the platform scales, individual and organization features should be deployable independently.

The Challenge: Reorganize 1,000+ files without breaking anything.

The Strategy: Feature-based architecture with clear domain boundaries.

apps/web/src/features/
├── individual/          # Solo users (future: serverless)
├── organizations/       # Teams (future: dedicated servers)
└── shared/              # Common features (future: npm package)

The Process:

Created directory structure
Documented migration strategy (68 pages)
Migrated auth as reference pattern
Validated build (0 TypeScript errors)

The Fun Part: Claude generated README files for all 11 feature domains, each with:

Purpose and components
Boundary rules (what can/cannot import)
Usage context (individual vs organization)

Example (from features/shared/collaboration/README.md):

# Real-Time Collaboration

## Purpose
Provides Yjs CRDT-based collaborative editing with presence
awareness and conflict-free synchronization.

## Boundary Rules
✅ Can import from: `shared/auth`, `shared/editor-core`
❌ Cannot import from: `individual/*`, `organizations/*`

## Performance
- Presence-gated activation (only when 2+ users)
- WebSocket connection pooling
- Automatic reconnection with exponential backoff

★ Insight ─────────────────────────────────────
AI documentation is only as good as your examples. We
established one pattern (auth README), and Claude replicated
it perfectly across 10 more domains.
─────────────────────────────────────────────────

The Challenges: Where AI Needed Guidance

1. Architectural Decisions

AI can’t decide: “Should we use Yjs or Operational Transform?”

Human provides context: “We’re targeting <20 concurrent editors per document initially. Yjs is simpler and has Next.js integration.”

AI implements: 467 lines of Yjs integration with presence tracking.

2. Schema Mismatches

Bug: Dashboard fails with “Invalid input: expected string, received undefined”

AI’s initial diagnosis: “Check your validation schema”

Human insight: “The schema expects organization_id, but we’re not selecting it in the query”

Fix: One line:

- const PROJECT_LIST_COLUMNS = 'id, title, ..., owner_id'
+ const PROJECT_LIST_COLUMNS = 'id, title, ..., organization_id'

Lesson: AI can find bugs, but humans understand why systems are designed certain ways.

3. Performance Trade-offs

Me: “Should we cache PDF compilations?”

Claude: “We could add Redis caching, but that adds complexity. What’s your latency target?”

Me: “Under 5 seconds for simple docs is fine for MVP.”

Claude: “Then defer caching until P95 latency >10s. YAGNI principle.”

Result: Simpler architecture, faster development.

The Fun: Moments That Made Me Smile

1. The Button Unification

We had two button components: button.tsx (Shadcn) and custom-button.tsx (legacy).

Claude’s analysis:

Found 21 files using custom-button:
- features/individual/dashboard/...
- features/shared/editor-core/...

Migration plan:
- primary → default
- text → ghost
- icon → size="icon" variant="ghost"

Result: Deleted custom-button.tsx, migrated 21 files, 0 visual regressions.

Why it was fun: Watching Claude systematically find and fix every usage, then validate with TypeScript.

2. The Converter Quality Score

Original markdown converter:

return {"quality_score": 95}  # Totally fake!

After Phase 6:

def calculate_quality_score(warnings: List[str]) -> int:
    """
    Realistic scoring based on conversion issues:
    - No warnings: 90-100 (excellent)
    - 1-2 warnings: 70-89 (good)
    - 3-5 warnings: 50-69 (acceptable)
    - 6+ warnings: <50 (poor, manual review needed)
    """
    base_score = 100
    for warning in warnings:
        if 'math' in warning: base_score -= 15
        elif 'table' in warning: base_score -= 10
        elif 'image' in warning: base_score -= 5
        else: base_score -= 3
    return max(50, min(100, base_score))

Why it was fun: Watching Claude transform a fake metric into a useful quality indicator with clear reasoning.

3. The PR Validation Hook

We created a pre-PR validation hook that checks:

Title format (feat(phase-N): description)
Body structure (Summary, Changes, Testing)
Commit message conventions
Build validation

Result: CodeRabbit’s automated reviews became more useful because our PRs were consistently formatted.

Example feedback:

“Excellent PR structure! The separation of concerns in Phase 6 is well-documented. Minor suggestion: Add error boundary to OrganizationDashboard.”

Why it was fun: Teaching AI to teach AI. We set standards, Claude enforced them, CodeRabbit reviewed them.

The Statistics: What We Built

By the Numbers

Metric	Value
Duration	7 days (Nov 12-19)
Lines of Code	+166,432 / -85,977
Files Touched	1,829
PRs Merged	62
Phases Completed	6 major (18 sub-phases)
Features Shipped	47 (tracked in PLAN.md)

Code Quality

TypeScript Errors: 0 (enforced pre-commit)
ESLint Errors: 0 (automated fixes)
Build Time: <30 seconds
Bundle Size: <500KB (main chunk)
Test Coverage: Integration tests for critical flows

Features Delivered

Core Platform:

✅ LaTeX editor with autocomplete (100+ commands)
✅ HTML preview (80+ LaTeX commands parsed)
✅ PDF compilation with error parsing
✅ Real-time collaboration (Yjs CRDT)
✅ Presence avatars and collaborative cursors

Import/Export:

✅ Document import (DOCX, PDF, Markdown, LaTeX, ZIP)
✅ Overleaf project import (ZIP with asset extraction)
✅ Export (LaTeX + PDF with validation)
✅ Quality scoring and conversion warnings

Collaboration:

✅ Comments with mentions and threading
✅ Role-based permissions (owner/editor/reviewer)
✅ Invitation system (email + real-time notifications)
✅ Version history (save/restore with labels)

Organization Features:

✅ Multi-tenant architecture
✅ Kanban board for topic management
✅ Member management with roles
✅ Join requests for public orgs

AI Features:

✅ LaTeX error fixing (Google Gemini 2.0)
✅ Autocomplete suggestions
✅ Structure validation with auto-fix

Lessons Learned: A Framework for AI-Augmented Development

1. Humans Decide, AI Implements

Good:

“We’re building LaTeX-first. Remove ProseMirror.” ✅
“Use Yjs for <20 concurrent users.” ✅
“Defer Redis caching until latency >10s.” ✅

Bad:

“Should we use microservices?” (Too vague)
“Make it faster.” (No metrics)
“Add collaboration.” (No requirements)

2. Documentation is Your Multiplier

We created:

ARCHITECTURE.md (620 lines) - System design
PLAN.md (799 lines) - Phase-based roadmap
CONVENTIONS.md (568 lines) - Code standards
CLAUDE.md (269 lines) - Development workflow

Result: Claude referenced these docs constantly, making decisions aligned with our architecture without me repeating context.

3. Validate Early, Validate Often

Every commit ran:

bun run build && bun run lint && bun run typecheck

Why: Catch errors immediately, not in production.

AI benefit: Claude could fix issues in seconds because validation was automated.

4. Embrace Iteration

Phase 1 Plan: “Remove ProseMirror in 2 weeks” Reality: “Removed in 1 day, then iterated on 5 more phases”

Lesson: AI accelerates iteration. What used to take weeks now takes days. Plan for rapid cycles, not waterfall.

5. Test the Scenarios You Didn’t Think Of

Me: “Test ZIP import works”

Claude: “8 scenarios to test:

Solo editing - ZIP import
Normal import (editor ready)
Collaborative editing - no regression
User joins mid-session
Project switch - no contamination
Multiple rapid imports
Import during active collaboration
Append mode - unaffected”

Me: “…I didn’t think of #4, #5, #6, #7.”

Lesson: AI can generate comprehensive test matrices if you ask “What could go wrong?”

Final Thoughts: The Human-AI Partnership

After 7 days and 166k lines of code, here’s what I know:

AI doesn’t replace developers. It amplifies them.

I provided vision: “LaTeX-first for researchers”
Claude provided execution: 62 PRs, 47 features, 0 TypeScript errors
Together we debugged: Race conditions, schema mismatches, quality scoring
Together we architected: Microservices boundaries, CRDT integration, converter patterns

The magic happens when you stop thinking of AI as a tool and start treating it as a thoughtful collaborator:

It asks clarifying questions (“What’s your latency target?”)
It challenges assumptions (“Should we make the parser handle this?”)
It documents thoroughly (11 README files without being asked)
It learns from patterns (registry pattern applied everywhere)

The best moments weren’t when Claude wrote perfect code on the first try (though that happened often). They were when we discovered solutions together:

“What if we only activate Yjs when 2+ users?” (Presence-gated collaboration)
“Can we queue pending resets?” (Race condition fix)
“Should quality scores reflect reality?” (Converter scoring redesign)

The challenges taught me that AI augmentation requires new skills:

Writing clear requirements (not just code)
Explaining architectural context (not just tasks)
Validating outputs systematically (not just visually)
Thinking in patterns (not just implementations)

Would I build production software with AI again?

Absolutely. And I’d do it even faster next time.

Project Details

Project: CoAuthor Papers - LaTeX Collaboration Platform
Timeline: November 12-19, 2025 (7 days)
Scale: 166k+ lines of code, 62 merged PRs, 6 major phases
Tech Stack: Next.js, TypeScript, Supabase, Yjs, Python (LaTeX compiler)
Repository: https://github.com/coauthorpapers/coauthorpapers

Key Documentation:

docs/PIVOT_RATIONALE.md - Why we chose LaTeX-first
docs/ZIP_IMPORT_RACE_CONDITION.md - Debugging real-time issues
docs/PHASE_6_REFACTORING.md - Microservices architecture
services/latex-compiler/docs/ADDING_CONVERTERS.md - Extensibility guide

Written by: Bishnu Bista With: Claude Code (Sonnet 4.5) Date: January 19, 2025

“The best code is the code you don’t have to write yourself.” 🤖 “The best architecture is the one you discover together.” 👨‍💻