When I started working with Claude Code on November 12th, I had a vision: build a professional LaTeX collaboration platform for academic researchers. What I didn’t expect was to ship a production-ready MVP in 7 days with 166,432 lines of code, 62 merged PRs, and 6 major development phases.
This isn’t a story about AI replacing developers. It’s about augmented development: where human judgment guides architectural decisions while AI handles the mechanical complexity of implementation.
Here’s what I learned about the challenges, the fun, and the unexpected moments that make coding with AI feel like pair programming with a tireless, detail-obsessed colleague.
Day 1: The Pivot That Changed Everything
The Problem: Lossy Conversion Hell
The project started as a fork of arxiv-collaboration-platform. It had a hybrid editor with both Visual (ProseMirror) and LaTeX modes. The idea seemed clever: users could write in a Google Docs-like interface while the system maintained LaTeX under the hood.
Reality check: LaTeX → ProseMirror → LaTeX conversion is lossy.
% Input
\newcommand{\mycommand}[2]{#1 \textbf{#2}}
% After round-trip through ProseMirror
% ❌ GONE. Custom commands stripped. Formatting lost.
I spent hours debugging the parser, trying to preserve edge cases. Claude and I discussed the architecture:
Me: “Can we make the parser handle custom commands?” Claude: “We could add 200+ lines of custom parsing logic… but there’s a better question: Should we?”
That question led to docs/PIVOT_RATIONALE.md. A 271-line analysis of why fighting LaTeX is the wrong approach. The conclusion was brutal but correct:
“Academic researchers already know LaTeX. A visual editor adds complexity without solving their real problems: collaboration, import, and compilation speed.”
The Decision: Embrace LaTeX-First
November 12, 5:04 PM: We created PR #1 - “Remove ProseMirror and Visual Editor infrastructure”
- 1,800 lines of parser/serializer code
- 13 npm packages (prosemirror-*)
- 961 lines of LaTeX → JSON conversion logic
- 180 lines of visual decorations
Result: The codebase became 30% smaller. The architecture became 100% clearer.
★ Insight ─────────────────────────────────────
When building with AI, the hardest decisions aren't technical.
They're philosophical. Should we build this feature? AI can
implement either path perfectly. Your job is choosing the
*right* path.
─────────────────────────────────────────────────
Days 2-3: Rapid Iteration on Phases 1-4
Phase 1-3: Foundation in 24 Hours
With the pivot decided, we entered hyperdrive:
November 12 (Evening):
- Phase 1: Removed ProseMirror infrastructure ✅
- Phase 2: Simplified comment system to line-based anchors ✅
- Phase 3: Added KaTeX live preview for math rendering ✅
The Fun Part: Watching Claude write a 200-line CodeMirror plugin for math preview:
// apps/web/src/lib/codemirror/katex-plugin.ts
// Detects inline math ($...$), display math ($$...$$),
// and equation environments, renders with KaTeX
I provided the requirement: “Render LaTeX math inline without compilation.”
Claude implemented:
- Regex-based detection with overlap handling
- Error boundaries (
[Math Error: ...]for invalid LaTeX) - Performance optimization (viewport-based rendering)
- Custom theme with light blue highlights
Testing conversation:
Me: “It renders $x^2$ but breaks on $$\int_0^1$$”
Claude: “Display math checked after inline. Priority inversion. Fixing…”
Me: “Perfect! Now both work.”
Phase 4: Document Import Architecture
The Challenge: Support DOCX, PDF, Markdown, LaTeX, and ZIP imports with quality scoring.
The Fun: Watching Claude architect a converter registry pattern:
# services/latex-compiler/app/converters/__init__.py
CONVERTERS = {
'docx': convert_docx_to_latex,
'pdf': convert_pdf_to_latex,
'markdown': convert_markdown_to_latex,
'text': convert_text_to_latex
}
def get_converter(file_type: str) -> Converter:
if file_type not in CONVERTERS:
raise ValueError(f"Unsupported file type: {file_type}")
return CONVERTERS[file_type]
Before: 45-line if-elif chain in main.py
After: 11-line registry dispatcher
★ Insight ─────────────────────────────────────
AI excels at recognizing patterns you've used before and
applying them consistently. The registry pattern appeared
once in documentation. Claude applied it everywhere relevant.
─────────────────────────────────────────────────
The Markdown Formatting Bug
November 19: A user reported: “Markdown import shows everything in one line!”
The Investigation:
# BEFORE (markdown_converter.py didn't exist)
return {"latex_source": source, "quality_score": 95} # Fake score!
# AFTER (Phase 6.0 - Unified Converter Architecture)
# 1. Apply 8 cleanup rules (artifacts, split words, URLs)
# 2. Enhanced Pandoc (13+ flags matching DOCX quality)
# 3. Realistic quality scoring (60-100 based on issues)
# 4. Meaningful warnings (math, tables, code blocks)
The Fix: Created markdown_converter.py (284 lines) with proper cleanup pipeline.
Result: Markdown quality went from “fake 95” to “realistic 60-100” with helpful warnings.
What I loved: Claude didn’t just fix the bug. It established a pattern for all future converters with comprehensive documentation (ADDING_CONVERTERS.md, 558 lines).
Day 4-5: Real-Time Collaboration
The Challenge: Google Docs-Style Editing Without Conflicts
Requirement: Multiple users editing the same LaTeX document simultaneously, no “your changes conflict” dialogs.
The Solution: Yjs CRDT (Conflict-Free Replicated Data Type)
The Twist: Presence-gated activation.
// Only activate Yjs when 2+ users are editing
const shouldEnableCollaboration =
onlineUsers.length > 1 &&
(editorView === 'source' || editorView === 'split');
Why? Solo users don’t need real-time infrastructure overhead. This optimization keeps the editor lightweight (no WebSocket broadcasting, no CRDT synchronization) until collaboration is actually needed.
★ Insight ─────────────────────────────────────
AI can implement complex features like CRDTs, but humans
provide the insight: "Don't activate this unless needed."
That single constraint saved us from performance issues.
─────────────────────────────────────────────────
The ZIP Import Race Condition
November 19 (the day before finishing):
Bug Report: “I uploaded a ZIP file. Editor shows empty content.”
The Investigation (from docs/investigations/ZIP_IMPORT_RACE_CONDITION.md):
Timeline:
1. User opens project → Editor initialization starts (async)
2. User uploads ZIP → API extracts in ~500ms
3. Content reset runs BEFORE ytextRef.current is set
4. Effect returns early → Content never applied
5. User sees empty editor
Root Cause: Race condition between database commit and UI rendering.
The Solution: Pending reset queue pattern.
// Before: Silent failure
useEffect(() => {
if (!resetContent || !ytextRef.current) return; // ❌
// ... apply reset
}, [resetContent]);
// After: Queue and retry
const pendingResetRef = useRef<string | null>(null);
useEffect(() => {
if (!ytextRef.current) {
pendingResetRef.current = resetContent; // Queue it
return;
}
// Apply immediately or from queue
const content = resetContent || pendingResetRef.current;
if (content) {
ytextRef.current.delete(0, ytextRef.current.length);
ytextRef.current.insert(0, content);
pendingResetRef.current = null;
}
}, [resetContent, ytextRef.current]);
Testing Coverage: 8 scenarios (solo editing, collaborative, project switch, rapid imports, mid-session joins)
What I Learned: AI can write comprehensive tests if you describe the scenarios. I said “What edge cases should we test?” Claude listed 8 scenarios I hadn’t considered.
Day 6-7: Architecture for the Future
Phase 6: Microservices-Ready Refactoring
The Vision: As the platform scales, individual and organization features should be deployable independently.
The Challenge: Reorganize 1,000+ files without breaking anything.
The Strategy: Feature-based architecture with clear domain boundaries.
apps/web/src/features/
├── individual/ # Solo users (future: serverless)
├── organizations/ # Teams (future: dedicated servers)
└── shared/ # Common features (future: npm package)
The Process:
- Created directory structure
- Documented migration strategy (68 pages)
- Migrated auth as reference pattern
- Validated build (0 TypeScript errors)
The Fun Part: Claude generated README files for all 11 feature domains, each with:
- Purpose and components
- Boundary rules (what can/cannot import)
- Usage context (individual vs organization)
Example (from features/shared/collaboration/README.md):
# Real-Time Collaboration
## Purpose
Provides Yjs CRDT-based collaborative editing with presence
awareness and conflict-free synchronization.
## Boundary Rules
✅ Can import from: `shared/auth`, `shared/editor-core`
❌ Cannot import from: `individual/*`, `organizations/*`
## Performance
- Presence-gated activation (only when 2+ users)
- WebSocket connection pooling
- Automatic reconnection with exponential backoff
★ Insight ─────────────────────────────────────
AI documentation is only as good as your examples. We
established one pattern (auth README), and Claude replicated
it perfectly across 10 more domains.
─────────────────────────────────────────────────
The Challenges: Where AI Needed Guidance
1. Architectural Decisions
AI can’t decide: “Should we use Yjs or Operational Transform?”
Human provides context: “We’re targeting <20 concurrent editors per document initially. Yjs is simpler and has Next.js integration.”
AI implements: 467 lines of Yjs integration with presence tracking.
2. Schema Mismatches
Bug: Dashboard fails with “Invalid input: expected string, received undefined”
AI’s initial diagnosis: “Check your validation schema”
Human insight: “The schema expects organization_id, but we’re not selecting it in the query”
Fix: One line:
- const PROJECT_LIST_COLUMNS = 'id, title, ..., owner_id'
+ const PROJECT_LIST_COLUMNS = 'id, title, ..., organization_id'
Lesson: AI can find bugs, but humans understand why systems are designed certain ways.
3. Performance Trade-offs
Me: “Should we cache PDF compilations?”
Claude: “We could add Redis caching, but that adds complexity. What’s your latency target?”
Me: “Under 5 seconds for simple docs is fine for MVP.”
Claude: “Then defer caching until P95 latency >10s. YAGNI principle.”
Result: Simpler architecture, faster development.
The Fun: Moments That Made Me Smile
1. The Button Unification
We had two button components: button.tsx (Shadcn) and custom-button.tsx (legacy).
Claude’s analysis:
Found 21 files using custom-button:
- features/individual/dashboard/...
- features/shared/editor-core/...
Migration plan:
- primary → default
- text → ghost
- icon → size="icon" variant="ghost"
Result: Deleted custom-button.tsx, migrated 21 files, 0 visual regressions.
Why it was fun: Watching Claude systematically find and fix every usage, then validate with TypeScript.
2. The Converter Quality Score
Original markdown converter:
return {"quality_score": 95} # Totally fake!
After Phase 6:
def calculate_quality_score(warnings: List[str]) -> int:
"""
Realistic scoring based on conversion issues:
- No warnings: 90-100 (excellent)
- 1-2 warnings: 70-89 (good)
- 3-5 warnings: 50-69 (acceptable)
- 6+ warnings: <50 (poor, manual review needed)
"""
base_score = 100
for warning in warnings:
if 'math' in warning: base_score -= 15
elif 'table' in warning: base_score -= 10
elif 'image' in warning: base_score -= 5
else: base_score -= 3
return max(50, min(100, base_score))
Why it was fun: Watching Claude transform a fake metric into a useful quality indicator with clear reasoning.
3. The PR Validation Hook
We created a pre-PR validation hook that checks:
- Title format (
feat(phase-N): description) - Body structure (Summary, Changes, Testing)
- Commit message conventions
- Build validation
Result: CodeRabbit’s automated reviews became more useful because our PRs were consistently formatted.
Example feedback:
“Excellent PR structure! The separation of concerns in Phase 6 is well-documented. Minor suggestion: Add error boundary to OrganizationDashboard.”
Why it was fun: Teaching AI to teach AI. We set standards, Claude enforced them, CodeRabbit reviewed them.
The Statistics: What We Built
By the Numbers
| Metric | Value |
|---|---|
| Duration | 7 days (Nov 12-19) |
| Lines of Code | +166,432 / -85,977 |
| Files Touched | 1,829 |
| PRs Merged | 62 |
| Phases Completed | 6 major (18 sub-phases) |
| Features Shipped | 47 (tracked in PLAN.md) |
Code Quality
- TypeScript Errors: 0 (enforced pre-commit)
- ESLint Errors: 0 (automated fixes)
- Build Time: <30 seconds
- Bundle Size: <500KB (main chunk)
- Test Coverage: Integration tests for critical flows
Features Delivered
Core Platform:
- ✅ LaTeX editor with autocomplete (100+ commands)
- ✅ HTML preview (80+ LaTeX commands parsed)
- ✅ PDF compilation with error parsing
- ✅ Real-time collaboration (Yjs CRDT)
- ✅ Presence avatars and collaborative cursors
Import/Export:
- ✅ Document import (DOCX, PDF, Markdown, LaTeX, ZIP)
- ✅ Overleaf project import (ZIP with asset extraction)
- ✅ Export (LaTeX + PDF with validation)
- ✅ Quality scoring and conversion warnings
Collaboration:
- ✅ Comments with mentions and threading
- ✅ Role-based permissions (owner/editor/reviewer)
- ✅ Invitation system (email + real-time notifications)
- ✅ Version history (save/restore with labels)
Organization Features:
- ✅ Multi-tenant architecture
- ✅ Kanban board for topic management
- ✅ Member management with roles
- ✅ Join requests for public orgs
AI Features:
- ✅ LaTeX error fixing (Google Gemini 2.0)
- ✅ Autocomplete suggestions
- ✅ Structure validation with auto-fix
Lessons Learned: A Framework for AI-Augmented Development
1. Humans Decide, AI Implements
Good:
- “We’re building LaTeX-first. Remove ProseMirror.” ✅
- “Use Yjs for <20 concurrent users.” ✅
- “Defer Redis caching until latency >10s.” ✅
Bad:
- “Should we use microservices?” (Too vague)
- “Make it faster.” (No metrics)
- “Add collaboration.” (No requirements)
2. Documentation is Your Multiplier
We created:
ARCHITECTURE.md(620 lines) - System designPLAN.md(799 lines) - Phase-based roadmapCONVENTIONS.md(568 lines) - Code standardsCLAUDE.md(269 lines) - Development workflow
Result: Claude referenced these docs constantly, making decisions aligned with our architecture without me repeating context.
3. Validate Early, Validate Often
Every commit ran:
bun run build && bun run lint && bun run typecheck
Why: Catch errors immediately, not in production.
AI benefit: Claude could fix issues in seconds because validation was automated.
4. Embrace Iteration
Phase 1 Plan: “Remove ProseMirror in 2 weeks” Reality: “Removed in 1 day, then iterated on 5 more phases”
Lesson: AI accelerates iteration. What used to take weeks now takes days. Plan for rapid cycles, not waterfall.
5. Test the Scenarios You Didn’t Think Of
Me: “Test ZIP import works”
Claude: “8 scenarios to test:
- Solo editing - ZIP import
- Normal import (editor ready)
- Collaborative editing - no regression
- User joins mid-session
- Project switch - no contamination
- Multiple rapid imports
- Import during active collaboration
- Append mode - unaffected”
Me: “…I didn’t think of #4, #5, #6, #7.”
Lesson: AI can generate comprehensive test matrices if you ask “What could go wrong?”
Final Thoughts: The Human-AI Partnership
After 7 days and 166k lines of code, here’s what I know:
AI doesn’t replace developers. It amplifies them.
- I provided vision: “LaTeX-first for researchers”
- Claude provided execution: 62 PRs, 47 features, 0 TypeScript errors
- Together we debugged: Race conditions, schema mismatches, quality scoring
- Together we architected: Microservices boundaries, CRDT integration, converter patterns
The magic happens when you stop thinking of AI as a tool and start treating it as a thoughtful collaborator:
- It asks clarifying questions (“What’s your latency target?”)
- It challenges assumptions (“Should we make the parser handle this?”)
- It documents thoroughly (11 README files without being asked)
- It learns from patterns (registry pattern applied everywhere)
The best moments weren’t when Claude wrote perfect code on the first try (though that happened often). They were when we discovered solutions together:
- “What if we only activate Yjs when 2+ users?” (Presence-gated collaboration)
- “Can we queue pending resets?” (Race condition fix)
- “Should quality scores reflect reality?” (Converter scoring redesign)
The challenges taught me that AI augmentation requires new skills:
- Writing clear requirements (not just code)
- Explaining architectural context (not just tasks)
- Validating outputs systematically (not just visually)
- Thinking in patterns (not just implementations)
Would I build production software with AI again?
Absolutely. And I’d do it even faster next time.
Project Details
- Project: CoAuthor Papers - LaTeX Collaboration Platform
- Timeline: November 12-19, 2025 (7 days)
- Scale: 166k+ lines of code, 62 merged PRs, 6 major phases
- Tech Stack: Next.js, TypeScript, Supabase, Yjs, Python (LaTeX compiler)
- Repository: https://github.com/coauthorpapers/coauthorpapers
Key Documentation:
docs/PIVOT_RATIONALE.md- Why we chose LaTeX-firstdocs/ZIP_IMPORT_RACE_CONDITION.md- Debugging real-time issuesdocs/PHASE_6_REFACTORING.md- Microservices architectureservices/latex-compiler/docs/ADDING_CONVERTERS.md- Extensibility guide
Written by: Bishnu Bista With: Claude Code (Sonnet 4.5) Date: January 19, 2025
“The best code is the code you don’t have to write yourself.” 🤖 “The best architecture is the one you discover together.” 👨💻