Episode	Podcast	Published	Duration	Status

AI & I

Best of the Pod: Claude Code - How Two Engineers Ship Like a Team of 15

November 19, 2025•3181•8,885 words•Dan Shipper

Description

If you’re using AI to just write code, you’re missing out.Two engineers at Every shipped six features, five bug fixes, and three infrastructure updates in one week—and they did it by designing workflo...

Summary

Two engineers at Every demonstrate how they ship like a team of 15 using Claude Code and AI agents. They've developed a compounding engineering workflow where each task makes the next easier—using AI for research, planning, and implementation while maintaining human oversight at critical decision points. The episode includes a detailed walkthrough of their Claude Code workflow, custom prompts that generate detailed implementation plans, and a tier ranking of all major AI coding assistants.

Jump to Topic

Introduction to Compounding Engineering with AI Agents

Kieran and Nityesh explain how they've transformed Cora's engineering workflow to feel like a 15-person team with just two engineers. They introduce the concept of 'compounding engineering' where each piece of work makes the next piece easier through systematic use of AI agents for research, workflows, and implementation.

•Two-person team shipping at 15-person velocity using AI agents that handle PRs, code reviews, and implementation
•Compounding engineering: each task completed makes subsequent tasks easier and faster
•AI should be used for everything beyond just coding—research, workflows, planning, and execution
•Recent shift from 'vibe coding' with Cursor/Windsurf to fully agentic workflows with Claude Code

Claude Code Deep Dive: Terminal-Based Agentic Coding

Detailed demonstration of Claude Code, Anthropic's terminal-based coding agent. Unlike traditional IDEs, it runs in the terminal with access to the entire computer, can execute commands, search the web, take screenshots, and integrate with GitHub. The team shows how they use it to query git history, generate product updates, and check project pipelines.

•Claude Code runs in terminal with extensive tool access: file system, command execution, web search, screenshots
•Can analyze git history to automatically generate weekly shipping reports (6 features, 5 bug fixes, 3 infrastructure updates shown)
•Simpler interface than Cursor/Windsurf—just a text box, but 10x more powerful due to Claude's capabilities
•Enables non-coding tasks like project management, status updates, and research alongside implementation

Custom Prompts for Automated Research and Issue Creation

The team reveals their workflow for creating detailed GitHub issues using custom Claude Code commands. They built a prompt that generates other prompts—taking voice-to-text feature descriptions and automatically researching best practices, analyzing the codebase, and creating comprehensive implementation plans with minimal human input.

•Built custom 'CCI' command that converts voice descriptions into detailed GitHub issues with research, requirements, and implementation steps
•Prompt creation process: started with ChatGPT, refined through Anthropic's prompt improver, then saved as reusable command
•Research phase includes: codebase analysis, web search for best practices, open source pattern review
•Human review checkpoint before issue creation ensures direction is correct before implementation begins

Preparing for AGI: Batch Issue Creation Strategy

Before Claude Opus 4 launched, the team spent two hours creating 20+ issues to prepare for the superior model. This strategic preparation allowed them to immediately leverage the new model's capabilities for maximum productivity, demonstrating forward-thinking workflow design.

•Created 20+ detailed issues in a two-hour jamming session before new model release
•Used ChatGPT with 'we just got AGI' prompt, then refined through Anthropic's prompt improver
•Prepared the system to maximize value from day one of new model availability
•Ran 6-7 research agents simultaneously during brainstorming sessions

Real-World Bug Fix: From Discovery to PR in Minutes

Live demonstration of debugging a production issue where a form wasn't being sent. Claude Code analyzed git history, identified the problematic code removal from 14 days ago, created a fix, generated a PR, and wrote a migration script—all with minimal human intervention and 'zero energy cost.'

•Identified missing form submissions by asking Claude Code to investigate 14-day-old changes
•Agent created checklist, searched codebase, found removed code, and proposed fix autonomously
•Generated both the fix PR and a migration script to backfill missing data
•Task that would take 30 minutes to 2 hours completed while multitasking on other work

The New Engineering Workflow: Parallel Agent Management

Description of the actual day-to-day workflow: running multiple Claude Code instances in parallel, using voice-to-text for all inputs, coding socially while on calls, and managing AI agents rather than writing code directly. The team hasn't touched Cursor or Windsurf in three weeks.

•Running multiple Claude Code instances simultaneously for different tasks
•Voice-to-text input for all commands using internal tool Monologue (unreleased Every product)
•Coding has become social—shipping features to production while on video calls
•Managing AI agents rather than writing code: 'let AI do the work, we're just managing the AI'

Longest Running Agent Competition and Model Capabilities

The team competes to see who can keep Claude Code running longest (Kieran's record: 25 minutes, Nityesh: 8 minutes). This demonstrates Opus 4's unprecedented autonomy—running complex, multi-step plans without intervention, a qualitative leap from previous agentic tools.

•Kieran's record: 25 minutes of autonomous agent execution
•Achieved through very long plans with comprehensive testing requirements
•Represents new level of agent autonomy—previous tools couldn't maintain quality over extended runs
•Model runs complete implementation, tests, and fixes without human intervention

Critical Principle: Fix Problems at the Lowest Value Stage

Nityesh shares the most important realization from 'High Output Management': catch errors at the earliest, lowest-cost stage. With AI's power to execute quickly, it's crucial to validate direction during planning before implementation, as mistakes compound rapidly with autonomous agents.

•Principle from Intel CEO's 'High Output Management': fix problems at lowest value stage
•Review and validate GitHub issue plans before asking Claude to implement
•AI acts as a lever—small directional errors at planning stage create massive problems at implementation
•Human review checkpoint prevents wasting agent time on wrong solutions

Making Planning Documents Engaging and Actionable

The team discusses how to make AI-generated planning documents less boring and more useful. Instead of traditional PRD format, they prompt for user stories, questions a good PM would ask, and concrete examples—making review more engaging while maintaining thoroughness.

•Traditional PRDs are boring—request user stories and PM questions instead
•Ask for two options and specific examples to make documents more engaging
•Shape research output during human review to add missing considerations
•Consider adding interview mode where agent asks clarifying questions before generating plans

Testing and Evals: Traditional Practices Still Critical

Despite AI's capabilities, traditional software practices remain essential. The team emphasizes smoke tests, automated testing, and evals (tests for prompts). Kieran demonstrates having Claude Code run evals 10 times, identify failures, and iteratively improve prompts until they pass consistently.

•Smoke tests and traditional testing remain critical for validating AI-generated code
•Evals are 'tests for prompts'—validate AI behavior across multiple runs
•Claude Code can run evals, analyze failures, and improve prompts autonomously
•Exploring visual testing: Figma MCP + Puppeteer screenshots for UI comparison

Comprehensive AI Coding Assistant Tier Rankings

Kieran ranks every major AI coding assistant based on extensive testing. Claude Code and AMP take S-tier, Cursor ranks A-tier, while Windsurf drops to C-tier for lacking Claude 4. The ranking reveals that model quality now matters more than IDE features, and different agents excel at different tasks.

•S-tier: Claude Code (best overall), AMP (great economics, dogfooded by developers)
•A-tier: Cursor (very good with Claude 4), Charlie (specifically for code reviews), Friday (between S and A, opinionated workflow)
•B-tier: Devin (setup complexity, code quality issues), Codex, Factory (enterprise-focused)
•C-tier: Windsurf (dropped from A-tier three weeks ago for not having Claude 4)
•Use different agents for different tasks: Friday for UI, Claude Code for research, Charlie for reviews

Integrating Human Experts with AI Workflows

Case study of bringing in an infrastructure expert for a 2-hour consultation. The team recorded the conversation, fed it to Claude to generate implementation issues, had the expert review, then used Claude Code to implement—compressing two weeks of work into hours while leveraging specialized expertise.

•Recorded 2-hour expert consultation, fed transcript to Claude for issue generation
•Expert reviewed AI-generated issues, provided iterations—faster than manual documentation
•Claude Code implemented expert-reviewed plans with human code review
•Compressed 2-week project into a few hours by combining human expertise with AI execution
•Works because GitHub/PR workflow allows both humans and agents to collaborate seamlessly

AI & I

Best of the Pod: Claude Code - How Two Engineers Ship Like a Team of 15

0:00 / 0:00

View original episode →

Summary

Jump to Topic

Introduction to Compounding Engineering with AI Agents

•Two-person team shipping at 15-person velocity using AI agents that handle PRs, code reviews, and implementation
•Compounding engineering: each task completed makes subsequent tasks easier and faster
•AI should be used for everything beyond just coding—research, workflows, planning, and execution
•Recent shift from 'vibe coding' with Cursor/Windsurf to fully agentic workflows with Claude Code

Claude Code Deep Dive: Terminal-Based Agentic Coding

•Claude Code runs in terminal with extensive tool access: file system, command execution, web search, screenshots
•Can analyze git history to automatically generate weekly shipping reports (6 features, 5 bug fixes, 3 infrastructure updates shown)
•Simpler interface than Cursor/Windsurf—just a text box, but 10x more powerful due to Claude's capabilities
•Enables non-coding tasks like project management, status updates, and research alongside implementation

Custom Prompts for Automated Research and Issue Creation

•Built custom 'CCI' command that converts voice descriptions into detailed GitHub issues with research, requirements, and implementation steps
•Prompt creation process: started with ChatGPT, refined through Anthropic's prompt improver, then saved as reusable command
•Research phase includes: codebase analysis, web search for best practices, open source pattern review
•Human review checkpoint before issue creation ensures direction is correct before implementation begins

Preparing for AGI: Batch Issue Creation Strategy

•Created 20+ detailed issues in a two-hour jamming session before new model release
•Used ChatGPT with 'we just got AGI' prompt, then refined through Anthropic's prompt improver
•Prepared the system to maximize value from day one of new model availability
•Ran 6-7 research agents simultaneously during brainstorming sessions

Real-World Bug Fix: From Discovery to PR in Minutes

•Identified missing form submissions by asking Claude Code to investigate 14-day-old changes
•Agent created checklist, searched codebase, found removed code, and proposed fix autonomously
•Generated both the fix PR and a migration script to backfill missing data
•Task that would take 30 minutes to 2 hours completed while multitasking on other work

The New Engineering Workflow: Parallel Agent Management

•Running multiple Claude Code instances simultaneously for different tasks
•Voice-to-text input for all commands using internal tool Monologue (unreleased Every product)
•Coding has become social—shipping features to production while on video calls
•Managing AI agents rather than writing code: 'let AI do the work, we're just managing the AI'

Longest Running Agent Competition and Model Capabilities

•Kieran's record: 25 minutes of autonomous agent execution
•Achieved through very long plans with comprehensive testing requirements
•Represents new level of agent autonomy—previous tools couldn't maintain quality over extended runs
•Model runs complete implementation, tests, and fixes without human intervention

Critical Principle: Fix Problems at the Lowest Value Stage

•Principle from Intel CEO's 'High Output Management': fix problems at lowest value stage
•Review and validate GitHub issue plans before asking Claude to implement
•AI acts as a lever—small directional errors at planning stage create massive problems at implementation
•Human review checkpoint prevents wasting agent time on wrong solutions

Making Planning Documents Engaging and Actionable

•Traditional PRDs are boring—request user stories and PM questions instead
•Ask for two options and specific examples to make documents more engaging
•Shape research output during human review to add missing considerations
•Consider adding interview mode where agent asks clarifying questions before generating plans

Testing and Evals: Traditional Practices Still Critical

•Smoke tests and traditional testing remain critical for validating AI-generated code
•Evals are 'tests for prompts'—validate AI behavior across multiple runs
•Claude Code can run evals, analyze failures, and improve prompts autonomously
•Exploring visual testing: Figma MCP + Puppeteer screenshots for UI comparison

Comprehensive AI Coding Assistant Tier Rankings

•S-tier: Claude Code (best overall), AMP (great economics, dogfooded by developers)
•A-tier: Cursor (very good with Claude 4), Charlie (specifically for code reviews), Friday (between S and A, opinionated workflow)
•B-tier: Devin (setup complexity, code quality issues), Codex, Factory (enterprise-focused)
•C-tier: Windsurf (dropped from A-tier three weeks ago for not having Claude 4)
•Use different agents for different tasks: Friday for UI, Claude Code for research, Charlie for reviews

Integrating Human Experts with AI Workflows

•Recorded 2-hour expert consultation, fed transcript to Claude for issue generation
•Expert reviewed AI-generated issues, provided iterations—faster than manual documentation
•Claude Code implemented expert-reviewed plans with human code review
•Compressed 2-week project into a few hours by combining human expertise with AI execution
•Works because GitHub/PR workflow allows both humans and agents to collaborate seamlessly

AI & I

Best of the Pod: Claude Code - How Two Engineers Ship Like a Team of 15

0:00 / 0:00

Best of the Pod: Claude Code - How Two Engineers Ship Like a Team of 15

Description

Summary

Jump to Topic

Introduction to Compounding Engineering with AI Agents

Claude Code Deep Dive: Terminal-Based Agentic Coding

Custom Prompts for Automated Research and Issue Creation

Preparing for AGI: Batch Issue Creation Strategy

Real-World Bug Fix: From Discovery to PR in Minutes

The New Engineering Workflow: Parallel Agent Management

Longest Running Agent Competition and Model Capabilities

Critical Principle: Fix Problems at the Lowest Value Stage

Making Planning Documents Engaging and Actionable

Testing and Evals: Traditional Practices Still Critical

Comprehensive AI Coding Assistant Tier Rankings

Integrating Human Experts with AI Workflows

Navigate

Chat with Episode

Summary

Jump to Topic

Introduction to Compounding Engineering with AI Agents

Claude Code Deep Dive: Terminal-Based Agentic Coding

Custom Prompts for Automated Research and Issue Creation

Preparing for AGI: Batch Issue Creation Strategy

Real-World Bug Fix: From Discovery to PR in Minutes

The New Engineering Workflow: Parallel Agent Management

Longest Running Agent Competition and Model Capabilities

Critical Principle: Fix Problems at the Lowest Value Stage

Making Planning Documents Engaging and Actionable

Testing and Evals: Traditional Practices Still Critical

Comprehensive AI Coding Assistant Tier Rankings

Integrating Human Experts with AI Workflows

Navigate

Chat with Episode