Episode	Podcast	Published	Duration	Status

AI & I

Cognition’s CEO on What Comes After Code

September 24, 2025•3202•11,663 words•Dan Shipper

Description

The future has a way of showing up early to some places. In software engineering, one of those places is Cognition—the startup that made headlines in early 2024 with Devin, the world’s first autonomou...

Summary

Scott Wu, CEO of Cognition (makers of Devin), discusses the transformation of software engineering through AI agents. He argues that AGI capabilities are already here in many practical senses, and explores how programming is shifting from hands-on coding to orchestrating autonomous agents. The conversation covers the technical architecture decisions behind Devin, the trade-offs between synchronous IDE experiences versus asynchronous agents, and why reinforcement learning environments are crucial for post-training coding models.

Jump to Topic

AGI is Already Here: Redefining Intelligence Benchmarks

Wu argues that by historical standards, AGI has effectively arrived—models can pass the Turing test, solve IMO/IOI problems, and interact with the real world. He challenges the notion that AGI requires automating 80% of knowledge work, pointing out that humans always focus on the remaining unautomated tasks, making it an impossible moving target.

•Models already pass Turing tests, win gold medals at IMO/IOI, and interact with real-world systems
•The '80% of knowledge work' definition is flawed because knowledge work constantly evolves as automation advances
•Historical perspective: tractors and basic automation already eliminated 80%+ of work from 1000 years ago
•AGI definitions will keep shifting as humans remain proud of whatever work remains unautomated

Alternative AGI Definition: Economic Autonomy and Always-On Agents

Discussion of a child development-based AGI definition: measuring how long AI can operate autonomously without supervision. The threshold is when it becomes economically profitable to never turn AI off—similar to how children gradually require less supervision until full independence.

•AGI defined by autonomous operation time: from tab-complete (seconds) to 10-15 minute agent runs to always-on
•Parallel to child development: infants need constant supervision, teenagers can be alone for days, adults are fully autonomous
•Economic profitability of continuous operation is the key threshold, not arbitrary capability percentages
•Current trajectory shows smooth lengthening of the 'leash' similar to human development patterns

Building Cognition: From Exploration to Commitment

Wu shares his journey from running Lunch Club to founding Cognition, emphasizing the 'leave it all on the field' mentality. He describes how Cognition started as friends exploring AI coding ideas rather than deliberately building a company, and how the decision to commit came from wanting to avoid wondering 'what if.'

•Previous experience running Lunch Club for 5 years before starting Cognition
•Philosophy: better to try and fail than to never try and always wonder
•Started as exploration with friends who were 'just nerds' interested in AI coding
•Most important factor: working with people you really like, since you'll spend most of your time working
•Would still consider the journey worthwhile even if everything collapsed tomorrow

The Future of Programming: From Bricklayers to Architects

Wu draws parallels to calculator protests by teachers, arguing that AI won't eliminate programming but will shift it toward higher-level thinking. Engineers will focus on logical fundamentals, problem decomposition, and architecture rather than debugging Kubernetes or memorizing syntax.

•Historical parallel: teachers protested calculators, yet math education survived and thrived
•New engineering education path: deep logical understanding, problem breakdown, strategic trade-offs, architecture design
•Less focus on: debugging infrastructure, obscure libraries, esoteric syntax memorization
•Computer science fundamentals become MORE valuable, not less, as you're the one making strategic decisions
•Analogy: turning bricklayers into architects—the best architects are still deeply technical

Eliminating the Hazing Period: Direct Access to Interesting Work

Discussion of how AI removes the traditional 'hazing' period in professions where juniors spend years on boring work before graduating to interesting tasks. With AI handling boilerplate, new engineers can immediately work on meaningful problems—like an 'officer's school' approach.

•Traditional path: spend first few years on boring work (spreadsheets in banking, boilerplate in coding) before interesting tasks
•AI enables 'officer's school' model: go straight to learning interesting things
•With Devin/ChatGPT, you can build something meaningful on your first prompt instead of spending 6 months learning while loops
•The boring work still needs to be done—but now Devin does it instead of junior engineers
•Removes unintentional barriers to entry while maintaining quality through AI assistance

The Spectrum of AI Coding Tools: Sync to Async

Wu maps out the landscape of AI coding tools as a spectrum from synchronous (tab complete) to fully asynchronous (autonomous agents). He predicts this spectrum will exist for ~3 years before everything becomes dictation to agents, and explains how different tasks require different points on this spectrum.

•Spectrum ranges from tab complete (most synchronous) to full autonomous agents (most asynchronous)
•Claude Code sits in the middle: more synchronous than Devin, more agentic than tab complete
•Next 3 years will maintain this spectrum; beyond that, everything becomes agent dictation
•Different tasks require different form factors: brainstorming needs sync, implementation can be async
•The suite of tools should cover the full spectrum because use cases vary significantly

Why Claude Code Succeeded: Full Send to Agentic Engineering

Analysis of Claude Code's rapid adoption as the first mainstream tool to go 'full send' on agentic engineering. Wu argues the CLI form factor matters less than the paradigm shift of handing full control to AI, and discusses the tight coupling between model capabilities and correct interface design.

•Claude Code was first to fully commit to 'AI is you' paradigm rather than 'AI augments you'
•Anthropic put significant effort into the experience, not just the CLI form factor
•Running on local machine enables bash commands, making it more extendable than cloud environments
•Tight dependency between model capability and correct interface—3.5 era couldn't support this
•Form factor matters less than the paradigm: Devin uses Slack, Claude uses CLI, both are valid 'software engineer' interfaces

Devin's Unique Position: Persistent Cloud Environments

Wu explains Devin's differentiation: a persistent agent with its own cloud computer that can be onboarded like a teammate. This enables unique capabilities like autonomous testing and learning company-specific workflows, though it requires more upfront onboarding than synchronous tools.

•Devin lives on its own persistent computer in the cloud, accessible via Slack like a teammate
•Can learn to test things and run all tests autonomously—capabilities other tools can't match
•Requires onboarding similar to hiring a software engineer, but then operates independently
•Windsurf acquisition provides faster time-to-value for users not ready for full async agents
•Natural transition path: start with Windsurf (sync), learn to work with cascades, graduate to Devin (async)

Post-Training Strategy: Custom RL Environments Over Base Models

Wu defends Cognition's decision to focus on post-training rather than pre-training base models. He argues that teaching models real-world software engineering nuances (like Datadog debugging workflows) fits naturally into post-training and represents their core competitive advantage.

•Cognition does extensive fine-tuning and RL but doesn't pre-train base models
•Post-training is ideal for teaching specific workflows: confidence prediction, Datadog log debugging, COBOL support
•Startup edge must be speed and focus—their DNA is understanding real-world software engineering nuances
•Building custom RL environments for hundreds of specific tasks (Grafana setup, version conflicts, etc.)
•The platonic ideal of RL: solve any benchmark—then the question becomes which benchmarks matter

RL Environment Design: Generalizability Through Real-World Interaction

Deep dive into how Cognition designs RL environments to be generalizable rather than brittle. The key is training agents to interact with the real world (Google docs, read logs, understand errors) rather than memorize specific package versions or configurations.

•Example: Grafana eval requires finding version conflicts, reading error logs, downgrading packages—real debugging workflow
•Reward function is simple: 'What does the dashboard say?'—can only answer if you successfully completed all steps
•Generalizability comes from teaching interaction patterns (Google, read docs, check logs) not memorization
•Humans figure things out by interacting with the real world, not pulling from memory—agents should too
•Curated environments provide tighter feedback signals than continual learning, making current RL more practical

The Missing Personal Agent: Beyond Coding Applications

Wu expresses surprise that mass-market personal agents haven't emerged yet despite capabilities being ready. He reveals Cognition uses Devin internally for Amazon ordering, suggesting the agent infrastructure exists but the right consumer form factor hasn't been built.

•Capabilities exist for personal agents (Operator, Devin) but mass consumer adoption hasn't happened
•Use cases beyond flight booking: dentist appointments, package delivery tracking, Amazon reordering, restaurant reservations
•Cognition orders all Amazon packages through Devin—proves the capability exists today
•Slack/Linear isn't the right form factor for consumer agents—someone needs to build the right interface
•Prediction: expects this to exist within 12 months given current capabilities

AI & I

Cognition’s CEO on What Comes After Code

0:00 / 0:00

View original episode →

Summary

Jump to Topic

AGI is Already Here: Redefining Intelligence Benchmarks

•Models already pass Turing tests, win gold medals at IMO/IOI, and interact with real-world systems
•The '80% of knowledge work' definition is flawed because knowledge work constantly evolves as automation advances
•Historical perspective: tractors and basic automation already eliminated 80%+ of work from 1000 years ago
•AGI definitions will keep shifting as humans remain proud of whatever work remains unautomated

Alternative AGI Definition: Economic Autonomy and Always-On Agents

•AGI defined by autonomous operation time: from tab-complete (seconds) to 10-15 minute agent runs to always-on
•Parallel to child development: infants need constant supervision, teenagers can be alone for days, adults are fully autonomous
•Economic profitability of continuous operation is the key threshold, not arbitrary capability percentages
•Current trajectory shows smooth lengthening of the 'leash' similar to human development patterns

Building Cognition: From Exploration to Commitment

•Previous experience running Lunch Club for 5 years before starting Cognition
•Philosophy: better to try and fail than to never try and always wonder
•Started as exploration with friends who were 'just nerds' interested in AI coding
•Most important factor: working with people you really like, since you'll spend most of your time working
•Would still consider the journey worthwhile even if everything collapsed tomorrow

The Future of Programming: From Bricklayers to Architects

•Historical parallel: teachers protested calculators, yet math education survived and thrived
•New engineering education path: deep logical understanding, problem breakdown, strategic trade-offs, architecture design
•Less focus on: debugging infrastructure, obscure libraries, esoteric syntax memorization
•Computer science fundamentals become MORE valuable, not less, as you're the one making strategic decisions
•Analogy: turning bricklayers into architects—the best architects are still deeply technical

Eliminating the Hazing Period: Direct Access to Interesting Work

•Traditional path: spend first few years on boring work (spreadsheets in banking, boilerplate in coding) before interesting tasks
•AI enables 'officer's school' model: go straight to learning interesting things
•With Devin/ChatGPT, you can build something meaningful on your first prompt instead of spending 6 months learning while loops
•The boring work still needs to be done—but now Devin does it instead of junior engineers
•Removes unintentional barriers to entry while maintaining quality through AI assistance

The Spectrum of AI Coding Tools: Sync to Async

•Spectrum ranges from tab complete (most synchronous) to full autonomous agents (most asynchronous)
•Claude Code sits in the middle: more synchronous than Devin, more agentic than tab complete
•Next 3 years will maintain this spectrum; beyond that, everything becomes agent dictation
•Different tasks require different form factors: brainstorming needs sync, implementation can be async
•The suite of tools should cover the full spectrum because use cases vary significantly

Why Claude Code Succeeded: Full Send to Agentic Engineering

•Claude Code was first to fully commit to 'AI is you' paradigm rather than 'AI augments you'
•Anthropic put significant effort into the experience, not just the CLI form factor
•Running on local machine enables bash commands, making it more extendable than cloud environments
•Tight dependency between model capability and correct interface—3.5 era couldn't support this
•Form factor matters less than the paradigm: Devin uses Slack, Claude uses CLI, both are valid 'software engineer' interfaces

Devin's Unique Position: Persistent Cloud Environments

•Devin lives on its own persistent computer in the cloud, accessible via Slack like a teammate
•Can learn to test things and run all tests autonomously—capabilities other tools can't match
•Requires onboarding similar to hiring a software engineer, but then operates independently
•Windsurf acquisition provides faster time-to-value for users not ready for full async agents
•Natural transition path: start with Windsurf (sync), learn to work with cascades, graduate to Devin (async)

Post-Training Strategy: Custom RL Environments Over Base Models

•Cognition does extensive fine-tuning and RL but doesn't pre-train base models
•Post-training is ideal for teaching specific workflows: confidence prediction, Datadog log debugging, COBOL support
•Startup edge must be speed and focus—their DNA is understanding real-world software engineering nuances
•Building custom RL environments for hundreds of specific tasks (Grafana setup, version conflicts, etc.)
•The platonic ideal of RL: solve any benchmark—then the question becomes which benchmarks matter

RL Environment Design: Generalizability Through Real-World Interaction

•Example: Grafana eval requires finding version conflicts, reading error logs, downgrading packages—real debugging workflow
•Reward function is simple: 'What does the dashboard say?'—can only answer if you successfully completed all steps
•Generalizability comes from teaching interaction patterns (Google, read docs, check logs) not memorization
•Humans figure things out by interacting with the real world, not pulling from memory—agents should too
•Curated environments provide tighter feedback signals than continual learning, making current RL more practical

The Missing Personal Agent: Beyond Coding Applications

•Capabilities exist for personal agents (Operator, Devin) but mass consumer adoption hasn't happened
•Use cases beyond flight booking: dentist appointments, package delivery tracking, Amazon reordering, restaurant reservations
•Cognition orders all Amazon packages through Devin—proves the capability exists today
•Slack/Linear isn't the right form factor for consumer agents—someone needs to build the right interface
•Prediction: expects this to exist within 12 months given current capabilities

AI & I

Cognition’s CEO on What Comes After Code

0:00 / 0:00

Cognition’s CEO on What Comes After Code

Description

Summary

Jump to Topic

AGI is Already Here: Redefining Intelligence Benchmarks

Alternative AGI Definition: Economic Autonomy and Always-On Agents

Building Cognition: From Exploration to Commitment

The Future of Programming: From Bricklayers to Architects

Eliminating the Hazing Period: Direct Access to Interesting Work

The Spectrum of AI Coding Tools: Sync to Async

Why Claude Code Succeeded: Full Send to Agentic Engineering

Devin's Unique Position: Persistent Cloud Environments

Post-Training Strategy: Custom RL Environments Over Base Models

RL Environment Design: Generalizability Through Real-World Interaction

The Missing Personal Agent: Beyond Coding Applications

Navigate

Chat with Episode

Summary

Jump to Topic

AGI is Already Here: Redefining Intelligence Benchmarks

Alternative AGI Definition: Economic Autonomy and Always-On Agents

Building Cognition: From Exploration to Commitment

The Future of Programming: From Bricklayers to Architects

Eliminating the Hazing Period: Direct Access to Interesting Work

The Spectrum of AI Coding Tools: Sync to Async

Why Claude Code Succeeded: Full Send to Agentic Engineering

Devin's Unique Position: Persistent Cloud Environments

Post-Training Strategy: Custom RL Environments Over Base Models

RL Environment Design: Generalizability Through Real-World Interaction

The Missing Personal Agent: Beyond Coding Applications

Navigate

Chat with Episode