| Episode | Status |
|---|---|
| Episode | Status |
|---|---|
From the frontlines of OpenAI's Codex and GPT-5 training teams, Bryan and Bill are building the future of AI-powered coding—where agents don't just autocomplete, they architect, refactor, and ship ent...
Bryan Fioca and Bill Chen from OpenAI's Codex and GPT-5 training teams discuss the launch of Codex Max, a long-running coding agent that can work for 24+ hours. They reveal how personality traits like communication, planning, and tool preferences are trained into models, why Codex is optimized for specific tools (it literally prefers 'rg' over 'grep'), and how the abstraction layer is moving from models to agents. The conversation covers multi-agent architectures, the importance of real-world evals, and a vision where coding agents become trusted teammates capable of handling complex refactors and integrations autonomously.
Deep dive into how OpenAI trains GPT-5 and Codex with specific behavioral characteristics beyond raw capability. The team focuses on personality traits like communication (keeping users informed), planning (strategy before execution), and checking work—essentially teaching models software engineering best practices as behaviors that can be measured and graded.
Codex develops specific tool preferences during training, similar to human habits. The model performs better with 'rg' (ripgrep) than 'grep' because of training patterns. Partners discovered they can improve tool call performance by naming custom tools the same way as terminal tools Codex was trained on, revealing how models can be 'bent' in unexpected ways.
Clarification of the difference between Codex (frontier coding model optimized for its specific harness) and GPT-5 mainline models (more general, steerable across tools). Codex comes with firm opinions on implementation, which some partners appreciate, while GPT-5 offers broader flexibility for custom integrations.
Major trend: the abstraction layer is moving from the model level to the agent level. Instead of optimizing for every model release, developers can now plug in complete agents like Codex into platforms. This enables agents to use other agents—for example, a chatbot spawning a Codex instance to write custom plugins or integrations on demand.
Introduction to Codex Max, which can run for 24+ hours (tested for multiple days), manages its own context window indefinitely, and is designed to spawn sub-agents for parallel work. The 'Max' name reflects both speed and maximization—it's faster at solving problems while also capable of extended runs.
OpenAI's Applied Evals team focuses on capturing real-world use cases beyond academic benchmarks like SWE-bench. The approach treats models like PhD students who need job descriptions (prompts), mentorship, and performance reviews. Multi-turn evals are emerging as critical, with techniques like LLM-as-judge for entire trajectories and 'job interview evals' that test underspecified problem-solving.
Coding agents are breaking out of pure software development into general personal automation. Before GUIs, all computer interaction was through terminals and code—coding agents are essentially computer-use agents for the terminal. Use cases include email management, file organization, video clip extraction, and any task that can be automated through CLI tools.
Looking ahead to 2026: coding agents will become vision-native to handle applications without APIs, enabling integration with legacy systems through UI automation. The ultimate goal is democratizing access to elite-level development capabilities—every team, from small dev shops to major firms, should have access to the kind of technical expertise currently only available at top-tier companies.
⚡️GPT5-Codex-Max: Training Agents with Personality, Tools & Trust — Brian Fioca + Bill Chen, OpenAI
Ask me anything about this podcast episode...
Try asking: