| Episode | Status |
|---|---|
| Episode | Status |
|---|---|
This episode features Olivier Godement, Head of Product for Business Products at OpenAI, discussing the current state and future of AI adoption in enterprises, with a particular focus on the recent re...
Olivier Godement, Head of Product for Enterprise at OpenAI, discusses the current state of AI adoption in enterprises, focusing on GPT-5.1 and Codex releases. He reveals that while complete job automation remains challenging, specific domains like coding, customer support, and life sciences are reaching tipping points. Companies like Amgen are using AI to compress drug development timelines from months to weeks through automated regulatory documentation. The conversation explores the critical importance of scaffolding, harnesses, and evaluation frameworks, with Olivier predicting that continuous learning capabilities and standardized agent architectures will be the next major unlock for enterprise AI adoption.
Discussion of the new GPT-5.1 and Codex models, focusing on how 5.1 addressed speed concerns while maintaining intelligence by compressing thinking tokens. Codex has achieved remarkable adoption internally at OpenAI, with engineers pushing 70% more PRs. The models represent iterative improvements based on user feedback from GPT-5.
Olivier identifies scientific research as a surprising breakthrough area, with researchers using LLMs to aggregate literature and accelerate hypothesis testing. A physicist used GPT-5 Pro to reproduce weeks of mathematical work in 30 minutes. Amgen is using AI to compress drug development timelines from months to weeks by automating regulatory documentation, representing a massive opportunity in pharma.
Response to Andre Karpathy's comments about agents being a decade away. While complete job automation is hard, specific fields like coding are reaching automation tipping points. Success requires extensive scaffolding, harnesses, evaluation frameworks, and human-in-the-loop feedback systems. T-Mobile and other enterprises are achieving meaningful scale in customer support.
Discussion of when OpenAI works directly with enterprises versus enabling the ecosystem. The complexity and depth of enterprise problems is enormous - no single company can solve everything. OpenAI is building Apps in ChatGPT to enable third-party integrations, allowing startups to benefit from ChatGPT's adoption and memory while building specialized features.
Vision for ChatGPT becoming the first place enterprise workers check each morning. The Pulse feature has become transformative, preparing daily briefings with relevant emails, meetings, and papers. While ChatGPT won't replace every tool, it's becoming the central hub for productivity information and simple actions.
Olivier operates on three time horizons: current capabilities, near-term post-training improvements, and fundamental breakthroughs. Continuous learning is identified as the next critical frontier - enabling models to update weights based on inference-time human feedback, like hiring an intern who learns on the job rather than requiring everything to be documented upfront.
Core PMF categories remain coding, customer support, finance, and life sciences - but these are gigantic markets with unknown total addressable market due to cheap software creation. Strategy shifting from 'spray and pray' to doubling down on proven markets and going deeper. Customer support evolution from tier-one tickets to revenue generation through personalization.
Current scaffolding is highly bespoke across use cases, with teams trying various approaches (single/multiple agents, deterministic gates). No standard agent architecture has emerged yet, but OpenAI is working toward it. Code and coding capabilities are emerging as the most general-purpose capability, suggesting computer access via code execution will become standard.
OpenAI has reduced GPT-4 level query costs by 1-2 orders of magnitude in 2-3 years through model compression, better hardware, and GPU networking. High-stakes use cases like coding already have working economics, but many use cases (like personalized website content) are blocked by cost and latency. Every price cut reveals untapped demand larger than the revenue impact.
RFT not yet widely adopted - most enterprises still catching up to base model capabilities. Innovators at the frontier use RFT when blocked by base models. Example: accounting software achieved 20-30% improvement with few dozen high-quality samples, crossing threshold from 'not valuable' to 'valuable'. OpenAI released first RFT API but it remains heavy-handed and time-consuming.
Three main factors drive model selection: capabilities/behavior, cost/latency, and 'vibes' (Twitter/influencer sentiment). Academic benchmarks provide limited value for specific use cases - industry-specific benchmarks like TauBench emerging. Best teams develop strong qualitative taste for model nuances, similar to expertise in writing or painting. Industry reinventing Gartner-style trust mechanisms.
Days of simple API parameter swaps are gone for non-trivial use cases. Model idiosyncrasies across providers are increasingly distinct - different instruction formats, tool signatures, context handling. Even sophisticated startups struggle with frequent updates. Enterprises want predictable release cadences with clear changelogs, similar to traditional software versioning.
GPT-4O in May/June was second major AGI breakthrough after ChatGPT, enabling tone and emotion understanding. However, voice hasn't crossed the Turing test yet - interruptions and cadence still feel unnatural. Next frontier is achieving naturalness where users are equally comfortable with AI as humans. Strong adoption in multilingual customer support where staffing every language is impossible.
Codex team exemplifies small, talented team singularly focused on use case. Current models excel at code generation, but software engineering involves much more: on-call, communication, scoping, architecture decisions, API duplication. Collaboration capabilities represent the next major unlock. Many enterprises still stuck on GitHub Copilot V1 due to security/compliance hurdles.
Increasingly difficult to separate model quality from harness quality - best agents have models trained for specific harnesses. OpenAI open-sourcing reference harnesses and tool definitions (like Codex) to enable ecosystem adoption. Industry evolving from pure model inference APIs to providing models + harnesses + reference UI designs as complete blueprints.
Most enterprises will buy harnesses/solutions rather than build, except for core business use cases. Building production-grade agents requires enormous effort. Critical success factors: clean data infrastructure, rigorous evaluation frameworks, and proper change management. Most enterprise knowledge exists in people's heads, not documentation, making eval creation an iterative people-finding process.
SORA API seeing strong adoption in ads/content generation and production studios. Production companies using it to quickly visualize concepts for brainstorming - showing 30-second visualizations of ideas accelerates creative collaboration. Video generation still in early innings due to cost and speed, but clear path to transforming creative workflows.
Scientific discovery and drug design identified as most underhyped opportunity. While software use cases feel natural to tech workers, accelerating scientific discovery is the substrate of all progress. Even 5% acceleration in discovery rate would have enormous compounding effects on economy and technology. Requires intersection of LLMs, lab infrastructure, data, and domain experts.
Ep 79: OpenAI's Head of Product on How the Best Teams Build, Ship and Scale AI Products
Ask me anything about this podcast episode...
Try asking: