Episode	Podcast	Published	Duration	Status

The a16z Show

How OpenAI Builds for 800 Million Weekly Users: Model Specialization and Fine-Tuning

November 28, 2025•53m•12,813 words•Sherwin Wu, Martin Casado

Description

In this episode, a16z GP Martin Casado sits down with Sherwin Wu, Head of Engineering for the OpenAI Platform, to break down how OpenAI organizes its platform across models, pricing, and infrastructur...

Summary

Sherwin Wu, Head of Engineering for OpenAI's Platform, discusses how OpenAI has evolved from pursuing a single general-purpose model to building a portfolio of specialized models and customization tools. The conversation covers OpenAI's unique dual strategy of running both ChatGPT (800M weekly users) and a developer API, the shift from prompt engineering to context engineering, and how fine-tuning APIs with reinforcement learning enable companies to leverage proprietary data. Wu also explains why models resist abstraction, making them 'anti-disintermediation technology,' and how deterministic agent workflows are proving more practical than fully autonomous AI for many enterprise use cases.

Jump to Topic

OpenAI's Platform Paradox: Enabling Competitors While Building Products

Discussion of OpenAI's unusual dual strategy of operating both ChatGPT (reaching 800M weekly users, 10% of global population) and a developer API that powers competitors. Wu explains how growth and mission alignment reduce internal tension, and introduces the concept of models as 'anti-disintermediation technology' that resist abstraction.

•ChatGPT reaches 800 million weekly active users (10% of global population), with API reach potentially even larger
•Models are inherently difficult to abstract away - users always know and care which specific model they're using (GPT-4, GPT-5, etc.)
•High API customer retention despite availability of model-switching tools, driven by both technical integration depth and end-user familiarity
•OpenAI's mission to 'distribute AGI benefits broadly' justifies maintaining both first-party apps and platform APIs
•Growth solves most platform-product tension; ecosystem expansion benefits OpenAI regardless of surface area

The Death of 'One Model to Rule Them All' and Model Specialization

Wu describes the industry's dramatic shift from believing in a single AGI model to embracing model proliferation and specialization. He explains how different models excel at different tasks (GPT-5 for planning, Composer for fast iteration, Codex for coding) and why this diversity is actually beneficial for the ecosystem.

•Industry consensus has completely reversed from 2-3 years ago when everyone expected one universal model
•Users now actively switch between multiple models for different tasks within single workflows (e.g., Cursor users leveraging GPT-5, Composer, and specialized models)
•Model specialization creates healthier ecosystem dynamics and more solution diversity than winner-take-all scenarios
•OpenAI now offers model portfolio including GPT-4o, GPT-5, Codex, and reasoning models for different use cases
•Rising tide of AI ecosystem growth benefits all participants, justifying OpenAI's support for model diversity

Fine-Tuning APIs and Reinforcement Learning: Unlocking Proprietary Data

Deep dive into OpenAI's fine-tuning capabilities, particularly the new reinforcement fine-tuning (RFT) API that allows companies to leverage proprietary data to create world-class specialized models. Wu explains the evolution from basic supervised fine-tuning to RL-based customization and the potential for data-sharing arrangements.

•Companies possess 'giant treasure troves of data' that basic RAG approaches cannot fully utilize
•Reinforcement fine-tuning (RFT) enables creating 'SOTA-level' models for specific use cases using proprietary datasets
•Early supervised fine-tuning was limited to instruction-following and tone adjustments; RFT unlocks deeper model behavior changes
•OpenAI exploring pricing models where customers get discounted inference and free training in exchange for sharing fine-tuning data
•Most fine-tuning use cases currently involve offline data preparation rather than real-time online learning

From Prompt Engineering to Context Engineering

Wu explains how the industry's thinking has evolved from believing prompt engineering would become obsolete to recognizing context engineering as critical. The focus has shifted from crafting perfect prompts to designing what tools, data, and retrieval mechanisms models have access to.

•2022 consensus was that prompt engineering would disappear as models improved - this proved completely wrong
•Models now excel at instruction-following, but the challenge has shifted to 'context engineering' - what you give the model
•Traditional RAG using cosine similarity feels inadequate when feeding results to superintelligent models
•Reasoning models like O3 can now intelligently perform tool calls and data retrieval themselves
•The paradigm is 'pushing intelligence out' into retrieval and tool selection rather than relying on simple embedding-based approaches

Usage-Based Pricing Philosophy and the One-Way Ratchet

Discussion of OpenAI's pricing strategy, including why usage-based pricing is a 'one-way ratchet' that companies never abandon once adopted. Wu explains the cost-plus approach for API pricing and explores why outcome-based pricing may not be necessary when test-time compute already correlates with value.

•Usage-based pricing is closest to actual utility and has proven surprisingly durable for API business
•Ben Cott's principle: usage-based pricing is a 'one-way ratchet' - once adopted, companies never return to per-seat models
•OpenAI uses cost-plus pricing strategy for API to maintain responsible margins at scale
•Outcome-based pricing may be unnecessary because test-time compute already correlates strongly with value delivered
•Building usage-based billing infrastructure at 800M+ user scale is one of the most complex systems challenges

Open Source Strategy: GPT-OSS and Ecosystem Growth

Wu addresses OpenAI's open source strategy, explaining why releasing GPT-OSS doesn't create cannibalization risk and actually strengthens the ecosystem. He discusses the distinction between open weights and true open source, and why inference difficulty creates natural moats.

•OpenAI has seen zero cannibalization from open source model releases - use cases and customers are different
•Open weights models still require sophisticated inference infrastructure that's extremely difficult to replicate
•OpenAI's cracked inference team and tight training-inference feedback loops create defensibility even with open models
•Releasing open source models helps grow overall AI ecosystem, which benefits OpenAI as rising tide
•Revenue concentration in 2-3 flagship models means open sourcing smaller models carries minimal business risk

Agent Builder and Deterministic Workflows: SOPs vs. Autonomous Agents

Wu explains OpenAI's Agent Builder product and the surprising discovery that most enterprise work requires deterministic, SOP-driven workflows rather than fully autonomous agents. He discusses two types of work: knowledge-based (like coding) versus procedural (like customer support), and why regulated industries need constrained model behavior.

•Agent Builder uses deterministic node-based workflows, initially criticized as 'not AGI-forward' but proving highly popular
•Two types of work exist: undirected knowledge work (coding, analysis) vs. procedural SOP-driven work (support, sales, marketing)
•Procedural work is far more prevalent in industry than Silicon Valley realizes, requiring strict adherence to policies
•Regulated industries cannot allow generated content to reach users directly - require response catalogs and logic constraints
•Techniques like embedding Python logic in prompts and pre-approved response sets enable AI in constrained environments (NPCs, compliance)

The a16z Show

How OpenAI Builds for 800 Million Weekly Users: Model Specialization and Fine-Tuning

0:00 / 0:00

View original episode →

Summary

Jump to Topic

OpenAI's Platform Paradox: Enabling Competitors While Building Products

•ChatGPT reaches 800 million weekly active users (10% of global population), with API reach potentially even larger
•Models are inherently difficult to abstract away - users always know and care which specific model they're using (GPT-4, GPT-5, etc.)
•High API customer retention despite availability of model-switching tools, driven by both technical integration depth and end-user familiarity
•OpenAI's mission to 'distribute AGI benefits broadly' justifies maintaining both first-party apps and platform APIs
•Growth solves most platform-product tension; ecosystem expansion benefits OpenAI regardless of surface area

The Death of 'One Model to Rule Them All' and Model Specialization

•Industry consensus has completely reversed from 2-3 years ago when everyone expected one universal model
•Users now actively switch between multiple models for different tasks within single workflows (e.g., Cursor users leveraging GPT-5, Composer, and specialized models)
•Model specialization creates healthier ecosystem dynamics and more solution diversity than winner-take-all scenarios
•OpenAI now offers model portfolio including GPT-4o, GPT-5, Codex, and reasoning models for different use cases
•Rising tide of AI ecosystem growth benefits all participants, justifying OpenAI's support for model diversity

Fine-Tuning APIs and Reinforcement Learning: Unlocking Proprietary Data

•Companies possess 'giant treasure troves of data' that basic RAG approaches cannot fully utilize
•Reinforcement fine-tuning (RFT) enables creating 'SOTA-level' models for specific use cases using proprietary datasets
•Early supervised fine-tuning was limited to instruction-following and tone adjustments; RFT unlocks deeper model behavior changes
•OpenAI exploring pricing models where customers get discounted inference and free training in exchange for sharing fine-tuning data
•Most fine-tuning use cases currently involve offline data preparation rather than real-time online learning

From Prompt Engineering to Context Engineering

•2022 consensus was that prompt engineering would disappear as models improved - this proved completely wrong
•Models now excel at instruction-following, but the challenge has shifted to 'context engineering' - what you give the model
•Traditional RAG using cosine similarity feels inadequate when feeding results to superintelligent models
•Reasoning models like O3 can now intelligently perform tool calls and data retrieval themselves
•The paradigm is 'pushing intelligence out' into retrieval and tool selection rather than relying on simple embedding-based approaches

Usage-Based Pricing Philosophy and the One-Way Ratchet

•Usage-based pricing is closest to actual utility and has proven surprisingly durable for API business
•Ben Cott's principle: usage-based pricing is a 'one-way ratchet' - once adopted, companies never return to per-seat models
•OpenAI uses cost-plus pricing strategy for API to maintain responsible margins at scale
•Outcome-based pricing may be unnecessary because test-time compute already correlates strongly with value delivered
•Building usage-based billing infrastructure at 800M+ user scale is one of the most complex systems challenges

Open Source Strategy: GPT-OSS and Ecosystem Growth

•OpenAI has seen zero cannibalization from open source model releases - use cases and customers are different
•Open weights models still require sophisticated inference infrastructure that's extremely difficult to replicate
•OpenAI's cracked inference team and tight training-inference feedback loops create defensibility even with open models
•Releasing open source models helps grow overall AI ecosystem, which benefits OpenAI as rising tide
•Revenue concentration in 2-3 flagship models means open sourcing smaller models carries minimal business risk

Agent Builder and Deterministic Workflows: SOPs vs. Autonomous Agents

•Agent Builder uses deterministic node-based workflows, initially criticized as 'not AGI-forward' but proving highly popular
•Two types of work exist: undirected knowledge work (coding, analysis) vs. procedural SOP-driven work (support, sales, marketing)
•Procedural work is far more prevalent in industry than Silicon Valley realizes, requiring strict adherence to policies
•Regulated industries cannot allow generated content to reach users directly - require response catalogs and logic constraints
•Techniques like embedding Python logic in prompts and pre-approved response sets enable AI in constrained environments (NPCs, compliance)

The a16z Show

How OpenAI Builds for 800 Million Weekly Users: Model Specialization and Fine-Tuning

0:00 / 0:00

How OpenAI Builds for 800 Million Weekly Users: Model Specialization and Fine-Tuning

Description

Summary

Jump to Topic

OpenAI's Platform Paradox: Enabling Competitors While Building Products

The Death of 'One Model to Rule Them All' and Model Specialization

Fine-Tuning APIs and Reinforcement Learning: Unlocking Proprietary Data

From Prompt Engineering to Context Engineering

Usage-Based Pricing Philosophy and the One-Way Ratchet

Open Source Strategy: GPT-OSS and Ecosystem Growth

Agent Builder and Deterministic Workflows: SOPs vs. Autonomous Agents

Navigate

Chat with Episode

Summary

Jump to Topic

OpenAI's Platform Paradox: Enabling Competitors While Building Products

The Death of 'One Model to Rule Them All' and Model Specialization

Fine-Tuning APIs and Reinforcement Learning: Unlocking Proprietary Data

From Prompt Engineering to Context Engineering

Usage-Based Pricing Philosophy and the One-Way Ratchet

Open Source Strategy: GPT-OSS and Ecosystem Growth

Agent Builder and Deterministic Workflows: SOPs vs. Autonomous Agents

Navigate

Chat with Episode