Episode	Podcast	Published	Duration	Status

The MAD Podcast with Matt Turck

DeepMind Gemini 3 Lead: What Comes After "Infinite Data"

December 18, 2025•54m•9,949 words

Description

Gemini 3 was a landmark frontier model launch in AI this year — but the story behind its performance isn’t just about adding more compute. In this episode, I sit down with Sebastian Bourgeaud, a pre-t...

Summary

Sebastian Bourgeaud, pre-training lead for Gemini 3 at Google DeepMind, reveals that frontier model progress comes from compounding many improvements across pre-training, post-training, data, architecture, and infrastructure—not single breakthroughs. The industry is shifting from an infinite data regime to a data-limited paradigm, requiring new research approaches around data curation, synthetic data, and evaluation proxies. DeepMind's vertical integration and research-engineering culture enables 150-200 person teams to coordinate complex system-building while maintaining research velocity through careful complexity management and eval development.

Jump to Topic

The 'Secret Recipe' Behind Gemini 3: Compounding Improvements Over Silver Bullets

Gemini 3's leap wasn't from one big change but from combining many improvements across pre-training and post-training. The team is building a complete system, not just a model, with progress coming from turning many knobs rather than architectural breakthroughs. Despite this incremental approach, progress continues to exceed expectations from 2019-2020 scaling law predictions.

•Better pre-training + better post-training = the actual formula, with 1-2 larger changes plus many smaller improvements
•Modern frontier models are systems, not just neural networks—requires coordination across data, model, infrastructure, and evals
•Internal productivity gains from each model generation provide stronger validation than benchmarks alone
•Progress is ahead of 2019-2020 expectations, though scaling laws pointed in this direction
•Benchmark performance continues improving on increasingly difficult problems that would take experts significant time

AI Research Automation and the Research-Engineering Convergence

The boundary between research and engineering has blurred at DeepMind, with most work now being 'research engineering' on large-scale systems. AI tools are accelerating the execution parts of research (running experiments, analyzing data) while humans focus on hypothesis formation and experiment design. The industry shows both convergence (similar base technologies like transformers) and specialization (DeepMind's strength in multimodal/vision).

•Research now looks like engineering due to system complexity—traditional research mindset has evolved
•AI tools will automate experiment babysitting and data analysis, letting researchers spend more time on higher-level hypothesis formation
•Agentic workflows will accelerate research velocity in the next year
•All frontier labs likely use transformer-like architectures but specialize in different areas (DeepMind: multimodal/vision, OpenAI: reasoning)
•Building leading models requires very large teams and resources, though disruptive research could change this

Sebastian's Path: From Cambridge to DeepMind Pre-Training Lead

Sebastian's journey from multi-country European upbringing through Cambridge to DeepMind illustrates the importance of seizing opportunities. Starting as a research engineer in 2018 working on RL, he pivoted to representation learning on real-world data, then scaled up to Gopher (280B parameters), co-authored the Chinchilla scaling laws paper, and led RETRO's retrieval architecture research.

•Asked a lecturer for a DeepMind referral after class—simple ask led to career-defining opportunity
•Shifted from synthetic RL environments to real-world data representation learning within 6 months
•Gopher (280B params, 300B tokens) was first large-scale pre-training experience with ~10-12 person team
•Chinchilla work revealed need to scale data much faster than model size for compute-optimal training
•RETRO explored retrieval-augmented architectures to separate knowledge storage from reasoning capabilities

Research Taste and Complexity Management in Large-Scale AI

Successful research at scale requires 'research taste'—ensuring work integrates well with others, managing complexity budgets, and knowing when to trade peak performance for lower complexity. Teams must balance short-term critical path work with exploratory research, being 'allergic to complexity' while understanding that most research ideas fail and negative results don't mean something won't work.

•Research must integrate well—a 5% model improvement that makes it 5% harder for others is a bad trade-off
•Complexity budget management is critical: often choose slightly lower performance for much lower complexity
•Balance execution-focused work (fixing known issues before scale-ups) with exploratory research (validating bigger bets)
•Most research ideas fail; knowing when to persist vs. pivot is key—negative results often mean 'not made to work yet'
•Leadership with research backgrounds means minimal pressure to force specific benchmarks over durable progress

Gemini 3 Architecture: Mixture of Experts and Native Multimodality

Gemini 3 uses a transformer-based mixture-of-experts (MoE) architecture that dynamically routes computation across experts, decoupling parameter count from compute cost. Native multimodality means the same neural network processes text, images, audio, and video together—not separate models per modality. This adds complexity cost but enables superior multimodal understanding.

•MoE architecture: feedforward blocks route to different experts, decoupling parameters from compute
•Native multimodality = single model handles all modalities, not separate models stitched together
•Images have higher computational cost than text, but research makes this efficient
•Complexity cost of multimodality: different modalities interact, requiring careful system design
•Benefits of native multimodality outweigh costs—shows in benchmarks and user experience

Scaling Laws Aren't Dead: The Shift to Data-Limited Regime

Contrary to 2025 narratives, pre-training scaling laws continue working predictably, but the paradigm is shifting from infinite data to finite data regimes. Scale remains important but compounds with architecture and data innovations. This shift mirrors pre-LLM computer vision research on ImageNet, where data-limited techniques become relevant again. Synthetic data requires careful use to avoid model collapse.

•Scaling laws still work predictably—scale is one important axis among several (architecture, data)
•Industry overvalued pure scale; architecture and data innovation matter as much or more
•Paradigm shift: from infinite data scaling to finite data regime changes research approaches
•Lessons from ImageNet-era data-limited research becoming relevant again
•Synthetic data must be used carefully—key question is whether it can make models better than the generator model
•Pre-training lessons from scaling now apply to RL scaling in post-training

The Critical Role of Evaluation Infrastructure in Pre-Training

Evaluation is extremely hard in pre-training and often underestimated. Evals must bridge two gaps: predicting large model performance from small model proxies, and predicting post-training capabilities from pre-trained models. External benchmarks quickly become contaminated, forcing teams to build internal held-out eval sets. Strong eval infrastructure has driven much of Gemini's measurable progress.

•Pre-training evals must be predictive across two gaps: small-to-large models and pre-training-to-post-training
•External benchmarks contaminate quickly as they spread across the web—hard to detect eval leaks
•Only protection against self-deception: create and maintain truly held-out internal eval sets
•Eval research is critical but gets less attention than model architecture work
•Strong eval infrastructure enables measuring and validating model and data improvements
•DeepMind builds evals internally to maintain control and prevent contamination

Deep Think, Reasoning, and the Future of Agentic AI

Deep Think and reasoning models generate extended thought processes, forming and testing hypotheses, invoking tools, and doing search before providing answers. This represents compute on the sequence length dimension rather than just model depth. Agentic workflows like those in Anti-Gravity require strong screen understanding from pre-training, while 'vibes' and model feel may come more from pre-training than commonly assumed.

•Reasoning models generate thoughts: hypothesis formation, testing, tool invocation, search calls
•Compute shifts from model depth to sequence length—industry normalizing around chain-of-thought paradigm
•Screen understanding and vision capabilities from pre-training critical for agentic computer use
•'Large model feel' and vibes may be more influenced by pre-training than post-training
•Vibe coding specifically likely more RL scaling and post-training with abundant data

Long Context, Continual Learning, and Future Research Directions

Long context capabilities (from Gemini 1.5) enable agentic workflows and provide a form of continual learning as context expands. Future research focuses on extending context windows, making long context more efficient, and attention mechanism innovations. The shift to finite data regimes and serving cost optimization are becoming central concerns. Retrieval research from RETRO may become viable for leading models in coming years.

•Long context enables agents to work on codebases and accumulate information—a form of continual learning
•Search and post-training provide near-term continual learning (new knowledge access)
•Future paradigm shift: continuous training on streaming real-world data
•Attention mechanism research showing promising recent discoveries at DeepMind
•Serving cost and inference efficiency increasingly important as deployment scales
•Retrieval research (RETRO-style) may become viable for leading models as post-training and RL scaling mature

Advice for Researchers and Startups in the AI Era

Future AI researchers need research-engineering skills spanning the full stack from TPUs to research—understanding system implications is a superpower. For startups, extrapolate current model progress trajectories to identify areas where models aren't improving much. The industry has matured from building specialized models for tasks that generalist models will soon handle. Sebastian sees no end in sight for compounding improvements driving continued progress.

•Critical skill: research-engineering mindset understanding full stack from hardware to research
•Ability to reason through research implications down to TPU level creates disproportionate impact
•For startups: extrapolate model progress from past 1-1.5 years, focus on areas not improving rapidly
•Industry has learned to bet on generalist models reaching capabilities within 6-12 months
•Research on model robustness, error recovery, and harnesses increasingly important
•Many compounding improvements continue with no slowdown visible for next year+
•Working with brilliant researchers and continuous learning drives personal motivation

The MAD Podcast with Matt Turck

DeepMind Gemini 3 Lead: What Comes After "Infinite Data"

0:00 / 0:00

View original episode →

Summary

Jump to Topic

The 'Secret Recipe' Behind Gemini 3: Compounding Improvements Over Silver Bullets

•Better pre-training + better post-training = the actual formula, with 1-2 larger changes plus many smaller improvements
•Modern frontier models are systems, not just neural networks—requires coordination across data, model, infrastructure, and evals
•Internal productivity gains from each model generation provide stronger validation than benchmarks alone
•Progress is ahead of 2019-2020 expectations, though scaling laws pointed in this direction
•Benchmark performance continues improving on increasingly difficult problems that would take experts significant time

AI Research Automation and the Research-Engineering Convergence

•Research now looks like engineering due to system complexity—traditional research mindset has evolved
•AI tools will automate experiment babysitting and data analysis, letting researchers spend more time on higher-level hypothesis formation
•Agentic workflows will accelerate research velocity in the next year
•All frontier labs likely use transformer-like architectures but specialize in different areas (DeepMind: multimodal/vision, OpenAI: reasoning)
•Building leading models requires very large teams and resources, though disruptive research could change this

Sebastian's Path: From Cambridge to DeepMind Pre-Training Lead

•Asked a lecturer for a DeepMind referral after class—simple ask led to career-defining opportunity
•Shifted from synthetic RL environments to real-world data representation learning within 6 months
•Gopher (280B params, 300B tokens) was first large-scale pre-training experience with ~10-12 person team
•Chinchilla work revealed need to scale data much faster than model size for compute-optimal training
•RETRO explored retrieval-augmented architectures to separate knowledge storage from reasoning capabilities

Research Taste and Complexity Management in Large-Scale AI

•Research must integrate well—a 5% model improvement that makes it 5% harder for others is a bad trade-off
•Complexity budget management is critical: often choose slightly lower performance for much lower complexity
•Balance execution-focused work (fixing known issues before scale-ups) with exploratory research (validating bigger bets)
•Most research ideas fail; knowing when to persist vs. pivot is key—negative results often mean 'not made to work yet'
•Leadership with research backgrounds means minimal pressure to force specific benchmarks over durable progress

Gemini 3 Architecture: Mixture of Experts and Native Multimodality

•MoE architecture: feedforward blocks route to different experts, decoupling parameters from compute
•Native multimodality = single model handles all modalities, not separate models stitched together
•Images have higher computational cost than text, but research makes this efficient
•Complexity cost of multimodality: different modalities interact, requiring careful system design
•Benefits of native multimodality outweigh costs—shows in benchmarks and user experience

Scaling Laws Aren't Dead: The Shift to Data-Limited Regime

•Scaling laws still work predictably—scale is one important axis among several (architecture, data)
•Industry overvalued pure scale; architecture and data innovation matter as much or more
•Paradigm shift: from infinite data scaling to finite data regime changes research approaches
•Lessons from ImageNet-era data-limited research becoming relevant again
•Synthetic data must be used carefully—key question is whether it can make models better than the generator model
•Pre-training lessons from scaling now apply to RL scaling in post-training

The Critical Role of Evaluation Infrastructure in Pre-Training

•Pre-training evals must be predictive across two gaps: small-to-large models and pre-training-to-post-training
•External benchmarks contaminate quickly as they spread across the web—hard to detect eval leaks
•Only protection against self-deception: create and maintain truly held-out internal eval sets
•Eval research is critical but gets less attention than model architecture work
•Strong eval infrastructure enables measuring and validating model and data improvements
•DeepMind builds evals internally to maintain control and prevent contamination

Deep Think, Reasoning, and the Future of Agentic AI

•Reasoning models generate thoughts: hypothesis formation, testing, tool invocation, search calls
•Compute shifts from model depth to sequence length—industry normalizing around chain-of-thought paradigm
•Screen understanding and vision capabilities from pre-training critical for agentic computer use
•'Large model feel' and vibes may be more influenced by pre-training than post-training
•Vibe coding specifically likely more RL scaling and post-training with abundant data

Long Context, Continual Learning, and Future Research Directions

•Long context enables agents to work on codebases and accumulate information—a form of continual learning
•Search and post-training provide near-term continual learning (new knowledge access)
•Future paradigm shift: continuous training on streaming real-world data
•Attention mechanism research showing promising recent discoveries at DeepMind
•Serving cost and inference efficiency increasingly important as deployment scales
•Retrieval research (RETRO-style) may become viable for leading models as post-training and RL scaling mature

Advice for Researchers and Startups in the AI Era

•Critical skill: research-engineering mindset understanding full stack from hardware to research
•Ability to reason through research implications down to TPU level creates disproportionate impact
•For startups: extrapolate model progress from past 1-1.5 years, focus on areas not improving rapidly
•Industry has learned to bet on generalist models reaching capabilities within 6-12 months
•Research on model robustness, error recovery, and harnesses increasingly important
•Many compounding improvements continue with no slowdown visible for next year+
•Working with brilliant researchers and continuous learning drives personal motivation

The MAD Podcast with Matt Turck

DeepMind Gemini 3 Lead: What Comes After "Infinite Data"

0:00 / 0:00

DeepMind Gemini 3 Lead: What Comes After "Infinite Data"

Description

Summary

Jump to Topic

The 'Secret Recipe' Behind Gemini 3: Compounding Improvements Over Silver Bullets

AI Research Automation and the Research-Engineering Convergence

Sebastian's Path: From Cambridge to DeepMind Pre-Training Lead

Research Taste and Complexity Management in Large-Scale AI

Gemini 3 Architecture: Mixture of Experts and Native Multimodality

Scaling Laws Aren't Dead: The Shift to Data-Limited Regime

The Critical Role of Evaluation Infrastructure in Pre-Training

Deep Think, Reasoning, and the Future of Agentic AI

Long Context, Continual Learning, and Future Research Directions

Advice for Researchers and Startups in the AI Era

Navigate

Chat with Episode

Summary

Jump to Topic

The 'Secret Recipe' Behind Gemini 3: Compounding Improvements Over Silver Bullets

AI Research Automation and the Research-Engineering Convergence

Sebastian's Path: From Cambridge to DeepMind Pre-Training Lead

Research Taste and Complexity Management in Large-Scale AI

Gemini 3 Architecture: Mixture of Experts and Native Multimodality

Scaling Laws Aren't Dead: The Shift to Data-Limited Regime

The Critical Role of Evaluation Infrastructure in Pre-Training

Deep Think, Reasoning, and the Future of Agentic AI

Long Context, Continual Learning, and Future Research Directions

Advice for Researchers and Startups in the AI Era

Navigate

Chat with Episode