| Episode | Status |
|---|---|
| Episode | Status |
|---|---|
Gemini 3 was a landmark frontier model launch in AI this year — but the story behind its performance isn’t just about adding more compute. In this episode, I sit down with Sebastian Bourgeaud, a pre-t...
Sebastian Bourgeaud, pre-training lead for Gemini 3 at Google DeepMind, reveals that frontier model progress comes from compounding many improvements across pre-training, post-training, data, architecture, and infrastructure—not single breakthroughs. The industry is shifting from an infinite data regime to a data-limited paradigm, requiring new research approaches around data curation, synthetic data, and evaluation proxies. DeepMind's vertical integration and research-engineering culture enables 150-200 person teams to coordinate complex system-building while maintaining research velocity through careful complexity management and eval development.
Gemini 3's leap wasn't from one big change but from combining many improvements across pre-training and post-training. The team is building a complete system, not just a model, with progress coming from turning many knobs rather than architectural breakthroughs. Despite this incremental approach, progress continues to exceed expectations from 2019-2020 scaling law predictions.
The boundary between research and engineering has blurred at DeepMind, with most work now being 'research engineering' on large-scale systems. AI tools are accelerating the execution parts of research (running experiments, analyzing data) while humans focus on hypothesis formation and experiment design. The industry shows both convergence (similar base technologies like transformers) and specialization (DeepMind's strength in multimodal/vision).
Sebastian's journey from multi-country European upbringing through Cambridge to DeepMind illustrates the importance of seizing opportunities. Starting as a research engineer in 2018 working on RL, he pivoted to representation learning on real-world data, then scaled up to Gopher (280B parameters), co-authored the Chinchilla scaling laws paper, and led RETRO's retrieval architecture research.
Successful research at scale requires 'research taste'—ensuring work integrates well with others, managing complexity budgets, and knowing when to trade peak performance for lower complexity. Teams must balance short-term critical path work with exploratory research, being 'allergic to complexity' while understanding that most research ideas fail and negative results don't mean something won't work.
Gemini 3 uses a transformer-based mixture-of-experts (MoE) architecture that dynamically routes computation across experts, decoupling parameter count from compute cost. Native multimodality means the same neural network processes text, images, audio, and video together—not separate models per modality. This adds complexity cost but enables superior multimodal understanding.
Contrary to 2025 narratives, pre-training scaling laws continue working predictably, but the paradigm is shifting from infinite data to finite data regimes. Scale remains important but compounds with architecture and data innovations. This shift mirrors pre-LLM computer vision research on ImageNet, where data-limited techniques become relevant again. Synthetic data requires careful use to avoid model collapse.
Evaluation is extremely hard in pre-training and often underestimated. Evals must bridge two gaps: predicting large model performance from small model proxies, and predicting post-training capabilities from pre-trained models. External benchmarks quickly become contaminated, forcing teams to build internal held-out eval sets. Strong eval infrastructure has driven much of Gemini's measurable progress.
Deep Think and reasoning models generate extended thought processes, forming and testing hypotheses, invoking tools, and doing search before providing answers. This represents compute on the sequence length dimension rather than just model depth. Agentic workflows like those in Anti-Gravity require strong screen understanding from pre-training, while 'vibes' and model feel may come more from pre-training than commonly assumed.
Long context capabilities (from Gemini 1.5) enable agentic workflows and provide a form of continual learning as context expands. Future research focuses on extending context windows, making long context more efficient, and attention mechanism innovations. The shift to finite data regimes and serving cost optimization are becoming central concerns. Retrieval research from RETRO may become viable for leading models in coming years.
Future AI researchers need research-engineering skills spanning the full stack from TPUs to research—understanding system implications is a superpower. For startups, extrapolate current model progress trajectories to identify areas where models aren't improving much. The industry has matured from building specialized models for tasks that generalist models will soon handle. Sebastian sees no end in sight for compounding improvements driving continued progress.
DeepMind Gemini 3 Lead: What Comes After "Infinite Data"
Ask me anything about this podcast episode...
Try asking: