| Episode | Status |
|---|---|
In this special release episode, Matt sits down with Nathan Lambert and Luca Soldaini from Ai2 (the Allen Institute for AI) to break down one of the biggest open-source AI drops of the year: OLMo 3. A...
Nathan Lambert and Luca Soldaini from AI2 discuss the release of OLMo 3, a fully open-source AI model family that includes base models, instruction-tuned models, and reasoning models. Unlike typical 'open weights' releases, AI2 publishes complete training data, recipes, intermediate checkpoints, and evaluation frameworks. The conversation provides an unprecedented technical deep-dive into the six-stage pipeline from pre-training through reinforcement learning, while addressing the geopolitical shift in open-source AI as Chinese models like Qwen and DeepSeek dominate the landscape amid uncertainty around Meta's Llama future.
Introduction to the OLMo 3 model family release, including 7B and 32B base models, thinking models, and instruct models. Discussion of AI2's commitment to full openness—releasing not just weights but all data, intermediate checkpoints, recipes, and evaluation frameworks—contrasting with typical 'open weights' releases from other labs.
Analysis of how DeepSeek's January release catalyzed an explosion of Chinese open-source models (Qwen, Kimi, DeepSeek) while Meta's Llama future became uncertain. Discussion of the strategic vacuum in US open-source AI and emerging responses including the ADAM project and increased investment from players like NVIDIA and Reflection.
Explanation of what thinking models are—models trained to spend more compute at inference time through long chains of thought, resulting in step-change improvements on math, coding, and agentic tasks. Discussion of why they're becoming the industry standard despite being less 'fun' to build than regular instruct models.
Background on the Allen Institute for AI, founded by Paul Allen in 2014, and its evolution from science-focused AI to becoming a leader in open language models. Discussion of AI2's grassroots initiative in November 2022 to build fully open models, securing initial compute from AMD, and the organization's ~100-person team structure.
Deep dive into pre-training methodology, including the constraint-driven approach of fixing compute budget and duration (2 months max), then optimizing data selection from a 30T token pool down to 6T tokens. Discussion of intelligent token repetition, domain balancing, and the critical importance of avoiding training spikes that would require restarts.
Explanation of mid-training (also called 'tail patching') to add capabilities the model didn't learn during pre-training, and the critical importance of long context extension for reasoning models. Discussion of architectural decisions that matter more than data quality, and extending from 4K to 65K+ token context windows.
Detailed explanation of supervised fine-tuning (SFT) for small reasoning models through distillation from larger teachers (DeepSeek R1, Qwen QWQ). Discussion of generating 2.5M reasoning traces, the practicality of using Chinese models as teachers due to licensing and quality, and how this stage provides the foundation for 90% of model performance.
Explanation of DPO (Direct Preference Optimization) as a simpler alternative to RLHF that works surprisingly well on reasoning models. Discussion of the 'delta learning hypothesis'—that contrast between chosen/rejected examples matters more than absolute quality—and the challenge of finding sufficient variance as open models improve.
Deep dive into RLVR (Reinforcement Learning with Verifiable Rewards) using correctness-based rewards rather than human preference models. Discussion of the extreme infrastructure challenges with long-context RL, numerical stability issues between VLM and training frameworks, and why this stage is critical for future model development despite modest immediate gains.
Discussion of AGI timelines and the tension between rapid progress and increasing system complexity. Both researchers express belief in transformative AI by 2030 but reject discontinuous 'singularity' scenarios, arguing that physical constraints, complexity tax, and messy co-evolution will result in smooth but dramatic progress rather than sudden jumps.
Open Source AI Strikes Back — Inside Ai2’s OLMo 3 ‘Thinking"
Ask me anything about this podcast episode...
Try asking: