| Episode | Status |
|---|---|
| Episode | Status |
|---|---|
AI models feel smarter than their real-world impact. They ace benchmarks, yet still struggle with reliability, strange bugs, and shallow generalization. Why is there such a gap between what they can d...
Ilya Sutskever discusses the fundamental gap between AI benchmark performance and real-world impact, arguing we've entered a new 'age of research' after the scaling era. He explores why models generalize poorly compared to humans, the role of RL versus pretraining, and proposes that better generalization—not just more compute—is the key bottleneck. The conversation covers value functions, continual learning, emotions as evolutionary reward systems, and what safe deployment of superintelligence might require, including the possibility that humans may need to merge with AI to maintain meaningful participation in an AI-driven world.
Ilya examines the puzzling disconnect between models acing evaluations yet struggling with basic real-world tasks like fixing bugs without introducing new ones. He proposes two explanations: RL training makes models too narrowly focused, and companies inadvertently train on eval-inspired environments rather than diverse real-world scenarios, leading to reward hacking at the research level.
Discussion of why humans generalize so much better than models despite seeing far less data. Ilya distinguishes between skills where evolution provides priors (vision, locomotion) versus recent skills (coding, math) where humans still excel, suggesting a fundamental difference in learning mechanisms rather than just evolutionary advantages.
Ilya explores how emotions function as a hardcoded reward system through the case study of a brain-damaged patient who lost emotional processing and became unable to make even simple decisions. This raises questions about whether emotions represent an evolutionary solution to the value function problem that AI systems lack.
Ilya argues that the 'age of scaling' (2020-2025) is giving way to a new 'age of research' where compute is abundant but ideas are scarce. The scaling paradigm provided a low-risk investment strategy, but now the question is finding more productive ways to use compute rather than just adding more of it.
Analysis of how RL and pretraining scale differently. While pretraining had clear power laws, RL requires long rollouts and gets relatively small learning per rollout, consuming massive compute. The transition from pretraining to RL represents a shift from a well-understood scaling recipe to a more research-intensive paradigm.
Ilya explains SSI's positioning: while they raised less than competitors, much of big company budgets go to inference and product features. For research specifically, SSI has 'sufficient compute to prove to convince ourselves and anyone else that what we're doing is correct.' The focus is on different technical approaches to generalization.
Ilya argues that the terms 'AGI' and 'pretraining' have shaped everyone's thinking in ways that may have overshot the target. The term AGI emerged as a reaction to 'narrow AI,' and pretraining seemed to deliver it by improving everything uniformly. But humans aren't AGIs—they rely on continual learning and know less but more deeply.
Rather than a finished system that knows everything, Ilya proposes superintelligence as a 'super intelligent 15 year old that's very eager' who can learn any job quickly. This reframes the problem from building omniscient AGI to building superior learning algorithms that acquire skills through deployment and experience.
Discussion of two paths to superintelligence: the learning algorithm becoming superhuman at ML research itself (recursive self-improvement), or instances deployed across the economy learning all jobs simultaneously and merging knowledge. Both could lead to rapid capability gains, though the timeline and dynamics differ significantly.
Ilya's thinking has shifted toward placing more importance on incremental AI deployment in advance. The core problem is that future AI power is 'very difficult to imagine,' and showing increasingly powerful systems is necessary for people—including AI researchers—to update their behaviors and safety approaches appropriately.
Ilya proposes that building AI robustly aligned to care about sentient life (not just humans) may be easier and more stable than other alignment targets. This is because the AI itself will be sentient, and mirror neurons/empathy suggest modeling others with the same circuits used for self-modeling is efficient and natural.
For long-term stability in a world with superintelligent AI, Ilya reluctantly proposes that humans may need to become 'part AI with some kind of Neuralink plus plus.' This would allow humans to fully understand what their AI agents are doing and remain genuine participants rather than passive beneficiaries writing 'keep it up' to reports.
Ilya describes his research taste as guided by an aesthetic of 'how AI should be by thinking about how people are, but thinking correctly.' This means identifying what's fundamental (neurons, distributed representations, learning from experience) versus superficial (brain folds), and looking for beauty, simplicity, and elegance while rejecting ugliness.
Dwarkesh and Ilya Sutskever on What Comes After Scaling
Ask me anything about this podcast episode...
Try asking: