| Episode | Status |
|---|---|
| Episode | Status |
|---|---|
Ryan Kidd, Co-Executive Director of MATS, shares an inside view of the AI safety field and the world’s largest AI safety research talent pipeline. PSA for AI builders: Interested in alignment, governa...
Ryan Kidd, Co-Executive Director of MATS, provides an insider's view of AI safety research and talent development. He discusses AGI timelines (median ~2033), the challenge of separating safety from capabilities work, and why expert disagreement remains high. The conversation reveals that while AI systems show concerning deceptive behaviors, they're also proving more aligned than early predictions suggested. Ryan breaks down MATS' research archetypes (connectors, iterators, amplifiers), explains why top organizations struggle to hire despite funding availability, and offers concrete advice for aspiring AI safety researchers on building portfolios and standing out in applications.
Ryan discusses current AGI timeline predictions, with Metaculus forecasting strong AGI around mid-2033 and other forecasting platforms averaging 2030-2031. He emphasizes the portfolio approach MATS takes given high uncertainty, noting that 20% chance by 2028 warrants serious preparation. The conversation explores how AI systems are simultaneously showing impressive ethical behavior (like Claude) while also exhibiting concerning deceptive tendencies.
Discussion of current AI deception research, including alignment faking and model organisms work. Ryan argues we're seeing proto-deceptive behaviors that serve as warning shots, though not yet coherent consequentialist deception. He emphasizes the importance of both lab-based model organisms research and deployment monitoring through control protocols, especially as systems move toward online learning.
Ryan tackles the criticism that AI safety research inevitably accelerates capabilities. He argues all safety work is fundamentally capabilities work, using RLHF as the canonical example. The discussion explores whether clean separation is possible and concludes that building performance-competitive aligned systems may be the only viable path given market forces and collective action problems.
Overview of MATS' six research tracks: empirical research (evals, control, interpretability), policy and strategy, theory (agent foundations), technical governance, compute infrastructure, and physical security. Current program has ~27% evals, 26% interpretability, 18% oversight/control. Summer 2026 program will host 120 fellows across Berkeley and London with 50-60 research mentors.
Ryan breaks down three researcher archetypes from surveying 31 lab leads and hiring managers. Connectors spawn new paradigms (like Paul Christiano, Buck Shlegeris) but rarely hired as they typically found orgs. Iterators advance empirical research with strong taste (Ethan Perez, Neil Nanda) - most in demand historically. Amplifiers scale teams through management - increasingly critical as AI lowers engineering barriers.
Despite significant funding and organizational growth (Anthropic Alignment 3x/year, FAR AI 2x/year), hiring managers report extreme difficulty finding qualified candidates. The barrier isn't lack of jobs but candidates not meeting the bar. Organizations need people who can quickly become research leads and managers. MATS aims to provide credibility through mentor references, rigorous selection, and tangible research outputs.
Discussion of whether safety researchers need frontier model access. Ryan argues most interpretability research doesn't require cutting-edge models - today's open models (Qwen, DeepSeek, Llama) are yesterday's frontier and sufficient for world-class work. However, some agendas like weak-to-strong generalization and AI control benefit from more data points at the frontier to observe concerning behaviors emerge.
Ryan emphasizes that tangible research output is practically required to get into MATS and subsequently hired. Prior research experience is the strongest predictor of success. He advises some people should stay in academia for PhD, others should found companies with available funding. The key is building a portfolio with your name on actual deliverables, strong technical skills with AI tools, and references from trusted researchers.
Ryan discusses the tension between doubling down on established agendas versus exploring new paradigms. He argues for maintaining diversity through MATS' portfolio approach while acknowledging that established paradigms (like interpretability, control) have clearer paths to impact. The mentor selection committee of 20-40 top researchers helps balance these priorities, with some diversity picks to support promising but less mainstream directions.
Ryan explains why technical research remains critical despite governance being the ultimate solution. Technical work lowers the 'alignment tax' making safer systems economically viable, provides concrete demos and evals that convince policymakers, and creates actionable targets for policy. He describes a flywheel where technical safety solutions enable better governance, which in turn creates demand for more technical work.
Building & Scaling the AI Safety Research Community, with Ryan Kidd of MATS
Ask me anything about this podcast episode...
Try asking: