Episode	Podcast	Published	Duration	Status

Machine Learning Street Talk (MLST)

We Invented Momentum Because Math is Hard [Dr. Jeff Beck]

December 31, 2025•1h 16m•13,645 words

Description

Dr. Jeff Beck, mathematician turned computational neuroscientist, joins us for a fascinating deep dive into why the future of AI might look less like ChatGPT and more like your own brain.**SPONSOR MES...

Summary

Dr. Jeff Beck argues that scaling transformers alone won't achieve human-like AI - we need brain-inspired, object-centered models grounded in physical reality. He presents a framework combining Bayesian inference with sparse, structured world models that can be trained on smaller datasets, enable continual learning, and support true systems engineering. Key innovations include 'lots of little models' approach, physics discovery algorithms, and solving the sim-to-real gap through proper grounding in macroscopic physics rather than language.

Jump to Topic

Bayesian Brain Hypothesis and Optimal Cue Combination

Beck explains why he believes the brain operates as a Bayesian inference engine, citing behavioral experiments showing humans optimally combine multiple sensory cues with varying reliability. The brain constantly processes information to maintain understanding of low-level statistics, even when not consciously perceived.

•Cue combination experiments show humans weight sensory information by reliability on a trial-by-trial basis
•Bayesian inference encapsulates the scientific method - explicit hypothesis testing with models conditioned on hypotheses
•The brain requires constant input to maintain fidelity; closing eyes for years causes visual system decay
•90% of brain function is deciding what to ignore, managing uncertainty in all decisions

Why We Invented Momentum: Computational Convenience vs Reality

Beck uses momentum in physics as an example of how we choose mathematical frameworks for computational convenience rather than because they necessarily reflect reality. Causal models are preferred because they simplify calculations and point to effective intervention points.

•Momentum was chosen as a hidden variable because it makes models Markovian and causally simple
•We don't directly observe momentum - it's a computational convenience that happens to work
•Causal models are valuable because they reduce variables to track and enable effective action planning
•The most sophisticated technology of each era becomes our metaphor for how the brain works

Macro vs Micro Causation and Downward Causation

Discussion of why macroscopic causal relationships matter more than microscopic ones - they align with our affordances and ability to act. Downward causation justifies when we've correctly identified a useful macroscopic variable that makes microscopic details irrelevant.

•Causal relationships that matter are those matching our domain of action and affordances
•Technology extends our affordances into new scales (e.g., nuclear power accessing atomic-scale causation)
•Downward causation validates macroscopic abstractions - shows you 'drew the circle correctly'
•Good macroscopic variables have equations describing their evolution over time (like PVT in thermodynamics)

The Transformer Revolution: AutoGrad Changed Everything

Beck argues AutoGrad was more important than the transformer architecture itself, turning AI development from careful mathematical construction into an engineering problem. This enabled rapid experimentation but lost focus on structured, brain-like models.

•AutoGrad enabled experimentation with architectures, nonlinearities, and structures previously impossible
•Scaling models like Mamba show transformer benefits came more from scale than architecture
•Backprop was dismissed due to vanishing gradients until engineering tricks solved it
•Function approximation alone won't deliver AGI - need models structured like brain and world

Building Brain-Like AI: Object-Centered World Models

Beck's core thesis: AI needs cognitively-inspired models grounded in macroscopic physics, not language. The approach uses sparse, structured, object-centered models that enable systems engineering and creative problem-solving rather than just pattern matching.

•Intelligence must be embodied with models structured like the physical world (object-centered, relational, causal)
•Current models are microscopic (pixel/token level) vs macroscopic (object level) like human cognition
•Systems engineering requires understanding how objects relate so you can combine them creatively
•Active inference at scale requires new framework combining Bayesian methods with brain-like structure

Grounding AI in Physics vs Language

Critique of grounding models in language (like LangChain approach) versus grounding in physical reality. Language is an unreliable representation of both world and thought processes - self-report is least reliable experimental data.

•Vision-language models ground everything in linguistic space for human interface convenience
•Humans are grounded in macroscopic physical world - that's where atomic elements of thought come from
•Self-report in experiments is least reliable data - people's explanations don't match their actual behavior models
•Single cells have models grounded in chemistry; mammals need models grounded in object-centered physics

Scaling Bayesian Inference: Recent Breakthroughs

Overview of techniques making Bayesian inference tractable at scale: normalizing flows, natural gradient methods, improved sampling. The active inference community historically focused on breadth (evangelism) over depth (solving hard problems).

•Normalizing flows ensure sophisticated likelihoods with tractable probability distributions
•Natural gradient methods enable massive parameter space jumps without losing learning capability
•Active inference community showed wide applicability but avoided hard problems due to scaling challenges
•Recent developments (last 8 years) from Bayesian ML community enable practical scaling

Lots of Little Models: Compositional Learning

Revolutionary approach: instead of one giant model, train thousands of small object-specific models that can be composed. Train on houses, train on parks separately, then combine - the book model works in both contexts.

•Object-centered approach yields thousands of little models (e.g., one 'book model') instead of one monolith
•Models trained in different domains (houses, parks) can be combined if interaction structure is clever
•Dramatically reduces data requirements - don't need to retrain everything for new environments
•Enables transfer learning at object level rather than requiring full retraining

Physics Discovery and Interaction Graphs

Objects are defined by their interaction patterns. Multiple adjacency matrices represent different interaction types (forces). Bayesian uncertainty about interactions enables continual learning when encountering novel situations.

•Not just one adjacency matrix - one for every type of interaction possible
•Discovering generalized 'forces' (interaction classes) enables flexible object definitions across cultures
•Maintaining uncertainty about unseen interactions allows rapid updates when new behaviors observed
•Continual learning is critical - unlike current AI that turns learning off at deployment

The Cat in the Warehouse: Handling Novel Objects

Concrete example of system advantages: warehouse robot encounters cat (never seen before). Surprise signal triggers, queries model bank, receives candidate models, tests hypotheses, incorporates cat model. Demonstrates knowing what you don't know.

•Free energy framework tracks surprise - alerts system when encountering unknown objects
•Can 'phone a friend' - query central model bank for candidate explanations
•Performs proper hypothesis testing by watching behavior to select correct model
•Massive compute advantage - only loads models relevant to current environment, not everything

Solving Sim-to-Real Gap with Better Physics

Current robotics fails to transfer from simulation to reality because game engines prioritize plausibility over accuracy, and robot 'brains' lack world structure. Need accurate physics simulators and structured internal models.

•Video game engines designed to look good, not be physically accurate (use tricks/hacks)
•Expert trajectory learning (mimicking humans) doesn't teach physics - can't generalize across tasks
•Training in accurate physics simulation with structured models enables real-world transfer
•Model-based approach with accurate world representation is critical for robotics advancement

Alignment Through Belief Sharing, Not Reward Functions

Critique of reward-based alignment: reward functions are arbitrary and lead to degenerate behavior. Humans align by discussing beliefs to separate belief disagreements from value disagreements. Proposes using AI as oracles or solving alignment through explicit belief models.

•Reward function selection has no normative solution (minus 10 for squirrel, minus 50 for cat - why those numbers?)
•Action conflates beliefs and values - mathematically impossible to separate from observation alone
•Humans align by making beliefs explicit through conversation, isolating value disagreements
•Safest approach: use AI as prediction oracles, not decision makers, until alignment solved

Cellular Automata and Emergent Computation

Discussion of cellular automata as Turing-complete systems with emergent properties. Beck focuses less on how emergence happens from simple rules, more on mathematical properties of resulting macroscopic objects - aligns with human cognitive bias.

•Cellular automata can arbitrarily expand memory and learn update rules via gradient descent
•Simple local rules leading to complex behavior is interesting but not Beck's focus
•More interested in properties of emergent large-scale objects and their mathematical descriptions
•Humans naturally focus on macroscopic behavior (the 'creatures') not microscopic rules

Program Synthesis and Genetic Encoding of Neural Networks

Program synthesis has promise but current approaches lack datasets of well-written programs. Tony Zadar's work on genetically encoding neural networks suggests path forward - learn patterns across solutions to sensibly traverse architecture space.

•Object-centered approach is compatible with program synthesis - both about composing components
•Current program synthesis produces incomprehensible confections of rules
•Zadar's genetic encoding of neural networks finds patterns across solutions to different problems
•Program synthesis could exploit similar tricks with access to large datasets of quality programs (like GitHub)

Machine Learning Street Talk (MLST)

We Invented Momentum Because Math is Hard [Dr. Jeff Beck]

0:00 / 0:00

View original episode →

Summary

Jump to Topic

Bayesian Brain Hypothesis and Optimal Cue Combination

•Cue combination experiments show humans weight sensory information by reliability on a trial-by-trial basis
•Bayesian inference encapsulates the scientific method - explicit hypothesis testing with models conditioned on hypotheses
•The brain requires constant input to maintain fidelity; closing eyes for years causes visual system decay
•90% of brain function is deciding what to ignore, managing uncertainty in all decisions

Why We Invented Momentum: Computational Convenience vs Reality

•Momentum was chosen as a hidden variable because it makes models Markovian and causally simple
•We don't directly observe momentum - it's a computational convenience that happens to work
•Causal models are valuable because they reduce variables to track and enable effective action planning
•The most sophisticated technology of each era becomes our metaphor for how the brain works

Macro vs Micro Causation and Downward Causation

•Causal relationships that matter are those matching our domain of action and affordances
•Technology extends our affordances into new scales (e.g., nuclear power accessing atomic-scale causation)
•Downward causation validates macroscopic abstractions - shows you 'drew the circle correctly'
•Good macroscopic variables have equations describing their evolution over time (like PVT in thermodynamics)

The Transformer Revolution: AutoGrad Changed Everything

•AutoGrad enabled experimentation with architectures, nonlinearities, and structures previously impossible
•Scaling models like Mamba show transformer benefits came more from scale than architecture
•Backprop was dismissed due to vanishing gradients until engineering tricks solved it
•Function approximation alone won't deliver AGI - need models structured like brain and world

Building Brain-Like AI: Object-Centered World Models

•Intelligence must be embodied with models structured like the physical world (object-centered, relational, causal)
•Current models are microscopic (pixel/token level) vs macroscopic (object level) like human cognition
•Systems engineering requires understanding how objects relate so you can combine them creatively
•Active inference at scale requires new framework combining Bayesian methods with brain-like structure

Grounding AI in Physics vs Language

•Vision-language models ground everything in linguistic space for human interface convenience
•Humans are grounded in macroscopic physical world - that's where atomic elements of thought come from
•Self-report in experiments is least reliable data - people's explanations don't match their actual behavior models
•Single cells have models grounded in chemistry; mammals need models grounded in object-centered physics

Scaling Bayesian Inference: Recent Breakthroughs

•Normalizing flows ensure sophisticated likelihoods with tractable probability distributions
•Natural gradient methods enable massive parameter space jumps without losing learning capability
•Active inference community showed wide applicability but avoided hard problems due to scaling challenges
•Recent developments (last 8 years) from Bayesian ML community enable practical scaling

Lots of Little Models: Compositional Learning

•Object-centered approach yields thousands of little models (e.g., one 'book model') instead of one monolith
•Models trained in different domains (houses, parks) can be combined if interaction structure is clever
•Dramatically reduces data requirements - don't need to retrain everything for new environments
•Enables transfer learning at object level rather than requiring full retraining

Physics Discovery and Interaction Graphs

•Not just one adjacency matrix - one for every type of interaction possible
•Discovering generalized 'forces' (interaction classes) enables flexible object definitions across cultures
•Maintaining uncertainty about unseen interactions allows rapid updates when new behaviors observed
•Continual learning is critical - unlike current AI that turns learning off at deployment

The Cat in the Warehouse: Handling Novel Objects

•Free energy framework tracks surprise - alerts system when encountering unknown objects
•Can 'phone a friend' - query central model bank for candidate explanations
•Performs proper hypothesis testing by watching behavior to select correct model
•Massive compute advantage - only loads models relevant to current environment, not everything

Solving Sim-to-Real Gap with Better Physics

•Video game engines designed to look good, not be physically accurate (use tricks/hacks)
•Expert trajectory learning (mimicking humans) doesn't teach physics - can't generalize across tasks
•Training in accurate physics simulation with structured models enables real-world transfer
•Model-based approach with accurate world representation is critical for robotics advancement

Alignment Through Belief Sharing, Not Reward Functions

•Reward function selection has no normative solution (minus 10 for squirrel, minus 50 for cat - why those numbers?)
•Action conflates beliefs and values - mathematically impossible to separate from observation alone
•Humans align by making beliefs explicit through conversation, isolating value disagreements
•Safest approach: use AI as prediction oracles, not decision makers, until alignment solved

Cellular Automata and Emergent Computation

•Cellular automata can arbitrarily expand memory and learn update rules via gradient descent
•Simple local rules leading to complex behavior is interesting but not Beck's focus
•More interested in properties of emergent large-scale objects and their mathematical descriptions
•Humans naturally focus on macroscopic behavior (the 'creatures') not microscopic rules

Program Synthesis and Genetic Encoding of Neural Networks

•Object-centered approach is compatible with program synthesis - both about composing components
•Current program synthesis produces incomprehensible confections of rules
•Zadar's genetic encoding of neural networks finds patterns across solutions to different problems
•Program synthesis could exploit similar tricks with access to large datasets of quality programs (like GitHub)

Machine Learning Street Talk (MLST)

We Invented Momentum Because Math is Hard [Dr. Jeff Beck]

0:00 / 0:00

We Invented Momentum Because Math is Hard [Dr. Jeff Beck]

Description

Summary

Jump to Topic

Bayesian Brain Hypothesis and Optimal Cue Combination

Why We Invented Momentum: Computational Convenience vs Reality

Macro vs Micro Causation and Downward Causation

The Transformer Revolution: AutoGrad Changed Everything

Building Brain-Like AI: Object-Centered World Models

Grounding AI in Physics vs Language

Scaling Bayesian Inference: Recent Breakthroughs

Lots of Little Models: Compositional Learning

Physics Discovery and Interaction Graphs

The Cat in the Warehouse: Handling Novel Objects

Solving Sim-to-Real Gap with Better Physics

Alignment Through Belief Sharing, Not Reward Functions

Cellular Automata and Emergent Computation

Program Synthesis and Genetic Encoding of Neural Networks

Navigate

Chat with Episode

Summary

Jump to Topic

Bayesian Brain Hypothesis and Optimal Cue Combination

Why We Invented Momentum: Computational Convenience vs Reality

Macro vs Micro Causation and Downward Causation

The Transformer Revolution: AutoGrad Changed Everything

Building Brain-Like AI: Object-Centered World Models

Grounding AI in Physics vs Language

Scaling Bayesian Inference: Recent Breakthroughs

Lots of Little Models: Compositional Learning

Physics Discovery and Interaction Graphs

The Cat in the Warehouse: Handling Novel Objects

Solving Sim-to-Real Gap with Better Physics

Alignment Through Belief Sharing, Not Reward Functions

Cellular Automata and Emergent Computation

Program Synthesis and Genetic Encoding of Neural Networks

Navigate

Chat with Episode