Episode	Podcast	Published	Duration	Status

Machine Learning Street Talk (MLST)

Making deep learning perform real algorithms with Category Theory (Andrew Dudzik, Petar Velichkovich, Taco Cohen, Bruno Gavranović, Paul Lessard)

December 22, 2025•43m•7,706 words

Description

We often think of Large Language Models (LLMs) as all-knowing, but as the team reveals, they still struggle with the logic of a second-grader. Why can’t ChatGPT reliably add large numbers? Why does it...

Summary

This episode explores fundamental limitations of current AI systems—like LLMs' inability to reliably perform basic arithmetic—and proposes Category Theory as a mathematical framework to transform deep learning from empirical "alchemy" into rigorous science. The discussion covers how categorical deep learning extends geometric deep learning to handle non-invertible computations (like algorithms that destroy information), introduces formal frameworks for weight tying and compositionality, and addresses the challenge of implementing discrete algorithmic operations (like carrying in addition) within continuous neural architectures.

Jump to Topic

Why LLMs Fail at Basic Arithmetic and Multiplication

Language models cannot reliably perform addition or multiplication despite executing billions of operations per token. They learn patterns that work often but fail when tested with edge cases. The discussion reveals fundamental misalignment between how these systems are trained and what we need them to do for reasoning and scientific applications.

•ChatGPT fails at addition when patterns break (e.g., changing one digit in a trick question)
•Frontier models perform hundreds of billions of multiplications to produce one token, yet cannot reliably multiply small numbers
•Tool use (like calling calculators) isn't enough—internalizing computation provides stability and efficiency
•Complex reasoning requiring multiple sequential operations becomes impractical with external tool calls
•Video generation models (VEO, Genie) approximate physics but don't accurately encode Newton's laws

Geometric Deep Learning: Symmetries and Equivariance

Geometric deep learning builds neural networks that are equivariant to symmetry transformations (like translations, rotations, permutations). This approach dramatically reduces data requirements by encoding structural priors. Transformers are fundamentally permutation-equivariant models, which explains part of their success.

•Equivariance means predictable outputs when inputs undergo irrelevant transformations (e.g., shifting an image still shows a cat)
•Graph neural networks use permutation equivariance—reordering nodes doesn't change the graph
•Building in symmetries reduces data requirements exponentially compared to learning from scratch
•Transformers are permutation-equivariant after adding position embeddings
•Group theory provides the mathematical foundation for spatial symmetries but has limitations

Limitations of Group Theory for Algorithmic Reasoning

Group theory assumes all transformations are invertible, but many algorithms destroy information (like Dijkstra's shortest path algorithm). Different graphs can produce identical outputs, making the computation non-invertible. This fundamental limitation led the team to explore category theory as a more general framework.

•Algorithms like Dijkstra's or Bellman-Ford compress different inputs to the same output—information is lost
•Group symmetries require invertibility, which doesn't match how most programs work
•Many programs delete data or have outputs that don't uniquely determine inputs
•Path-finding algorithms: many different weighted graphs yield identical shortest paths
•Need framework that handles partial compositionality—not all operations can compose with all others

Category Theory as Algebra with Colors: Non-Square Matrices

Category theory generalizes algebra to handle situations where composition has constraints. The key insight: think of it as 'algebra with colors' where operations can only compose when types match, like non-square matrix multiplication requiring matching dimensions.

•Categories formalize partial compositionality—when you can't always combine operations
•Non-square matrices illustrate the concept: m×n and l×m matrices only multiply when dimensions align
•This 'color matching' constraint appears throughout neural networks with varying layer dimensions
•Categories aren't mysterious—they capture everyday mathematical structures
•Provides systematic way to reason about type-safe composition in computation

Synthetic vs Analytic Mathematics and Structuralism

Category theory represents synthetic (vs analytic) mathematics—focusing on relationships and inference rules rather than internal construction. This structuralist approach eliminates irrelevant implementation details and reveals what actually matters for reasoning, similar to Euclidean geometry vs Cartesian coordinates.

•Analytic math: everything built from basic substances (Descartes' coordinate geometry)
•Synthetic math: focus only on inference principles, ignore internal structure (Euclid's geometry)
•Structuralist mathematics eliminates 'noise'—details inaccessible to logic
•Category theory provides single language for diverse structures: groups, lists, trees, algebraic structures
•Systematic approach often produces meaningful definitions that domain experts independently discover

Two-Categories and Weight Tying in Neural Networks

Two-categories add another level: objects, morphisms between objects, and two-morphisms relating morphisms. This framework provides rigorous theory for weight sharing in neural networks, proving when weight tying preserves computational structure across diverse domains from RNNs to game theory.

•Two-morphisms model relationships between morphisms (functions relating functions)
•Weight tying in RNNs: same weights process each time step—but when is this valid?
•Two-morphisms formalize reparameterization and weight sharing constraints
•Framework applies beyond smooth spaces: works for manifolds, game theory, economic agents
•Higher categories (3+) capture emergent effects and compositional behavior at multiple scales

Semantics vs Syntax: Folds, Monoids, and Recursive Data Types

The paper focuses on semantics (how programs behave) rather than syntax (what you type). Lists are semantically 'foldable types' or monoids. Different syntaxes can describe identical semantics, and working semantically enables cleaner mathematical proofs and more general frameworks.

•Syntax: what you type (encounters syntax errors). Semantics: how programs behave
•Lists are foldable types—mathematically, they're monoids (generalized groups)
•Multiple syntaxes can express the same semantics (addition+negation vs subtraction)
•Semantic analysis enables theorem proving and comparison across different representations
•Categorical framework describes recursive computation: algebras for endofunctors capture list constructors and folds

The Carry Problem: Why GNNs Struggle with Basic Addition

A fundamental oversight in GNN design: the concept of 'carry' from elementary arithmetic. Traditional GNNs send whole states, but carry operations require transmitting state changes. Implementing discrete operations like carrying in continuous gradient-based systems requires sophisticated geometric structures like the Hopf fibration in 4D space.

•Carry mechanism: when a digit wheel goes 9→0, it increments the next wheel
•GNNs typically send whole states, but carry information is in the state change, not the state itself
•State change alone insufficient: 9→0 could be +1, +11, or -9—context matters
•Implementing discrete carry behavior in continuous gradient descent is fundamentally difficult
•Hopf fibration (3-sphere projected to 2-sphere with circle fibers) provides geometric model for carry-like phenomena
•Goal: build actual CPU-like computational primitives within neural network architectures

Machine Learning Street Talk (MLST)

Making deep learning perform real algorithms with Category Theory (Andrew Dudzik, Petar Velichkovich, Taco Cohen, Bruno Gavranović, Paul Lessard)

0:00 / 0:00

View original episode →

Summary

Jump to Topic

Why LLMs Fail at Basic Arithmetic and Multiplication

•ChatGPT fails at addition when patterns break (e.g., changing one digit in a trick question)
•Frontier models perform hundreds of billions of multiplications to produce one token, yet cannot reliably multiply small numbers
•Tool use (like calling calculators) isn't enough—internalizing computation provides stability and efficiency
•Complex reasoning requiring multiple sequential operations becomes impractical with external tool calls
•Video generation models (VEO, Genie) approximate physics but don't accurately encode Newton's laws

Geometric Deep Learning: Symmetries and Equivariance

•Equivariance means predictable outputs when inputs undergo irrelevant transformations (e.g., shifting an image still shows a cat)
•Graph neural networks use permutation equivariance—reordering nodes doesn't change the graph
•Building in symmetries reduces data requirements exponentially compared to learning from scratch
•Transformers are permutation-equivariant after adding position embeddings
•Group theory provides the mathematical foundation for spatial symmetries but has limitations

Limitations of Group Theory for Algorithmic Reasoning

•Algorithms like Dijkstra's or Bellman-Ford compress different inputs to the same output—information is lost
•Group symmetries require invertibility, which doesn't match how most programs work
•Many programs delete data or have outputs that don't uniquely determine inputs
•Path-finding algorithms: many different weighted graphs yield identical shortest paths
•Need framework that handles partial compositionality—not all operations can compose with all others

Category Theory as Algebra with Colors: Non-Square Matrices

•Categories formalize partial compositionality—when you can't always combine operations
•Non-square matrices illustrate the concept: m×n and l×m matrices only multiply when dimensions align
•This 'color matching' constraint appears throughout neural networks with varying layer dimensions
•Categories aren't mysterious—they capture everyday mathematical structures
•Provides systematic way to reason about type-safe composition in computation

Synthetic vs Analytic Mathematics and Structuralism

•Analytic math: everything built from basic substances (Descartes' coordinate geometry)
•Synthetic math: focus only on inference principles, ignore internal structure (Euclid's geometry)
•Structuralist mathematics eliminates 'noise'—details inaccessible to logic
•Category theory provides single language for diverse structures: groups, lists, trees, algebraic structures
•Systematic approach often produces meaningful definitions that domain experts independently discover

Two-Categories and Weight Tying in Neural Networks

•Two-morphisms model relationships between morphisms (functions relating functions)
•Weight tying in RNNs: same weights process each time step—but when is this valid?
•Two-morphisms formalize reparameterization and weight sharing constraints
•Framework applies beyond smooth spaces: works for manifolds, game theory, economic agents
•Higher categories (3+) capture emergent effects and compositional behavior at multiple scales

Semantics vs Syntax: Folds, Monoids, and Recursive Data Types

•Syntax: what you type (encounters syntax errors). Semantics: how programs behave
•Lists are foldable types—mathematically, they're monoids (generalized groups)
•Multiple syntaxes can express the same semantics (addition+negation vs subtraction)
•Semantic analysis enables theorem proving and comparison across different representations
•Categorical framework describes recursive computation: algebras for endofunctors capture list constructors and folds

The Carry Problem: Why GNNs Struggle with Basic Addition

•Carry mechanism: when a digit wheel goes 9→0, it increments the next wheel
•GNNs typically send whole states, but carry information is in the state change, not the state itself
•State change alone insufficient: 9→0 could be +1, +11, or -9—context matters
•Implementing discrete carry behavior in continuous gradient descent is fundamentally difficult
•Hopf fibration (3-sphere projected to 2-sphere with circle fibers) provides geometric model for carry-like phenomena
•Goal: build actual CPU-like computational primitives within neural network architectures

Machine Learning Street Talk (MLST)

Making deep learning perform real algorithms with Category Theory (Andrew Dudzik, Petar Velichkovich, Taco Cohen, Bruno Gavranović, Paul Lessard)

0:00 / 0:00

Making deep learning perform real algorithms with Category Theory (Andrew Dudzik, Petar Velichkovich, Taco Cohen, Bruno Gavranović, Paul Lessard)

Description

Summary

Jump to Topic

Why LLMs Fail at Basic Arithmetic and Multiplication

Geometric Deep Learning: Symmetries and Equivariance

Limitations of Group Theory for Algorithmic Reasoning

Category Theory as Algebra with Colors: Non-Square Matrices

Synthetic vs Analytic Mathematics and Structuralism

Two-Categories and Weight Tying in Neural Networks

Semantics vs Syntax: Folds, Monoids, and Recursive Data Types

The Carry Problem: Why GNNs Struggle with Basic Addition

Navigate

Chat with Episode

Summary

Jump to Topic

Why LLMs Fail at Basic Arithmetic and Multiplication

Geometric Deep Learning: Symmetries and Equivariance

Limitations of Group Theory for Algorithmic Reasoning

Category Theory as Algebra with Colors: Non-Square Matrices

Synthetic vs Analytic Mathematics and Structuralism

Two-Categories and Weight Tying in Neural Networks

Semantics vs Syntax: Folds, Monoids, and Recursive Data Types

The Carry Problem: Why GNNs Struggle with Basic Addition

Navigate

Chat with Episode