| Episode | Status |
|---|---|
| Episode | Status |
|---|---|
We often think of Large Language Models (LLMs) as all-knowing, but as the team reveals, they still struggle with the logic of a second-grader. Why can’t ChatGPT reliably add large numbers? Why does it...
This episode explores fundamental limitations of current AI systems—like LLMs' inability to reliably perform basic arithmetic—and proposes Category Theory as a mathematical framework to transform deep learning from empirical "alchemy" into rigorous science. The discussion covers how categorical deep learning extends geometric deep learning to handle non-invertible computations (like algorithms that destroy information), introduces formal frameworks for weight tying and compositionality, and addresses the challenge of implementing discrete algorithmic operations (like carrying in addition) within continuous neural architectures.
Language models cannot reliably perform addition or multiplication despite executing billions of operations per token. They learn patterns that work often but fail when tested with edge cases. The discussion reveals fundamental misalignment between how these systems are trained and what we need them to do for reasoning and scientific applications.
Geometric deep learning builds neural networks that are equivariant to symmetry transformations (like translations, rotations, permutations). This approach dramatically reduces data requirements by encoding structural priors. Transformers are fundamentally permutation-equivariant models, which explains part of their success.
Group theory assumes all transformations are invertible, but many algorithms destroy information (like Dijkstra's shortest path algorithm). Different graphs can produce identical outputs, making the computation non-invertible. This fundamental limitation led the team to explore category theory as a more general framework.
Category theory generalizes algebra to handle situations where composition has constraints. The key insight: think of it as 'algebra with colors' where operations can only compose when types match, like non-square matrix multiplication requiring matching dimensions.
Category theory represents synthetic (vs analytic) mathematics—focusing on relationships and inference rules rather than internal construction. This structuralist approach eliminates irrelevant implementation details and reveals what actually matters for reasoning, similar to Euclidean geometry vs Cartesian coordinates.
Two-categories add another level: objects, morphisms between objects, and two-morphisms relating morphisms. This framework provides rigorous theory for weight sharing in neural networks, proving when weight tying preserves computational structure across diverse domains from RNNs to game theory.
The paper focuses on semantics (how programs behave) rather than syntax (what you type). Lists are semantically 'foldable types' or monoids. Different syntaxes can describe identical semantics, and working semantically enables cleaner mathematical proofs and more general frameworks.
A fundamental oversight in GNN design: the concept of 'carry' from elementary arithmetic. Traditional GNNs send whole states, but carry operations require transmitting state changes. Implementing discrete operations like carrying in continuous gradient-based systems requires sophisticated geometric structures like the Hopf fibration in 4D space.
Making deep learning perform real algorithms with Category Theory (Andrew Dudzik, Petar Velichkovich, Taco Cohen, Bruno Gavranović, Paul Lessard)
Ask me anything about this podcast episode...
Try asking: