| Episode | Status |
|---|---|
The Transformer architecture (which powers ChatGPT and nearly all modern AI) might be trapping the industry in a localized rut, preventing us from finding true intelligent reasoning, according to the ...
Llion Jones (Transformer co-inventor) and Luke Darlow from Sakana AI discuss why the Transformer architecture may be trapping AI research in a local minimum, preventing discovery of better reasoning systems. They introduce Continuous Thought Machines (CTM), a biologically-inspired recurrent architecture with native adaptive compute that uses neuron synchronization and sequential internal reasoning. The paper achieved a NeurIPS 2024 spotlight and demonstrates superior calibration, natural adaptive computation, and novel problem-solving behaviors on tasks like maze solving, suggesting a path beyond current LLM limitations.
Jones argues the AI field is trapped in a Transformer local minimum similar to the RNN era. Before Transformers, endless RNN variants (LSTMs, GRUs) achieved incremental improvements (1.26→1.25 bits/char), but Transformers immediately jumped to 1.1, making all that research obsolete. Current Transformer tweaks may be similarly wasted effort when the next breakthrough arrives.
Sakana AI is built on Kenneth Stanley's 'Why Greatness Cannot Be Planned' philosophy - researchers should follow gradients of interestingness rather than fixed objectives. Jones emphasizes protecting researcher freedom from commercialization pressures, publication demands, and committee-driven agendas that narrow exploration and prevent breakthrough discoveries.
Jones uses the 'matrix exponentiation' paper's spiral classification task to illustrate fundamental limitations. ReLU networks solve spirals with piecewise linear boundaries - technically correct but lacking true understanding. Better architectures would represent spirals as spirals, enabling proper extrapolation and generalization rather than brute-force fitting.
Luke Darlow explains CTM's three core novelties: (1) internal sequential thought dimension for multi-step reasoning, (2) neuron-level models where each neuron is a small model processing temporal history, and (3) synchronization-based representations measuring how neuron pairs fire together over time, creating a richer d²/2 dimensional representation space.
CTM naturally exhibits adaptive computation without requiring carefully tuned penalty losses (unlike Alex Graves' ACT paper). The loss function trains on both lowest-loss and highest-certainty points, causing easy examples to solve in 1-2 steps while hard examples use full thinking time. Remarkably, the model is nearly perfectly calibrated without post-hoc tricks.
During maze training, CTM spontaneously developed sophisticated strategies: mid-training it would explore one path, realize it's wrong, backtrack and try another. Under time constraints, it invented a 'leapfrogging' algorithm - jumping ahead to approximate positions and filling in paths backwards, a faster algorithm humans don't typically use.
CTM's sequential reasoning enables path-dependent understanding - how you arrived at a conclusion matters, not just the conclusion itself. This allows agents to explore trajectories, construct understanding that 'carves the world up by the joints,' and potentially avoid the shortcut learning and hallucination problems of feed-forward models.
Darlow is actively exploring applying CTM to language modeling, viewing language as a type of maze with ambiguity. Future directions include multi-agent systems with shared memory structures - agents solving mazes with limited visibility but shared cultural memory, enabling collective intelligence and long-term memory capabilities.
Jones introduces Sudoku Bench, a reasoning benchmark using variant Sudokus with handcrafted unique constraints requiring strong NLU and meta-reasoning. Includes thousands of hours of expert reasoning traces from 'Cracking the Cryptic' YouTube channel. Current best models achieve only ~15% on simplest puzzles; GPT-5 shows improvement but still fails on human-solvable puzzles.
"I Co-Invented the Transformer. Now I'm Replacing It." & Continuous Thought Machines - Llion Jones and Luke Darlow [Sakana AI]
Ask me anything about this podcast episode...
Try asking: