| Episode | Status |
|---|---|
| Episode | Status |
|---|---|
Adam Marblestone is CEO of Convergent Research. He’s had a very interesting past life: he was a research scientist at Google Deepmind on their neuroscience team and has worked on everything from brain...
The big million dollar question that I have that I've been trying to get the answer to through all these interviews with the AI researchers, how does the brain do it? Right? Like, we're throwing way more data at these LLMs and they still have a small fraction of the total capabilities that a human does. So what's going on?
Yeah. I mean, this might be the quadrillion dollar question or something like that. It's it's it's arguable you can make an argument this is the most important, you know, question in science. I don't claim to know the answer. I I also don't really think that the answer will necessarily come even from a lot of smart people thinking about it as much as they are.
My my overall, like, meta level take is that we have to empower the field of neuroscience to just make neuroscience a a more powerful field technologically and otherwise to actually be able to crack a question like this. But maybe the the way that we would think about this now with, like, modern AI, neural nets, deep learning, is that there's sort of these these certain key components of that. There's the architecture. There's maybe hyperparameters of the architecture. How many layers do you have or sort of properties of that architecture?
There is the learning algorithm itself. How do you train it, back prop, gradient descent, is it something else? There is, how is it initialized? So if we take the learning part of the system, it still may have some initialization of the weights. And then there are also cost functions.
There's like, what is it being trained to do? Yeah. What's the reward signal? What are the loss functions? Supervision signals.
My personal hunch within that framework is that the the field has neglected the role of this very specific loss functions, very specific cost functions. Machine learning tends to mathematically simple loss functions, predict the next token, You know, cross entropy, these these these these simple kind of computer scientist loss functions. I think evolution may have built a lot of complexity into the loss functions. Actually, many different loss functions for different areas turned on at different stages of development. A lot of Python code, basically, generating a specific curriculum for what different parts of the brain need to learn.
Because evolution has seen many times what was successful and unsuccessful, and evolution could encode the knowledge of of the learning curriculum. So so in the in the machine learning framework, maybe we can come back, and we can talk about, yeah, where do the loss functions of the brain come from? Can that can loss different loss functions lead to different efficiency of learning?
You know, people will say, like, the cortex has got the universal human learning algorithm, the special science that humans have. What's up with that?
A huge question, and we don't know. I've seen models where what the cortex, you know, the cortex has typically this, like, six layered structure layers in a slightly different sense than layers of a neural net. It's like any one location in the cortex has six physical layers of tissue as you go in layers of the sheet. And then those areas then connect to each other, and that's more like the layers of a network. I've seen versions of that where what you're trying to explain is actually just how does it approximate back prop.
Yeah. And what is the cost function for that? What is the network being asked you to do? If you are sort of are trying to say it's something like back prop, is it doing back prop on next token prediction? Is it doing back prop on Exactly.
Classifying images, or or what is it doing? And no one no one knows, but I think I think one one thought about it, one possibility about it is that it's just this incredibly general prediction engine. So so any one area of cortex is just trying to predict any basically, can it learn to predict any subset of all the variables it sees from any other subset? So, like, omnidirectional inference or omnidirectional prediction, whereas an LLM is just you see everything in the context window, and then it it computes a very particular Yeah. Conditional probability, which is given all the last thousands of things, what is the very probabilities for all the all the the next token?
Yeah. But it would be weird for a large language model to say, you know, you know, the quick brown fox blank blank, the lazy dog, and fill in in the middle Yeah. Versus do the next token if it if it's if it's doing just forward. It can learn how to do that stuff in this emergent level of in context learning, but natively, it's just predicting the next token. What if the cortex is just natively made so that it can you know, any area of cortex can predict any pattern in any subset of its inputs given any other missing subset?
That is a little bit more like, quote, unquote, probabilistic AI. I think a lot of the things I'm saying, by the way, are extremely similar to like what Jan Lecun would say. Yeah. He's really interested in these energy based models and something like that. It's like the joint distribution of all the variables.
What is the what is the likelihood or unlikelihood of just any combination of variables? And if I if I clamp some of them, I say, well, definitely, these variables are in these states, then I can compute with probabilistic sampling, for example, I can compute, okay, conditioned on these being set in this state, what are and these could be any arbitrary subset of of of variables in the model. Can I predict what any other subset is gonna do and sample from any other subset given clamping this subset? And I could choose a totally different subset and sample from that subset. So it's omnidirectional inference.
And so, you know, it that could be there's some parts of air of cortex that might be like association areas of cortex that may, you know, predict vision from audition. Yeah. There might be areas that predict things that the more innate part of the brain is gonna do. Because remember, this whole thing is basically riding on top of the sort of a lizard brain and lizard body, if you will. And that thing is a thing that's worth predicting too.
So you're not just predicting, do I see this or do I see that? But I is this muscle about to tense? Am I about to have a reflex where I laugh? You know, is my heart rate about to go up? Am I about to activate this instinctive behavior?
Based on my higher level understanding of, like, I can match, somebody has told me there's a spider on my to this lizard part that would activate if I was like literally seeing a spider in front of me. And now you you learn to associate the two so that even just from somebody hearing you say, there's a spider on your back.
Yeah. Let's well, let's come back to this. And this this is partly having to do with with Steve Byrne's theories, which I'm recently obsessed about.
But Yeah.
But on your podcast with Ilya, he said, look. I'm not aware of any any good theory of how evolution encodes high level desires or intentions. I think this is, like, this is, like, very connected to to to all of these questions about the loss functions and the cost functions that the brain would use, and it's a really profound question. Right? Like like, let's say that I am embarrassed for saying the wrong thing on your podcast because I'm imagining that Young Lakun is listening, and he says, that's not my theory.
You describe energy based models really badly. That's gonna enact activate in me innate embarrassment and shame, and I'm gonna wanna go hide and and whatever. And so that's gonna activate these innate reflexes. And that's important because I might otherwise get get killed by Jan Lakun's, you know, marauding army of of other
Differentiatory researchers are coming for you, Adam.
And so it's important that I have that instinctual response. But, of course, evolution has never seen Yong Lakun or known about energy based models or known what a a a a important scientist or a podcast is. And so somehow, the brain has to encode this desire to, you know, not not piss off really important, you know, people in the tribe or something like this in a very robust way without knowing in advance all the things that the the learning subsystem, okay, of the brain, the part that is learning, cortex and other parts, the cortex is gonna learn this world model that's gonna include things like Jan Lakun and and podcasts. And evolution has to make sure that that those neurons, whatever the young lacoon being upset with me neurons, get properly wired up to the shame response or this part of the reward function. And this is important.
Right? Because if we're gonna be able to seek status in the tribe or learn from knowledgeable people, as you said, or things like that, exchange knowledge and skills with friends, but not with enemies. I mean, we have to learn all this stuff. So it has to be able to robustly wire these learned features of the world, learned parts of the world model up to these innate reward functions, and then actually use that to then learn more. Right?
Because next time, I'm not gonna try to piss off John Lecune if he emails me that that And I got this so we're gonna do further learning based on that. So it's in constructing the reward function, it has to use learned information. But how can evolution evolution didn't know about young lacoutin, so how can how can it how can it do that? And so the basic idea that Steve Burns is proposing is that, well, part of the cortex or or other areas like the amygdala that learn, what they're doing is they're modeling the steering subsystem. Steering subsystem is the part with these more innate innately programmed responses and the innate programming of these series of reward functions, cost functions, bootstrapping functions that exist.
So there are parts of the amygdala, for example, that are able to monitor what what those parts do and predict what those parts do. So so how do you find the neurons that are important for social status? Well, you have some innate heuristics of social status, for example, or you have some innate innate heuristics of friendliness that that the steering subsystem can use. And the steering subsystem actually has its own sensory system, which is kinda crazy. So we think of, you know, vision as being something that the cortex does.
Mhmm. But there's also a steering subsystem, subcortical visual system called the superior colliculus with innate ability to detect faces, for example, or threats. So it so there's a visual system that has innate heuristics, and that the steering subsystem has its own responses. So there'll be part of the amygdala or part of the cortex that is learning to predict those responses. And so what are the neurons that are that matter in the cortex for social status or for friendship?
Or they're the ones that predict those innate heuristics for friendship. Right? So you train a predictor in the cortex, and you say, which neurons are part of the predictor? Those are the ones that are now it's now you've actually managed to wire it up. Yeah.
This is fascinating. I I feel like I still don't understand. I understand how the cortex could learn how this primitive part of the brain would respond to so it can obviously it has these labels on here's literally a picture of a spider and this is bad. Like be scared of this.
Right.
And then the cortex learns that this is bad because the innate part tells it that. But then it has to generalize to, okay, the spider's on my back. Yes. And somebody's telling me the spider's on your back. That's also bad.
Yes. But it never got supervision on that. Right. So how does
it Well, it's because the learning subsystem is a powerful learning algorithm that does have generalization, that is capable of generalization. So the steering subsystem, these are the innate responses. So you're going to have some, let's say, built into your steering subsystem, these lower brain areas, hypothalamus, brainstem, etcetera. And, again, they include they have their own primitive sensory systems. So there may be an innate response.
If I see something that's kind of moving fast toward my body that I didn't previously see was there and is kind of small and dark and high contrast, that might be an insect kind of skittering onto my body, I am going to, like, flinch. Right? And so there are these innate responses. And so there's gonna be some group of neurons, let's say, in the hypothalamus that is the I am flinching. Yep.
Or I just flinched. Right. Right? The the the the I just flinched neurons in the hypothalamus. So when you flinch, first of all, that negative contribution to the reward function, you didn't want that to happen perhaps.
But that's only hap that's a reward function then that is it doesn't have any generalization in it, so I'm gonna avoid that exact situation of the thing skittering toward me. And maybe I'm gonna avoid some actions that lead to the thing skittering. So that's that's something a generalization you can get. What Steve calls it is downstream with the reward function. So I'm gonna avoid the situation where the spider was skittering toward me.
But you're also gonna do something else. So there's gonna be, like, a part of of your amygdala, say, that is saying, okay, a few, you know, a few milliseconds, you know, hundreds hundreds of milliseconds or seconds earlier, could I have predicted that flinching response? It's going to be it's going be a group of neurons that is essentially a classifier of am I about to flinch? And I'm gonna have classifiers for that for every important steering subsystem variable that evolution needs to take care of. Am I about to flinch?
Am I talking to a friend? Should I laugh now? Is the friend high status? Whatever variables the hypothalamus brainstem contain, am I about to taste salt? So that's gonna have all these variables.
And for each one, it's gonna have a predictor. It's gonna train that predictor. Now the predictor that it trains, that can have some generalization. And the reason it can have some generalization is because it just has a totally different input. So its input data might be things like the word spider.
Right? But the word spider can activate in all sorts of situations that lead to the world word spider activating in your word world model. So, you know, if you have a a complex world model with really complex features, that inherently gives you some generalization. It's not just the thing skittering toward me. It's even the word spider or the concept of spider is gonna cause that to trigger.
And this predictor can learn that. So whatever spider neurons are in my world model, which could even be a book about spiders or somewhere a room where there are spiders or whatever that is.
The the amount of heebie jeebies that this this conversation is eliciting in the audience is like So
now I'm activating your steering subsystem. Your your steering subsystem spider hypothalamus subgroup of neurons of of skittering insect are activating based on these very abstract concepts in the conversation.
Keep going. I'm gonna have to put in a trigger warning.
That's because that's because you learned this. And the the the cortex inherently has the ability to generalize because it's just predicting based on these very abstract variables and all these integrated information that it has, whereas the the steering subsystem only can use whatever the superior colliculus and a few other sensors to spit out. So
By way, it's remarkable that the person who's made this this connection between different pieces of neuroscience, Stephen Burns, former physicist Yeah. Has, for the last few years, has been trying to synthesize.
He's an AI safety researcher. He's just synthesizing. This comes back to the academic incentives. Right. And I think that this is it's this is a little bit hard to say, what's the exact next experiment?
How am I gonna publish a paper on this? How am I gonna train my grad student to do this? Very very speculative. But there's a lot in the neuroscience literature, and Steven has been able to pull this together. And I think that Steve has an answer to Ilyo's question, essentially, which is which is how how does the brain ultimately code for these higher level desires and link them up to the more primitive rewards?
Yeah.
Very naive question. But why can't we achieve this omnidirectional inference by just training the model to not just map from a token to next token, but remove the masks in the training. So it maps every token to every token or, come up with more labels between video and audio and text so that it it's forced to map one to each one.
I mean, that may be that may be the way. So it's it's not clear to me. Some people think that there's sort of a different way that it does probabilistic inference or different learning algorithm that isn't back prop. There might be other ways of learning energy based models or other things like that that you can imagine, but that is involved in being able to do this and that the brain has that. But I think there's a version of it where, you know, the what the brain does is, like, crappy versions of backprop to learn to predict, you know, through a few layers.
And that, yeah, it's it's kinda like a multimodal foundation model. Right. Yeah. So maybe the cortex is just kind of like a certain kinds of foundation models there. You know, some LLMs are maybe just predicting the next token, but, you know, vision models maybe are trained in learning to fill in the blanks or reconstruct different pieces or combinations.
But but I think that it does it in an extremely flexible way. So it's you know, if you train a model to just fill to fill in this blank at the center, okay, that's great. But what if you didn't train it to to fill in this other blank over to the left, then it doesn't know how to do that. It's not part of its, like, repertoire of predictions that are, like, amortized into the network. Whereas with a really powerful inference system, you could choose at test time, you know, what is the the sub you know, the the the subset of variables it needs to infer and what which ones are clamped.
Okay. Two sub questions. One, it makes you wonder whether the thing that is lacking in artificial neural networks is less about the reward function and more about the encoder or the embedding, which, like maybe the issue is that you're not representing video and audio and text in the right latent abstraction such that they could intermingle and, conflict. Maybe this is also related to why LLMC mad at drawing connections between different ideas. Like, it's like, are the ideas represented at a level of generality at which you could you could notice different connections?
Is these questions are all commingle. So if we don't know if it's doing a prop like learning, and we don't know if it's doing energy based models, and we don't know how these areas are even connected in the first place, it's, like, very hard to, like, really get to the the ground truth of this. But, yeah, it's possible. I mean, I think that people have done some work. My friend, Joel DiPello, actually did something some years ago where I think he put a model I think it was a model of v one of sort of specifically how the the early visual cortex represents images, and put that as, like, an input into, like, a ConvNet, and that, like, improves some things.
So it could be it could be, like, differences. The retina is also doing, you know, motion detection, and certain things are kind of getting filtered out. So there there may be some preprocessing of the sensory data. There may be some clever combinations of which modalities are predicting which or so on that that lead to better representation. There may be much more clever things than that.
Some people certainly do think that there's inductive biases built in the architecture that will shape the representations, you know, differently or that there are clever things that you can do. So Astera, which is the the the same organization that employs Steve Barents, just launched this neuroscience project based on Doris So's work, and she has some ideas about how you can build vision systems that basically require less training. They put in they in build into the assumptions of the design of the architecture that things like objects are bounded by surfaces, and this, you know, surfaces have certain types of shapes and relationships of how they include each other and stuff like that. So it may be possible to build more assumptions into the network. Evolution may have also put some changes of architecture.
It's just I think that also the cost functions and so on may be a a key a key thing that it does.
So Andy Jones has this amazing 2021 paper where he uses AlphaZero to show that you can trade off test time compute and training compute. And while that might seem obvious now, this was three years before people were talking about inference scaling. So this got me thinking: Is there an experiment you could run today, even if it's a toy experiment, which would help you anticipate the next scaling paradigm? One idea I had was to see if there was anything to multi agent scaling. Basically, if you have a fixed budget of training compute, are you gonna get the smartest agent by dumping all of it into training one single agent or by splitting that compute up amongst a bunch of models, resulting in a diversity of strategies that get to play off each other?
I didn't know how to turn this question into a concrete experiment, though, so I started brainstorming with Gemini three Pro in the Gemini app. Gemini helped me think through a bunch of different judgment calls. For example, how do you turn the training loop from self play to this kind of coevolutionary league training? How do you initialize and then maintain diversity amongst different AlphaZero agents? How do you even split up the compute between these agents in the first place?
I found this clean implementation of AlphaGoZero, which I then forked and opened up in Antigravity, which is Google's agent first IDE. The code was originally written in 2017, and it was meant to be trained on a single GPU of that time. But I needed to train multiple whole separate populations of AlphaZero agents, so I needed to speed things up. I rented a beefcake of a GPU node, but I needed to refactor the whole implementation to take advantage of all this scale and parallelism. Gemini suggested two different ways to parallelize self play: one which would involve higher GPU context switching, and the other would involve higher communication overhead.
I wasn't sure which one to pick, so I just asked Gemini. And not only did it get both of them working in minutes, but it autonomously created and then ran a benchmark to see which one was best. It would've taken me a week to implement either one of these options. Think about how many judgment calls a software engineer working on an actually complex project has to make. If they have to spend weeks architecting some optimization or feature before they can see whether it will work out, they will just get to test out so many fewer ideas.
Anyways, with all the stuff from Gemini, I actually ran the experiment and got some results. Now please keep in mind that I'm running this experiment on an anemic budget of compute, and it's very possible I made some mistakes in implementation. But it looks like there can be gains from splitting up a fixed budget of training compute amongst multiple agents rather than just dumping it all into one. Just to reiterate how surprising this is, the best agent in the population of 16 is getting one sixteenth the amount of training compute as the agent trained on self play alone, and yet it still outperforms the agent that is hogging all of the compute. The whole process of vibe coding this experiment with Gemini was really absorbing and fun.
It gave me the chance to actually understand how AlphaZero works and to understand the design space around decisions about the hyperparameters and how search is done and how you do this kind of coevolutionary training rather than getting bogged down in my very novice abilities as an engineer. Go to Gemini dot Google dot com to try it out. I wanna talk about this idea that you just glanced off of, which was amortized inference. And maybe I should try to explain what I think it means because I think it's probably wrong and this this will help you correct
few years for me too. So okay.
Right now the way the models work is you have an input, it maps it to an output. And this is amortizing a process that the the real process, which we think is like what intelligence is, which is like you have some prior over how the world could be, like what are the causes that make the work world the way that it is. And then the way you, when you see some observation, you should be like, okay, here's all the ways the world could be. This cause explains what's happening best. Now the like doing this calculation over every possible cause is computationally intractable.
So then you would just have to sample like, oh, here's a potential cause. Does this explain this observation? No, forget it. Let's keep And then eventually you get the cause, the cause, then the cause explains the observation and then this becomes your posterior.
That's actually pretty good I think of sort of, yeah. Yeah. This Bayesian inference, like, in general is, like, of this very intractable thing.
Right.
It the algorithms that we have for doing that tend to require taking a lot of samples, Monte Carlo methods, taking a lot of samples. Yeah. And taking samples takes time. I mean, this is like the original, like, Boltzmann machines and stuff. We're using Yeah.
Techniques like this. And still, it's used with probabilistic programming, other types of methods often. And so yeah. So the Bayesian inference problem, which is basically the problem of perception, like, given some model of the world and given some data, like, how should I update my how what what what are the, like, the the variables, you know, missing variables in my in my internal model?
And I guess the the idea is that neural networks are hopefully obviously there's mechanistically the neural network is not starting with like, here is my model of the world and I'm gonna try to explain this data. But the hope is that instead of starting with, hey, does this cause explain this observation? No. Did this cause explain this explanation? Yes.
What you do is just like observation.
What's the most what's the cause that we the neural net thinks is is the best one.
Observation to cause. So the feed forward, like, goes observation to cause.
Observation to cause.
To then the output
that Yes. You don't have to you don't have to evaluate all these energy values or whatever and and and sample around to make them higher and lower. You just say, approximately that process would result in this being the top one or something like that. Yeah.
One way to think about it might be that test time compute, inference time compute is actually doing this sampling again because you literally read its shade of thought. It's like actually doing this toy example we're talking about where it's like, oh, can I solve this problem by doing X? Yeah. I need a different approach. And this raises the question.
I mean, time, is the case that the capabilities, were, which required inference time compute to elicit get distilled into the model. So you're amortizing the thing, which previously you needed to do these like rollouts, these like Monte Carlo rollouts to, to figure out. And so in general, maybe there's this principle of digital minds, which can be copied, have different trade offs, which are relevant than biological minds, which cannot. And so in general, it should make sense to amortize more things because you can literally copy the copy the amortization, right? Or copy the things that you have, sort of like built in.
Yeah.
And it's maybe this is a tangential question where it might be interesting to speculate about in the future, as these things become more intelligent and the way we train them becomes more economically rational. What will make sense to amortize into these minds, which evolution did not think it was worth amortizing into biological minds, and you have to retrain everything.
Right. I mean, first of all, I think the probabilistic AI people would be like, of course, you need test time compute because this inference problem is really hard. And the only ways we know how to do it involve lots of test time compute. Otherwise, it's just a crappy approximation that's never gonna, like you have to do infinite data or something to, like, make this. So I think that some of the probabilistic people will be like, no.
It's, like, inherently probabilistic, and, like, amortizing it in this way, like, just doesn't make sense. And so and they might then also point to the brain and say, okay. Well, the brain, the neurons are kind of stochastic, and they're sampling, and they're doing doing things. And so maybe the brain actually is doing more like the non amortized inference, the real inference. But it's also kinda strange how perception can work in just milliseconds or whatever.
Doesn't seem like it uses that much sampling, so it's also clearly also doing some kind of baking things into approximate forward passes or something like that to do this. And yeah. So in the future, you know, I don't know. I mean, I think is it already a trend to some degree that things that are people were having to use test time compute for are getting, like, used to train back the the base model. Right?
Yeah. Yeah. That now it can do it in one pass. Right. Yeah.
So I mean, I think, yeah, you know, maybe evolution did or didn't do that. I think evolution still has to pass everything through the genome, right, to build the network. So and the environment in which humans are living is very dynamic. Right? And so, maybe that's if we believe this is true, that that there's a learning subsystem per Steve Burns and a steering subsystem, that the the learning subsystem doesn't have a lot of, like, preinitialization or pretraining.
It has a certain architecture, but then within lifetime, it learns. Then evolution didn't, you know, actually, like, immortize that much into that network.
Right.
It immortized it instead into so set of innate behaviors in a set of these bootstrapping cost functions Yeah. Or ways of building up very particular reward signals.
Yeah. Yeah. This framework helps explain this mystery that people have pointed out and I've asked a few guests about, which is, if you want to analogize evolution to pre training, well, do you explain the fact that so little information is conveyed through the genome? So three gigabytes is the size of the total human genome. Obviously a small fraction of that is actually relevant to coding at the brain.
Yeah.
And if previously people made this analogy that actually pre, evolution has found the hyper parameters of the model, the numbers which tell you how many layers should there be, the architecture basically, right? Like how should things be wired together? But if a big part of the story that increases sample efficiency, aids learning, generally makes systems more performant is the reward function is the loss function. Yeah. And if evolution found those loss functions, which aid learning, then it actually kind of makes sense how, so you can like build an intelligence with so little information because, like, the reward function hey.
You're, like, right in Python. Right? The reward function is, like, literally a line. Yes. And so you just, have, a thousand lines like this, and that's doesn't take up that much space.
Yes. And it also gets to do this generalization thing with the the thing I the thing was describing where we were talking with about the spider, right, of where it learns that just the word spider, you know, triggers the spider, you know, reflex or whatever. It gets to exploit that too. Right? So it gets to build a reward function that actually has a bunch of generalization in it just by specifying these innate spider stuff and the thought assessors, as Steve calls them, that do the learning.
So that's, like, potentially a really compact solution to building up these more complex reward functions too that you need. So it doesn't have to anticipate everything about the future of the reward function, just anticipate what variables are relevant and what are heuristics for finding what those variables are. And then yeah. So then it has to have a very compact specification for the learning algorithm and basic architecture of the learning subsystem. And then it has to specify all this Python code of, like, all the stuff about the spiders, and all the stuff about friends, and all the stuff about your mother, and all the stuff about meeting, and and social groups, and joint eye contact.
It has to specify all that stuff. And so is this really true? And so I think that there is some evidence for it. So so Fei Chen and and Evan Macosko and various other researchers who have been doing, like, these single cell atlases. So one of the things that neuroscience technology or scaling up neuroscience technology, again, this is kind of like my one of my obsessions, has done through through the brain initiative, a big, you know, neuroscience funding program is they've basically gone through different areas, especially the mouse brain, and mapped where are the different cell types.
How many different types of cells are there in different areas of cortex? Are they the same across different areas? And then you you look at these subcortical regions, are more like the steering subsystem or reward function generating regions. How many different types of cells do they have, and which neurons types do they have? We don't know how they're all connected and exactly what they do or what the circuits are or what they mean, but you can just, like, quantify, like, how many different kinds of cells are there with sequencing the RNA.
And there are a lot more weird and diverse and bespoke cell types in the steering subsystem, basically, than there are in the learning subsystem. Like, the cortical cell types, there's enough to build it seems like there's enough to build a learning algorithm up there and specify some hyperparameters. And in the in the steering subsystem, there's, like, a gazillion, you know, thousands of really weird cells, which might be, like, the one for the spider flinch reflex and the one for I'm about to taste salt and the
one So why would each reward function need a different cell type?
Well, so this is where you get innately wired circuits. Right? So in the in the learning algorithm part, in this in the learning learning subsystem, you set up that's why the initial architecture, you specify a learning algorithm is all all the all the all the juices is happening through plasticity of the synapses, changes of the synapses within that big network, but it's kind of like a relatively repeating architecture, how it's initialized. It's just like the amount of Python code needed to make, you know, a eight layer transformer is not that different from one to make it a three layer transformer. Right?
You're just replicating. Yeah. Whereas, all this Python code for the reward function, you know, if superior click list sees something that's skittering and it lands, you know, you're feeling goosebumps on your skin or whatever, then trigger spider reflex, that's just a bunch of, like, bespoke species specific situation specific crap that no. The cortex doesn't know about spiders, it just knows about layers and Right.
And learning. The the only way to have this, like, write this reward function Yeah. Is to have a special cell type.
Yeah. Okay. Yeah. Well, I think so. I think you either have to have a special cell types or you have to otherwise get special wiring rules that evolution can say, this neuron needs to wire to this neuron without any learning.
And the way that that is most likely to happen, I think, is that those cells express, like, different receptors and proteins that say, okay. When this one comes in contact with this one, let's form a synapse. This So it's genetic wiring. Yeah. And those need cell types to do it.
Yeah.
I'm sure this would make a lot more sense if I knew one zero one neuroscience, but, like, it seems like there's still a lot of complexity or generality rather in the steering subsystem. So in the steering subsystem has its own visual, system that's separate from the visual cortex.
Yeah. Different
features still need to plug into that vision system in the so like the spider thing needs to plug into it and also the, the, love thing needs to plug into it, etcetera, etcetera.
Yes.
So it seems complicated. Like, I
know it's still complicated, that's that's all the more reason why a lot of the genomic, you know, real estates in the genome and in terms of these different cell types and so on would go into wiring up the steering subsystem. And can we tell Prewiring Can we
tell how much of the genome is like clearly working? So I guess you could tell how many are relevant to the producing the RNA that manifest or the epigenetics that manifest in different cell types in the brain, right?
Yeah, this is what the cell types helps you get at it. I don't think it's exactly like, oh, this percent of the genome is doing this. But you could say, okay. In these all these steering subsystem subtypes, you know, how many different genes are involved in sort of specifying which is which and how they wire, and how much genomic real estate do those genes take up versus the ones that specify, you know, visual cortex versus audio auditory cortex, you kinda are just reusing the same genes to do the same thing twice. Whereas the spider reflex hooking up yes.
You're right. They have to they have to build their vision system, they have to build some auditory systems and touch systems and navigation type systems. So, you know, even feeding into the hippocampus and stuff like that, there's head direction cells. Even the fly brain, it has innate circuits Yeah. That, you know, figure out its orientation and help it navigate in the world, and it uses vision, figure out its optical flow of of how it's flying, and, you know, how is it how is its flight related to the wind direction.
It has all these innate stuff that I think we in the mammal brain, we would all put that and lump that into the steering subsystem. So there's a lot of work. So all the genes basically that go into specifying all the things a fly has to do, we're gonna have stuff like that too just all in the steering subsystem.
And but do we do we have some estimate of, like, here's how many nucleotides, here are many megabases it takes to
I I don't know. I mean, but but but, I mean, I think peep you might be able to talk to biologists about this, you know, to to some degree because you can say, well, we just have a ton in common. I mean, we have a lot in common with yeast from a genes perspective. Yeast is still used as a model Yeah. For, you know, some amount of drug development and stuff like that in biology.
And so so much of the genome is just going towards you have a cell at all. It can recycle waste. It can get energy. It can replicate. And then it then you see what we have in common with a mouse.
And so we we do know at some level that, you know, the difference is us and a chimpanzee or something, and that includes the social instincts and the more advanced, you know, differences in cortex and so on. It's it's a it's a tiny number of genes that go into these additional amount of making the eight layer transformer instead of the six layer transformer or tweaking that reward function.
Yeah. This would help explain why the hominid brain exploded in size so fast, which is presumably like, tell me this is correct, but under the story we, social learning or some other thing increased the ability to learn from the environment, like increased our sample efficiency, right? Instead of having to go and kill the boar yourself and figure out like how to do that, can just be like, the elder told me this is how you make a spear. And then now it increases the incentive to have a bigger cortex, which can like learn these things.
And that can be done with a relatively few genes because it's really it's really replicating what the mouse already has is making more of it. It's maybe not exactly the same, and there may be tweaks, but it's like from a perspective, you don't have to reinvent Right. All this stuff. Right?
And so then how far back in the history of the evolution of the brain does the cortex go back? And is the idea that like the cortex has always figured out this omnidirectional inference thing that that's been a solve problem for a long time. And then with the big unlock with primates is this, we got the reward function which increased the returns to having omnidirectional inference or is the is cortex the omnidirectional inference also something that took a while to unlock?
I'm not sure that there's agreement about that. I think there might be specific questions about language, you know, are there tweaks to be whether that's through auditory and memory, some combination of auditory memory regions. There may also be macro wiring of you need to wire auditory regions into memory regions or something like that and into some of these social instincts to get
I see.
Language, for example, to happen. So there might be but that might be also a small number of gene changes
Yep.
To be able to say, oh, I just need from my temporal lobe over here going over to the auditory cortex something. Right? And there is some evidence for the, you know, De Broca's area, Wernicke's area. They're connected with this hippocampus and so on. And so prefrontal cortex.
So there's, like, some small number of genes maybe for, like, enabling humans to really properly do language. That could be a big one. But yeah. I mean, I think that is it that something changed about the cortex, and it became possible to do these things? Or is that that potential was already there, but there wasn't the incentive to expand that capability and then use it, wire it to these social instincts, and and use it more.
Mhmm. I mean, I would lean somewhat toward the latter. I mean, I think a mouse has a lot of similar similarity in terms of cortex as a human. Right.
Although there's that, the Suzette and Hercula Hussal work of the the, the the number of neurons scales better with weight with primate brains than it does with rodent brains. Right? So Yeah. Does that suggest that there actually was some improvement in the scalability of the cortex?
Maybe. Maybe. I'm not I'm not super deep on this. There may there may have been, yeah, changes in architecture, changes in the folding, changes in neuron properties and stuff that that somehow slightly tweak this, but there's still a scaling.
That's right. That's right.
Either way. Right? And so I I was not saying there aren't something special about humans in the architecture of the learning subsystem at all. But, yeah, I mean, it's I think it's pretty widely thought that this has expanded, but then the question is, okay. Well, how does that does that fit in also with the steering subsystem changes and the instincts that make use of this and allow you to bootstrap using this effectively?
But, yeah, I mean, just to say a few other things. Mean, so even the fly brain has some amount of, for example, even even very far back I mean, I think you've read this this great book, The Brief History of Intelligence. Right? I think this is a really good book. Lots of AI researchers think this is a really good book, it seems like.
Yeah. You have some amount of learning going back all the way to anything that has a brain, basically. You have something kind of like primitive reinforcement learning, at least, going back at least to, like, vertebrates. Like, imagine, like, a zebrafish. There's kind of these other branches.
Birds maybe kind of reinvented something kind of cortex like, but it doesn't have the six layers. Mhmm. But they have something a little bit cortex like, So that that some of those things, after reptiles, in some sense, birds and mammals, both kind of made us up somewhat cortex like, but differently organized thing. But even a fly brain has associative learning centers that actually do things that maybe look a little bit like this thought assessor concept from Behrens, where there's a specific dopamine signal to train specific subgroups of neurons in the fly mushroom body to associate different sensory information with, am I gonna get food now or am I gonna get hurt now?
Yeah. Brief tangent. I remember reading in, one blog post that Darren Milledge wrote that the parts of the cortex, are associated with audio and vision have scaled disproportionately between other primates and humans, whereas the parts associated say with odor have not. And I remember him saying something like, this is explained by that kind of data having worse scaling law properties. But I think the, and maybe he meant this, but another interpretation of actually what's happening there is that these social reward functions that are built into the Syrieg subsystem needed to make use more of being able to see your elders and see what the visual cues are and hear what they're saying.
In order to make a sense of these cues, which guide learning, you needed to activate these Yeah. Activate the vision and audio more than
I mean, there's all this stuff. I feel like it's come up in in your your shows before actually, but, like, the design of the human eye where you have the pupil and the white and everything, we are designed to be able to establish relationships based on joint eye contact. Maybe And this came up in the Sudden episode, I can't remember. But yeah. We're we we have to bootstrap to the point where we can detect eye contact and where we can communicate by language.
Right? And that's what the first couple years of life are are trying to do. Yeah.
Okay. I wanna ask you about RL. So currently the way these LNs are trained, you know, they are, if if they solve the unit test or solve a math problem, that whole trajectory, every token in that trajectory is up weighted. And what's going on with humans? Is there are there different types of model based versus model free that are happening in different parts of the brain?
Yeah. I mean, this is this is another one of these things. I mean, again, all my answers to these questions any specific thing I say is all just kind of, like, directionally, this is we can kind of explore around this. I find this interesting. Maybe the lit I feel like the literature points in these directions in some very broad way.
What I actually wanna do is, like, go and map the entire mouse brain and, like, figure this out comprehensively and, like, make neuroscience the ground truth science. So I don't know, basically. But but, yeah, I mean, there so first of all, I mean, I think with Ilya on the podcast, I mean, he was like, it's weird that you don't use value functions. Right. Right?
You use, like, the most dumbest form of RL based on of course, there are these people are incredibly smart, and they're optimizing for how to do it on GPUs, and it's really incredible what they're achieving. But, like, conceptually, it's a really dumb form of RL even compared to, like, what was being done in, like, ten years ago. Right? Like, even, you know, the Atari game playing stuff, right, was using, like, Q learning, which is basically like it's a kind of temporal difference Yep. Learning.
Right? The And temporal difference learning basically means you have some kind of a value function of, like, what action I choose now doesn't just tell me literally what happens immediately after this. It tells me, like, what is the long run consequence of that from my expected, you know, total reward or something like that. And so, you would have value functions like the fact that we don't have, like, value functions at all is, like, in the LLMs is, like it's crazy. I mean, I I think I think because Ilya said it, I I can say it.
I know, you know, one one hundredth of what he does about AI, but, like, it's kinda crazy that this is working. Yeah. But, yeah, I mean, in terms of the brain, well so I think there are some parts of the brain that are thought to do something that's very much like model free RL. That's sort of parts of the basal ganglia, sort of striatum and basal ganglia. They have, like, a a certain finite like, it is thought that they have a certain, like, finite relatively small action space.
And the types of actions they could take, first of all, might be, like, tell the spinal cord or tell the brain stem and spinal cord to do this motor action. Yes, no. Or it might be more complicated cognitive type actions, like tell the thalamus to allow this part of the cortex to talk to this other part or release the memory that's in the hippocampus and start a new one or something. Right? There is but there's some finite set of actions that kinda come out of the basal ganglia, and that it's just a very simple RL.
So there are probably parts of other brains in our brain that are just, like, doing very simple naive type RL algorithms. Layer one thing on top of that is that some of the major work in neuroscience, like Peter Diane's work and a bunch bunch of work that is part of why I think DeepMind did the temporal difference learning stuff in the first place, is they were very interested in neuroscience. And there's a lot of neuroscience evidence that the dopamine is giving this reward prediction error signal, rather than just reward, yes, no, you know, a gazillion time steps in the future. It's a prediction error, And that's consistent with learning these value functions. So there's that.
And then there's maybe higher order stuff. So we have these cortex making this world model. Well, one of the things the cortex world model can contain is a model of when you do and don't get rewards. Right? Again, it's predicting what the steering subsystem will do.
It could be predicting what the basal ganglia will do. And so you have a model in your cortex that has more generalization and more concepts and all this stuff that says, okay, these types of plans, these types of actions will lead in these types of circumstances to reward. So I have a model of my reward. Some people also think that you can go the other way. And so this is part of the inference picture.
There's this idea of RL as inference. You could say, well, conditional on my having a high reward, sample a plan that I would have had to get there. That's inference of the plan part from the reward part. I'm clamping the reward as high and inferring Yeah. The plan sampling from plans that could lead to that.
And so if you have this very general cortical thing, it can just do if you have this, like, general very general model based system, and the model, among other things, includes plans and rewards, then you just get it for free, basically. So,
like, in neural network parlance, there's a value head associated to the the the omnidirectional inference that's happening in
Yes. The Yeah. Or there's a value input. Yeah.
Oh, okay.
Yeah. And it and it it can predict one of the one of the almost sensory variables it can predict is is what rewards is gonna get.
Yeah. But by the way, speaking of this thing about amortizing things, yeah, obviously, value is like amortized rollouts
of looking up reward. Yeah. Something like that. Yeah. Yeah.
It's like a statistical average or prediction of it. Yeah.
Right. Tangential thought. You know, Joe, Henrik and others have this idea that the way human societies have learned to do things is just like, how do you figure out that, you know, this kind of bean, which actually just almost always poisons you is edible if you do this 10 step incredibly complicated process. Any one of which, if you fail at, the bean will be poisonous.
Uh-huh.
How do you figure out how to hunt the seal in this particular way with this, like, particular weapon at this particular time of the year, etcetera? There's no way but, just like trying shit over generations. And it strikes me that this is actually very much like model free RL happening at, a civilizational level. No. Not exactly.
I mean, Evolution is
the simplest algorithm in some sense. Right? And if we believe that all of this can come from evolution, like, the outer loop can be, like, extremely not foresighted and yeah.
Right. Yeah. That that that's interesting. Just like, hierarchies of evolution model for a culture evolution model for a
So what does that tell you? Maybe that simple algorithms can just get you anything if you do it enough or something.
Right. Right.
Yeah. Yeah. Don't know. So
But yeah. So you you have, like, maybe this yeah. Evolution model free, basal ganglia model free, cortex model based Mhmm. Culture model free potentially. I mean, there's like pay attention to your elders or whatever.
So there's like Maybe this, like, group selection or whatever of of these things is, like, more model free. Yeah. But now I think culture well, it stores some of the model. Yeah. Right.
So let's say you want to train an agent to help you with something like processing loan applications. Training an agent to do this requires more than just giving the model access to the right tools, things like browsers and PDF readers and risk models. There's this level of tacit knowledge that you can only get by actually working in an industry. For example, certain loan applications will pass every single automated check despite being super risky. Every single individual part of the application might look safe, but experienced underwriters know to compare across documents to find subtle patterns that signal risk.
Labelbox has experts like this in whatever domain you're focused on, and they will set up highly realistic training environments that include whatever subtle nuances and watchouts you need to look out for. Beyond just building the environment itself, Labelbox provides all the scaffolding you need to capture training data for your agent. They give you the tools to grade agent performance and capture the video of each session and to reset the entire environment to a clean state between every episode. So whatever domain you're working in, Labelbox can help you train reliable, real world agents. Learn more at labelbox.com/thorcash.
Stepping back, how, is it a disadvantage or an advantage for humans that we get to use biological hardware in comparison to computers as they exist now? So, by what I mean by this question is like, if there's the algorithm, would the algorithm just qualitatively perform much worse or much better if, inscribed in the hardware today? And the reason to think it might like, here's what I mean. Like, you know, obviously the brain has had to make a bunch of trade offs, which are not relevant to competing hardware. It has to be much more energetically efficient.
Maybe as a result, has to learn, run on slower speeds so that it can get smaller voltage gap. And so the brain runs at 200 Hertz, and has to like run on 20 Watts. On the other hand, you know, with like robotics, we've clearly experienced that fingers are way more nimble than we can make motors so far. And so maybe there's something in the brain that is equivalent of like cognitive, dexterity, which is like, maybe due to the fact that we can do unstructured sparsity, we can co locate the memory and the compute. Yes.
Where does this all land out? Are you like, fuck, would be so much smarter if we didn't have to deal with these brains or are you like,
oh. I mean, I think in the end we will get the best of both worlds Right. Somehow. Right? I think I think an obvious downside of the brain is it cannot be copied.
Yeah. You don't have, you know, external read write access to every neuron and synapse. Whereas you do, I can just edit something in the weight matrix Right. You know, in Python or whatever, and load that up and copy that in principle. So the fact that it can't be copied and random accessed is very annoying.
But otherwise, maybe it has a lot of advantages. So or it also tells you that you wanna somehow do the codesign of the algorithm and the it maybe that that even doesn't change it that much from all of what we discussed, but you wanna somehow do this codesign. So, yeah, how do you do it with really slow, low voltage switches? That's gonna be really important for the energy consumption. The co locating memory and compute.
So, like, I I think that probably just like hardware companies will try to co locate memory and compute. They will try to use lower voltages, allow some stochastic stuff. There are some people that think that this like, all this probabilistic stuff that we were talking about, oh oh, it's actually energy based models and so on, is doing lot it is doing lots of sampling. It's not just amortizing everything. That the neurons are also very natural for that because they're naturally stochastic.
And so you don't have to do a random number generator and a bunch of Python code basically to generate a sample. The neuron just generates samples, it can tune what the different probabilities are. Yeah. And so and, like, learn learn those tunings. And so it could be that it's very codesigned with, like, some kind of inference method or something.
Yeah. It'd be hilarious. I mean, the the message I'm taking over this interview is, like, you know, all these people that folks make fun of on Twitter, you know, Yan Lakul Yan Lakul and Beth Jasos and whatever. They're like
Who knows?
Nope. Like, what yeah. Maybe I don't know of it.
That is actually that is actually one read of of the read. You know, I I haven't really worked on AI at all since LLMs, you know, took off. So I'm I'm just, like, out of the loop. But I'm I'm surprised, and I'm I I I think it's amazing how the scaling is is working and everything. But, yeah, I think Jan Lecun and Beth Jezzos are kinda onto something about the about the probabilistic models or at least possibly.
And in fact, that's what, you know, all the neuroscientists and all the AI people thought, like, until 2021 or something. Right?
So there's a bunch of cellular stuff happening in the brain that is not just about neuron to neuron synaptic connections. How much of that is functionally doing more work than the synapses themselves are doing versus it's just a bunch of collage that you have to do in order to make the synaptic thing work. So the way you need to, you know, with a digital mind, you can nudge the synapse, sorry, the parameter extremely easily, but with a cell to modulate a synapse according to the gradient signal, it just takes out all of this crazy machinery. So, like, is it actually doing more than it takes extremely little code to do?
So I don't know, but I'm I'm not a believer in the, like, radical, like, oh, actually, memory is not synapses mostly or, like, learning is mostly genetic changes or something like that. I think it would just make a lot of sense. I think you put it really well for it to be more like the second thing you said. Like, let's say you wanna do weight normalization across all of the weights coming out of your neuron, right, or into your neuron. Well, you probably have to go, like, somehow tell the nucleus about this of the cell and then have that kind of send everything back out to the synapses or something.
Right? And so there's gonna be a lot of cellular changes. Right? Or let's say that, you know, you just had a lot of plasticity, and, like, you're part of this memory. And now that's got consolidated into the cortex or whatever, and now we wanna reuse you as, like, a new one that can learn again.
It's gonna be a ton of cellular changes. So there's gonna be tons of stuff happening in the cell, but algorithmically, it's not really adding something beyond these algorithms. Right? It's just implementing something that in a digital computer is very easy for us to go and just find the weights and change them. And it is a cell.
It just literally has to do all this with molecular machines itself Yeah. Without any central controller. Right? It's kind of incredible. There are some things that cells do, I think, that that seem like more convincing.
So in the cerebellum so one of the things the cerebellum has to do is, like, predict over time. Like, predict what is the time delay. You know, let let's say that, you know, I see a flash and then, you know, some number of milliseconds later, I'm gonna get, like, a puff of air in my eyelid or something. Right? The cerebellum can be very good at predicting what's the timing between the flash and the air puff so that now your eye will just, like, close automatically.
Like, the cerebellum is, like, involved in that type of reflex, like, learned reflex. And there are some cells in the cerebellum where it seems like the cell body is playing a role in storing that time constant, changing that time constant of delay versus that all being somehow done with, like, I'm gonna make a longer ring of synapses to make that delay longer. It's like, no. The cell body will just, like, store that time delay for you. So there are some examples, but I'm not a believer, like, out of the box in, like, essentially this theory that, like, what's happening is changes in connections between neurons.
Yeah. And that's, like, the main algorithmic thing that's going on. Like, I I think that's a very good reason to to still believe that it's that rather than some, like, crazy cellular stuff.
Yeah. Going back to this whole perspective of, like, our our intelligence is not just this omnidirectional inference thing that builds a world model, but really this system that teaches us what to pay attention to, are the important salient factors to learn from, etcetera. I want to see if there's some intuition we can drive from this, but what different kinds of intelligences might be like. So it seems like AGI or superhuman intelligence should still have this, like ability to learn a world model that's quite general. Yeah.
But then it might, be incentivized to pay attention to different things that are relevant for what, you know, the the modern post singularity environment. How different should we expect different intelligences to be basically?
Yeah. I mean, think one way of this question is like, is it actually possible to, like, make the paper clip maximizer or whatever? Right? If you make if you try to make the paper clip maximizer, does that end up, like, just not being smart or something like that? Because it's it was just the only reward function it had was, like, make paper clips.
Interesting. Yeah. Yeah.
If I channel Steve Burns more, I mean, I think he's very concerned that the the sort of minimum viable things in the steering subsystem that you need to get something smart is way less than the minimum viable set of things you need for it to have human, like, social instincts and ethics and stuff like that. So a lot of what you wanna know about the steering subsystem is actually the specifics of how you do alignment, essentially, or what human behavior and social instincts is versus just what you need for capabilities. And we talked about it in a slightly different way because we were sort of saying, well, in order for humans to learn socially, they need to make eye contact and learn from others. But we already know from LLMs, right, that depending on your starting point, you can learn language without that stuff. Right?
And so yeah. And and so I think that it probably is possible to make super powerful model based RL, optimizing systems and stuff like that, that don't have most of what we have in the human brain reward functions. As And a consequence, might wanna maximize paper clips, and that's a concern. Yeah.
Right. But but you're pointing out that in order to make a competent paper clip maximizer Yeah. The kind of thing that can build the spaceships and learn the physics and whatever, it needs to have some drives which elicit learning, including, say, curiosity and exploration.
Yeah. Curiosity and and interest in others Yeah. Of so interest in social interactions, curiosity. Yeah. But but that that's pretty that's pretty minimal, I think, and it and that's true for humans.
Right. But it might be less true for, like, something that's already pretrained as an LLM or something. Right? And so so most of why we wanna know the steering subsystem, I think, if I'm channeling Steve, is alignment reasons. Yeah.
Right. How how confident are we that we even have the right algorithmic conceptual vocabulary to think about what the brain is doing? And what I mean by this is, you know, there was one big contribution to AI from neuroscience, which was this idea of the neuron, which like William and Fitt, you know, 1950s, like this original contribution. But then it seems like a lot of what we've learned afterwards about what the high level algorithm the brain is implementing from the backdrop to, or if there's something analogous backdrop happening in the brain to always V1 doing something like CNNs to TD learning and Bellman equations, actor critic, whatever
Yeah.
Seems inspired by what is like, we come up with some idea, like, maybe we can make AI neural networks work this way. Yeah. And then we notice that something in the brain also works that way. Yes. So why not think there's more things like this where
Oh, may be. Yeah. Think the reason that I'm not I think that we might be onto something is that, like, the AIs we're bake making based on these ideas are working surprisingly well. There's also a bunch of, like, just empirical stuff, like like, convolutional neural nets and variants of convolutional neural nets. I'm not for sure what the absolute latest latest, but compared to other models in computational neuroscience of what the visual system is doing are just more predictive.
Right? So you can just score even pre trained on cat pictures and stuff, CNNs, what is the representational similarity that they have on some arbitrary other image compared to the brain activations measured in different ways. Jim DeCarlo's lab has the brain score. And the AI model is actually there seems to be some relevance there in terms of neurosciences don't necessarily have something better than that. So, yes, I mean, that's just kind of recapitulating what you're saying is that the best computational neuroscience theories we have seem to have been invented Right.
Largely as a result of AI models and find things that work. And so find backprop works and then say, can we approximate backprop with cortical circles or something? And there's there's kind of been things like that. Now some people totally disagree with this. Right?
So, like, Yuri Buzaki is a neuroscientist who has a book called The Brain from Inside Out, where he basically says, like, all our psychology concepts, like AI concepts, all this stuff is just, like, made up stuff. What we actually have to do is, like, figure out what is the actual set of primitives that, like, the brain actually uses, and our vocabulary is not gonna be adequate to that. We got to start with the brain and make new vocabulary rather than saying back prop and then try to apply that to the brain or something like that. And, you know, he studies a lot of, like, oscillations and stuff in the brain as opposed to individual neurons and what they do. And, you know, I don't know.
I I think that there's a case to be made for that. And from a kind of research program design perspective, I think there's, like, one thing we should be trying to do is just, like, simulate a tiny worm or a tiny zebrafish, like, almost, like, as biophysical or, like, as as bottom up as possible, like, get connectomelocules activity and, like, just study it as a physical dynamical system, and look what it does. But I don't know. I mean, just when I just feels like the AI is really good fodder for computational neuroscience. Those might actually be pretty good models.
We should look at that. So I I'm not a person who thinks that I I I think I both think that there should be a part of the research portfolio that is, like, totally bottom up and not trying to apply our vocabulary that we learn from AI onto these systems, and that there should be a bit another big part of this that's kind of trying to reverse engineer it using that vocabulary or variants of that vocabulary, and that we should just be pursuing both. And and my guess is that the reverse engineering one is actually gonna, like, kind of work ish or something. Like, we do see things like TD learning, which, you know, Sutton also invented
Right.
Separately. Right?
That that must be a crazy feeling to just, like Yeah. See. You know? It's crazy. This this, like, equation I wrote down is like in the Yeah.
Seems like the dopamine is like doing some of that.
Yeah. So let me ask you about this. You know, you, you guys are finding different groups that are trying to figure out what's up in the brain. If we had a perfect representation, how are you to find it of the brain? Why think it would actually let us figure out the answer to these questions?
We have neural networks, which are way more interpretable, not just because we understand what's in the weight matrices, but because there are weight matrices, there are these boxes with numbers in them. Right. And even then we can tell very basic things. We can kind of see circuits for, very basic pattern matching of following one token with another. Right.
I I I feel like we we don't really have an explanation of why LMs are intelligent just because they're
Yeah. Well, I would I I would somewhat dispute it. I think we have some architectural we have some description of what the LLM is, like, fundamentally doing. And what that's doing is that I have an architecture and I have a learning rule, and I have hyper parameters, and I have initialization, and I have training data.
But that that those are things we learn from Yeah. Because we built them, not because we interpreted them from the seeing the waves.
We built them.
Which is the the thing to connect to them is, like, seeing the waves.
What I think we should do is we should describe the brain more in that language of things like architectures, learning rules, initializations, rather than trying to find the Golden Gate Bridge circuit and saying exactly how does this neuron actually you know, that's gonna be some incredibly complicated learned pattern. Yeah. Conrad Recording and Tim Lillycraft have this paper from a while ago, maybe five years ago, called What Does It Mean to Understand a Neural Network? Or What Would It Mean to Understand a Neural Network? And what they say is yeah, basically that.
You can imagine you train a neural network to compute the digits of pi or something. Well, it's like some crazy it's like this crazy pattern, and then you also train that thing to predict the most complicated thing you find, predict stock prices, basically predict really complex systems. Right? Computational you know, computationally complete systems. I could predict I could train a neural network to do cellular automata or whatever crazy thing.
And it's like, we're never gonna be able to fully capture that with interpretability, I think. It's just gonna just be doing really complicated computations internally. But we can still say that the way it got that way is that it had an architecture, and we gave it this training data, and it had this loss function. And so I wanna describe the brain in the same way. And I think that this framework that I've been kind of laying out is like, we need to understand the cortex and how it embodies a learning algorithm.
I don't need to understand how it computes Golden Gate Bridge.
Right? If if you can see all the neurons, if you have the connectome, why does that teach you what the learning algorithm is?
Well, I guess there are a couple different views of it. So it depends on these different parts of this portfolio. So on the totally bottom up, we have to simulate everything portfolio. It kinda just doesn't. You have to just, like, see what are the you have to make a simulation of the zebrafish brain or something.
And then you, like, see what are the, like, emergent dynamics in this, and you come up with new names and new concepts and all that. That's, like, that's, like, the most extreme bottom up neuroscience view. But even there, the connectome is, like, really important for doing that bot biophysical or bottom up simulation. But on the other hand, you can say, well, what if we can actually apply some ideas from AI? We basically need to figure out, is it an energy based model, or is it, you know, an amortized, you know, VAE type model?
You know, is it doing backprop, or is it doing something else? Are learning rules local or global? I mean, if we have some repertoire of possible ideas about this, can we just think of the connectome as a huge number of additional constraints that will help to refine to ultimately have a consistent picture of that? I think about this for the the steering subsystem stuff too, just very basic things about it. How many different types of dopamine signal or of steering subsystem signal or thought assessor or so on?
How many different types of what broad categories are there? Even this very basic information that there's more cell types in the hypothalamus than there are in the cortex, that's new information about how much structure is built there versus somewhere else. Yeah. How many different dopamine neurons are there? Is the wiring between prefrontal and auditory the same as the wiring between prefrontal and visual?
You know, it's like the most basic things we don't know. And the problem is learning even the most basic things by a series of bespoke experiments takes an incredibly long time, whereas just learning all of that at once by getting a connectome is just, like, way more efficient.
What is the timeline on this? Because presumably the idea of this is to, well, first inform the development of AI. You want to be able to figure out how we do the, how we get AIs to want to care about what other people think of its internal thought pattern. But interpret researchers are making progress on this question just by inspecting, you know, normal neural networks. There must be some feature There
may you can do interp on LLMs that exist. Yeah. You can't do interp on a hypothetical model based reinforcement algorithm like the brain that we will eventually converge to when we do AGI. Fair. Fair.
But Yeah. Yeah. You know what what what timelines on AI do you need for this research to be practical
Yeah. And relevant to you? It's fair to say it's not super practical and relevant if you're in, like, AI 2027 scenario. Yeah. You know?
And so, like, what science I'm doing now is not gonna affect the science of, like, ten years from now, because what's gonna affect the science of ten years from now is the outcome of this, like, AI 2027 scenario. Yeah. Right? It kinda doesn't matter that much probably. If I have the connectome, maybe it it slightly tweaks certain things.
But but I think there there is a lot of reason to think maybe that we will get a lot out of this paradigm, but then the real thing, the thing that is like the trend the the the, like, single event that is, like, transformative for the entire future or something type event is still, like, you know, more than five years away or something.
Sorry. Is that because, like, we haven't captured omnidirectional inference? We haven't figured out the right ways to get a mind to pay attention to things in a way that makes
it I would take the entirety of your, like, collective podcast with everyone as, like, showing, like, the distribution of these things. Right? I don't know. Right? But I mean, what was Karpathy's timeline?
Right? You know, what's Demis' timeline? Right? So these these then not everybody has a three year timeline. And and so I think
But there's different reasons, and I'm curious
what's What are mine? I don't know. I'm just watching your podcast. I'm trying to I'm trying to understand the distribution. I don't have a super strong claim that LLMs can't do it.
But but is the cross, like, the data efficiency, or is it the
I think part of it is just it is weirdly different than all this brain stuff.
Yeah. Yeah.
Yeah. And so intuitively, it's just weirdly different than all this brain stuff, and I'm kinda waiting for, like, the thing that starts to look more like brain like, I think if alpha zero and model based RL and all of these other things that were being worked on ten years ago had been giving us the GPT five type capabilities, then I would be like, oh, wow. We're both in the right paradigm and seeing the results. Right. A priori.
So my model my prior and my data are agreeing.
Yeah. Yeah.
Right? And now it's like, I don't know what exactly my data is. Looks pretty good, but my prior is sort of weird. So yeah. So I so I I don't have a super strong opinion on it.
So I think there is a possibility that essentially all other scientific research that is being done is, like, not is somehow obviated, but I don't put a huge amount of probability on that. I think my timelines might be more in the, like, yeah, range. And if that's the case, I mean, I think there there yeah. There is probably a different subpoena world where we have connectomes on hard drives, and we have understanding of steering subsystem architecture. We've compared the the you know, even the most basic properties of what are the reward functions, cost function architecture, etcetera, of mouse versus a shrew versus a small primate, etcetera.
Is this practical in ten years?
I think it has to be a really big push.
Like, how how how much funding? How does it compare to where we are now?
It's like billion low billions dollar scale funding in a very concerted way, I would say.
And how much is is on it now?
Well, so so if I just talk about some of the specific things we have going, so with connectomics. So so e eleven bio is kind of like the the the our main thing on connectomics. They are basically trying to make the technology of connectomic brain mapping several orders of magnitude cheaper. So the Wellcome Trust put out a report a year or two ago that basically said to get one mouse brain, the first mouse brain connectome would be, like, several billion dollars, you know, billions of dollars project. What e 11 technology and sort of the suite of efforts in the field also are trying to get, like, a single mouse connectome down to, like, low tens of millions of dollars.
Okay? But that's a mammal brain. Right? Now a human brain is about a thousand times bigger. If So a mouse brain you can get to 10,000,000 or 20,000,000, 30,000,000 with technology.
You know, if you just naively scale that, okay, human brain is now still billions of dollars to just one do do one human brain. Can you go beyond this? Or can you get a human brain for, like, less than a billion? But I'm not sure you need every neuron in the human brain. I think we wanna, for example, do an entire mouse brain and a human steering subsystem and the the entire brains of several different mammals with different social instincts.
And so I think that that with a bunch of technology push and a bunch of concerted effort can be done in the you know, a real significant progress if it's focused efforts can be done in the kind of hundreds of millions to low billions. Skip.
What is the definition of a connectome? Is it presumably, it's not a bottom up biophysics model. So is it just that if if if it can estimate the input output of a brain, but, like, what is what is the level of abstraction?
So you can give different definitions. And one of the things that's cool about so the the kind of standard approach to connectomics uses the electron microscope and very, very thin slices of brain tissue, and it's basically labeling the cell membranes are gonna show up, scatter electrons a lot, and everything else is gonna scatter electrons less. But you don't see a lot of details of the molecules, which types of synapses, different synapses of different molecular combinations and properties. Eleven and some other research in the field has switched to an optical microscope paradigm. With optical, the photons don't damage the tissue, so you can kind of wash it and look at fragile, gentle molecules.
So so with e 11 approach, can get a quote, unquote, molecularly annotated connectome. So that's not just who is connected to who by some kind of synapse, but what are the molecules that are present at the synapse, what type of cell is that. So, molecularly annotated connectome, that's not exactly the same as having the synaptic weights. That's not exactly the same as being able to simulate the neurons and say, what's the functional outcome functional consequence of having these molecules and connections? But you can also do some amount of activity mapping and try to correlate structure to function.
Yeah. So Interesting. Train an ML model to basically predict the activity from the connectome.
What are the lessons to be taken away from, the human genome project? Because one way you could look at it is that it was actually a mistake and you shouldn't have spent whatever billions of dollars getting one genome mapped rather you should have just invested in technologies, which have, now now allows to map genomes for hundreds of dollars.
Yeah. Well yeah. So George Church was my was my PhD adviser. And and basically yeah. I mean, what he's pointed out is that, yeah, it was 3,000,000,000 or something, you know, roughly $1 per base pair for the first genome.
And then the National Human Genome Research Institute basically structured the funding process right, and they got a bunch of companies competing to lower the cost. And then the cost dropped like a million fold in ten years. And because they changed the paradigm from kind of macroscopic chemical techniques to these individual DNA molecules, make a little cluster of DNA molecules in the microscope, and you would see just a few DNA molecules at a time on each pixel of the camera would basically give you a different in parallel, looking at different fragments of DNA. So you parallelize the thing by millions fold, and that's what reduced the cost by millions fold. And and yeah.
So so I mean, essentially, with switching from electron microscopy to optical connectomics, potentially even future types of connectomics technology, we think there should be similar pattern. That's why Eleven, with the Focus Research Organization, started with technology development rather than starting with saying we're gonna do a human brain or something. Let's just brute force it. We said let's get the cost down with new technology. But then you still it's still a big thing.
Even with new next generation technology, you still need to spend hundreds of millions on data collection.
Yeah. Is this gonna be funded with philanthropy by governments, by investors?
This is very TBD and very much evolving in some sense as we speak. I'm hearing some rumors going around of connectomics related companies potentially forming. But so so so far, e 11 has been philanthropy. The National Science Foundation just put out this call for for tech labs, which is basically somewhat of it is kind of fro inspired or or related. I think you could have a tech lab for actually going and mapping the mouse brain with us, and that would be sort of philanthropy plus government still in a nonprofit kind of open source framework.
But can companies accelerate that? Can you credibly link connectomics to AI in the context of a company and get investment for that? It's like possible.
I mean, the cost of training these AIs is increasing so much if you could like tell some story of like, not only are we gonna figure out some safety thing.
Right.
But in fact, we will, once we do that, we'll also be able to tell you how AI works.
I mean, all
these You should questions you should, like, go go to these AI labs and just be like, give me one one hundredth of your projected budget in 2030.
I sort of tried a little bit, like like, seven or eight years ago, and there was not a lot of interest. Maybe now there there would be. But, yeah, I mean, I think all the things that we've been talking about, like, I think it's really fun to talk about, but it's ultimately speculation. What is the actual reason for the energy efficiency of the of the brain, for example? Right?
Is it doing real inference or amortized inference or something else? Like, the the this is all gonna be all it's all answerable by neuroscience. It's gonna be hard, but it's actually answerable. And so if you can only do that for low billions of dollars or something to really comprehensively solve that, seems to me in the grand scheme of trillions of dollars of GPUs and stuff, it actually makes sense to do that investment. But
And and and I think investors also just there there's been many labs that have been launched in the last year where they're raising on the valuation of billions. Yes. For things which are, quite credible but are not like our ARR next quarter is gonna be whatever. It's like, we're we're gonna discover materials and dot dot dot. Right?
Yes. Yes. Moonshot startups or billion dollar billionaire backed startups, moonshot startups. I see it as a kind of on a continuum with Fro's. Yeah.
Fro's are a way of channeling philanthropic support and ensuring that it's open source, public benefit, various other things that that may may be properties of a given Fro. But, yes, billionaire backed startups, if they can target the right science, the exact right science, I think there's a lot of ways to do moonshot neuroscience companies that would never get you the connectome. You say, oh, we're gonna upload the brain or something, but never actually get the the mouse connectome or something, fundamental things that you need to get to to ground truth the science. There are lots of ways to have a moonshot company kinda go wrong and not do the actual science, but there also may be ways to have companies or or big corporate labs get involved and actually do it correctly. Yeah.
This this brings to mind an idea that you had in a lecture you gave five years ago about yeah. Do you wanna explain behavior cloning on
Right. Yeah. I mean, actually, is funny because I think that the first time I saw this idea, it was I think it actually might have been in a blog post by Gurren. Oh. There's always there's always a Gurren blog post.
And
there are now academic research efforts and some amount of emerging company type efforts to try to do this. So yeah. So normally, like, let's say I'm training an image classifier or something like that. I show it pictures of cats and dogs or whatever, and they have lay the label cat or dog, and I have a neural net that's supposed to predict the label cat or dog or something like that. That is a limited amount of information per label that you're putting in.
It's just cat or dog. What if I also had predict what is my neural activity pattern when I see a cat or when I see a dog, and all the other things. If you add that as an auxiliary loss function or an auxiliary prediction task, does that sculpt the network to know the information that humans know about cats and dogs and to represent it in a way that's consistent with how the brain represents it and the kind of representational dimensions or geometry of of how the brain represents things, as opposed to just having these labels. Does that let it generalize better? Does that let it have just richer labeling?
And of course, that's like that sounds really challenging. It's very easy to to generate lots and lots of labeled cat pictures with, you know, Scale AI or whatever can do this. It is harder to generate lots and lots of brain activity patterns that correspond to things that you wanna train the AI to do. But again, this is just a technological limitation of neuroscience. If we if every iPhone was also a brain scanner, you know, you would you would not have this problem, and we would be training AI with the brain signals.
And, it it's just the order in which technology is developed is that we got GPUs before we got portable brain scanners or whatever. Right? And, that kind of thing.
What is the ML analog? What what you'd doing here? Because when you distill models, you're still looking at the the final layer, like, the the log props across across all two areas.
If you do distillation of one model into another, that is a certain thing that you're just trying to copy one model into another. Yeah. I think that we don't really have a perfect proposal to distill the brain. I think to distill the brain, you need a much more complex brain interface. Maybe you could also do that.
You could make surrogate models. Andreas Tolias and people like that are doing some amount of neural network surrogate models of brain activity data. Instead of having your visual cortex do the computation, just have the surrogate models. You're basically distilling your visual cortex into a neural network to some degree. That's the kind of distillation.
This is doing something a little different. This is basically just saying, I'm adding an auxiliary. Think of it as regularization, or I think of it as adding an auxiliary loss function. That's sort of smoothing out the prediction task to also always be consistent with how the brain represents it. As like
but but what exactly are
might help you predicting adversarial examples, for example. Right?
Alright. So but you're predicting the internal state of the brain?
Yes. So in it so you you so in addition to predicting the label, the vector of labels like yes cat, not dog, yes, you know, not boat, you know, one shot vector or whatever of one hot vector of of, yes, it's cat, instead of these gazillion other categories, let's say, this simple example. You're also predicting a vector, which is like all these brain signal measurements.
Right.
Yeah.
Interesting.
And so Gurren, anyway, had this long ago blog post of like, oh, this is like an intermediate thing. It's like, we talk about whole brain emulation. We talk about AGI. We talk about brain computer interface. We should also be talking about this like brain augmented brain data augmented thing that's trained on all your behavior, but is also trained on, like, predicting some of your neural patterns.
Right. Yeah. And you're saying the learning system is already doing this for the steering system.
Yeah. And our learning system also has predict the steering subsystem as an auxiliary task. Yeah. Yeah. Yeah.
And that helps the steering subs now the steering subsystem can access that predictor and build a cool reward function using it.
Yes. Okay. Separately, you're on the board for of Lean, which is this, formal, formal math language, that mathematicians use to prove theorems and so forth. And obviously there's a bunch of conversation right now about math, AI automating math. What's your take?
Yeah. Well, I think that there are parts of math that it seems like it's pretty well on track Yeah. To to automate. And that has to do with, like so so first of all, so so Lean so Lean had had been developed for a number of years at Microsoft and other places. It has become one of the convergent focused research organizations to kind of drive more engineering and focus onto it.
So Lean is like this language programming language where if you instead of expressing your math proof on pen and paper, you express it in this programming language, Lean. And then at the end, if you do that that way, it is a verifiable language so that you can basically click verify, and Lean will tell you whether the conclusions of your proof actually follow perfectly from your assumptions of your proof. So it checks whether the proof is correct automatically. By itself, this is useful for mathematicians collaborating and stuff like that. Like, if I'm some amateur mathematician I want to add to a proof, Terry Tau is not gonna believe my results.
But if Lean says it's correct, it's just correct. So it makes it easy for collaboration to happen. But it also makes it easy for correctness of proofs to be an RL signal in very much the RLVR. You know? It's like a perfect Math proofing is now formalized math proofing.
Formal means is like expressed in something like lean and verifiable, mechanically verifiable. That becomes a perfect RLVR task. Yeah. And I think that that is going to just keep working. Seems like it's the couple billion dollar at least $1,000,000,000 valuation company, Harmonic, based on this.
Alpha Proof is based on this. A couple other emerging, really interesting companies. I think that this problem of RLVR ing the crap out of math proving is basically going to work, And we will be able to have things that search for proofs and find them in the same way that we have AlphaGo or what have you that can search for ways of playing the game of Go and with that verifiable signal works. So just this solved math, there is still the part that has to do with conjecturing new interesting ideas. There's still the conceptual organization of math of what is interesting, how do you come up with new theorem statements in the first place, or even the very high level breakdown of what strategies you use to do proofs.
I think this will shift the burden of that so that humans don't have to do a lot of the mechanical parts of math, validating lemmas and proofs and checking if the statement of this in this paper is exactly the same as that paper and stuff like that. It would just that would just work. You know, if you really think we're gonna get all these things we've been talking about, real AGI, it would also be able to make conjectures. And you know, Benji has a paper as more theoretical paper. There are probably a bunch of other papers emerging about this.
Like, is there a loss function for good explanations or good conjectures? That's a pretty profound question. Right? A really interesting math proof or statement might be one that compresses lots of information about other has lots of implications for lots of other theorems. Otherwise, you would have to prove those theorems using long, complex, passive inference.
Here, if you have this theorem, this theorem is correct, and you have short passive inference to all the other ones. And it's a short compact statement. So it's like a powerful explanation that explains all the rest of math. And part of what math is doing is making these compact things that explain the other things.
So they call it the moral complexity of the statement or something.
Yeah, of generating all the other statements given that you know this one or stuff like that. Or if you add this, how does it affect the complexity of the rest of the kind of network of proofs? So can you make a loss function that adds, oh, I want this proof to be a really highly powerful proof? I think some people are trying to work on that. So so maybe you can automate the creativity part.
If you had true AGI, it would do everything a human can do, so it would also do the things that the creative mathematicians do. But But way barring that, think just RLVR ing the crap out of proofs. Well, I think that's going to be just a really useful tool for mathematicians that can accelerate math a lot and change it a lot, but not necessarily immediately change everything about it. Will we get, you know, mechanical proof of the Riemann hypothesis or something like that, or things like that? Maybe.
I don't know. I don't know enough details of how hard these things are to search for, and I'm not sure anyone can fully predict that, just as we couldn't exactly predict when Go would be solved or something like that. And I think it's gonna have lots of really cool applied applications. So one of the things you wanna do is you want to have provably stable, secure, unhackable, etcetera, software. So you can write math proofs about software, and say this code, not only does it pass these unit tests, but I can mathematically prove that there's no way to hack it in these ways or no way to mess with the memory or this type of things that hackers use, or it has these properties.
It can use the same lean and same proof to do formally verified software. I think that's going to be a really powerful piece of cybersecurity that's relevant for all sorts of other AI hacking the world stuff, and that, yeah, if you can prove a Riemann hypothesis, you're also gonna be able to to prove insanely complex things about very complex software, and then you'll be able to ask the LLM, synthesize me a software that is I can prove is correct. Right?
Why hasn't provable programming language taken off as a result of LLMs? You you would think that this
I think would it's starting to. Yeah. Think it's starting to. I think that one one challenge, and we are actually incubating a potential focused research organization on this, is the specification problem. So mathematicians kind of know what interesting theorems they want to formalize.
If I have some code let's say I have some code that involved in running the power grid or something that has some security properties. Well, what is the formal spec of those properties? The power grid engineers just made this thing, but they don't necessarily know how to lift the formal spec from that. And it's not necessarily easy to come up with the spec that is the spec that you want for your code. People aren't used to coming up with formal specs, and there are not a lot of tools for it.
So you also have, like, this kind of user interface plus AI problem of, like, what security specs should I be specifying? Is this the spec that I wanted? So there's a spec problem, and it's just been really complex and hard, but but it's only just in the last very short time that that LLMs are able to generate, you know, verifiable proofs of, you know, things that are useful to mathematicians, starting to be able to do some amount of that for for software verification, hardware verification. But I think if you project the trends over the next couple years, it's possible that it just flips the tide that formal methods basically, this whole field of formal methods or formal verification, provable software, which is kind of this weird, almost like backwater of more theoretical part of programming languages and stuff, very academically flavored often. Although there was this DARPA program that made a provably secure like, quadcopter helicopter and stuff like that.
So secure against, like, what what is the property that is exactly brewed? Not for that particular project, but just in general,
like Yeah. So
What what because, obviously, the the things malfunction for all kinds
of reasons. Like You could you could say that what's going on in this part of the memory over here, which is supposed to be the part the user can access, can't in any way affect what's going on in the memory over here or something like that. Right. Or yeah, things like that. Yeah.
Got it. Yeah. So there's there's two questions. One is how useful is this? Yeah.
And two is, like, how satisfying as a, as a mathematician would it be? And the fact that there's this application towards proving that software has certain properties or hardware certain properties, obviously like, if that works, that would obviously be very useful, but from a pure, like, are we going to figure out mathematics? Right. Yeah. Is there, is your sense that there's something about finding that one construction cross maps to another construction in a different domain or finding that, Oh, this like lemma is, if you reconfigure it, like if you redefine this, this term, it still like kind of satisfies what I meant by this term, but it no longer a counter example that previously knocked it down no longer applies like that kind of dialectical thing that happens in mathematics.
Will this software like replace that?
Yeah. And like how much of the value of this sort of pure mathematics just comes from actually just coming up with entirely new ways of thinking about a problem? Yeah. Like mapping it to a totally different representation. And do yeah.
Do we have examples of
I don't know. I think of it as I think of it maybe a little bit like the when every everybody had to write assembly code or something like that. Just like the amount of fun, like, cool startups that got created was, like, a lot less or something. Right? And so it was just like less people could do it.
Progress was more grinding and slow and lonely and so on. You had more false failures because you didn't get something about the assembly code right rather than the essential thing of like, was your concept right? Harder to collaborate and stuff like that. And so I think it will be really good. There is some worry that by not learning to do the mechanical parts of the proof that you fail to generate the intuitions that inform the more conceptual parts, the creative part.
Right?
Yeah. It's the same with assembly and Right.
Yeah. And and so so at what point is that applying as vibe coding? Are people not learning computer science? Right? Or actually, are they, like, vibe coding and they're also simultaneously looking at at at the LLM with, like, explaining them these abstract computer science concepts, and it's all just, like, all happening faster.
Their feedback loop is faster, and they're learning way more abstract computer science and algorithm stuff because they're vibe coding. You know, I don't know. It's not obvious. That might be something the user interface and all this human infrastructure around it. But I guess there's some worry that people don't learn the the mechanics and therefore don't build, like, the the the grounded intuitions or something.
But my hunch is it's, like, super positive. Exactly on net how useful that will be or how much overall math, like, breakthroughs or, like, math breakthroughs even that we care about will happen. I don't know. I mean, one other thing that I think is cool is is actually the accessibility question. It's like, okay.
That sounds a little bit corny. Okay. Yeah. More people can do math, but but who cares? But I think there's actually lots of people that could have interesting ideas, like maybe the quantum theory of gravity or something.
Like, yeah, one of us will come up with the quantum theory of gravity instead of, like, a card carrying physicist in the same way that Steve Burns is, like, reading the neuroscience literature, and he's, like, hasn't been in the neuroscience lab that much. But he's, like, able to synthesize across the neuro neuroscience literature, be like, oh, learning subsystem, steering subsystem. Does this all make sense? He's you know, it's kinda like he's an outsider neuroscientist in some ways. Can you have outsider, you know, string theorists or something because the math is just done for them by the computer?
And does that lead to more innovation in the string theory? Right? Maybe yes.
Interesting. So Okay. So if this approach works and you're right that LLMs are not the final paradigm Uh-huh. And suppose it takes at least ten years to get the final paradigm.
Yeah.
In that world, there's this fun sci fi premise where, you have turns Tao today had a tweet where he's like, these, these models are like automated cleverness, but not automated intelligence. And you can quibble with the definitions But yeah, if you have automated cleverness and you have some way of filtering, which if you can formalize and prove things that the LLMs are saying you could do, you could have this situation where quantity has a quality all of its own. And so what are the domains of the world which could be put in this provable symbolic representation? And furthermore, okay. So in the world where AGI is super far away, maybe it makes sense to like literally turn everything the LLMs ever do or almost everything they do into like super approval statements.
And so LLMs can actually build on top of each other because everything to do is like super provable. Maybe this is like just necessary because you have billions of intelligences running around, even if they are super intelligent. The only way the future AGI civilization can collaborate with each other is if they can prove each step. They're just like brute force churning out. This is what the Jupiter brains are doing.
It's a universe it's a universal language. It's provable. And it's also provable from like, are you trying to exploit me? Or is it are you sending me some
Yeah.
Some message that's actually trying to like sort of hack into my my brain effectively? Are you trying to socially influence me? Are you actually just like sending me just the information that I need and no more Right. Right for this? And yeah.
So Davidad, who's like this program director at ARIA now in The UK, I mean, he has this whole design of a of a kind of ARPA style program, a sort of safeguarded AI that very heavily leverages provable safety properties. And can you apply proofs to like can you have a world model? But that world model is actually not specified just in neuron activations, but it's specified equations. Those might be very complex equations. But if you can just get insanely good at just auto proving these things with cleverness, auto cleverness, can you have explicitly interpretable world models as opposed to neural net world models and move back basically to symbolic methods just because you just have insane amount of ability to prove things.
Yeah. I mean, that's an interesting vision. I don't know how in the next ten years whether that will be the vision that plays out, but I think it's really interesting to think about. Yeah. And even for math, I mean, I think Terry Tau is doing some amount of stuff where it's like, it's not about whether you can prove the individual theorems.
It's like, let's prove all the theorems en masse, and then it's like, study the properties of the aggregate set of proved theorems. Right? Which are the ones that got proved and which are the ones that didn't? Mhmm. Okay.
Well, that's like the landscape of all the theorems instead of one theorem at a time. Right?
I see. Speaking of symbolic representations, one question I was meaning to ask you is, how does the brain represent the world model? It's like, obviously that's out in neurons, but I don't mean sort of extremely functionally. I mean, sort of conceptually, is it in something that's analogous to the hidden state of a neural network or is it something that's closer to a symbolic language?
We don't know. I mean, I think there's there's some amount of study of this. I mean, there's there's these things like, you know, face patch neurons that represent certain parts of the face that geometrically combine in interesting ways. That's sort of with geometry and vision. Is that true for like other more abstract things?
There's this idea of cognitive maps. A lot of the stuff that a rodent hippocampus has to learn is place cells, and where is the rodent going to go next, and is it going to get a reward there? It's very geometric. And do we organize concepts with a abstract version of a spatial map? There's some questions of can we do true symbolic operations?
Can I have a register in my brain that copies a variable to the another register regardless of what the content of that that variable is? That's like this variable binding problem. And basically, I just don't I don't know if we have that machinery or if we or if it's more like cost functions and architectures that make some of that approximately emerge, but maybe would also emerge in a neural net. There 's a bunch of interesting neuroscience research trying to study this, what what the representations look like.
But what was your hunch?
Yeah. My hunch is it's gonna be a huge mess, and we should look at the architectures, the loss functions, and the learning rules, and then we shouldn't really I don't expect it to be pretty in there. Yeah.
We we should say this is not a symbolic language type
thing. Yeah. Probably. Probably it's not that symbolic. Yeah.
But but but other people think very differently. You know? Yeah.
Other random questions, speaking of binding. Yeah. What is up with feeling like there's an experience that it's like both all the parts of your brain, which are modeling very different things have different drives. Feel like at least presumably feel like there's an experience happening right now. Also that across time you feel like, what what is Yeah.
I'm I'm pretty much at a loss on this one. I don't know. I mean, Max Hodak has been giving talks about this recently. He's another really hardcore neuroscience person, neurotechnology person. And the thing I mentioned with Doriso, it maybe also it sounds like it might have some touching on this question.
But, yeah, I think this I haven't I don't think anybody has any idea. It might even involve new physics. It's like, you know yeah.
Another question which I don't have an answer yet. What so continual learning, is that the product of something extremely fundamental at the level of even the learning algorithm where you could say, look, at least the way we do backpropagal neural networks is that you freeze the way there's a training period and you freeze the weights. And so you just need this active inference or some other learning rule, in order to do continue learning or do you think it's more a matter of architecture and how is memory exactly stored and is it What kind of associative memory you have, basically?
Yeah. So, continual learning. I don't know. Think that there's probably things that there's probably some at the architectural level, there's probably something interesting stuff that the hippocampus is doing, And people have long thought this. What kinds of sequence is it storing?
How is it organizing, representing that? How is it replaying it back? What is it replaying back? How is it exactly how that memory consolidation works in training the cortex using replays or memories from the hippocampus or something like that? There's probably some of that stuff.
There might be multiple timescales of plasticity or clever learning rules that can simultaneously be storing short term information and also doing back prop with it. Neurons may be doing a couple of things, some fast weight plasticity and some slower plasticity at the same time or synapses that have many states. I don't know. I I think that from a neuroscience perspective, I'm not sure that I've seen something that's super clear on what continual learning what causes it, except maybe to say that this this systems consolidation idea of sort of hippocampus consolidating the cortex, like some people think is a big piece of this, and we don't still fully understand the details.
Yeah. Speaking of fast weights, is there something in the brain which is the equivalent of this distinction between parameters and activations that we see in neural networks? And specifically like in transformers, we have this, ideas like some of the activations are the key in value, vectors of previous tokens, that you build up over time. And they there's like the so called the fast weights that you whenever you have a new token, you you query them against these, you query as these activations, but you also obviously can't query them against all the other parameters in the network, are part of the actual built in weights. Is there some such distinction that's analogous?
I don't know. I mean, we definitely have weights and activations. Whether you can use the activations in these clever ways, different forms of actual attention, like attention in the brain, is that based on I'm trying to pay attention I think there's several probably several different kinds of actual attention in the brain. I wanna pay attention to this area of visual cortex. Yeah.
I wanna pay attention to this the content in other areas that is triggered by the content in this area. Yeah. Right? Attention that's just based on kind of reflexes and stuff like that. So I don't know.
I mean, I think that there's not just the cortex. There's also the thalamus. The thalamus is also involved in kind of somehow relaying or gating information. So there's cortical cortical connections. There's also some amount of connection between cortical areas that goes through the thalamus.
Is it possible that this is doing some sort of matching or kind of constraint satisfaction or matching across keys over here and values over there? Is it possible that it can do stuff like that? Maybe. I don't know. This is all part of what's the architecture of this corticothalamic yeah, system.
I don't know I don't know how transformer like it is or if there's anything analogous to, like, that attention. It'd be interesting to find out.
We gotta give you a billion dollars so we can you could come on the podcast again and then tell me how exactly the run
rate Mostly I just do data collection. It's like really, really, really unbiased data collection so all the other people can figure out these questions.
Maybe the final question to go off on is what was the most interesting thing you learned from the gap map? And And maybe you wanna explain what the gap map is.
So the gap map so in the process of incubating and coming up with these focused research organizations, these sort of nonprofit startup like moonshots that we've been getting philanthropists and now government agencies to fund, we talked to a lot of scientists. And some of the scientists were just like, here's the next thing my graduate student will do. Here's what I find interesting, exploring these really interesting hypothesis spaces, like all the types of things we've been talking about. And some of them are like, here's this gap. I need this piece of infrastructure, which there's no combination of grad students in my lab or me loosely collaborating with other labs with traditional grants that could ever get me that.
I need to have an organized engineering team that builds the mini miniature equivalent of the Hubble Space Telescope. And if I can build that Hubble Space Telescope, then I will unblock all the other researchers in my field or or some path of technological progress in the way that the Hubble Space Telescope made lifted the boats, improved the life of every astronomer, but wasn't really an astronomy discovery in itself. It was just like you had to put this giant mirror in space with a CCD camera and organize all the people and engineering and stuff to do that. So some of the things we talk to scientists about look like that. And so the gap map is basically just a list of a lot of those things, and we call it a gap map.
It's I think it's actually more like a fundamental capabilities map. Like, what are all these things like mini Hubble space telescopes? And then we kinda organize that into gaps for, like, helping people understand that or, like, search that.
And what was the most surprising thing you found?
So, I mean, I think I think I've talked about this before, but I think it one thing is just, like, kinda, like, the overall size or shape of it or something like that. It's, like, it's, like, a few 100 fundamental capabilities. So if each of those was a deep tech startup sized project that's only a few billion dollars or something, each one of those was a series A, it's not like a trillion dollars to solve these gaps, it's lower than that. Maybe we assumed that, and that's what we got. It's not really comprehensive, it's really just a way of summarizing a lot of conversations we've had with scientists.
I do think that in the aggregate process, things like Lean are actually surprising, because I did start from neuroscience and biology. It was very obvious that there's these omics. We need genomics, we also need connectomics. And, you know, we can engineer E. Coli, but we also need to engineer the other cells.
And there's somewhat obvious parts of biological infrastructure. I did not realize that math proving infrastructure was a thing. And and that was kind of emergent from trying to do this. So I'm looking forward to seeing other other things where it's not actually this hard intellectual problem to solve it. It's maybe the kind of the slightly equivalent of AI researchers just needed GPUs or something like that and focus and and really good PyTorch code to start doing this.
Like, What is the full diversity of fields in which that exists? We've even now found and which are the fields that do or don't need that? So fields that have had gazillions of dollars of investment, do they still need some of those? Do they still have some of those gaps? Or is it only more like neglected fields?
We're even finding some interesting ones in actual astronomy, actual telescopes that have not been explored maybe because of the kind of if you're getting above a critical mass size project, then you have to have, like, a really big project, and that's a more bureaucratic process with the federal agencies. Yeah.
I I guess I I guess you just kinda need scale in every single domain of science these days.
Yeah. I think you need scale in many of the domains of science, and that does not mean that the low scale work is not important. It does not mean that the kind of creativity, serendipity, etcetera, each student pursuing a totally different direction or thesis that you see in universities is not, also really key. But, yeah, I think we need some amount of scalable infrastructure is missing in essentially every area of science, even math, which is crazy, because mathematicians, I thought, just needed whiteboards.
Right.
Yeah. Right? But they actually need lean. They actually need verifiable programming languages and stuff. Like, I didn't know that.
Yeah. Cool. Adam, this is super fun. Thanks for coming on. Thank you so much.
Where can people find your stuff?
My pleasure. The easiest way now, my my adammarblesome.org website is currently down, I guess, but you can find convergentresearch.org can can link to a lot of the stuff we've been doing. Yeah.
And then you have a great blog, Longitudinal Science.
Yes. Longitudinal Science. Yes. On WordPress. Yeah.
Cool. Thank you so much. Pleasure.
Yeah. Hey, everybody. I hope you enjoyed that episode. If you did, the most helpful thing you can do is just share it with other people who you think might enjoy. It.
It's also helpful if you leave a rating or a comment on whatever platform you're listening on. If you're interested in sponsoring the podcast, you can reach out at duarkesh dot com slash advertise. Otherwise, I'll see you on the
next one. Hi.
Adam Marblestone – AI is missing something fundamental about the brain
Ask me anything about this podcast episode...
Try asking: