Episode	Podcast	Published	Duration	Status

Machine Learning Street Talk (MLST)

We Invented Momentum Because Math is Hard [Dr. Jeff Beck]

December 31, 2025•1h 16m•13,645 words

Description

Dr. Jeff Beck, mathematician turned computational neuroscientist, joins us for a fascinating deep dive into why the future of AI might look less like ChatGPT and more like your own brain.**SPONSOR MES...

So my PhD is in mathematics, from Northwestern University. I studied pattern formation in complex systems, in particular combustion synthesis, which is all about burning things that don't ever enter the gaseous phase. Bayesian inference provides us with like a normative approach to empirical inquiry and encapsulates the scientific method writ large. I just believe it's the right way to think about the empirical world. I remember I was at a talk many years ago by Zubin Garamani, and he was explaining the Dirichlet process prior.

This is when the Chinese restaurant process, all that stuff was like relatively new. And his explanation of it, it so resonated with me in terms of like, oh my gosh, this is the algorithm that summarizes how the scientific method actually works, right? You get some data, right? Then you get some new data, and you sort of say, oh, how is it like the old data? And if it's similar enough, then you sort of lump them together, and then you build theories, and you properly test hypotheses in the fashion.

That's the essence of the Bayesian approach is it's about explicit hypothesis testing, and explicit models, in particular, models of the world conditioned on those hypotheses. I believe it is the only right way to think about how the world works, it encapsulates the structure of the scientific method. I mean, if I'm being perfectly honest, what actually convinced me the brain was Bayesian had a lot more to do with behavioral experiments done by other people. My principal focus was on, well, how does the brain actually do this? So I'm referring to experiments showing that humans and animals do optimal cue combination.

We're surprisingly efficient terms of using the information that comes into our brains with regards to, again, these low level sensory motor tasks.

Interesting, so it's almost like we're so efficient that the only explanation that makes sense is that we must be doing Bayesian analysis.

More or less, I mean it's a bit more precise than that. It's not just efficiency, it's like the Q combination experiments I think are really compelling. And so the idea behind a cue combination experiment is that I give you two pieces of information about the same thing. And one piece of information is more reliable than the other. And the degree of reliability changes on a trial by trial basis.

So you never know a priori that, like, say, the visual cue as opposed to the auditory cue is gonna be the more reliable thing. And yet, nonetheless, when people combine those two pieces of information, they take into account the the relative reliability on a trial by trial basis. And that means that they're optimal in a sense. Now we have to like be super careful with our words. They're relatively optimal because they're not actually using a 100% of the information that the computer like your the visual information that you use, you don't use a 100% of the information that the computer provided you.

But there is some loss between the computer screen and your brain mediated in principle by, but the system behaves as if, right, it has optimally combined those two cues. It has taken into account uncertainty. This also is because it was like how we really do think about the world. Like, we take into account uncertainty all the time in our decisions. Right?

You know this. If you're driven in the fog, you're aware of this. 90% of what the brain does is decide what to ignore. Right? And because if we didn't, right, we'd be screwed.

Right? We receive an insane amount of information, most of which does not even we don't even bother to process.

Right? So Yeah. Yeah. Is that definitely the case, though? Do you think that we could actually be processing more information than we know?

We are definitely processing more information than comes out in behavior. Yeah. And a lot of that is because we are continually learning. And learning you close your eyes for five years, and your visual system decays. You lose fidelity.

It forgets. It requires constant input simply to maintain this understanding of the low level statistics of the visual world. Without input, you're So the question is, is that using all the information? Or is it just using the low level information? And it's information that we don't like directly perceive, but it's still definitely being used in a sense.

When it comes to, you know, but what is it being used for? It's being used to track these sort of low level statistics that we sometimes need, but don't always need. And so this is why I say that like, you know, when we say context matters, you know, you can think of that in terms of like, we were able to flexibly switch between tasks, which means having a lot of resources and having a lot maintained and having them still be in good working order, just in case we need them. Right? And this is why like the self supervised or unsupervised learning approaches that are ubiquitous for like starting getting your LLMs to give you sort of your reasonable prior over language is the sort of stuff that your brain is definitely doing.

So in a sense, it is using everything, but it's not really using all of the information that's present. Right? And that's sort of, I think, the argument that I wanna make. The idea of having to traffic and squishy people in order to make our systems go is not immediately appealing. Let's put it that way.

This episode is sponsored by Prolific.

Let's get few quality examples in. Let's get the right humans in to get the right quality of human feedback in. So so we're trying to make human data or human feedback. We treat it as an infrastructure problem. We try to make it accessible.

We make it cheaper. We effectively democratize access to this data.

What do you think about these broad sort of metaphorical idealizations? Know, the the big one is that the brain is a computer. The probably the the more popular one

is that the brain is a prediction machine. It will always be the case that our explanation for how the brain works will be by analogy to the most sophisticated technology that we have. How's that for a non answer, right? A couple thousand years ago, right? How'd the brain work?

Was like levers and pulleys, man. I mean, duh, don't be ridiculous. Why? Was the, you know, at some point in the middle ages, it became humorous, right? Because fluid dynamics was like the, you know, was the kind of technology, you know, the technology that was like the most advanced or technology that took advantage of water power was like the most advanced technology that we had.

Now the most advanced technology is computers. So duh, that's exactly how the brain works.

Philosophers used to think that the universe was a machine. Went to Chomsky about this as well because he talks about the ghost in the machine. And the ghost is all of the bits in the machine that we don't understand. But do you think now that we can think of the universe as a machine?

I think that that is a very convenient way to think of the universe. Right? So when we model the universe as having like causal structure, right? Do we do so because it actually has causal structure or because that's a really convenient class of models with which to work? I think that it's, you know, it has causal structure, right?

But also it's a convenient class of models. A good example is large language models, right? So most but not all are autoregressive in terms of their predictions. All right, well, why? Like, why is it auto Oh, because it's mathematically convenient.

It's a compact way to like take the past and make a prediction about the future. Does it mean that that's actually the way language works? No, I don't think it's actually the way language works, but it's a computationally convenient model. In physics, have are, in fact momentum is a good example. Why do we need momentum in order to describe we don't observe momentum directly.

You're just looking at videos. You know the position of the ball. Right? You wanna infer the velocity while you just take the difference between two adjacent positions and then that gives you But you don't ever directly observe the momentum. And this is in a mechanical setting.

So why did we choose momentum? Well, we chose momentum because that's the variable that if we knew what momentum was, now everything is Markovian, right? Now there's like a simple like, causal model that describes how the world works. We picked that model because we picked that particular hidden variable because it's what rendered the model causal. Does that mean that's how the universe works?

Or was that just a computationally convenient choice? I'm going to stay agnostic on that one. But I do like that it's a computational that ended up working out.

And just quickly riff on the benefits of having models that preference causal relationships.

So the nice thing about so when you have a causal relationship, it reduces the number of variables you have to worry about and track. That's the beauty of having a cause. It's like a Markov. It's the same argument with momentum and Markov models. We chose to have that hidden variable because it's the thing that made the model simpler, right?

It made the calculations easy. Now we can just like go forward in time, just make predictions in a totally like iterative fashion. That's what makes causal models great. The other thing that makes causal models great is if you do ever intend to sort of act or behave, right? Then you still need to be able to predict the consequences of your action.

The more tightly linked your actions or your affordances are to the things that causally impact the world, the more effective those actions are with respect to your model, but hopefully also with respect to reality. And so we prefer causal models, in part because they are relatively speaking simpler to execute, right? And in a simulation form, but also because they point directly to, well, where should I intervene? Where should I go in? How should I choose my series of actions that will lead me to the desired conclusion or goal?

What's the difference between micro causation and macro causation?

I think the difference between micro and macro is a single letter.

So we could just model the light cone at the particle level. Oh, yeah. That's how Yeah. Physicists Yeah. Mean, that's the way physicists see the world.

And we see the world in terms of populations and people and all these macroscopic things. We still reasonably do experiments and we do interventions and we randomization. To truly identify a causal relationship, have to

do an intervention. The classic example, this is also in lung cancer. I forget how long ago this was, but at one point there was this belief that alcoholism caused lung cancer, but it was actually because they were in poor health because they were alcoholics and they smoked a lot more than the rest of the population, right? So you do need to do that kind of intervention to discover a causal relationship. However, the causal relationships that we care about are the ones that mesh with our affordances.

Identifying a microscopic causal relationship is super, that's great. But unless you have really tiny tweezers, it's not very helpful, right? What you need to do is you need to identify the causal relationships that are present in the domain in which you are capable of acting. We care about the causal relationships at the macroscopic level because that is where we live. We live at the macroscopic level.

Most of our actions are at the mic. Now, one of the best things about humans is our ability to extend the domain of our affordances with technology, right? We have like nuclear power, because what we did was we acquired the ability to take tweezers at that scale and like, make these things happen, right? We figured out how to take advantage of causal relationships at that level, not because we have those abilities, but we were able to create the tools that gave us access to that space. It all depends on what the problem it is that you're trying to solve.

And the causal relationships that you always care about will be the ones that are related to the actions that you are capable of performing. Now that said, there's clearly a great advantage in understanding the microscopic causal relationships, If for no other reason, then that might lead to us discovering a way to expand our affordances into another aspect of

the microscopic domain. Is this just instrumental? It's a little bit like we say that agents have intentions and representations, and it's just a great way of understanding things. But for all intents and purposes, it's not actually how it works.

Well, think that that sentence ended on a rather definitive statement, with which I don't think I would agree. The rest of it, is it all, you're asking the scientific anti realist if it's all instrumental. So yeah, it's all instrumental, right? The things that we care about are the things that, again, back to affordances, right? So we need to understand causal relationships at the scale that we can manipulate, right?

That's matters most, Because that allows us to have effective actions in the world in which we actually live. To the extent that we care about other scales, it is because we simply wish to expand our domain of influence. The mind is quite

an interesting example. So let's say I wanted to move my hand, and my mind willed it so. It's top down causation. Now, can't act in the world of my mind. But it seems macroscopically intelligible.

We think about our minds. So maybe the mind is a special case.

I don't know. Well, the mind is a special case. I'll agree with that. I think of downward causation from, well, guess from an instrumentalist perspective. It's like, I'm not saying downward causation is the thing.

I'm saying the downward causation is like how it all works. I would take it from more from the perspective that downward causation, if discovered downward causation is what justified your macroscopic assumption. So what do I mean by that? I mean that like, suppose I'm in the following situation, got a bunch of microscopic elements and they're all doing stuff, and I'd like to draw a circle around them and call that a macroscopic object. Now I am justified in doing so if that particular description of the macroscopic at the macroscopic level, right has the downward causation property, right?

It is a way of sort of saying, oh, that was a good you drew that circle you drew, that was a good circle, right? Because it's summarized the behavior of the system as a whole, right? In a way that rendered the microscopic behavior irrelevant for further consideration,

Yes. I can think of some situations where we do this. We might identify an aspect of culture or a meme, and we might say that is responsible for violence or something like that. You still have

to show that it has that property. Right? And I think intentionality is a tough one, right? Because it's a variable that has a lot of explanatory power, but it's not one that evolves. So when I think of a good macroscopic variable, it's one that I understand how it evolves over time.

That's what makes it a good macroscopic. I can just write down a simple equation and it says, pressure, volume, temperature, right? They are going to do this over time. And like taking any little microscopic measurement becomes like totally irrelevant. Right?

But what made it useful wasn't just that the the microscopic measurements are irrelevant. Right? It's that I had an equation that describes how it would have behaved, that's also fairly accurate. So I have a nice, deterministic model that is at the macroscopic level, right? And so when you talk about intentionality, I think it's, yes, it can be used as an explanatory variable, but it's only good to the extent that we understand how that intentionality changes over time, right?

It's a long term prediction. And this is why like, the jurisprudence example made me really uncomfortable because it's sort of like saying, what you're kind of doing is you're saying this is a bad person, right? And I don't know how we would necessarily like identify like that intentionality, except in a very indirect way, right? Then they're stuck with But then because it's only good as a max covered variable, if we can make predictions about how that variable changes over time, and we're not doing that, we're saying that you're stuck with it, right? That's why it sort of makes me a little uncomfortable.

I did actually notice that the active inference community has quite a ragtag it's got very diverse. So in a way, you see people rubbing up against each other that you normally wouldn't. And that can create arguments, I suppose.

Yeah. Well, think this was Carl's influence. So what did Carl actually discover? He's got this link between information theory and statistical physics that in some way gives you this uniform mathematical framework that's widely applicable to a huge number of situations. It has a lot of sort of things that are baked into how we think about the world is kind of like baked into it.

And so it can be applied in a whole bunch of different areas. And Carl spent a lot of time basically evangelizing various different aspects of the scientific community. It's like, oh, look, you can apply this to epidemiology. You can apply this to the social sciences. You can apply this to physics.

You can apply, and just sort of wrote a series. This is one of the reasons I think he's so prolific is because he's basically written variations on the same paper, right? But just applied in different domains. And he did this, and this was intentional, right? Because he wanted to show that this is nigh uniformly applicable mathematical framework.

And I think he's largely right about that. As a result, there's all these people from all these different communities have been pulled into his sphere that think about the world very differently. And it makes for some very entertaining conversations at the pub.

Yes. Even in our Discord server, we've got people thinking about it in terms of crypto, even in terms of Christianity, phenomenology, psychology. It's really interesting. Yeah, it's

that's the beauty of constructing a nearly uniformly applicable mathematical framework, right? Yes, exactly. You get to suddenly this is one of the things I love. I mean, is what I love about the community, in fact, is that we now have a relatively common language to discuss a huge variety of different things. Now, of course, that means we often end up talking cross purposes, but that's half the fun, right?

So I often ask people in the business, like, what changed? Why did we have this massive explosion in AI development over the last several years? There are three common responses, I agree with every single one of them. Autograd, the transformer, but why the transformer is something that I often disagree with people about transformer architecture. And just the the the ability to scale things up in a manner that we haven't haven't really seen before.

I actually, the reason why I say transformer comes with an asterisk is because a lot of the things that transformers have been that people believe that the transformer enabled, I think really resulted more from scaling and my the point, you know, the point of evidence that I like the site is like Mamba. Mamba, which is a state which is a traditional state space model. It's basically a common filter, but like on steroids, they scaled it way up and yet and now it's, you know, got they've, you know, Mistral has their very nice like coding agent, and it works pretty darn well, They got a lot of the same functionality with a completely different architecture simply by virtue of scaling. Transformers get an asterisk. I think the biggest thing was Autograd.

And Autograd turned the development of artificial intelligence from being something that was done by carefully constructing your neural networks and then writing down your learning rules and going through all that painful process that was tick for. And they turned it into an engineering problem. It made it possible to experiment with different architectures, different networks, different nonlinearities, different structures, different ways of like getting your memory in there in different ways. And all this fun stuff that allowed people to just start trying things out in a way that we couldn't do it before. And then we what did we did we suddenly discovered Oh, it turns out backprop does work.

I mean, when I was a young man, like backprop was considered a non starter for two reasons, right? One is it's not brain like, which is true, right? The brain does not use backprop. And the other one was a vanishing gradients. Oh, you'll never solve the vanishing gradients problem.

And it's like, oh, it'll always be unstable. Yet, nonetheless, once we turn it into a new problem, sort of playing around with tricks and hacks and certain kinds of nonlinearities and relues and this and that, we discovered that, oh, no, in fact, like there are ways around this. We just weren't gonna discover them by playing with equations. We had to actually start it. So we turned it into an engineering problem.

As soon as it got turned into an engineering problem, that's what enabled the hyperscaling, which is what led to all of this, you know, these great developments over the last several years. What got lost in the mix though, was the notion that there's more to artificial intelligence than just like function approximation. We got really good function approximators. But that's not the only thing you need to develop like proper AI, right? You need models that are structured like the brain is structured.

You need models that are structured like how we conceive the world is structured, certainly if you want to have models that think the way we think, and that got lost in the shuffle. And we're starting to see, you know, as we're starting to see the limitations faults and flaws of these approaches, and starting to see them not living up to the hype, which I think is like now it's standard that like, like AGI is no longer I don't know if you read the other day, at least according to, you know, the experts in the field at the top of the best companies in the business, like AGI is no longer like a huge priority, right? And they're dialing back the rhetoric surrounding that. In part, because I think that they've begun to realize that like, just function approximation isn't going to deliver, that was just hype, right? We do need to do something different.

We do need to start bringing in what we know about how the brain works, right? If we're ever going to get to something that is a human like intelligence. And that was the starting point for us, you know, about a year or so ago is that we were sort of like, yes, let's do the same thing for cognitive models. Like, let's talk about let's take what we know about how the brain, the brain actually works. Let's take what we know about how people actually think about the world in which they live, and start building an artificial intelligence that thinks like we do by incorporating these principles.

And this means basically creating a modeling and coding framework for building brain like models at scale. And that's like the critical element because obviously scaling was a big part of the solution. And right now, most of the work in the active inference space, as I'm sure you're aware, is not at scale. There's very little like active inference work that is active inference at scale. Most of the models are like relatively small toy grid world type models.

And part of the reason for that is that, you know, it is, in fact, difficult to scale Bayesian methods. Now, that also has now begun to change. We now have a lot of great mathematical tools and a lot of great frameworks for approximating Bayesian inference. You'll never do it exactly. Or approximating Bayesian inference, which I believe is how the brain works, Bayesian brain and all that, that allows us to build these kind of structured models that are structured both after how the brain is structured and how the world that we live in is actually structured.

Hence, this notion that what we need to get to the next layer of AGI, and I also don't like that term and don't intend to use it very often. But what we need to get to next level right is this framework that allows us to build the kinds of models that we know people actually use, and just make them bigger and more sophisticated and so on. And then take advantage like hyper scaling Bayesian inference is part of it, but also like it's constructing models of the world as it actually works. The way the world actually works, right, is what is it provides us with the structure of our own thinking, right? The atomic elements of thought is how I like to phrase it, are models of the physical world in which we live.

And the physical world in which we live is a world of macroscopic objects that have specific relations and interact in certain ways that we understand, right? Looking around the room for a good example, right? You sit on a chair, right? That's an example of a relationship. It holds you up and all that fun stuff.

And those are the kinds of, that understanding of the physical world was necessary for us to have in order to survive. Dogs have it too, right? It's languages and what makes the, you know, it isn't all that special, right? Well, it's actually quite special, those are the models that form that understanding of the world in which we live is where we get our the models that form the models that form the atomic elements of our thoughts, out of which we have composed more sophisticated models that have allowed us to do all this great systems engineering, build this great technology that we've got. So that's what we want to do, We're focused on building cognitively inspired models that are based on our understanding on the way the world in which we live actually works, because we believe intelligence must be embodied, building a framework for putting those models together and experimenting with them at scale, all in an approximately Bayesian way because we believe that's how the brain works.

It's not just about putting your AI into a robot. It's about giving that giving the robot a model of the world that is like our model of the world, a model that is object centered, it's dynamic. It's, it's largely causal. Right? That's the big difference.

And I think that the sort of sparse structured models is another sort of key differentiating component. Like when you think about how like a transformer and LLM work, a transformer takes every word in the document and says, now, how does this word relate to every other word? It does it many, many, many, many times. It's very much same thing with your generative vision language action models. They operate in pixel space.

They are microscopic models. Now, yes, do they have an implicit notion of sort of macroscopic? Yes, they must because they work. But it's implicit. And it's not implemented with the kind of sparse structure that actually exists in the real world and in our conceptualization of it.

And that's the thing that we are saying, no, no, no, look, if we want an AI that thinks like us, then we are going to build models that are structured both like the real world is structured. They have this sparse causal macroscopic structure to it. And so should our models. And the only way to do that is not just to put a robot in the real world, but to put a robot with a model that is structured in that fashion into the real world.

No one's using the XLSTM. Not many people are using Mamba because why not? All you need to do is just scale the transformer as much as possible. So many people just really think you just magically get these things for free, right?

So I think you could argue that with enough data that's the right kind of data, one of these really big super scaled models will obtain an implicit representation of the world that is more or less correct. Now, having an implicit representation is great if your only goal is to just represent the world, if your only goal is to just predict what's going to happen. But it turns out people do something which is very different. People are creative. People can solve novel problems.

It's not just about mining old problems and figuring out where I can move some words around and get an answer that looks more or less right. We actually are capable of creating. We're capable of inventing new things. The way that we invent, I think, is exemplified by systems engineering. How does systems engineering work?

Well, I know how an airfoil works to create lift. I know how a jet engine works to create thrust. And I could take those two bits of information to invent something brand new, which is an airplane. That kind of systems engineering was predicated upon having this sort of model of the world that was relational. Right?

Here's the wing. I can put a jet on. I can, like, I don't know, you don't staple it on. I'm sure you use rivets or something. Right?

I know how to put things together. I know how to construct new relationships and new objects. An AI that that is designed for systems that is designed to do systems engineering will have a object centered or system centered understanding of the world. And we'll know how those all of the objects relate so that it can sort of start experimenting with different ways to combine. It's absolutely you know, without that, the only thing you will ever be able to do right is just retool old solutions for new purposes.

Even that is, I think is a generous interpretation of what a purely predictive model is gonna do. Right? So this is how I like to think about like, you know, the principal advantage of taking this object centered approach, right, is that it enables systems engineering.

What is a grounded world model? That is a so, so I feel like that's a trick question.

I actually had this conversation with one of my friends, Max, co conspirators the other day. In some sense, every model is grounded. It's grounded in the data that it was given. Now, okay, so that's like a true statement. It's like, okay, but that's not what we want.

And when we often use the word like a grounded world model, it's we say that it's grounded in something. And that something is not just the data that it saw. So example, vision language models. A vision language model is a way of grounding the visual model in the linguistic space. And this is the approach that we're taking.

This is what LangChain does. It's all about taking models, and everything becomes a blank language model, vision, whatever. Everything becomes when you do that, what you're doing is you're saying that you're grounding all of your models in a common linguistic space so that they can communicate with one another via language. Now, why did we choose language? Well, we chose language because, honestly, I think it's because we wanted models that we could talk to.

We wanted a model that it was really all about making the interface convenient for us, which is great. That's totally something you want. But it begs the question, what's the right domain in which to ground your models? Now, like grounding models. So we also use the phrase like ground truth.

And of course, truth is the thing you made up and said was ground truth, right? So what's ground truth? What is the right domain in which to ground models in order to get them to think like we do? That's the relevant question. And so my view is that, again, if you want AI that thinks like we do, you need to have it grounded in the same domain in which we are ground.

And we are grounded in this domain. This is why the embodied bit is such an important thing. We want models that are grounded in the physical world in which we evolved. And the reason for this is because that is the world that provides us with these atomic elements of thought. A single cell like lives in soup, right?

And it has, know, whatever model it has of the world to the extent that it has one, or it behaves as if it has one, that model is the model of its environment, right? If it didn't understand the environment in which it lived to some extent, right, then it wouldn't be able to continue to exist and function in that environment. So you can sort of say that a cell has a model that's grounded in chemistry, right, of the chemistry of the soup in which it lives. When we talk about like, is a prerequisite for its survival. Now we talk about like mammals and bigger animals and things that live in the macroscopic world that includes other animals, right?

And all that stuff. So what's that model? What's the world, well, at the very least, we can say that whatever models we have, a significant subset of them are grounded in that world, right? And that world we know has properties that we can understand it is object centered, it's relational, it's all this stuff. And so the grounded bit is more about like properly grounded, grounded in the domain in which we are grounded as a route to creating AI models that in fact think like we think, right?

That's the grounding that particularly focused on. If you had to choose the domain in which to ground your models, what would you choose? Right? I don't think language is the right one. Language is an incredibly poor description of both our thought processes and reality.

I tell the story all the time, right? So you ask any cognitive scientist or psychologist who's done some experimental work with humans, right? You put them in a chair, you make them do some tasks, you carefully monitor their behavior, you look at what they did, right? And then you have a nice way of and then that informs your theory of that behavior or however that works. If you do the experiment well, you have a very good model of how they made whatever decisions they made throughout the course experiment.

And then you go back and you ask them, why did you do what you did? And they give you an explanation. It sounds totally reasonable. It also is completely inconsistent with an accurate model of their behavior. Self report is the least reliable form of data right, that one gets out of a cognitive or psychological experiment.

And so we don't want to rely on that. We don't want to ground our models in what we know is an unreliable representation, both of the world and of our thought processes, Right? We wanna ground it in something that's a good model of our world. And that's why we've chosen to focus on like mac you know, models that are grounded in the domain of macroscopic physics as opposed to language.

Can you speak a

little bit more to the limitations with current active inference? A nearly uniformly applicable information theoretic for describing objects and agents, right? It really is inspired by statistical physics and its links to information theory. And when you take those two mathematical structures, throw in a little Markov blankety things so you can talk about macroscopic objects, you kind of have a very generic, widely applicable mathematical framework that you can throw up many problems. And a lot of what has gone on in the active members community over much of the last twenty years has been demonstrating that it's uniformly applicable.

So there's been a lot of breadth and not a lot of depth. And I think that, of course, that's appropriate given if you really want to make the argument that everyone should be using this, you shall see in this domain, it works on your toy examples. But the people doing that, the active community has this habit of showing, see, oh, basically, I can handle this psychological phenomenon. I can model this cognitive phenomenon. Oh, and look, it's a good post hoc description of this neural network's behavior and things like that, right?

They've been showing that, but they've never really sat down and tried to tackle any really big, really hard problem, because the emphasis has been on evangelism. You couple that with the fact that there is this strong bias within the active members community towards being as Bayesian as possible. And so of course, they also shun the really hard problems because Bayesian inference has been historically challenging to scale. There have been a lot of developments over the last few years that have come out of the machine learning community as well, but mostly out of the Bayesian machine learning community that have really made it possible to start scaling Bayesian inference in ways that we really weren't able to do it before. And you couple that with a desire to stop the evangelizing and start solving really hard problems with these methods.

And you've got a way to prove that active inference really can live up to its promises.

Yeah, it was a similar thing with constraint satisfaction. In the 1970s, there was that Lighthill report, and people said Symbolic AI will never work, they wrote it off. Apparently, just that there are all of these empirical methods that have been discovered in the last twenty years that just make it massively more scalable and tractable. And is it the same thing here? Are there some specific techniques that have dramatically improved the tractability of active inference?

I sort of, I would lump it all into the Bayesian inference category. There have been a number of developments over the last, I would say eight, eight, yeah, eight years or so that have made Bayesian inference significantly more tractable than it used to be. Some of it had to do with, you know, work in the sort of Gaussian process space. My current favorite trick is normalizing flows, which is a great way of ensuring that you have access to sophisticated likelihoods, but nonetheless result in tractable probability distributions. There's the work, you know, I've been using natural gradient methods for a very long time, which allow you to massively speed up gradient inference and in some situations completely eliminate the need to do gradient inference and instead like, you know, do coordinate descent and allowing you to take massive jumps in parameter space and not actually lose the ability to do learning of sophisticated model in a sophisticated modeling scenario.

I also like the fact that like the natural gradient stuff has been getting some great acronyms recently, Bayesian online natural gradient for bong for short, just think these guys, these guys get me every time I wish I was that clever, honestly, But there've been a lot of developments in that space as well. Additionally, there's been a lot of developments in rapid sampling methods, conditional sampling methods, constraint methods like that, that have really improved things. And I think that like one of the problems again with the active awareness community, you know, historically that I think is now starting to change has been a hesitance to use these sort of certain approximate methods. There's been this this focus on, like, straight up old school message passing. And, you know, as soon as you sort of, you you know, if you relax the desire to be as Bayesian as possible, it opens up a lot more possibilities for for scaling this stuff up.

We're now talking about agents that are interacting with the world around them, and that still presumably needs a lot of data. So we've got

a couple of tricks. One of the nice things about taking an explicitly object centered approach is that you don't have to train all of your models. You don't have to train just one model at a time. This is my favorite trick. I think that this is one of the things I think we're going to be seeing a lot more of in the near future.

So if you want to train a vision model to understand like YouTube videos or something really complicated like that, you basically take one big model and you train it on a ton of data. You just keep training, keep training, keep training and eventually it sort of gains this implicit and it doesn't get an implicit sort of object centered understanding. Way to go to train objects in specific domains that are so these are smaller datasets. Like, I'm only gonna worry about the Zillow problem, like the inside of people's houses, right? And that's gonna have a much smaller set of objects that it has to learn an implicit distribution over.

And you can do this with one big neural network. There's a really great Gaussian splatting paper where they trained a massive neural network that is able sort of make predictions about what's going on inside people's houses and some nice language models. But obviously, has an understanding that's limited to a house and the objects that are inside a house. If you have an explicitly object centered model, then you end up not just with one model that understands a house, you end up with one model that's actually thousands and thousands of little models, each of which sort of explains like a single object or object class within the house, right? So you got like a book model.

So all books come in different shapes and colors, right? But there's just one like book model. And the beauty of doing this is that book model, you have to be a little clever about how you structure the interactions between these things. But if you're a little bit clever about how you describe the relationships between objects within this modeling framework, you gain the ability to train a model just on the insides of houses, a model just on parks and park benches, and take the objects that were discovered in this space and the objects that were discovered in this space and put them into a combined environment that has objects of both of those kinds, and it still works. That's the advantage of taking an object centered approach or what I like to refer to as the lots of little models approach.

Some of these things are a little bit weird. Some cultures have maybe one culture doesn't have the notion of time and some cultures might see two objects as one. So is there a potential problem here that there's some ambiguity that we need to overcome? I'm not going

to say that there's not the potential problem for ambiguity that we need to overcome. What I will say instead is the additional constraint that we're imposing, it's not just about objects, it's also about their relationships. Now think about physics. This why the physics discovery stuff is such a big part of it, right? In physics, particular, Newtonian mechanics, let's pretend we're living in a world of rigid bodies, right?

So all I need to worry about is weight and shape of things, and that defines a particular object type. But I also need to know how they interact. And so in in Newtonian mechanics, we have, like, you know, what we can do is we can take these objects. We can watch them, like, bouncing off of each other and doing all these sorts of things. And we can quickly infer that, like, oh, like, their interactions are all governed by us in a single language, which is the language of forces and force vectors, right?

That language of interaction, right, is really what makes it work, right? Otherwise, we'd just have pictures of things. That's all we would have got. What you're empirically discovering is sort of a generalized notion of forces that describe the relationships between things. And the constraint that you place in order to avoid the problem of things being too brittle, right?

Is that well, all have to use the same class of forces together in order to interact. We're stuck with that. But by being flexible about our definition of what a force is and having the ability to discover new kinds of forces, not just like literal force vectors, gives us the ability to sort of generalize without becoming too brittle.

You're talking to this interaction dynamic. So there's a graph of interactions which might possibly represent affordances in the macroscopic domain. And by doing analysis on the interaction graph and simplifying the analysis as much as possible, you get a principled way to partition the world up.

That's right. And so it's all about having interactions and interaction classes. So it's not like there's not just one adjacency matrix. There's an adjacency matrix that also specify there's one for every type of interaction that's possible. That's what gives you the additional flexibility.

The other thing that gives you the additional flexibility is being a little bit Bayesian about things. It may very well have been that all of your observations of this object when it was in a house were really simple. It was all just, it sits on a shelf, right? And so what do you know? Well, what you know is that that object sits on a shelf, but you have to be, which is one kind of interaction, right?

That's just the, has a force pushing down, there's a force pushing up. You don't know anything about like the weight on it, but if you keep error bars about that, if you keep error bars about the other kinds of interactions that you have seen, but are agnostic, right about the specific deals for this particular object, it gives you the flexibility to say, well, I'm going put it in this environment. I can make some predictions about how it's going to behave. But if I throw a bowling ball at it, I'm going to be making some assumptions about how it might behave. But once the bowling ball hits it, right, I might have to revise those assumptions.

This is the other critical elements of the approach we're taking, is you have to have some kind of continual learning element. This is something that really doesn't exist in contemporary AI. Right? And, you know, when you build your big model, you've spent millions of dollars training it and then you're done. Right?

Yes. Someone else can come along and fine tune it a bit for a particular task. Right? Which is great. But at the end of the day, when you're at the deployment phase, you turn learning off.

Whereas in this approach, we're saying, no, no, no, you've one of the things that's critical, a critical aspect of the way we think about the world and the way we learn about the world is that it's continual and it's interactivist, right? And that needs to be true of the objects that we're discovering as well. We've learned classes of interactions, but just because we haven't seen a particular class of interactions previously doesn't mean we say the others never happen. We still allow for that possibility and then do continual learning quick with rapid updates when we see something happen we see you know a new interaction. Now the what makes that work right is the fact that you specify that there are certain set of kinds of interactions some of which you previously observed some of which you still don't know about and might observe soon And then you could update your posterior beliefs about whether or not that object interacts in that way.

What would the architecture of such a system look like? I'm imagining it'll be distributed, right? So it'd be have all these different agents. And then we have the consistency problem because maybe this agent has empirically learned that these two things are a book, but the agent over there just thinks this one thing is a book. And then there's how many objects are there?

Would it become intractable? Realistically, I don't know what you're saying.

So from a simulation perspective, the way that this gets simulated is remarkably like the way a video game engine simulates the world. The only difference being is this abstract notion of forces. So how does a video game represent the world? Well, you have all these assets, right? And each asset is basically a shape, maybe a texture or color or something like a fork is an asset, right?

Or a little like three legged stool is an asset. And it has a bunch of properties, but it's basically you know that are associated with its shape, color, mass, all of this stuff and it has a set of interaction rules which are like Newtonian forces force vectors. Then you've got other things like water and sand that have special rules for them because if you just try to, because otherwise you need a macroscopic rule to describe them otherwise the compute would be insane and stuff like that. So it's very similar to that, right? What we do when you take this lots of little models approach, what you end up with is the moral equivalent of a giant list of video game assets.

And then when it goes to modeling a particular environment, when you find the agents that you're talking about that has this lots of little models model in its head, what it does is is it it sort of looks at the scene and says, oh, okay. I need to worry about these 10,000 little models right now. And that's it. I don't need the rest of it. Right?

And then it just sort of operates in that space running something that looks a lot like a video game simulation. Right? So it's that sparsity is makes this lots of little models approach. Right? You may have a million little models, but at any given time, you only need a tiny fraction of them And you just instantiate those.

The thought occurs, though, that in a game engine, all of these particles are in the engine. I can say, what are the forces between these two particles?

Yeah, it's called cheating.

Yeah, because when you deploy an agent in the real world, you can't just ask, well, what's the force vector between Jeff and the light? That's right. Yeah, you have

to learn those. Does this model if you take a video game engine as ground truth, are we capable of discovering the video game, the assets, and their properties that were in that game engine?

So what would your input be? Would it just be the pixels?

Yeah, why not? Make it hard. It would be cheating to start out with something that already segments the image for you. If you can't solve the hard problem from the bottom up, then it's not a hard problem. Why'd you do it?

If I understand correctly, a successful implementation of the technology you're talking about would be, let's start with a game engine. And we almost treat the AI like a black box. So it has input. I can move left. I can move right, pan up, down.

I can interact with objects. And then maybe there's some kind of a score function. I'm not sure. But it can learn inside the game engine, and it will build up this internal model library that represents things in the world in the game engine. And if it's learned a sparse, robust model library, you could, in principle, take the same learned model and apply it to a robot in the real world, and it would generalize.

That's the idea. And that's the problem that we're trying this is one of the critical missing elements in the robotics space is that, you know, if you you know, training models in simulated environments does not translate really very well to real world environments. This could have this could be a result of of a of a situation where the simulated environment is just too impoverished. But it could also be a result of a situation where the artificial environment just isn't actually a very accurate representation of the real world, right? And I think it's largely the latter, right?

Also coupled with the fact that the artificial agent's internal model is not structured like the world that it actually is being trained to function Those biggest are the problems. But it could also be the case, what do you need in order to address those? Well, is you need a good model for the robot's brain that has the structure of the world in which it lives. The other thing is you need a mapping from real world data to simulated data. And right now what we're typically using is video game engine.

Video game engines are great. I know I certainly enjoy them on a ten hour a week basis. The problem with them though is that they weren't trained to be realistic physics. Right? Most of them were designed to be plausible.

They were designed to look good to the user. Part of this has to do and there's a lot of tricks and hacks and things that are thrown in to deal with the fact that the equations of Newtonian mechanics are very stiff, right? When collisions happen, if you're just a little bit wrong about that, then weird stuff can happen and non physically realistic things can occur. So if you had the ability to construct an environment that had good enough physics that accurately represented the real world and trained your robots in that domain where they have these models in their heads so they're actually capable of learning the quote unquote ground truth that you've implemented in the simulated world, then I believe that they will generalize better to functioning in the real world. And this absolutely critical, I think, robotics going forward.

If for no other reason than right now, I mean, like large language models, all these self supervised models, the way that we're currently training robots to put your groceries away and things like that is all by training them to mimic human behavior, It's expert trajectory learning. They're not really learning the physics of their environment, they're learning to mimic human behavior without like crushing the eggs, right? And know, if you want them to be able to generalize across domains, across tasks, you need to get rid of reliance on expert trajectory learning. And so that's the end, and that only happens when you move to something that is explicitly model based with a model that accurately represents the world in which they live.

Once you've got a core set of models that work in the world, is that the value of the AI?

Yeah, so once you have a core set, then you have the ability to deploy your agent out there in the real world, and it can handle situations that it couldn't previously handle. One of my co conspirators likes to talk about the cat in a warehouse problem, right? So what do we have? So now we've got an AI agent that has been trained to like manage a warehouse, right? And so it understands things like forklifts and boxes and workers, hopefully, you know, and all the, and it knows how to do it.

And then one day, one day something comes along that's never seen before, it's a cat, right? Cats don't belong anywhere else, a cat comes along, right? And so the model has no idea, has never seen a cat before, right? Because that's the environment in which it was trained. This is one of the beauties of this approach.

So the cat comes in the warehouse and it's like, what the hell is this? And like, it's screwing with my system. And because we're taking this sort of like, you know, free energy based approach, right? One of the critical elements is tracking surprises. So when a cat comes along, doesn't know what a cat is, the surprises signal goes crazy, and then it says, okay, stop.

Right? Don't run over the cat. Right? Let's figure out what's going on. And what it can do is it can take a picture of the cat, and it can fire it off to a a server somewhere that has a huge bank of models and has been pre trained on model selection to a small extent.

And it says, what the hell is this? And then the big bank of models says, oh, I think it's, you know, here's like seven or eight things it could possibly be. And it's different kinds of cats, maybe there's a dog thrown in whatever. And then it ports those little models over to the warehouse model. And then it sort of does some proper hypothesis testing, watches the cat behave for a little bit, ah, it's a cat, Puts the other models, sends the other models back because it doesn't need them anymore, right?

It's figured out that this is what it is. And now it's incorporated in understanding of the cat into the system. This is the beauty of taking an explicit this is another beauty of taking an explicitly object centered approach. It gives the model the ability to know what it doesn't know. That comes from the active inference component.

Know what it doesn't know. When it doesn't know it, it can go phone a friend. That's another way to describe it. The friend will respond by saying, oh, it's a cat. And then it can take the model account, incorporate it into its warehouse model.

And now it understands that. This is really great from a compute there's a huge compute advantage to this. If we had started with one big model that already knew what all a cat was, think how many parameters that would have. It'd be huge. This model is very frugal in a sense that it only knows, it only needs to know two things.

What it needs to know about the environment in which it exists, right? And when it sees something it doesn't know, and then it can just go pull. So that's the idea is that you have this massive bank of models, but when you instantiate a particular for a particular use case, you don't need them all. Yeah. Right?

You just need the ones that are relevant to that environment. But you could but models are continuously tracking surprise or uncertainty. And when it sees something it doesn't know before, it's smart enough to say, I don't know what that is.

How and when should deep learning be combined with this? My naive perception of Bayesian inference is right now, if you have a photograph from a camera and it's 300 pixels squared or something, that would be a challenge for Bayesian inference. Could you just use a vision language transformer or something and use that as part of the Bayesian framework? Or could you even use deep learning models as a way of bootstrapping the knowledge acquisition in Bayesian framework?

So the reason why I mentioned normalizing flows is because technically that's a deep learning tool. It just happens to be a deep learning tool for which the output that takes in an image and turns it into something that is easy to deal with from a probabilistic reasoning perspective. Are we going to use deep learning tools? Yes. The ones that are fit to purpose for sure.

And that's a great example of one where we're taking sort of like, oh, well, why wouldn't we use this if it's compatible with our framework?

Many folks in the audience won't know what a normalizing flow is. Can you just give us a quick update on that?

Well, okay. We've got a pretty good handle on how diffusion models work these days, right? It's, you you take your image, you just add a bunch of noise to it, make it Gaussian, and then you learn that inverse transformation. It's the same thing, right? What you're doing is you're learning a mapping from a probability distribution that is easy to deal with like a Gaussian distribution.

And you're learning a mapping from that distribution onto the thing you actually are observing, the thing you care about. So in this case, it could be an image. In fact, I actually don't think we should call them diffusion models. It's a normalizing flow. The diffusion is it should be referred to as a diffusion training protocol for a normalizing flow.

So to some extent, will be using some of those tricks as well. You could say, yeah, we're going use diffusion models. It's going to make me roll my eyes and say that.

Jeff, what is your approach to alignment?

Well, typically like to talk to people about their beliefs and values and figure out how it is that they came to form them and then try to convince them to adopt my values. The beliefs that these systems have, the belief that our artificial systems have are not the same as our beliefs. And they're and their reward functions that we specify for these artificial agents are definitely not the same as our reward functions. Now there's a few exceptions. It's like like like go chess, right?

Any game where you either win or you lose the reward function is obvious. But in general, like, in complicated situations, reward functions. It's not so obvious what the reward function should actually be. You know, I know that there's this definite belief that like reward is all you need. And there's some truth to that.

But the question is, well, where'd your reward function come from? Now, a philosophical perspective, you know, there is no normative solution to the problem of reward function selection, I always say barring divine intervention. And that which is just another fancy way of saying that, like, your values and my values might be different. And it's really difficult to say whose are better, right? From a practical perspective, a situation that I like to point out is if you're talking about self driving cars, obviously, you'd like to penalize your self driving car if it drives over a squirrel.

Right? But if it had to choose between a squirrel or a cat, most people would want it to choose the squirrel. And the way you would do that in our L model is you say minus 10 points for a squirrel, minus 50 for a cat. Where did those numbers come from? It's completely ambiguous, right?

That's relatively arbitrary. They're kind of sort of made up. And so relying on arbitrarily selected reward functions, like seems like a terrible idea. We also know that like things can go horribly wrong. And I know everyone's sick of this example.

But like, you know, you all you know, when rely on reward, you're effectively like, you know, making wishes from like a malevolent genie, or, you run the risk of saying, hey, Skynet, end world hunger and it's like no problem, kill all humans. You don't specify your reward functions very carefully, you can get very degenerate behavior. So the goal of alignment in an RL setting is to get it would be to somehow get my reward function or perhaps humanity's collective reward function into the AI agent. This is really, really, really hard. It's really, really, really hard because measuring reward functions is really, really, really challenging.

The approach that we're taking is we're saying, so how do people actually do this? How we as humans construct alignment? Well, first thing we do is we try to figure out what other people's reward functions are. The problem of reward function identification is conflated by the fact that people have different beliefs. Action, which is what we can observe other people doing is a combination of their beliefs, right, and their reward function or their values.

And so just sort of like taking those, you know, so the problem of course is that you only observe people's actions. Like there's a difference of opinion about what to do, right? And so you want to figure out why. And it could be because your beliefs are different, or it could be because your values are different. The only the but it's ambiguous, you can't tell it's mathematically, it's not even possible to separate these two belief and value are fundamentally conflated when all you observe is action or decision.

The way that we solve this problem as people is we talk about our beliefs. I ask you, well, why do you think this is the action? And then you tell me like, oh, well, it's because this fact, this fact, and this fact suggests that if I do this, then this will happen. And then I can go in and say, ah, I see. So maybe the reason for the disagreement in our beliefs or in decision is because you're not aware of this fact, and I'd forgotten about this fact.

And so what we do is say, well, let's incorporate all of these things together and then sort of see and then you would still say like, well, I still think we should do X. And I'm like, no, it's still definitely Y. And we continue this conversation until each of us has a very reasonable model of the belief formation mechanism that the other person has. At which point, the only cause for disagreement is a disagreement about the reward function.

AI systems are completely illegible. And that's almost a good thing because if we actually understood how flawed they were, they would be banned.

Well, they're amoral. We have no idea how to put morality into them. The smart safe thing to do is to remove decision making from their capabilities and to simply use them as oracles or prediction engines, right? And then we can just say, hey, like, you know, what would happen if I did X, Y, and Z? And then it just sort of tells you, well, is the ultimate outcome.

And then we're like, oh, okay, well then maybe A, B, C were better choices, right? And things like that. That eliminates them from participating in actions, sorry, that prevents them from using their reward function, right? And you can get that just by like training them to just do good prediction. That's totally great.

But that doesn't give us the kind of automation that we really want, right? What we really want are decision, artificial agents that are decision makers that can act on our behalf. And so it's either gonna be human in the loop or it's going to be something like what I propose where we figure out how to like solve the alignment problem in that fashion.

But Jeff, you're an old school cognitive guy. So for someone like you, would you always think that in the absence or in the lieu of explicit cognitive models that we would never be able to say that these things actually had beliefs or intentions? I think

that what allows us to currently say that they don't have beliefs or intentions actually stems a lot from our knowledge of how they actually work. Right? I, for example, have no problem concluding that you have beliefs and intentions, though it may very well be that that conclusion is drawn from the fact that I really don't know how you work. I have an intuitive feel for it. I assume you work the way I work.

I have beliefs and intentions. You know, that's my perspective of myself. And so I conclude the same about you. It's kind of like emergence. Emergence is such a funny concept, right, is that there's this whole class, this whole, like, branch of the emergence literature that defines an emergent phenomenon as anything that I didn't predict, Right?

Which is a remarkably anthropocentric, and I would argue ignorance based definition of of emergence. And I don't like it. That's for the for those reasons. The same sort of thing, you know, goes with, you know you know, I I think the sort of converse of that is what's going on here. We know that these algorithms do not have the capability to do anything other than predict.

And so we don't believe they have intentions.

But something like strong emergence usually means like causal irreducibility.

Whatever definition of emergence you end up going with, it shouldn't be ignorance based. It shouldn't be based on like, oh, well, there's, you know. And so, you know, that includes explanations of emergence that involves things like, well, the only way I could have discovered this was by simulating it. Therefore, it is an emergent phenomenon. I don't even like that.

I don't even like that. I am more sympathetic to that, but I prefer definitions of emergence that are sort of more pragmatic, right? That are sort of like, oh no, this is why I like downward causation as like a fundamental feature of emergent behavior. Mostly because downward causation, it's not only a nice explanation of when you can, a nice sort of fairly rigorous definition of when you can say a phenomenon is emergent, it also comes with a practical tool. It tells you you don't need to model the microscopic phenomenon.

Last time we spoke about linear and Game of Life, didn't we?

Yeah, that was, oh, that's so, I'm still playing with that by the way. Are you? Some of the linear, one favorite of linear simulations, this is not particle linear, this is the traditional linear. And so what they do is they have a field, and they're obstructions, and they're squares and circles and things like that. And then they have these little creatures that are sort of like a little amoeba like swimmers like they've got like fins in the back and everything and they swim and they'll hit one of these obstructions which will cause them to deform and kind of look like oh it's gonna die, that's so sad, then it reforms and becomes itself again And so we thought of this as like a really nice abstract environment in which to test some of the properties of the physics discovery algorithm.

Because one of the nice things about the approach we've taken is that as thing, as a little swimmer goes and hits something, it's possible that it loses its identity when it like deforms into something new and then reforms into itself. And we wanted to see if the approach we've taken captures that. And it more or less does, right? It hits the obstruction, right? It changes its identity into an object of a different type, and then reforms and comes out the other side, and then regains its identity back.

Fascinating.

Quick aside, you know, spoke last time about Alex Maud Vinserf and he had this, you know, convolutional cellular automata with the gecko, you know, the self healing gecko. And he has now written a new paper with his friends at Google and it's using logic gates. So it's like an emergentist logic gate thing that draws a Google logo. And I haven't read it in detail, but it looks amazing. So definitely look at that.

And now you're taking your system and you're applying it in something like a game of life, basically. But you still expect it to work.

Yeah, well, so there are forces in Linea, right? The rule that causes the pixels to change, right, it has a few properties, right? It's radially symmetric. Yep. Right?

And it can flip sign, but it's like any radial symmetry can work. And so it has like a polarity, it has like, you know, and you can think of that, I mean, it is a force in a sense, right? And it's even a force that's like, kind of like real forces. It's like a weird kind of charged particle thing. And so I still think that you can, you know, the approach that we're taking is basically just discovering what are the effective forces between, it's, you know, we're not worried about the microscopic forces.

We don't care. Right, that's the whole point of a macroscopic physics. You know that there are microscopic forces that govern the behavior of the system as a whole. But what you're interested in are the things that make predictions on the scale you care And so what you're doing is you're discovering the effective rules that describe the interactions between not just the particles that make or the pixels that make up the the the little floater or whatever flyer, but the rules that govern its interactions with other floaters or physical objects like obstructs, they put obstructions

in the domain and things like that. And Keith really loves cellular automata because they are Turing complete. And they have this miraculous ability to arbitrarily expand their memory. So you can have a grid size that's this big, you can just add more memory and you can add more memory and you don't have to train the thing from scratch. And using some of these approaches we've just been talking about, you can actually train the update rules, learn the update rules of Stochastic Gradient Descent.

So do you think in the future we might actually have an AI system which is running inside the cellular automata?

That is a very good question. So the snarky response is to say, don't we already? I mean, we got it running on a computer, and at the end of the day, a computer is just a whole bunch of logic gates. Isn't it already cellular automata?

Well, it's in the same class of algorithms. Yeah. But there seems to be a cellular automata has this emergentist thing. So what it what it does is not how it's programmed. And it feels that it feels like there's a trick, that way it's programmed is an order of magnitude less complicated than the thing it does.

So it feels like a magical bridge to do stuff which is more complicated than we could explicitly program or learn. I agree, but that also sounds a lot like a computer.

It's like, well, what can you do with like a lot of Now, I guess the difference between a cellular automata is that a computer, you program it, you tell it exactly, you specify something. Whereas in a cellular automata, the way that you're training, if you're training to do something in particular, like, so for example, find a bunch of discrete objects that go in a certain direction, and then you're allowed to like tweak the rules that govern the local interactions until you get something that more or less does that. That's just programming in a sort of backhanded way. Yeah. But I think that's I mean, those systems are very interesting because it is remarkable that really dumb, simple rules, right, can lead to like really interesting sophisticated behavior.

The thing that I find interesting though, isn't the fact that like complicated stuff can result from like simple local rules. What I find interesting, what I'm more interested in rather is like, well, what are the properties of the resulting large scale objects? Right? How is that related to the small scale objects? What's the mathematical description of those big things?

The things that are, that have emerged. I'm less concern less interested in how they precisely emerge. This probably is because of my bias for taking a a human cognitive approach. Most of the people don't act you you know, when you look at the game of life, when most people think, oh, that's really cool. Like, look at these pretty pictures and all these little creatures, and they're doing fun things.

They don't really care about the low level rules. Right? The thing that captures their imagination is the high level, the macroscopic level behavior of these things. Though it is cool that you can get them from simple rules.

Yes. Yes. No, as you say, we can program computers, but there's the legibility ceiling. We could do program synthesis. Doesn't work very well.

It will. I have confidence that that's not one of those things that I'm gonna, like, outright poo poo. Yes. I do have confidence that there's a lot of that, you know, that is a rich that is a new area. It's, you know, relatively new, and they haven't really know, there's a lot know, are a lot of it has a lot of promise.

That's what I'm going to say. And to some extent, the approach that we're taking is compatible with program synthesis, right? We're taking this object centered description of the world. And the reason we're doing that is because we want to automate systems engineering. Well, what's systems engineering?

Oh, that's like taking this object and attaching to this one, attaching to this until you get something that does something really cool, right? Program synthesis, right, is an abstract way of doing that, right? Is that you start with one program, you attach it to another program, attach it to another program and so on and so forth.

There is this problem of just understanding the program. I mean, I'm going back to DreamCoder and I'm sure Kevin and Josh have put other ones out more recently. Some of the programs which are learned are just really complicated. They had examples of like, I think drawing towers and drawing graphs and stuff like that. And you just saw this huge confection of rules that are being composed together.

And it's great, it has many good properties that it's a program, but it doesn't really make sense to us.

Yeah, to a large extent, I suspect that there are ways around that that are related to how it is that like your AI coding agent actually works. So for example, right, when they're doing this program synthesis, what they don't currently have access to is the kind of dataset that GitHub has access to. They don't have access to a whole bunch of really well written programs that do exactly what they were intended to do. There was a paper in Nature. This was actually one of those situations where neuroscience is making interesting statements about machine learning from Tony Zadar.

And what he had done is they'd taken a whole bunch of neural networks that did a variety of different things, and then they came up with a way of genetically encoding them for the purposes of seeing if like, okay, so what's this so it's like, oh, I had to have a layer that did this, and then a layer that did this, and then what I'm gonna do is I'm gonna compactly represent the weights in each layer and come up with a representation of that. And then I'm just gonna look at a whole bunch of different neural networks and solve a whole bunch of different problems and say, are there any patterns that are present in these neural networks such that when I have a new problem I'm interested in, I can sort of just take something that understands this genetic code, maybe mutate it a little, has a way of sensibly traversing the space of possible neural networks until I find the best one. Program synthesis could in principle exploit the same trick. They just need the data set to do it. What are humans in a world where everything can be done by a robot?

Machine Learning Street Talk (MLST)

We Invented Momentum Because Math is Hard [Dr. Jeff Beck]

0:00 / 0:00

View original episode →