| Episode | Status |
|---|---|
| Episode | Status |
|---|---|
Pedro Domingos, author of the bestselling book "The Master Algorithm," introduces his latest work: Tensor Logic - a new programming language he believes could become the fundamental language...
Tensor logic unifies not just symbolic AI and deep learning, it also unifies things like kernel machines and graphical models. I'm Pedro Domingos, I'm a professor of computer science at the University of Washington in Seattle and a long time machine learning researcher. And my dream from my PhD onwards has always been to unify all the different paradigms of AI into a single one. My PhD unified two of them, my best known research unifies a couple others of them. I wrote this book that turned into a big bestseller surprisingly called the MASA algorithm that is precisely about this goal and where we are towards that goal.
And my latest work which this podcast will talk about, is a new language called Tensor Logic that I would say for the first time brings this dream of a unified representation, a unified solution to AI within reach. So if you wanna find out, you know, how how we're gonna do that, watch this podcast.
You know, can set the temperature to GPT to zero and it still hallucinates. And I can I can have a I can have a poor deductive system that hallucinates all kinds of things? So to me, like, are separate separate problems.
No. Very good. So precisely the problem or one of the problems with GPT is that it it hallucinates even when you set the temperature to zero. What the hell? Right?
I want to have a mode.
Yeah. Yeah. Yeah.
Right? Not I, but like every Fortune 500 company, if it's going to use AI, needs to have a mode where the logic of the business is just to bathe. The security isn't violated, the customer doesn't get lied to, etc. We've got to have that ORA at the end of the will not take off, right? And transformers can't do that.
Tensor logic can do that precisely because in this reasoning innovating space mode that I just described, if you set the temperature to zero it does purely deductive reasoning and by the way the temperature can be different for each rule. Tensor logic is just based on this, to me, gobsmacking observation that an einsum and the rule in logic programming are the same thing. There is this thing called predicate invention which is discovering new predicates, discovering new relations that are not in the data but that explain it better. I would say that, you know, in some sense discovering representation like that is the key problem in AI, is the holy grail. What was Turing's achievement that we now take for granted?
Turing's achievement was, for which he is deservedly famous, right, is to postulate this notion of a universal machine. The amazing thing about computers is that the universal machine, which in his time was a was a completely counterintuitive notion. What do you mean a machine that can do everything? The typewriter can type, you know, know, like the sewing machine can sew. You're telling me there's a machine that can type with one hand and sew with the other.
What are you talking about? So like, this is the genius. Right? So first step, you want to have this property of having a machine that can do anything. What we're missing to be able to do what the universe does and evolution does is universal induction.
What is the Turing machine equivalent for induction for learning? That's what I'm after.
MLST is supported by CyberFund. Link in the description.
Hey, folks. I'm Omar, product and design lead at Google DeepMind. We just launched a revamped vibe coding experience in AI Studio that lets you mix and match AI capabilities to turn your ideas into reality faster than ever. Just describe your app, and Gemini will automatically wire up the right models and APIs for you. And if you need a spark, hit I'm feeling lucky, and we'll help you get started.
Head to a i.studio/build to create your first app. The idea of having to traffic in squishy people in order to make our systems go is not immediately appealing. Let's put it that way.
This episode is sponsored by Prolific.
Let's get few quality examples in. Let's get the right humans in to get the right quality of human feedback in. So so we're we're trying to make human data or human feedback. We treat it as an infrastructure problem. We try to make it accessible.
We make it cheaper. We effectively democratize access to this data. I'm a long time fan of Machine Learning Street Talk, in fact I was a fan of it before it was big. Just like I was doing deep learning before it was big, so very close analogy. So you should definitely watch Machine Learning Street Talk, it's one of the best ways to not only learn about machine learning but find out about what's going on at a deeper level than you see everywhere and that is very important, so you should definitely subscribe to Machine Learning Street Talk.
Professor Pedro Domingos, it's amazing to have you back on MLST. I've lost count of how many times we've had had you on the show now, so it's it's amazing to have you back. The main reason that we've invited you today is you've just released a brand new paper, a very exciting paper called Tensor Logic, the language of AI. And fields, you said, take off when they find their language. So you gave the example of calculus in physics and, you know, Boolean logic when designing circuits.
What's the idea behind this paper?
Well, tensor logic in many ways is the goal that I've been working towards my entire professional life, because I really do strongly believe that a field cannot take off until it has really found its language. And tensor logic, I believe, is the first language that really has all the key properties that you need in the language of AI. For example, it has automated reasoning right out of the box, like for example, prologue has, right? The classic AI languages had the number of things that we just took for granted. The transparent and reliable reasoning you didn't even have to worry about, it was just already available, right?
At the same you don't have that in PyTorch at all, right? You have all these hacks to try and do reasoning on top of it. At the same time, the Lisps and the prologues, they never had the auto differentiation, the ability to learn, like one of the beauties of the current moment in many ways is that you barely have, you look at most papers, people barely talk about the learning because it's already implemented under the hood, so you want that as well, right? And you want the scalability on GPUs is the other thing that things like PyTorch and TensorFlow and whatnot give you. There was no language before that had all of these, and there's a number of others but these maybe are some of the key ones.
So tensor logic is basically a language which as the name implies, is a marriage, a very deep unification, not just some superficial combination of the tensor algebra that deep networks are all built out of, and the logic programming that symbolic AI is built out of. There's only one construct in tensor logic and it's the tensor equation. You can do everything with tensor equations.
Are you saying that there's only one language of AI? Because certainly in in some fields like physics, you gave the example of calculus. I mean, yeah, like, you know, almost all of calculus, I mean, almost all of physics, you know, involves quite a bit of calculus. There are other fields where actually there are kind of multiple multiple languages, that that play, you know, almost equal roles. So I'm wondering if you think that is tensor logic gonna be 85, 90 plus percent of the way that we should be talking about and thinking about AI or will it be kind of a mixture of different languages?
That is a very good question. And in fact we know very well in computer science that there's no one programming language that is better for everything, there's just people who think it is. Right, everybody has their favorite language that they believe, you know, is the universal solvent, but it never really is. So we, and we know also for fundamental reasons like, you know, going back to Shannon and whatnot, that there is no language that is the most pithy for anything you might want to say. Right?
Having said that, right, physics is a good example because calculus is so fundamental, know. Feynman famously said that he thought in calculus. Right? And the thing that I found with tensor logic is that, you know, I don't know how much of AI it's going to be or how much it should be, but what I have found in many ways to my surprise is that in some ways tensor logic is more than just a programming language. It really I think captures the fundamentals of what you need in AI in a way that going in I didn't even think was possible.
All of tensor algebra can be reduced to this operation which, you know, going back to physics, is called the Einstein summation. Right? Einstein summation was something that was introduced by Einstein when he was working on relativity and got tired of writing summation signs. It was all about tensors, right? General relativity was all about tensors.
And he jokingly called it his great contribution to mathematics. But the bottom line, and you know there's this great paper by Tim Roktaschel, or you know, blog post saying Einsum is all you need. And truly you can do all of deep learning with just Einsum. All of the matrix multiplications and tensor products, all of that are instances of Einsum on the one hand. On the other hand, in symbolic AI, it's all about rules.
Right? And Tensor Logic is just based on this, to me, gobsmacking observation that an and the rule in logic programming are the same thing. They are actually the same thing, the only difference is that one is operating on real numbers and the other one is operating on booleans, but that's just a different atomic data type. Right? And then on top of that, so to summarize, at this point I think that it would be I look at all different things that I and others have done in AI and I think it would be crazy to not do these things with tensor logic.
There may be other better things coming you know after, but at this point I would say Tensor Logic probably better for what people are doing across the board. But hey, that's me, I may be a little biased.
First of all shout out to Tim Roktasher, I read that blog post from 2018 earlier that that you were referring to. But I I suppose the thought occurs that if it is mostly about Einsam and, you know, you might make the argument, why do we need an abstraction when we already have a great abstraction in Einstein? So folks now can use PyTorch and, you know, JAKs. What what exactly does your abstraction allow them to do that they can't do with PyTorch?
No. Very good. It does several things. So first of all, and this is going to be an increasing order of importance, the syntax of EINSUM in this language, there's also this package called INOPs, is incredibly clunky. So at a very basic level, tensor logic is just a much pithier, more compact, easier to write and understand way to write Einsum's.
And you know physics and mathematicians famously like to say that a good notation is half the battle. So this might not seem like a big deal but my experience is that you can just think better and faster once you have this notation that like this funky procedure call with these indices and these arrows and these arguments. It's a nightmare and you know, the syntax of tensor logic is like you write a ninth sum like you would write a rule. There's a tensor equation with a tensor on the left hand side, and this join of tensors on the right hand side. Right?
So this is one aspect. Another very important aspect, and one that I think could prove decisive is that people don't use EIN so much because it's not very efficient. Under the hood, it's not as efficient as your tip, you know, sometime, you know, this could be done so much better, right? But I've done some of programming of this and that wound up, even I wound up not using Einsen because it's so slow and clunky. And all of that can be fixed.
Once you have this one abstraction of the tensor equation, and you implement it on CUDA for example, you can optimize the heck out of it and you'll just be able to, know, Einstein will finally be able to reach its potential, right? But actually none of these things are actually the most important part. The most important part is that Einstein as we know it is only good for tensor algebra. Tensor logic is a language where the same construct does all the symbolic and all the numeric parts and any mix and variation between them including learning the symbolic part and whatnot. These are all things that in an Einstein world just didn't exist, right?
You talk about the people in your Einstein whether in AI or mathematics or physics and they just had no idea that any of this had anything to do with reasoning. You look at all the ways that people are trying to do reasoning today and just want to pull out your hair.
Let me ask a very concrete you know, in some sense, I'm a simple simple man. I need, like, a very concrete example because I completely agree with you, which is that the symbols we use, the language we use are just simplicity is so fundamental to our ability to, like, reason at higher and higher levels. So let's take one example, you know, from your paper, which is a logical or logical or of a bunch of values is equivalent to an EIN sum within a Heaviside function applied to it. Like you give this example, right?
The or just to, so to be precise, what I did in in, know, maybe this is an important piece of context. So if you look at, so the simplest form of logic programming is data log, right, which is the foundation of databases, right? Most SQL queries are variations on data log rules. And data log rules are composed of two things, joins and projections. This is like databases 100 and And what I have done is I have generalized join and projection to numeric values.
There's this thing which I defined called the tensor join and the tensor projection, which when the tensors are boolean becomes the regular symbolic database one. But now the numeric version has all these things as special cases, and by the way it's also more general than the EINSUM, right? So another benefit of this is that it actually goes beyond the EINSUM. An or, right, the way you get an or is by having more, is is just as in so how do you get an or in prolog or data log? Is by having, you know, multiple rules with the same head.
And then those rules, the then they implicitly being disjoint. Alright? So if I have AFBC and AFDE, then that means a if b c or d e. And the same thing happens here. And you could also of course just put them all in the same equation because it might be more convenient to say like well, a b plus c d, right?
So doing a NOR is a completely straightforward thing, but it's really not where the main action is. It's in the tensor joints and tensor projections.
Thank you, like for laying all that out. Completely agreed. Makes sense. I just wanted the very example I was giving is that an EINSUM over a particular index of boolean values, then with a Heaviside function applied to it, which is just that zero if it's less than zero or one if it's greater than zero, is equivalent to a logical or over that same index.
Oh yes, sorry, I understand your question. So again, it's more than that, it's a DNF. DNF is a disjunction of the normal form. Exactly. And so what happens is that like the EinSum is, so in numeric land, think of a dot product, right?
A dot product right, is just the sum of products, right, which in boolean land will be a disjunction of conjunctions. If more than one is true you get a number that is greater than one, which is you need to pass it through a step function to reduce all the values greater than one back to one, yes.
But my question was more like, okay, I have these two different representations of the same operation, at least at the element level, like an EIN sum over an index followed by a Heaviside on that element is equivalent to an OR overall the same Boolean values of that index. And I guess and my question per element. So my question to you is, I I give up one thing, which is instead of having a single symbol, which is kind of like an or, I've now got two operations, you know, EINSUM, Heaviside. And there are many examples of that. Right?
Like, can build every single circuit out of NAND gates. I think we discussed this like Yeah. Exactly. Once actually. Or or I can have like other kinds of gates and and it's useful to have other kinds of gates.
So in your in your language, do you foresee people not having syntactic sugar like an or operator, which under the hood is or would they still retain those? It's just that the fundamental, you know, the most basic, you know, constructs of the language are tensor logic.
We can do everything with NAND, so why do we need high level programming languages at all, right? The point, so there's two things that you want a language to be. First of all, you want it to be universal, so you can, for some things you don't, but in general, right, for AI surely you want a universal language. You want something Turing complete and Tensor Logic is that. But then this is actually the most important and most difficult part.
You want something that is at the right level of abstraction for the things that you want to do. And NAND definitely is not, And I can show with a lot of examples I have in the paper for example, you can code the transformer in a dozen tensor equations, as opposed to a vast massive code. And then what happens when people have a language that suits their needs, is that then they just get used to that, they often wind up using it even for things that it wasn't the perfect thing for, but at that point it's what they're comfortable with. So my guess is at the end of the day people are just going to do everything in tensor logic, and and and they know, you know, in the back of their heads that yes, there are oars going on here and you could think of them as oars, but they just think of them as joints and projection tensor equations.
Very good. And and by the way, you can implement transformers and anything else in tensor logic. It's so easy, in fact, that I fed your paper into Claude code, and I got it to implement the whole lot, this afternoon. And maybe I'll publish that on GitHub if folks wanna have a look, but, it it it's quite straightforward. But just to get the trajectory a little bit here, Pedro, you're famous for writing this master algorithm book.
And in that book, you spoke about all of these different tribes in machine learning, you know, like Bayesian folks and logic folks and kernel methods and neural networks and and all of this. And I guess, do you see this as as a step towards unifying these things together? Because now in tensor logic, you can actually create a composition of different modalities of AI and and it just works. But this might seem a bit weird to people. I mean, can you explain what that might actually look like?
Absolutely. So in a way the master algorithm was laying out my agenda, right, was asking the question, what is the master algorithm? I did say at the outset, I'm not going to give you the master algorithm in this book, I'm just going to tell you where we are and why I think this is the central goal of AI. I would say that tensor logic is that answer. Tensor logic, we haven't talked about that yet, but tensor logic unifies not just symbolic AI and deep learning, it also unifies things like kernel machines and graphical models.
The things that graphical models for example are built out of, and then you can compute probabilists with them, are they are a direct, I didn't do this on purpose but it just fell out. You know, factors that graphical models are made of, those are just tensors. And then the marginalization summation, sorry, the marginalization and the point wise products that are what probabilistic inference is made of, they are just tensor joints and projections on those tensors that represent potentials in the case of Bayesian network conditional distributions. So at this point we do have this very simple language where you can do the entire gamut of AI, which honestly I didn't think this was going to be possible going in. I thought the answer would be much more complicated.
Now, is this the master algorithm? Tensor logic per se is not the master algorithm because it's just a language. I would say that it's the scaffolding on top of which you can build the master algorithm. Now, tensor logic is not just a language, it's also the learning and reasoning facilities under the hood. So for example, one of the best things about tensor logic is that the autograph is incredibly simple.
Because there's just one construct, it's the tensor equation, and the gradient of a tensor logic program is just another tensor logic program. So this is all there. So the learning and the reasoning are all there. However, you know, what I would say is that this is not the mass routing per se, but it's what we need to producing and I intend to produce it on short order.
You've described a language and certainly if components of that language are Turing complete, that's a big vexed issue. We'll come back to that a little bit later. But, because of computational equivalence, we can, you know, from an expressibility point of view, we can describe anything in the universe. So we've got this framework. But to me, the challenge in AI is structure learning.
Right? So as well as being able to express stuff, it's being able to adapt to novelty and, create perhaps from building blocks that we already have a new structure to allow us to do something useful in that domain. And I can't quite make that leap with your technology yet. So how do we do the meta thing where we actually build the tensor logic constructions to represent the kind of world that we're seeing?
Oh very good, so I actually go into that in the paper, but briefly the paper is just an informal introduction to these ideas. Inductive logic programming is the field that deals with discovering rules from data. But it does this by things like greedy search or beam search and it's a very large search space and it's extremely inefficient, right? Which is actually one of the things that killed it even though it could do all these things that people know in deep learning are just painfully rediscovering. In tensor logic, this is one of the best parts of it, the structural learning falls out of the gradient descent.
The gradient descent actually does structural learning. And then on top of that, this is actually the best part as far as the learning is concerned, there is this thing called predicate invention, which is discovering new predicates, discovering new relations that are not in the data but that explain it better. I would say that in some sense discovering representation like that is the key problem in AI, is the holy grail. Everything that we know, you know, like when you look at the world, right, you don't see pixels, you don't see photons hitting your retina, right, you see objects. The objects are invented predicates.
All the way up to science, right, the most, like Newton's genius was to introduce a new quantity which is force, and energy, and entropy and all these etc etc, right? So, in tensor logic that also just happens by gradient descent, right? It's hard to believe but let me just give you a hint as to why this is the case. There's this other thing that is folded into tensor logic which is tensor decompositions, right? And tensor decompositions are generalizations of matrix decompositions.
And if you think about matrix decompositions to take that simple case, what a matrix decomposition does is it takes a matrix and decomposes into two new matrices that together are more compact but essentially reproduce the same data, right? And there's a generalization of that two tensors called the Tucker decomposition, there's others but the Tucker one is the most relevant one here. And so if you write in tensor, to answer your questions very directly, if you write in tensor logic a rule schema, know, including a data tensor on the left hand side, and by the way your entire data can just be reduced to one tensor embedded as one tensor, we can touch on that later, but you write a rule expressing that as a function of a few other tensors and the gradient descent, just as in matrix factorization, will discover the best values for those. And then if you want to for example then discretize it, say like I'm gonna threshold this and make it Boolean again, you will see what is the concept that that learned, or you can leave it in numeric form. So the learning is actually extraordinarily powerful.
I've always thought, and you know, I think a lot of people in deep learning really believe this, that, you know, gradient descent can do amazing things provided you give it the right architecture to operate on. And in a way what all these million papers are about is about finding the right architecture for gradient descent to operate on, and of course transformers are a great leap forward, but I think transfer I think, you know, tensor logic is an even greater leap forward.
How so? Because for example, I can picture suppose we we we wanna get rid of Python. So like I'm over here in, you know, PyTorch and and I've described all my my layers kind of in the clunky syntax. And instead, I'm like, no. Now I have, you know, the Tensor Logic, you know, GitHub programming language.
Let me go do there. I'm still going to construct my layers. Right? So because like, for example, you do, of course, you allow for the the, you know, nonlinearities. Right?
So after every EIN sum, I can apply whatever kind of nonlinear function I want to ReLU or sigmoid or whatever else, right? That's still gonna be described in my program. It's like I'm gonna have this shape EIN sum followed by this linearity feeds into this shape followed by so I'm still gonna have to do that kind of like, you know, structuring of the network, if you will, except now in Tensor Logic. And in my opinion, that's one of the biggest limitations right now is these are all just divined incantation structures that people have come up with. Like, let's put in a dropout layer here and this kind of layer there and there.
We don't actually allow the machines to learn the overall topological structure. We only allow them to to find weights within that structure.
No, but okay, I understand your question, but tensor logic does allow that. You know, step one, you can encode a multilayer perceptron, the entire multilayer perceptron, and I do that in the paper with a single tensor equation. All the layers provided that they all use the same nonlinearity can be encoded in one equation. Okay, number one. You can also have different equations for different layers or typically sets of layers you know, to your, however way you please.
But from the point of view of structure discovery, the thing to realize is that if you create, if you set up one of these very general equations that you can in tensor logic, then in some sense it can, it's a very broad classes of architecture. Then what the learning does is it discovers the architecture within that space. Right? Which if you think about it at some level is what a neural network, when you compare an ordinary multi layer perceptron with a set of rules, right? A multilayer perceptron is, can take, in fact there was a system called K band in the early days that did this, very clever.
It initialized multilayer perceptron with a set of rules because each neuron is a rule, right? But it's also more flexible because now you can have weights, right? But a neuron, a single neuron can represent a conjunction and therefore a layer can represent a junction and so forth. So when you're learning weights in an ordinary neural network, you can actually see it as learning the structure of a set of rules. What tensor logic is doing, this is at a more powerful level, like that was just propositional, and now this is at the full level of generality of first order logic.
But you can learn the structure, and then of course then there's more than one way to do that, and you can also decide how black and white you want the structure to be, what you want to leave as weights, what you want to discretize, but the structure itself can be learned by taking a tensor equation. A tensor equation is a very general thing, right? When you learn the weights of those tensors that materializes to a specific network structure.
Yeah, so I understand that. Let me bring this back to like, the folks who are familiar with, you know, PyTorch or traditional techniques. What you described is, yeah, I can just create a fully connected, you know, network with however many layers I want and then let SGD, you know, find all the weights. That doesn't work. Like, it doesn't work in practice and it's not gonna work with tensor logic.
You know, it's just a different representation of the same fundamental problem, which is there's too many degrees of freedom. It's not gonna learn anything useful. This is why so much alchemy goes into structuring, you know, constrained networks to have certain, you know, built in, you know, inductive bias, right?
No, absolutely. So to take another example, you can also do an entire ConvNet in just one tensor equation. And the quintessential example of like yes, complete connections don't work is a multilayer perceptron for vision, right? Which you replace with a convnet that actually has a local structure. That is also a tensor logic equation.
Now you're saying well, how do you choose between the ConvNet and then MLP, right? Very good question, and now there's a range of things you can do. You can actually these days start out with a very general structure because, I mean GPUs and large server farms are an amazing amount of power for something like this, right? So you can almost, I would say, brute force that search, provided you have the data, I'm not actually recommending you to do that, right? You can also however, and more interestingly, you can, and this is actually one of the key benefits of tensor logic, is that you can write down what you believe are properties of the, say, right now what happens in, know, when you for example program a network in Python is like, you have to commit, you say like here's the structure, and now the learning of, the only thing that happens is the learning of the weights.
Intensive logic you don't have to do that. You can you can set up one of these very general structures and then you say, let me give you a bunch of equations that are things that I believe to be true about the structure but do not completely determine it. And those just work like priors, and indeed like soft priors, right? And then you can turn up the temperature on this or down, and say like, you gotta obey this equation and that one, you know, you can overwrite. And then this, in my experience, this is actually what is important, is that the gradient descent, instead of starting from a tabula rasa, have this kind of soft knowledge.
And then, most importantly, like you, the developer, researcher, you get to, this is really the essence, is that it's not, every deep learning researcher or data scientist knows this, it's like, you don't do the same priority and then push the button and hope for the best, right? There's an iterative loop of you set up the structure and then you learn, you get the results, and then you refine the structure. And what this does is it makes that more efficient much more because you just have to, in your interpreter, you write one more equation or you modify an existing equation and also the entire stack of what you learn is much more interpretable than it was before. It's actually in some ways one of the most important properties of tensor logic, is that you can understand what's going on much better than you could in two ways, one is that the code is much more transparent than the whole pile of things that you have sitting under a bunch of you know PyTorch procedure calls, but also the result of learning, at least if you do it in certain ways that I discuss in the paper, the result of learning is transparent in the way that a transformer just, you know, can hope to be.
So we've covered some interesting topics on MLST before. I mean, of course, there's geometric deep learning, which is this idea that symmetries are are fundamental. We've spoken with Andrew Wilson from NYU recently about soft inductive priors, and I've just spoken with Yi Ma about his Crate series of architectures. And I guess the the prevalent idea here is almost platonistic that there are real natural patterns. And if we kind of bias the model as you were just alluding to, that it will converge on really good representations that describe reality.
Now the the alternative view is that reality is is constructive and gnarly and and that won't work. But you were talking about your Tucker decomposition earlier, and that's this idea that, you know, we might have a large sparse matrix. We might want to densify it. We might want to factorize it. And the factorization will kind of pull out some of these natural orderings, you know, of the universe perhaps.
And I guess I was thinking, isn't it a bit like a gzip algorithm? I mean, what if these factorizations are just semantically meaningless? How do you know that you've got a good one?
Know, great question and you've touched on several things there. Let me start with the geometric deep learning, right? I'm a big fan of this. In fact, I gave a keynote at the second iClear on something that I called symmetry based learning, which is in some wisdom and sense of geometric deep learning. I really do think that the universe possesses these fundamental symmetry, actually I don't think that.
This is known, right? In physics, right, the standard model is basically a bunch of symmetries. And this is extraordinarily powerful, right, that such simple things could be such universal regulators that you then basically can build everything else out of, right. And if you think about it, in machine learning the problem is like, what is the learning bias that you should start from, Right? Should you pull in a lot of knowledge?
Should you have a very, you know, very vague architecture? The thing about machine and there's the no free lunch theorem, right, that says, know, if you don't assume anything you can't ever learn anything. The thing that's amazing about machine learning is that with very weak biases you can get very far, right? And I would submit that those weak biases fundamentally at the end of the day the most important ones are these symmetries. And tensor logic is precisely, you know, I think the perfect language for expressing those symmetries as the physicists will tell you, right?
It's what they use, in not the logical version, but the numeric version. Right? So I think we can discover those regularities, I have some suspicions as to what they might be, but I think, you know, we're not quite there yet. But I think once we have those regulators in some sense, you know, they will play in AI the role that the standard model plays in physics. Right?
Now of course as you say, you know, there are people who say like, oh forget that, right? Know, going back to Marvin Minsky, right, there's like, there is no small set of AI laws or anything, it's just one damn thing after another, blah blah blah blah, right? Like you're dreaming, right? And I respect that point of view, right? And you know, we will find out empirically, but if I had to guess how this is going to play out, at the end of the day it's going to be like this.
The stuff that I'm talking about gives you, you know, the eightytwenty. You know, it gets you 80% of the way. And then the other 20% away, you have to do a lot of these things, you have to do a lot of hacks, etc. Etc. But something like Tensor Logic still makes it much easier and faster to do those hacks than if you didn't have it, So it actually gives you benefit both in the 80% part and in the 20 part.
There are folks, you know complexity science, there's this guy called David Krakauer and in his book on the first page actually, the very first sentence, the scientific and social implications of differences between a, closed reversible symmetry dominated and predictable classical domains. I think that's what you're talking about, the kind of the Roger Penrose type world. And b, open self organizing, dissipative, uncertain, and adaptive domains. Now I think the latter is where all the interesting stuff in the universe is. It's where life and intelligence and all the all the stuff we wanna model.
And could it be the case that those things are not reducible in the way that you're arguing they are?
I'm glad you asked that question because this really is the crux of the matter. Also you're probably familiar, I know you're familiar because we've talked about it before. Steve Wolfram's notion of computationally reducibility, right? Yes. And of course the whole notion that we now understand very well that systems are, many systems are chaotic and therefore inherently unpredictable, right?
And complex systems and all of that. But so where does the whole notion that more is different, right? Like very famous notion in Rosanderson. Exactly, which I'm a very strong believer in. So doesn't that contradict what I just said?
Actually no, right? I would say the following: from physics all the way to AI with biology in the middle. The universe is basically composed of two things: Symmetries and spontaneous symmetry breakings. Right? God made the symmetries.
The symmetries are the laws. As far as we can tell, none of these systems at any level violate the laws. Right? Those symmetries are there. I mean, you can go into that, there's a lot to be said there, but essentially, know, most people, the great majority of people may be accepting, you know, there are some exceptions, but they believe that the laws of physics apply to everything.
Like my brain abase the laws of physics, society abase the laws of physics. The problem is that the laws of physics are useless at some point in understanding even biology, let alone psychology or sociology or AI. Why are they useless? Because we have inherited from the beginning of the universe a series of spontaneous symmetry breakings, right? And my brain is doing spontaneous symmetry breakings one after another continuously.
And those, like, those then, some of them die out, right, or become irrelevant, stay the same. But others balloon into very big things. And that's actually what evolution is, is one of these things after another. And once you have that, so the computational reproducibility problem is that, at some level it is true that although in principle this is all predictable and reducible, in practice it isn't. Right?
But now here's the point, it's like, how do we handle that? Our brains know how to handle this in a way that AI doesn't. And the way they handle this is like, you predict, you computationally reduce everything you can to begin with. And I'm actually, I've talked with Steve, you know, at some length about this, and I'm actually more much more optimistic about how much is reducible than he is. And the thing is like, your overall universe is not reducible, but it's full of these reducible pieces.
And in a way evolution is accumulative, our brain is an accumulation of these reducible pieces. So you do that, you want the machine learning to discover it, you want the inference to exploit it, but then, after that you actually have no choice but to just keep gathering data and using that to inform your predictions, right? In a way the physics goal of like, give you the initial conditions and then I just predict like the, you know, the Laplace's demon dream, it is a dream, but I think that the problem that some of the complex systems people have not realized is that we don't have to do that. Ask any engineer, any aerospace engineer using a Kalman filter. What you do is you predict just what's gonna you know, or reinforcement learning, right?
It's like, you wanna have a sense of where you're going, but at every step of time, you you recalibrate your predictions with the new data that comes in. So you actually only need to predict things well enough to control them, to make them predictable. Right? We humans are always controlling the world to make it more predictable, and this is what robots need to do as well. And this is sort of like what I'm trying to, you know, support with a language like light tensor logic.
Increasingly more of a believer in kind of Hofstadter's, you know, concepts. Right? That there are multiple levels of description. And even within a level of description, there may be multiple languages, you know, to describe things at that level. And I think part of the lesson is not only do we observe, like not only do we kind of observe a particular level, and sure, we try to reduce things and come up with theories at finer grain levels, higher resolution theories, whatever, but we also observe a certain layer and we're able to, by whatever sort of miraculous mechanism, to almost pull out of thin air to abduct a theory at this level.
Like, here's thermodynamics. Somehow we came up with that. Right? And even if we learn theories at lower levels or higher resolution theories, actually, most of the time you don't replace those older ones. It's like within their domain of operation, you know, Newtonian mechanics is still extremely useful for all lots of things that have to do with our scale.
Right? Our scale of activity. GR is useful to different scale, quantum mechanics at different scale. So we we retain all these languages. And I'm hearing that tensor tensor logic is is a great language for a certain, you know, layer of description and for for activities of of AI, but you're not arguing that it's the language to sort of replace all other layers, right?
Like you still buy into the I'm idea that other languages are
glad you asked that question. I am absolutely arguing that tensor logic is the language to use in all these layers, and let me give you some evidence towards that. Express relativity in tensor logic, it's tensors and you know differentials of tensors and whatnot. That's, you know, tensor logic does that out of the box. Do the same thing with quantum mechanics, do the same thing with all these others, with all of the different pieces of AI that I know.
And why is that possible and why does Tensor Logic do that? Again I think this gets at a very deep fact about the universe, which you know, complex systems people and physicists have suspected as well, which is that the universe has this amazing property without which it would not be comprehensible, that you can have a lot of complexity at one level that then organizes itself into a new level at which now a different set of laws applies, right? In a way what we do with computers is do that by design. But here's the key: what you want is a language in which to express this process. The whole process by which multiple levels get created, by which multiple representations get created, including different representations at the same level.
For example, going back to Herb Simon, many people at least have believed that the essence of human intelligence is your ability to switch between representations as the problem dictates. And as long as you pick one representation you stuck yourself in the box. But at that level, tensor logic is a meta representation, it's the way to construct representations. And yeah, you know, a large language model, know, to take a very salient example, what has a transformer learned, right, when it looks at all that text? Precisely I would say where a lot of its parts come from is that it has looked, you know, it's like Seb Bubeck says, it's like it has learned this soup of algorithms.
There's all these different pieces and different ways of doing things that it has gathered from different places, and it doesn't choose between them. It's the prompting and the fine tuning and more that that then pull out the parts that are better for one thing or another. So we absolutely have to do this in AI, I think it also reflects a deeper truth about the universe. I think there are going to be laws of this. You know, we're not then describing laws of the universe, and I think, Tensor Logic at least is my best attempt at having a language in order to do both this AI and this type of scientific discovery.
I also believe, and I you know, I discussed that briefly in the paper that tensor logic is not going to be just a good language for AI, it's going to be a good language for science in general, for several reasons. One of them is this, but the other one is that if you look at the difference between the equations on the page and the resulting program from implementing them, often there's a lot of complication. In tensor logic it's almost, you know, the tensor equation is an almost symbol for symbol translation of the equation on the page. So now you can just do, you know, science, you know, on a different level. Also the logic, if you look at scientific computing, right, it's usually these tensor operations with some logic wrapped around it.
Tensor logic does the tensor operations and the logic in one language, but more importantly, the logic now becomes learnable. You can now learn the logic as Let
me just challenge you on this because for example, like in your paper, when you got to the RNN section, right? Like, you know, tensor logics can can represent RNNs, but then you hacked in star t. Like, oh, I need this little star t here. What's star t? Well, star t is a virtual index that doesn't create new memory.
That's not tensor logic. You hacked in star t because you needed that in order to express RNNs, right?
No, no, no, no, no, no, no, great question. So there's two very important things to distinguish here. One is, which start t is not, but let me mention that first, the RNNs also illustrate that, is syntactic sugar, right? You always have syntactic sugar because for example, in an RNN you want to express x(t plus one), right? And I could, you know, tensor logic is too incomplete, but I don't have the t plus one.
I can this is a very simple syntactic piece of syntactic sugar to add. Why wouldn't I do that? Right? Again, there's an eightytwenty rule of like which of these contracts you wanna have, But the is actually completely different thing. The is there for computational efficiency purposes.
Is a hint about how to implement that tensor that saves a ton of memory. Right? And you know this notion of a leaky abstraction, all abstractions are leaky famously in computer science. Tensor logic is no exception. For the most part when you write tensor logic you don't have to worry about what goes on under the hood, but sometimes you want to.
And this is precisely one of those things. The ideal of the start t is that, it's like, we don't have four loops anymore, right, which is great, forget all of that, but sometimes I don't want to be computing a new tensor or even just a new vector for every new thing that I do because that would be a waste of memory. The start is just saying, you know, you have one vector and you reuse it at every iteration. So you have the initial x zero, and then x one is over it overwrites that, right? So this this is a piece of the language, You can do everything without it, but it would be silly to not use it.
Alright. So let me let me push back on something because you mentioned it twice now, is like the Turing completeness. So your paper relies on like Siegelman's, you know, 1995 sort of paper. She herself now, like decades later, has admitted that that thing is a total toy that has no practical relevance whatsoever. Okay?
Because it requires like infinite precision, rational, registers that encode in a fractal way, etcetera. And by the way, in her paper, all she demonstrated was that under these infinite assumptions, that she could build a particular RNN that was a universal Turing machine. The problem with you using that for your tensor logic is two things. One, that restricts the field over which you can have your tensors. It must be one of these fields that has like infinite precision.
So infinite precision rationals or whatever. I can't use any other fields like no modular arithmetic, which is actually what runs on, you know, GPUs for example. And secondly is, it would restrict the actual structure of the weights to her universal Turing machine. Therefore it wouldn't be a general purpose tensor logic. Do you realize this problem?
No, no, no. So actually there is no problem there, let me tell you exactly why, right? And let's just do this in three steps. First of all, Turing completeness doesn't matter at all whatsoever, because the only difference between a Turing machine and a finite state machine is the infinite tape, and in the real world there is no infinite tape. So if you can implement
It the doesn't Turing matter, why do you keep mentioning it?
That's part two, that is part two, right? This is actually a very interesting set of questions, so let's another part of it. So Turing completeness doesn't matter, what matters is that you want to be able to express any computation that you might want. That's what matters, right? You might choose a specific language for specific purposes, for something like technology, you want that generality.
You have that generality irrespective of Turing completeness. So this is part one, we can debate but let's let's set this aside for just a second now. But you know the way I don't get to change the way computer science is and Turing completeness is a shorthand for universality. I just want to show people that tensor logic is universal. And now I have a proof that tensor logic is computational universals that does not rely on the Siegelman construct.
Right? I chose to not publish in this paper because it would take too long. Right? The beauty of that is that in one paragraph I can just say, look, the equation in the Siegelman paper, you can implement it here and we're done, right? I can also, you know, there's so many ways to prove that things are too incomplete, so I completely agree with you and her that that construct is ridiculous, right?
It's silly, right? It has no practical significance, but the reason I use it is like, it's just my way of telling people in one sentence that and why, you know, tensor logic is Turing complete, Right? But the real action is Well, hey,
I'm free to share I'd love to see the other proof.
Oh, can, mean, the other So, actually, there's even more than one other type of proof that is possible. Let me tell you what that one is and what so here's two, you know, not just one, so three ways, there's the Siegelmann way, right, another one is you have a finite control with access to an infinite external tape. Right, that is a much more reasonable thing in my view, right? You have a memory, the memory is infinite, but all that you have to do in the tensor logic is know how to access that memory. So like it gets back, remember a Turing machine is a finite control and an infinite tape, right?
If the tensor logic can realize the finite control, which obviously it can, and you give it an infinite tape, then we're done, right? And then on that note you can even just do it the following way, right? Which for example like you know Dale Schumann says a great paper about this is, you know, people have come up with various very simple ways to set up a Turing universal, you know, computer, and one of them is there's a set of rules, right, that you know sets up that machine, right, without going into details. And that set of rules, you know, you can just write intensive logic without even, you know, having to wake up from your sleep. So there you go.
I totally agree with you. Like and I and I have I often say to people, I'm like, a Turing machine is just, and I really hate to use the word just because it just doesn't do justice to Turing, to Alan Turing, and like the genius of his, you know, creation, the theory of computation. Right? But it's just a finite control with an unbounded read write, you know, external memory. Totally on board with that.
Absolutely, tensor logic is a finite control, but then you need to add to it these operations to manipulate external memory, right? So it's kind of tensor logic plus some operations to deal with external read write memory, no?
I mean, so those operations are just read write, move left and move right. That's all there is.
I know, but that's an extension of, I mean, at least in my view, I mean, I don't know if before you there was such a thing as tensor I'm not sure. I know that a lot of people have talked about tensors for like a decade or more, but you know, it seems like some kind of an extension to the typical. It's certainly an extension to the way tensors are used in GR. No rewrite to external memory in that.
Of course, but that is why tensor logic is more than tensors mathematics, right? Tensors that people in mathematics just don't do this, right? But tensor logic does because of the logic programming side, right? If tensor logic can do logic programming then it can do everything that a computer can.
Have you specified fully like all the operators in tensor logic somewhere like on a website or something?
There's only two, tensor projector three, right? There's tensor projection, there's tensor join and there's univariate nonlinearities. And the nonlinearities are crucial, right? Tensor algebra is multilinear. Algebra is linear, tensor is multilinear.
Right?
Totally agree. Where do the memory operations fit in there? Are they projections, are they joins, are they?
Oh no, I mean like, they're even projections or joins, right? I mean think of a trivial projection where you're not summing things, you only have one, right? That's what a write is, right? Actually, let's not even worry about tensor joints and projects, let's just think about, you know, propositional rules, which of course are what you, if you want to implement propositional rules in tensor logic, all that you need is tensors with no indices, with zero indices, right? So all you're dealing with is scalars.
And the right, right, is just a rule that says the target of the writing is on the left hand side and what you want to write is on the right hand side. Now, to get very conclusive to the issue of an infinite memory, right? What is an infinite memory? An infinite memory is just an infinite vector, right? Indexed by the memory address.
That's all it is. Right? And so how do you write this infinite memory tensor logic? You just have the memory as your tensor on the left hand side. It's kind of so, you know, it's so simple, almost there's nothing to think about.
I'll have to work through
some examples. And sorry, just to finish that thought, how do you advance the tape? Well you just increment the index and how do you move it left? You decrement the index. It's it's done.
Well, could we come up with a solid example? Because we I don't think we sufficiently described the startee function. So roughly as I understand it, rather than it becoming a dimension, it becomes a transition function. So we don't need to model the the full trajectory. But just just to give an example, if I wanted to compute, you know, let's say I wanna write a function, to compute the nth digit of pi or to approximate it, would I not need to fix the size of the tensors before?
Right? So the way I understand it is these things have a fixed size, so how could it possibly solve unbounded problems?
No, very good. So to clarify, is not a function, is a notation about an index. So for example, if I have a vector like x, right? Better example, a matrix, m, This occupies, if I and j are reach 100, this occupies 10,000 positions in memory, right? But if what I do is mij star, right, on the left hand side of my tensor equation, then this is just, you know, instead of being whatever, 100 by 100, it's just 100.
Because what this is saying is like, if I put the star in the j, what I'm saying is like, run through the I, right, and for every j you overwrite the result. You can do this in either dimension, but you know, so pick one whichever one. It just says like, keep overwriting the results, right? So you lose your old one, let me put it this way, is actually a vector. Is a vector where the only dimension is I.
J is actually just a iterator for a for loop. You see what I'm saying? And concretely, for example, in an RNN, this is what you want because x, right, x I is your vector and the j, let's call it t, right? X I t, at every new step in time when the state evolves, you don't want to, I mean you could, but in general, you just want to overwrite the old state with a new one as in any state transition system. Okay?
Now you're, you know, does this make sense?
It does. But you're describing an accumulator and is it do you lose something by losing the history? So because if you think about it, you're you're overwriting what went before with new information and you're just unrolling in time. Do you lose anything doing that?
Of course you lose. So if you don't want to overwrite it then don't put the star in. But not to answer your question about pi, right, how would I compute all the digits of pi? In infinite Turing machine land, right? I have a vector of the digits of pi that has a start but not an end.
Right? And what the computation in tensor logic does, it computes every successive so like, we didn't talk about this but there's, you know, how is inference done in tensor logic? Forward chaining or backward chaining. Forward chaining is a general, they are both generalizations of the corresponding operations in symbolic AI. If you applied forward chaining to a set of rules that computes the digits of pi, actually just one rule because it's very simple, what it will do is in each iteration it will fill in the next digit of pi, right?
Now if your vector is infinite this will go on forever as it should. If your vector is finite, well, at some point you run out of memory and you're satisfied with the number of digits, which is what we do with any real computer in the real world.
I don't wanna I always get us bogged down into Turing issues. I think we should move move on, but I think it'd be fun to talk about it, you know, more at another time or just to work through some examples. I think I'll probably work through some examples.
But I think this was an interesting one. There's there's a strange attractor with Turing conversations, and normally, it goes the the Schmidhuber direction where, you know, the universe is finite. There's no difference between an FSA and a tree. And I felt I felt that we actually had some inform information gain in this conversation.
Well, you know, so so on that point, and and this is a bit of an aside. It doesn't actually have anything to do with tensor logic. So I hope you don't mind me asking. But since we have a computer science professor, like, wanna just run something by with you, you know. So I always get this kind of pushback from people where I'll say, for example, you know, autoregressive transformers, and I and I mean, classic autoregression, not extended autoregression, not generalized autoregressive transformers are not Turing complete.
Like, deep mind admits this and they write a paper showing how you can extend them to become, you know, Turing complete. So I'll say something like that and somebody will be like, yeah, but you know, if I can't do 100 digit multiplication with this context size, all I gotta do is just have more context and then I'll be able to do it. And I keep making them point, here's the crucial difference, Between so and you brought this up beautifully when you said, look, a Turing machine is a finite control with an unbounded rewrite memory. And here's the really cool thing about those Turing machines is they can run-in a way where they're churning, churning, churning, churning, and then they say, out of memory. And all you gotta do is just give them more memory and hit continue.
You don't have to reprogram them. You don't have to retrain them when you've like increased their context size. Right? That's the whole difference is that with a neural network, a traditional transformer, if you increase its context size, go back to the training board, you gotta retrain it. Right?
Because you've run out of memory. Is that a fair point that I'm making?
So this is actually extraordinarily simple and it's to me incredibly frustrating that there's so much confusion about it starting with computer science and theoretical computer science and now playing out in AI and transformer land. And it just boils down to this, right? You said earlier and I violently agree that, and correct me if I misinterpreted it, but you said like, Turing completeness is not important but that shouldn't cause us to underrate Turing's achievement. Absolutely. What was Turing's achievement that we now take for granted?
Turing's achievement was, for which he is deservedly famous, right, is to postulate this notion of a universal machine. The amazing thing about computers is that the universal machine, which in his time was a completely counterintuitive notion. What do you mean a machine that can do everything? The typewriter can type, you know, like the sewing machine can sew, you're telling me there's a machine that can type with one hand and sew with the other, what are you talking about? So like, this is the genius, right?
So first step, you want to have this property of having a machine that can do anything. This is the foundation of computer sciences, of computers as a revolutionary technology, right? So point one. But point two, and getting to the transformer part, right? I don't know, unfortunately these confusions then build on each other and never get, it's one of those symmetry We breakings, went down this road of defining things a certain way and worrying about infinity, and now we're stuck there, right?
NP completeness is another example, but ignoring that. So the problem with transformers, so like the real problem is the following. It's people say, oh but if you only have this many blocks then you can only do so many computations. The thing for example that inductive logic programming has and we want is that you can learn things from very small examples, like children do in elementary school. You learn to do addition on tiny examples, but then if needed you can do addition on numbers of any length.
Of course your life is finite, you will never add infinite numbers, but that's not the point. Infinity is just a shorthand for something that's so large, it doesn't matter how large it is. And what I want in machine learning is to precisely be able to learn to do, to handle problems, graphs, structures, knowledge bases, inference problems, whatever of any size from very small ones. That's the limitation that a lot of these transformers have. And that's the one that you want to fix and can't fix and Tensor Logic helps you do that.
Yeah. And I just just to cap off the discussion about Alan Turing because I think I think he deserves, you know, us mentioning this. You know, you mentioned that this was the real achievement, this universality. And I mean, it wasn't just a machine to do a typing, can't do this and that. It was even within computation.
Right? In his time, people didn't know this. They're like, well, what if I have a machine that just has a separate read tape and a separate write tape? I don't know. Well, how about if we add two write tapes?
Does that make it more powerful? What if it's read write? What if it's just a stack? What if it's Lambda calculus? What if it's there were so many myriad of, you know, lag systems, blah, blah, blah.
All these different computational models. Right? And nobody knew that they were all equivalent. And that was the real, you know, remarkable No,
very and I mean, and to be fair, you know, Turing wasn't the only one doing things like this. And and precisely now we know that that all these things are equivalent and then extensions don't add But here's actually a really important point, right? The question that has been on my mind for decades is this. A Turing machine is a model of deduction. It's universal deduction.
What we're missing to be able to do what the universe does and evolution does is universal induction. What is the Turing machine equivalent for induction for learning? That's what I'm after, Right? That's what the master algorithm is. And I know it exists and again, just as you can have a million different versions of Turing machines that are all equivalent, you can have a million different versions of the master algorithm that are all equivalent and that's okay.
The point is that first we have to realize that there is one, we have to prove what it does, and then we can refine it with the syntactic sugars and whatnot and that's all good. But the main point is having, you know, gotten the universal induction machine, which I think we are we are pretty close to.
But Pedro, I know the answer. It's Bayesian tensor logic. No. I'm just kidding.
No. If you're Bayesian, it is Bayesian tensor logic.
This is this is a good segue because we are talking about reasoning and deduction. And transformers, they don't really reason. Right? And I I think of them as a kind of collection of fractured bits of knowledge, maybe with a little bit of understanding two levels down, but we understand many levels down. And when we do reasoning, what we're doing is we are respecting all of the constraints of this epistemic understanding phylogeny thing that we have.
And that allows us to build new knowledge. Right? Because you can build new knowledge. You can create new things when you respect all of the understanding that you already have. And transformers don't do that.
But let's talk about how this works in intensive logic. So you have this temperature parameter. So for example, you could do something deduction even in an embedding space. Right? And and certainly with an MLP.
And this is where I was a bit confused because I I can I can appreciate that if we have a logical model, which is in the domain of certainty, we can do deduction? Right? And then if we have something like an MLP and and we learn the weights and we turn this temperature parameter up, right, so it's it's actually introducing some degree of randomness, why would that be anything like the kind of logical deductive reasoning we do? Would that not just do what neural networks do now, which is they just look for similarity in some embedding space and the type of reasoning it's doing isn't actually semantically meaningful at all?
I would actually say that like of all things in the paper, this is the most exciting and important one. Is that you can do sound and transparent reasoning in embedding space with sensor logic. And how come? Right? Why is that possible?
And to just sort like give the gist of it, here's the key, right? Is think of kernel, let's go to kernel machines for just a second. And like the gram matrix, right? The similarity matrix, what is it, right? You're in feature space and it's for every pair of objects I, j, the dot product of their feature representations, right?
And now, if you embed all your objects, we already know how to do that, like you know, there's a matrix with the embedding vector for FJ object, whether it's a word or a token or anything else, right? Now I can do the dot product of the embeddings of two objects, right? And let's suppose that all unit vectors to keep things simple, right? And now what happens is that, for the moment, let's say you're not even learning the embeddings yet, right? Let's say you just have random vectors.
Your embeddings are random. That's actually already useful for a lot of things, but of course it's not where the action is. And now there's the following very interesting property which is the dot product of a vector with itself is one. But the dot product of two random vectors in a high dimensional space is approximately zero. So your gram matrix, your similarity matrix will be approximately the identity matrix.
Okay? And now what happens, if I have a tensor logic rule that operates in this way and then it has something like a sigma non linearity, right? Then what's going to happen is that it's going to clean up that noise and it turns into the identity matrix. Right? And now I have all these rules that are just operating in a purely logical mode.
Right? The boolean, it's boolean tenses going in, meaning relations, right? And it's boolean tenses going out, right? So that way you can do pure deduction in embedding space with these embedding, random embedding vectors, right? That's why it is something interesting.
But now, let's say you learn the embeddings, which of course is the whole point, right? When you learn the embeddings what's going to happen by trying to minimize the loss function is that the embedding vectors of objects about which you tend to make the same inferences will get closer, Right? Because if I'm saying something about one object and that one is similar, this like to, you know, the gradient descent to minimize the loss is to make them, you know, is going to increase their dot product. So you're going to wind up with a similarity matrix that has high values for objects that are quite similar, right, in the limit one in the diagonal, and has low values for objects that are quite dissimilar. And now, if you turn the temperature parameter, meaning the stiffness of the sigmoid, right, at one extreme at zero temperature you have a step function.
And and and and the similarity make is is this criticized back to zero one? So so at the zero temperature extreme you have pure deduction. But this is very you see where I'm going with this?
I do but could I challenge it a tiny bit? When we train neural networks, we think reasoning is good when we are building you know, let let's say we'll use the Lego analogy. So we're we're building these blocks, and the new understanding tree that we've created is a good one if it represents the world in an abstract causal way. So I can see how you've, you know, framed this as deduction in the sense that, you know, you've got this boolean operation and you can build from it. But what if you're building on a sand castle?
What if the component, let's say it's an MLP component, what if it just doesn't represent the way the world works?
No, no, so very good. So like again, there's more than one thing you can do with Tensor Logic, one of them is you can just reimplement existing things like MLPs and transformers and whatnot. And if all that you did was reimplement them, it will have all their pros and cons, right? It's the same thing, just implement it much more elegantly, blah blah, right? What I'm talking about here, and talking about in that section of that paper, is doing something different.
It's not an MLP, it's not a transformer. It's actually doing these things of like, you embed objects, you embed relations in a certain way that falls from the object, you embed the rules, you embed the reasoning, right? So this is a different process. What this different process allows you to do is that when you raise the temperature, you get to do analogical reasoning. You know, Douglas Alstadter came up before, Douglas Alstadter I think would like this because it's an analogical he has this whole 500 page book arguing that all of cognition is just analogy.
Right? And again, this is one of the schools of thought, like this is one of the tribes in the master algorithm is reasoning by analogy. You do reasoning by analogy because what happens is you generalize from from one object to an object that has a high dot product with it. So now now I get to borrow inferences from similar objects. And the higher the temperature the looser the inferences, the more analogical inference can be.
But for example, and again Douglas goes into this in some of his books, and any mathematician like, know, Terrence Tau the other day, I just heard him say this, right, is like mathematicians reason by analogy. They notice similarities between things. But at the end of the day you need to have a proof. Intensive logic in this scheme, in this particular scheme of embedding, you know, reasoning in embedding space, this is just simulated annealing. You start out with a high temperature being very analogical, and then you lower it, at the end of the day you have a proof.
It's a deductive proof that is guaranteed to be correct. But you couldn't have gotten to it because the search space is so large without the analogical part. Right?
Okay. But I I understand what you're saying. So you you can generalize reasoning outs you know, outside the domain of certainty. But the question I'm asking, the reason why we have metaphor and analogy is there's this incredible process of evolution and intelligence, and it's led to the coarse graining of all of these concepts that we use in our language and there's this rich beautiful phylogeny that kind of represents the the causal reality of of what's happened. And why is statistical similarity the same thing as analogy?
Oh, it's not so again I skipped over some steps here. It isn't, right? So analogy so the the most powerful type of analogy so kernel machines in some sense are the least powerful type of analogy. It's just, oh, here's a similarity or nearest neighbor, right? I have a distance function.
That's not really where the action is. The action is in what is called structure mapping, right? Structure mapping was this thing proposed by Deidre Gendner, where you solve a problem by mapping its structure to the structure of problems that you know, right? And the canonical example is Niels Bohr's model of the atom which he came up with by an analogy between an atom and the solar system. The nucleus is the sun, planets are the atoms.
Turns out to be a bad analogy but it was crucial in the development of physics, right? And there's also this whole subfield of AI called case based reasoning, where I'm a help desk, you come up with a problem, and I don't try to solve it from scratch because I don't need to, that would be wasteful. I go to my database of similar cases and I find one, and then I tweak it. So structure mapping is an extraordinarily powerful thing, but it's this combination of similarity and compositionality which kernel machines per se don't have. But tensor logic does.
The point in tensor logic is that you do have all the parts of the kernel machines, but all the compositionality of the symbolic AI, So again, the the structure mapping just, you know, just comes out of the box. You don't need to do anything more to add structure mapping, and all the power of analogical reasoning comes with it.
Yeah. Can I suggest a good analogy is to ad lib? Do you think that's fair? It's like you've got the general structure there and you can plug in parts into the blank spaces and you get, you know, a solution. Right?
That's one mode in which things can function. Right? You can also Okay. So yeah. The whole process of structure mapping or of case based learning can actually be very rich.
I can combine, for example,
big
pieces, but like that's one example. Yeah.
Oh, yeah. Yeah. No, that's fair. I mean, yeah, it has this nice nested structure, you know, property. Since while we're on this topic, let me ask you about something that I was confused about in the paper.
So I don't understand your connection between hallucination and and deduction or determine you know, determinism because in my mind, you know, I can set the temperature to GPT to zero and it still hallucinates. And I can I can have a I can have a poor deductive system that hallucinates all kinds of things? So to me, like, those are separate separate problems. Like where what was I just kind of misunderstood.
No. Very good. So precisely the problem or one of the problems with GPT is that it it hallucinates even when you set the temperature to zero. What the hell? Right?
I want to have a mode. Yeah. Yeah. Right? Not I, but like every Fortune 500 company, if it's going to use AI needs to have a mode where the logic of the business is just to bathe.
The security isn't violated, the customer doesn't get light, etc. We've got to have that orient in that they will not take off, right? And transformers can't do that. Tensor logic can do that precisely because in this reasoning in embedding space mode that I just described, if you set the temperature to zero it does purely deductive reasoning. And by the way, the temperature can be different for each rule.
And I think this is what almost all applications are going to have is like there are some rules that are the mathematical truths or logic that you must guarantee will not be violated, they are the laws, right? And those have infinite temperature. And then there's all these others that are more qualitative reasoning and like more accumulating evidence, maybe stuff that you mined from the web, and those, you know, those will have lower, those will have higher temperature. And that temperature parameter, you know, can be learned in some rules and not others, right? So now you have this whole spectrum between the deductive and the more, you know, or even fantasizing, truly hallucinating at the far end of the high temperature, right?
But precisely the point that I'm making in the paper is that, you know, LLMs, the best that you can get at zero temperature is still a lot of hallucinations, then there's things like rag, but all they do is retrieve and even then you still hallucinate, right? Compare, you know, tensor logic in this mode with rag, right? And it doesn't just retrieve things, it computes the deductive closure of your knowledge which is an exponentially more powerful thing to have. Right? And with zero hallucinations.
Well, it it it
is if if the model represents the world. You know you know, because what what does hallucination mean? Or actually, does slop mean? My definition of slop is when a creative artifact is produced, by something that doesn't understand. So if I'm if if I understand a domain deeply, that artifact looks incoherent to me because it was generated by a process that doesn't understand the world.
And isn't it even the same with tensor logic that, you know, deduction is great but if the model isn't a good one then wouldn't that just be a hallucination as well?
Absolutely. But let's let's make some distinctions here, right? The only claim I'm making here, because that's the only one you can make, is that tensor logic at zero temperature in this mode will give you the soundness properties that logic has, right? Soundness in the technical sense of soundness. All that means is that you only reach conclusions that truly logically follow from the premises.
You don't say anything about whether the premises are valid or not. If the premises were hallucinated, so will the conclusions be, right? There's like, there's no magic there, right? But that is a very important property to have. Again, if I give to a transformer a bunch of, you know, true facts, it still hallucinates.
And that's what I can guarantee will not happen in tensor logic. Now coming up with the true facts, well that's a different part of the game. You can write them down, you can learn them, you can refine them, you never know for sure if you have the perfect model, and of course that's more the machine learning and knowledge acquisition part, right? So I do I think have a very important guarantee here of non hallucination but it's not a guarantee that the, you know, that the model that you're working on came from the real world. That's a whole other neck of the woods.
Who's going to adopt this first? How are we going to boot strap this as a community? How do you see this progressing?
Very good. So the last section in the paper is discussing adoption and what needs to happen and things like that. Let's suppose that everybody agrees that Tensor Logic is a beautiful perfect language and what we need for AI. Just for that reason, that would not be enough to make it take off, sadly, right? Because you know there's a very, you know, people are still using COPA all these days, right?
I rest my case, right? So legacy, there's this irony in computer science or in the information technology, it moves faster than anything else, but at the same time, you know, things never die, right? You can't kill them, you can't kill COBOL, right? And I really do believe, I like Python, I program in Python, it's very nice in many ways, better than Fortran for some things, etc, right? Even though it was never, or NumPy if you will, but you get the point, right?
It's like, for AI it's just a terrible thing. But like, I'm a Python programmer, like, I, you know, general is like, okay, your tensor logic is nice, I'm not going to rewrite all my code, right? Forget that, right? So what is going to make it happen, right? But now, we can look at what has made this happen in the past, right?
And it's several things. One is that, for example, look at how Java took off, right? Java took off at the time of the Internet because it was the language of networking, allegedly you could debate that, but like people wanted to do things that it was very hard to do with you know things like C and blah blah and C, right? And so Java took off, right, and we are in exactly
It was the language of embedded programs and web browsers, that was the only option we
Exactly, right, so and you know, there's big arguments about this, but not relevant to us here, The point I'm trying to make here is, we are precisely at a, also very relevant, why did languages like Lis and ProGod, you know, fall out, right? Because they were better for AI than, you know, Fortran or C or whatever, right, or Java, is that like, they were niche languages. And the network effects of the more widely used languages and all their aspects just completely overrode that, right? We understand that very well now, people did in the eighties. But now, we're in a different ballgame now, now the big technology, the center of everything is AI, Right?
If you have a better language for AI, that is the one that is gonna, you know, have the biggest users. And moreover, if you have a language that solves the big pains, right, to adopt a new language or a new anything, right, you know, a new app, right? It needs to solve some big pain, right? Is there a big pain that Tensor Logic solves? Well, hell yeah!
It solves potentially, okay? It solves all this is subject to empirical verification, but it potentially solves hallucination. It solves the opacity. Right? Like, we're in this world right now where there's like multi billion corporations and systems that are like, they're driven by this black box.
And nobody I've talked with CEOs of big tech companies that say like, you know, I can't sleep at night because I don't know what this thing is going to do, and the people who trained it have left the company, and who knows it, right? So if we can make a dent in that, people will converge to it very, very quickly. Also I think when people have the experience of how easy it is to use tensor logic compared to the big pile of stuff that lies under PyTorch and whatnot, I think they will actually be very very motivated to migrate very quickly. And then, you know, like there are several things, there's like developing the open source community and vendor competition and whatnot. But you know, like there's a couple of other important things here, one of which is the following.
Tensor logic is ideally suited for AI education. It's one language in which you which has very little, you know, extraneous stuff, and you can just and can teach the entire gamut of AI very well and do the exercise. It'll be a language that the professors, the TAs, and the students will like. Right? And history shows, you know, going back to things like Unix that if you have something like that that ticks off in computer science education, then people go to industry and say like, I want to use this because it's what's good, it's what I like, and a generation later it's what everybody is using.
And one more thing is the following: the transition to tensor logic from Python doesn't have to happen all at once, right? You can have, for example, and I already have actually, another half, again because it's very easy to do, right? You read the paper and you do that in the next whatever thirty minutes. You can write a preprocessor that just converts tensor equations into Python. And again, all it does is a one to one mapping between the syntax of tensor logic and EINSUM, right, then making things efficiently as we discussed is another matter, but from this point of view of developer, you know, uptake, right, all it does, and there's a long history of people doing this with different languages, right, it's like you have a preprocessor that that lets you write in some equations, but then it converts you, it converts those equations into into PyTorch or Python or just NumPy, let's say Python, right, and then you do everything else in Python that you did before.
You don't lose anything, you don't lose any existing code, it's just that a set of things, and in particular reasoning, have now become much easier than they were before. And then once you have this little baliblical like, oh, but I can do this, and now let me add that piece of syntactic sugar, and before you know it people like, well, I don't need all that, you know, Python stuff anymore. I just rather live in Tensor Logic world.
So you said about AI for education and and Tensor Logic, it's it's a declarative language which means it's it's the what, not the how. It's this incredible coarse graining that screens off a lot of unnecessary detail. But is it is it unnecessary? I guess is the question. Like, do you think that people learning about AI should know about how the underlying things work?
And certainly folks working at Google they might need to do some domain specific optimizations for certain components of the machine behind the scenes. Do you think that we can screen off all that detail?
Great question but actually let me start by correcting something. Tensor logic, like Prolog and Datalog, has actually both declarative and procedural semantics. You can look at a tensor equation, that's actually the whole beauty of logic programming in some sense, that like you can look at a tensor logic equation as like it's an equation. It's like Einstein's equation, it's the statement about the world. But you can also look at it and treat it as a function call.
The left hand side is the call and the right hand side is the body which is a bunch of other calls and the way to combine them. Right? So you can, and in fact, you know, most of the time, in my experience, that I've used Tensor Logic so far, I tend to use it in procedural mode, right? It's a set of equations, it's a bunch of statements like you would have in any imperative language, right? You know, very important to bear that in mind.
Now, but to the heart of your question, which I think is very important, when you're teaching people something, I mean, I would actually say this is the tragedy of computer science education. From high school to intro courses to the most advanced things is that you want to teach them the beauty of what you can do and the essence of the algorithms and so on, but then you and particularly they, the students, they spend all their time bogged down in all this crap. All these details where you get the semicolon wrong and the program doesn't work anymore, and they hate it. And they decide that computer science is not for them, or at best they waste 10 times more time than they should. Right?
So precisely the whole point of having the right abstraction is to avoid that. Now, so I would say this is one of the best features of tensor logic is to do that for AI. Now, but you also say correctly that like, well a lot of the time you need to go beyond that level of abstraction and for example, from a point of view of inefficiency and so on, and a lot of things, right? But I would say, and again, you know like, we won't know until Tensor Logic is used widely and we'll see what happens, but tensor logic is a language that at some level, it's like C, right? It's very low level.
The beauty in my mind, again this gets back to multiple as a lie, you can use it to say very high level things. You can also to express the lowest level possible computation, right? Like a tensor equation is something that you can map onto a GPU with almost no change, right? And then optimize the heck out of, right? Like in fact, you know, I've joked like with folks at NVIDIA that, you know, CUDA is a nice mode, but Tensor Logic could be the end of that mode.
I sometimes feel like, and I'm not sure exactly how much money was spent on bigger and bigger transformers, you know, deeper and deeper, you know, wider and more data and whatever transform, more parameter transformers. But it's gotta be a lot, like a trillion dollars or or something like that. And I feel like sometimes we've spent a trillion dollars to learn yet again lessons that people could have learned if they'd have taken certain, you know, basic courses in computer science. Like, I'm wondering if you sometimes feel like that and what lessons, if any, you think people should have known before spending a trillion dollars.
I I violently agree with that. In fact, the the paradox of current moment in AI is that on the one hand, this is super exciting, right? This is what we've worked all our lives towards, right? It's like the dream is happening. I used to tell people, you know, when I went into grad school that one day machine learning is going to take over the world and be like, what?
And I'm like, see, it is taking over the world, there, take that. So on a more serious note, like, transformers are a great leap forward and you know, anybody who's used the chatbot is like, wow, look at the things this can do. This is great. But at the same time, the sheer amount of like wastefulness and stupidity and ignorance going on is just unbelievable. It's like, why are you reinventing why?
For example, I've talked with people at, for example, OpenAI that do the reasoning. And many of them are very good people, so I'm not trying to, you know, pick on anybody, but it's like, oh, what is reasoning? We need to figure that out. And like, and then they say a bunch of stuff that is completely wrong. And I'm thinking to myself, why don't you spend an afternoon reading a couple of chapters of Russell and Norvig and save a $100,000,000,000 in wasted compute?
Please, just do that, right? And in a way, you know, part of what I'm trying to do with Tensor is make things go in that direction because the current direction is just too damn painful. And it's not just that it's painful, this is going to end badly, right? People are spending all right?
Oh yeah.
In a way like, you know, spending all this money on data centers is not wasted because it's not like the fiber that went dark. Right? We in AI have an appetite for unlimited compute. Right? But they're spending all this money prematurely on stuff that isn't ready for that yet.
Right? The demand is probably not gonna be there. And and we're gonna look back onto that and go like, wow, 99.9% of that compute was completely wasted because of a lot of the reasons that we've been talking about, including like, you didn't know how to do reasoning, so you brute forced it, etcetera, etcetera. So like, know, we gotta change the direction of this ship.
It's like that well known quote from, you know, Matt Damon and and Goodwill Hunting. Right? Like, you know, to to paraphrase it, you know, you've wasted a trillion dollars on an education you could have got for a buck 50 in late fees at the library.
Exactly. Exactly.
Well, professor Pedro Domingos, it's an absolute honor to have you on the show. Thank you so much for joining us.
Thanks for having me.
Thank you.
Always a pleasure.
Pedro Domingos: Tensor Logic Unifies AI Paradigms
Ask me anything about this podcast episode...
Try asking: