| Episode | Status |
|---|---|
Join Logan Kilpatrick and Koray Kavukcuoglu, CTO of Google DeepMind and Chief AI Architect of Google, as they discuss Gemini 3 and the state of AI! Their conversation includes the reception of Gemin...
Gemini three, we're sitting here. Reception seems super positive. The vibes of the model are good.
I'm very excited about the progress. I'm excited about the research.
We had actually pushed the frontier on a bunch of dimensions.
This is how we are gonna build AGI. We wanna do it the right way, And that's where we are putting all our minds, our innovations. It's not
like it's this purely research effort that's off in a lab somewhere. Like, it's a joint effort with us in the world.
This is a new world, right? There's a new technology that is defining a lot of what users expect.
We are, in some sense, like co building AGI with our customers.
So all of a sudden, you enable a lot more people to be builders.
Bring anything to life.
Bring anything to life. Right?
Yeah. I feel like the next six months are gonna be probably just as exciting as the as the last six months and the previous six months before that.
We are lucky to be living in this in this age. It's happening right now. It's very exciting.
Hey, everyone. Welcome back to Release Notes. My name is Logan Kilpatrick. I'm on the DeepMind team. Today, it's an honor to be joined by Kory Kavachulu, who is the CTO of DeepMind and the new chief AI architect of Google.
Kory, thanks for being here. I'm excited to chat.
Me too. Yeah. Very excited. Thanks for inviting.
Of course. Gemini three. We're sitting here. We've launched the model. Reception seems super positive.
Like, I think we we went out and we obviously had a hunch about how good the model was going to be. Leaderboards looked awesome, but I think putting it in the hands of users and getting out is like a
That's always the test, right? I mean, we have been like benchmarking is the first step, and then we have been doing tests. We have been like with trusted testers, with peer reviews and everything. So, you get a feeling that, yes, it's a good model, it's capable. It's not perfect, right?
But like I think I'm quite pleased with the reception really. People seem to like the model and the kinds of things that I think we found interesting, they also found interesting. So, like that's good so far, like this is good.
Yeah. We we were talking yesterday and the the thread of the conversation was just around, like, appreciating this moment that the progress isn't slowing down, which I think resonates with me. As I was reflecting back to the last time I sat next to you, we were at IO as we launched 2.5, and we were listening to Demis and Sergei talk about AI and all that stuff. I feel like the progress has not slowed down, which is really interesting. Like, when we launched 2.5, it felt like a state of the art model, and it felt like we had actually pushed the frontier on a bunch of dimensions, and I feel like three point o delivers that again.
Yeah. And I'm curious what the scaling conversation of can it continue, continues to go, what's your sense right now?
Yeah, I mean, look, I'm very excited about the progress. I'm excited about the research. Like when you are actually there in the research, there are a lot of excitement in terms of like in all areas of this, right? Like coming from data, pre training, post training, everywhere. We see a lot of excitement, see a lot of progress, a lot of new ideas.
At the end of the day, this whole thing is really running on innovation, running on ideas, right? The more we do something that is impactful, that is in real world, that people use, you actually get more ideas because your surface area increases, the kinds of signals that you get increases, and I think the problems will get harder, the problems will get more varied, right? And with that, I think we will be challenged and these kinds of challenges are good. And I think that is the driver for going towards building intelligence as well. Right?
That's how it's going to happen. I feel like sometimes if you look at one or two benchmarks you can see squeeze but I think that's normal because benchmarks are defined at a time when something was a challenge, you define that benchmark and then of course as the technology progresses that benchmark becomes not the frontier. It doesn't define the frontier. And then what happens is you define a new benchmark. It's very normal in machine learning, right?
Like benchmarks and model development is always hand in hand. Like, you need the benchmarks to guide the model development, but you only know what the next frontier is when you get close to it so that you can define with the new benchmark.
Yeah, I feel this way, and there was a couple of benchmarks like HLE, originally, all the models were horrible on and doing like one or 2%, and I think now the newest with DeepThink is like 40 something percent, which is crazy. Arc AGI two was originally all the models could barely do any of that. It's now, like, 40 plus. So it is it is interesting, and then it's also interesting to see, and I don't have the context on why, the benchmarks that are static that do have a little bit of, like, the test of time, if you will. And, like, I I think they are probably close to saturated.
But, like, GPQA Diamond, as an example, like, continues to stick around even though we're eking out 1%
or whatever. It's like there are really hard questions there.
Yeah, yeah.
Like, I mean, and like those hard things we are still not able to do. Yeah. Right? And they still test something. But if you think about, like, where we are with GPQA, it's not like, oh, you're at twenties and you need to go to nineties, right?
So you're getting close there, so the number of things that it defines as unsold is, of course, decreasing. So at some point it's good to find new frontiers, new benchmarks. And defining benchmarks is really, really important, right? Because if you're going to think about if we think about benchmarks as the definition of progress, which does not necessarily always align. Right?
Like there's this thing between like there's progress and then there's the benchmarks. In an ideal case it's 100% aligned, but it's never 100% aligned. Like to me the most important measure of progress is like we have our models in real world and scientists use them, students use them, lawyers use them, engineers use them, and then like people use them to do like all sorts of things writing, creative writing, emails, easy or hard, right? Like, that spectrum is important, and different topics, different domains. If you can actually continue delivering larger value there, I think that's progress.
And these benchmarks help you quantify that.
Yeah. How do you think about and, like and maybe even there's a particular example from, like, 2.5 to three or, like, whatever. We could choose whichever model version change you want. Where are we hill climbing? And, like, like, how actually, how in a world where there's, like, a zillion benchmarks now actually of, you know, you could choose where you wanna hill climb, how are you thinking about for just, like, broadly Gemini, but also maybe the pro model specifically?
Like, where do we try to hill climb for that?
I think there several important areas, right? Like one of them is instruction following is important. Instruction following is where the model needs to be able to understand the request of the user and to be able to follow that, You don't want the model just answering whatever it thinks it should answer, so that instruction following capability is important and that's what we always do. And then for us internationalisation is important, Google is very international and we want to reach everyone in the world, so that part is important.
And I feel like three point zero Pro, at least, I was talking to Tulsi this morning and she was remarking about how incredible the model is for languages that historically we haven't been really good at, which is awesome to see.
So, like, continuously, you have to put the focus on some of these areas, right? They might not look like, okay, it's the frontier of knowledge, but they're really, really important because you want to be able to interact with the users there because, as I said, it's all about getting that signal from the users. And then if you come to actually a little bit more technical domains, function calls, tool calls, agentic actions, and code, these are really important. Function calls and tool calls are important because I think it's a whole different multiplier of intelligence that comes from there, both from the point of view of the models being able to just naturally use all the tools and functions that we have created ourselves and then use it in its own reasoning. But also the model writing its own tools, right?
Like you can think that the models are in a way the models are tools in themselves as well. So that one is a big thing. Like obviously like code is because, like not just because we are all software engineers, but also because like we know that with that you can actually build anything that is happening your laptop. And on your laptop it's not just software engineering that happens.
Bring anything to life.
Bring anything to life, right? Yeah. So a lot of the things that we do right now happens in the digital world and code is the basis for that to be able to integrate with anything that happens, pretty much anything that happens in your life. Not everything, but like a lot of things. That's why these two things together I think makes up for a lot of reach for users as well.
I give this example of wipe coding, right? I like it, why? Because a lot of people are creative, they have ideas and all of a sudden you make them productive, Like going from creative to productive in a way that you can just write it down and then like you see the application in front of you and like it is like, I mean most of the time it works and when it works it's great, right? I mean, that loop I think is great. So all of a sudden you enable a lot more people to be builders,
like
building something like, I mean, like, it's great.
I love it. Yeah. Yeah. Thank you for this is the AI studio pitch. Appreciate the We'll clip this part out.
We'll put it out online. One of the interesting threads that you mentioned is like how and actually the as part of this Gemini three moment, we launched Google anti gravity, a new agent encoding platform. How much do you think about, like, the importance of having this product scaffolding to hill climb on quality from a a model perspective. Yeah. Tool calling and coding.
Yeah.
Yeah. It's like, to me, it's very, very important. And I think like, anti gravity as a product itself, yes, it's exciting, but like from a model perspective, if you think about it, so it's double sided, Let's talk about first the model perspective. From the model perspective, being able to have this integration with the end users, in this case software engineers, and learning from them directly to understand where the model needs to improve is really critical for us. I mean it is important in areas like Gemini app is important for the same reason.
Right? Like I mean understanding users directly is very very important. Anti gravity is the same way, AI studio is the same way. So having these products that we work really closely with and then understanding and learning that getting those user signals, I think is really massive. And anti gravity has been a very critical launch partner.
It hasn't been long that they have joined, right, but in the last two, three weeks of our launch process, their feedback has been really, really instrumental. The same thing with Search AI mode, right? Like I mean, overview is even that we get a lot of feedback from there. So like to me this integration with the products and getting that signal is the main driver that we understand. Like of course we have the benchmarks, so like we know how to push the stem, the sciences, the math, that kind of intelligence.
But it's really important that actually we understand the real world use cases, because this has to be useful in the real world.
Yeah. In your new Chief AI Architect role, you're now responsible for also making sure that we don't just have good models, but the products actually take the models and implement them and sort of build great product experiences across Google. Obviously, I think this is the right thing for users. Like, getting Gemini three and all the product services on day one is an is like an awesome accomplishment for for Google. And I think it'll even more so, hopefully, more product services in the future.
How much additional complexity from the DeepMind perspective do you think it adds to try to do it? In some sense, life was simpler a year and a half ago. Sure.
But like, we are building intelligence, right? A lot of people ask me, you have these two roles. Like, I mean, I have these two titles in a way, but they very much the same thing. If we are going to build intelligence, we have to do it with the products, through the products, connecting with the users. With the Cairo, what I'm trying to do is make sure that the products in Google have the best technology that is available to them.
We are not trying to do the products, like we are not product people, we are technology developers, right? Like we develop the technology, we do the models. And of course, I mean, just like everyone is opinionated on anything, like I mean people are opinionated but like the most important thing for me is making the models, making the technology available in the best way that is possible and then work with the product teams to enable them to build the best product in this AI world. Because this is a new world, There's a new technology that is defining a lot of what users expect and how the products should behave, what information that they should carry over. And all the new things that you can do with this new technology.
So to me it's about enabling that across Google, working with all the products. I think that's exciting both from the product perspective, from what users getting perspective, but also from the point of view of like as I said, that's our main driver. It's really important for us to be able to feel that user need, to be able to get that user signal. That's critical for us. So that's why I wanted to do it, this is how we are going to build AGI.
This is how we are going to build intelligence like, with the products. That's how I think it's going to happen.
This a great tweet for you to put out at some point because I do think it's interesting. Share this perspective that we are, in some sense, co building AGI with our customers, with the other PAs. It's not like it's this purely research effort that's often a lab somewhere. It's a joint effort with us in the world.
And I think it is actually a very trusted, tested system as well. It's a very engineering mindset that I think we are adapting more and more. And I think it's important to have an engineering mindset in this one because when something is nicely engineered you know that it robust, it is safe to use. So we are doing something in the real world and we are adapting all the trusted tested in a way ideas of how to build things. And I think that's reflected in how we think about safety, how we think about security.
We try to think about it again from that engineering mindset of think about it from the ground up, from the beginning, not something that comes at the end. So when we are doing post training models, when we are doing pre training, when we are looking at our data, we always have this. Everyone needs to think about this, do we have a safety team? Obviously we have a safety team. And they are bringing in all the technology that is related to them.
We have a security team, they're bringing in all the technology but enabling everyone in Gemini to actually also heavily be part of that development process that is taking this as a first principle. And those teams are themselves part of our post training teams, right? So when we are developing these, when we are doing these iterations, release candidates. Just like we look at GPQA, HLE, those kinds of benchmarks. We look at its safety security measures as well.
I think that is a very, like that engineering mindset is important.
Yeah, I completely agree with you. I think it also feels natural to Google, which is also helpful because of how collaborative and how big effort is
now to
ship Gemini models out the door.
I mean, with Gemini three, I think we were just reflecting on this. To me, one of the important things is this model has been a very Team Google model.
We should look into the data. It might be like one of the, I mean, some of maybe the Apollo NASA programs had a lot of people, but like it is, I think this massive Google global, also global effort across all of our teams to make it happen, is crazy.
Every Gemini release takes people from this continent, Europe, Asia, all around the world. We have teams all around the world and they contribute and not just GDM teams, right? All teams across Google. It's a huge cooperative effort and we sim shipped with AI mode, sim shipped with Gemini app, right? These are not easy to do because they were together with us during our development.
That's the only way that on day one we can actually go all together out at the same time the model is ready and we have been doing that. When we say across Google, it's not just people actively trying to build the model. All the product teams doing their parts as well.
Yeah. I have a maybe this isn't a controversial question, but, you know, Gemini three, we're of soda on many benchmarks, a lot of benchmarks. We're sort of SIM shipping across the Google product services, our sort of partner ecosystem services. The reception is very positive. Sort of the vibes of the model are good.
You sort of fast Gunches. Knock on wood. If we sort of fast forward to the next major Google model launch, are there things that you are still on your list of you wish we were doing X, Y, and Z? Or how does it get better than the Gemini or should we just enjoy the moment of Gemini three? I think we should do both.
We should enjoy the moment, because one day of enjoying the moment is a good thing. This is the launch day, and I think people are appreciating the model. I'd like the team to enjoy this moment as But at the same time, every area we look at, we also see gaps. Right? Like, is it perfect in writing?
No, it's not perfect in writing. Is it perfect in coding? It's not perfect in coding. I mean, especially I think on the area of agentic actions and coding, I think that there's a lot more room there. That's one of the most exciting growth areas and like we need to identify where we can do more and we'll do more, right?
Like I think we have come a long way. The model is like I would say pretty much like maybe 9095% of the people who will engage with coding in some way are they software engineers or these creative people who want to build something? Yeah. I'd like to think that this model is the best thing that they can use. Right?
But there are some cases probably that is we still need to do better.
Yeah. I I have another sort of pointed question for coding and tool use. What do you think has it just been, if you sort of look at the history for Gemini, and obviously we had like a very multimodal focus for one point o, and I think for two point o we started to make some of the like agentic infrastructure work. Like, do you have a sense of, like, why we and I'll make the caveat that, like, I think the rate of progress looks really strong, but, like, why? Has it just been, like, a focus thing why we haven't been, like, state of the art in agentic tool use from the get go?
But for example, multimodal, we have been literally Gemini one was state of the art and multimodal, we've sort of held that for a long time.
I look. I don't think it was a deliberate thing. I think it was like I mean, honestly, I think, like, if anything, when I reflect back, I tie it to using the models, development environment being closely tied to real world. The more we are tied, then we are more better understanding these real requirements that is happening. And I think in our journey in Gemini we started from a point where, of course, the AI research in Google has a huge history, right?
The amount of amazing researchers that we have and the amazing history of AI research that has been done in Google, I think it's great but Gemini is also a journey of moving from that research environment into this, like as we talk, this engineering mindset and getting into a space where we are really connected with the products. When I look at the team, I have to say I feel really proud because this team is still majority formed by people including me, Like four or five years ago we were writing papers, we were researching AI And here we are actually, we are at that frontier of that technology and that technology you are developing it via products with the users. It's a completely different mindset that we are building models every six months and then we are doing updates every month, month and a half. It's an amazing shift. I think we walked through that shift.
Yeah.
I love that. Gemini three progress has been awesome. Another thread that was top of mind is just generally sort of how we're thinking about where the gen media models, I which think historically have, like, not been a huge I mean, not that they haven't been a focus. They were they've always been interesting, but I feel like we've had with v o three v o 3.1 with the nano banana model. We've had, like, so much success from, like, a product externalization standpoint.
And I'm curious how you think about in this, like, pursuit of we wanna build AGI. Sometimes I think sometimes I can convince myself that, like, a video model is, like, not part of that story. I don't think that's true. I think in general, can sort of, you should understand the world and physics and all this other stuff. So I'm curious how you see all these things intertwining together.
If you actually go back like ten, fifteen years ago, generative models were mostly on images. Right? Because we could much better inspect what is going on in terms of and also this idea of understanding the world, understanding the physics was the main driver of doing generative models with images and so on. Like some of the exciting things that we have done with generative models date back to ten years ago, like maybe Feels ten years like twenty. Right?
Twenty years ago we were still doing image models, right? That's why I was hesitating a little bit, but during my PhD we were doing generative image models, right? Like everyone was doing those at that time. We walked through that, we had things called pixel CNNs, right? They were image generative models.
In a way what happened was I think it was also a big realization that text actually was the better domain to have very fast progress. But I think it is very natural that the image models are coming back and like at GDM we have had really strong image video audio models for a long time. I think that's what I'm trying to explain maybe. Bringing those together I think is natural. So where we are going right now is we have always talked about this multimodality, right?
And of course, naturally, like we have always talked about inputoutput multimodality and that's where we are going, right? And when you look at it, as the technology progresses, the architectures, the ideas in between those two different domains have been merging with each other. It used to be that these architectures were very different, right? But they are getting together quite a lot. So like it's not like we are forcing something in, what is happening is naturally the technology is converging.
As the technology is converging, it is converging because everyone understands where to get more efficiency from, where the ideas are evolving, and we see a common path and that common path I think is getting together well. So nanobanana is one of those first moments, right, where you can iterate over images, you can talk to the model. Because what happens is that text models have a lot of world understanding, right, like from the text. They have a lot of world understanding. And then the image model has the world understanding from a different perspective.
So like when you merge those two, I think you get exciting things because, like, I think people feel that this model understands the neons that they want to get through.
I have another question about NanoBenai stuff. Do you think we should just have goofy names for all of our models? Do you think that would help?
Not really. Look, I mean, like, I think we didn't do it on purpose.
Gemini three. If we didn't name it Gemini three, what would you what would we have called it? Something ridiculous.
I I don't know. I'm not good at names. I think I like I like mean, it was Rift Runner. Right? Like, it was Rift Runner.
Like we actually use Gemini model. Those are code names. We use Gemini models to come up with those code names too. And Nanobanana was not one of those. Like we didn't use Gemini, right?
There's a story about it. Think like it's published somewhere. Mean, long as these things are natural and organic, I think I'm happy because I think the teams who are building the models, it's good for them to sort of have that connection. Yeah. And then when we release them, like I think that just like, I mean that happened because we were testing the model with the code name, right?
On LM Arena and people loved it. And I think, I don't know, I'd like to think that it was so organic that like, sort of it caught on. I'm not sure if you can create a process to generate that.
I agree with you. That's my feeling.
If we have it, we should use it. If you don't have it, it's good to have standard names.
Yeah. We should talk about, Nano Banana Pro, which is our new state of the art, image generation model built on top of Gemini three Pro. And I think the team, I think actually, even as they were sort of finishing, Nano Banana sort of like had early signal that potentially doing this in a pro capacity, like you could sort of get a lot more performance on a bunch of like more nuanced use cases like text rendering and world understanding and stuff like that. Any anything sort of top of mind for I know we're we're a lot of
stuff going on. I think, like, this is like probably where we see this like alignment of different technologies is coming into play, right? Like I mean because with Gemini models we have always said like every model version is a family of models, right? Like we have the Pro, Flash, Flashlight, like this family of models. Because at different sizes you have different compromises in terms of speed, accuracy, cost, those kinds of things.
As these things are coming together, of course we have the same experience on the image side as well. Yeah. So I think it's natural that the teams thought about okay, there's the three point zero Pro architecture, can we actually tune this model more to be generative image using everything that we learned in the first version and increasing the size? And I think like where we end up with is something a lot more capable, understands really complex. Like some of the most exciting use cases are you have a large set of really complex documents, you can feed those in, We rely on these models to ask questions.
You can ask it to generate an infographic about that as well, and then it works. Right? So this is where this natural input modality input output modality just comes into play, and it's great.
Yeah, it feels like magic. I don't know, hopefully folks will have seen the examples by the time this video comes out, but I think it's just, it's so cool, seeing a bunch of the internal examples being shared around. It's crazy.
Yes, I agree. Like it's exciting when you see that all of a sudden, oh my god, yes, like that's sort of huge amount of text and concepts and like complicated things explained in one picture. Such a nice way. Like when you see those things like it is, it's nice, right? You realize the model is capable.
And it's, yeah, there's so much nuance to it too, which is really interesting. I have a parallel question to this, which is probably December, December 2024, Tulsi was promising how we were going to sort of have these unified Gemini model checkpoints. And I think what you're describing is actually that we've gotten really close to that now, where the historically was
Unified in terms of image generation and oh, I see.
I see. Yeah. And I'm curious, do you think that I assume that's a goal. We want these things actually made mined into the model, and there's natural things that stop that happening, and I'm curious if any context or sort of high level
Look, I think as I said, the technology, the architectures, they're aligning, so we see that happening. At regular intervals, people are trying, but it's a hypothesis, and like you can't be ideology based in this, right? The scientific method is the scientific method. Like we try things, we have an hypothesis and you see the results, sometimes it works, sometimes it doesn't, but that's the progression that we go through. It's getting closer.
I'm pretty sure near future we are going to see something getting together and I think gradually it's going to be more and more like one single model. But it will require a lot of innovation, right? Like it is hard, like if you think about it. The output space is very critical for the models because that's where your learning signal comes from. Right?
Right now our learning signal comes from code and text. That's the most of the driver of that output space and that's why you are getting good at there. Now being able to generate images is we are so tuned for the quality in images, like it is a hard thing to do, right, like generating really like the quality of the images, the pixel perfectness is hard and then images are also conceptually it has to be very coherent like every pixel both the quality matters but also how it fits with the general concept of the picture like it matters right? It is harder to train something that does both. The way I look at this is to me, I think it's definitely possible.
It will be possible. It's just about finding the right innovations in the model to make it happen.
Yeah. I love it. I'm excited. It'll hopefully make our serving situation easier too if we have
single model checkpoint. It's impossible to say.
It's impossible. I agree with you. The sort of interesting thread as we sort of sit here and, you know, DeepMind has a bunch of the world's best AI products, hopefully. Five coding and AI studio, Gemini app, antigravity, and sort of across Google that's happening now. We have a great state of the art model with Gemini three.
We have now Banana. We have Vio. We have all these models that are sort of at the frontier. The world looked very different, like, ten years ago, or even, like, fifteen years ago. And I'm sort of curious, like, for your personal journey to get to this point.
You when we were talking yesterday, you had mentioned, which I had no idea, and I mentioned this to someone else, they also were like, I had no idea of this. You were the first deep learning researcher at DeepMind. And I think taking that thread to this place that we're at now feels like it's a crazy jump, to go from just like the fact that people weren't excited about this technology. I don't know how long ago you started DMI, like ten years?
Twenty twelve.
Thirteen years? Yeah. That's crazy. Thirteen years ago, people weren't excited about this technology to the place or I guess, DeepMind was excited about this technology to the place now where, like, it is literally powering all these products and is, like, the main thing. And I'm curious, as you reflect on that, what comes to mind?
Is it surprising or like was it obviously
Well, I mean, think this is the hopeful positive outcome scenario case, right? The way I say it is like, when I was doing my PhD, think it's the same for everyone doing their PhDs. You believe that what you do is important or is going to be important, right? Like you're really interested on that topic, you think that it's going to make a big impact. And I think I was in the same mindset that's why I was really excited about DeepMind when Demis and Shane reached out and we talked.
Was really excited to learn that there was a place that was really focused on building intelligence and deep learning was at the core of it. It's actually like me and my friend Karl Gregor, actually we were both in Jan's lab in NYU. We joined DeepMind at the same time, just to be very specific. And then at those times it was very unnatural that you would have a deep learning focused and AI focused, startup even. So like I think that was very visionary and an amazing place to be.
Like it was really, it was really exciting. And then I started the deep learning team, it grew. I think one of the things that I like, I mean my approach to deep learning has always been that like a mentality of how you approach problems. And the first principle it's always learning based. That's what DeepMind was about.
Everything is better on learning. It was an exciting journey to start from where we were at the days and then RL and agents and everything that we have done along the way. Like you go into these things, at least like this is how I think, I go into these things hoping that a positive outcome happens, but I reflect and I say that we are lucky, right? Like we are lucky to be living in this age because I think a lot of people have worked on AI or the topics that they are really passionate about thinking that this is their age and this is when it's going to pan out. But it's happening right now and we have to also realise that AI is happening right now not just because machine learning and deep learning but also because it's like the hardware evolution has come to a certain state, like internet and data has come to a certain state, right?
So there are a lot of things that align together and I feel lucky to be actually be doing AI and sort of like working up to this moment. I think it is a like when I reflect that's how I feel that like yes, they were all choices that we worked on AI and we made and I made like specific choices to work on AI. But also at the same time, I also feel very lucky at this time we are in this position. It's very exciting.
Yeah, I agree with you. I love that. I'm curious, like, what are some of the and I was watching thinking game video and sort of, like, see like, learning more about, like, which I hadn't and I wasn't I wasn't around for AlphaFold. So that's the only context that I have is, like, reading about it and seeing people talk about it. And I'm curious, like, as you reflect and having lived through a bunch of that, how things are different today versus what they were before.
And I'll sort of tee you up with one example, which is what you kind of alluded to off camera right before this, which is, and this is not exactly your words. You were like, we've kind of figured out how to make these models and bring them to the world. It was like sort of an essence of what you're getting
at,
which I agree with. I'm curious if that felt like, yeah, how that is similar or not to how things were for some of the previous iterations
I think of how to organize or the cultural traits of what is important to be successful to turn hard scientific and technical problems into successful outcomes. I think we learned to do that a lot with many of the projects that we have done starting from DQN, AlphaGo, AlphaZero, AlphaFold. All these kinds of things have been quite impactful. And in their ways, like we learned a lot on how to organise around the particular goal, particular mission, organise as a largest team. Like I remember in the early days of DeepMind, we would work on a project with like 25 people and we would write papers with 25 people and then everyone would say to us 'surely 25 people didn't work on this.' I would say yes they did, they did.
Right? Like, I mean, we would organize because in sciences and in research that wasn't common, right? And I think like that knowledge, that mentality, I think is key. We evolved through that. I think that is really, really important.
At the same time, I think like with the latest, like the last two, three years as we talked, what we have been merged, like what we have merged this with is like the idea that now this is more like an engineering mindset where we have a main line of models that we are developing and we learn how to do exploration on this main line, how to do exploration with these models. The good example where I see this and every time I see this or think about this I feel quite happy is our deep think models. Those are the models that we go to the IMO competition with, to the ICPC competition with. And I think that's a really, really cool and good example because we do the exploration, you pick these big targets like, am I competition is really important, right? Like, it's really hard problems and kudos to every student out there who's competing in those competitions, amazing stuff really.
And like being able to put a model there, of course, like you have the urge to do something custom for that. We sort of what we try to do is use that as an opportunity to evolve what we have or to come up with new ideas that are compatible with the models that we have because we believe in the generality of the technology that we have. And then that's how things like DeepThink happen and then like we come up with something and then we make it available for everyone. Right? So everyone can use a model that is actually one that is used in the IMO competition.
Yeah, just to draw a corollary between what you said, the 25 people in the paper, and I think now the today version of that is you look at like, I'm sure there's 's a Gemini three contributors list that will come out or is already
2,500.
And there's like 2,500 people, and then I'm sure people are Conservatively. Yeah, I'm sure people are thinking there's no way that 2,500 people contributed to actually They did. But they did, which is And it is fascinating to see how large scale some of these problems are now. Really.
And I think it is important for us and that's one of the great things about Google. There are so many people who are amazing experts in their areas. We benefit from that. Google has this full stack approach. We benefit from that.
So you have experts at every layer from data centers to chips to networking to how to run these things at scale. It comes to a state again going on this engineering mindset it comes to a state that these things are not separable, right? Like when we design a model, we design it knowing what hardware it's going to run on and we design the next hardware knowing where the models will probably go. But this is beautiful, But coordinating this, yes of course you have thousands of people working together and contributing and I think we need to recognize it, and that's a beautiful thing. That's great.
Yeah. It's not easy to pull off. One of the one of the interesting threads is around back to this sort of like deep mind legacy sort of doing all these different scientific approaches and trying to solve these really interesting problems, and today where we actually know that this technology works in a bunch of capacities, and we truly just need to keep scaling it up, obviously there's innovation that's required to keep doing that, but I'm curious how you think about DeepMind in today's era balancing purely doing scientific exploration versus we're just trying to scale up Gemini. And maybe we can use my favorite example for you, which is Gemini diffusion as an example of that decision making come to life in some capacity? That
is the most critical thing. Finding that balance is really important. Even now, when people ask me, what is the biggest risk for Gemini? And of course, I think about this a lot. The biggest risk for Gemini is running out of innovation.
Because I really don't believe that we figured out the recipe and we're just going to execute from here. I don't believe in that. If our goal is to build intelligence and we're going to do that of course with the users, with the products, but the problems out there are very challenging. Our goal is still very challenging and it's out there and I don't feel like we have the recipe figured out that it's just scaling up or executing. It is innovation that is going to enable that.
And innovation, you can think about it as at different scales or at different tangential directions to what you have right now. Like of course we have Gemini models and inside the Gemini project we explore a lot. We explore new architectures, we explore new ideas, we explore different ways of doing things. We have to do that, we continue to do that and that's where all the innovation comes from. But like also at the same time I think DeepMind or Google DeepMind as a whole doing a lot more exploration.
I think it is very critical for us. We have to do those things because like again, like there might be some things that like the Gemini project itself might be too constraining to explore some things. So like I think the best thing that we can do is both in Google DeepMind, also in Google Research. Right? Like we would explore all sorts of ideas and we will bring those ideas in because at the end of the day Gemini is not the architecture.
Right? Gemini is the goal that you want to achieve. The goal that you want to achieve is the intelligence and you want to do it with your products enabling goal of Google to really run on this AI engine. In a way it doesn't matter what particular architecture it is. We have something currently and we have ways of evolving through that and we will evolve through that.
And the engine of that will be innovation. It will always be innovation. So finding that balance or finding opportunities of doing that in different ways, I think is very critical.
Yeah. I have a parallel question to that, which is at IO, I sat down with Sergei and I made the comment to him that sort of when and I I personally felt this at IO, is you bring all these people together to launch these models and and have this innovation. You sort of, like, feel the the warmth of of humanity as you do that, which is really interesting. And I I was referencing this because of, you know, I was sitting next to you also listening to them, and I sort of, was feeling your warmth. And I I mean this very personally because I think this translates into, like, how DeepMind sort of as a whole operates.
Think, like, Demis has this as well where it's, this deep scientific roots, but also it's just like people who are like nice and friendly and kind. And there there is something interesting where like, I don't I don't know how much people appreciate, like, how much that culture matters and how it manifests. I'm curious, as you think about helping sort of shape and run this, how that manifests for you?
First of all, thank you very much. You're embarrassing. But like, I think it is important to be I believe in the team that we have and I believe in giving people, like trusting people, giving people the opportunity and that team aspect is important. And I think like this is something that at least to my part I can say I've learned through working at DeepMind as well because we were a small team and of course you build that trust there and then how you maintain that as you grow. I think it is important to have this environment where people feel like, okay, we really care about solving the challenging technical scientific problem that makes an impact that matters for real world.
And I think that is still what we are doing. Right? Like Gemini, as I said, is about that. Building intelligence is a highly technical challenging scientific problem. We have to approach it that way.
We have to approach it with that humility as well, right? Like we have to always question ourselves. Like hopefully the team feels like that too. And I'm like, that's why I always keep saying I'm really proud of the team that they work together amazingly well. Like we were just talking upstairs at the micro kitchen today at Takamine, I said to them, yes it's tiring, yes it's hard, yes we are all exhausted, but this is what it is.
Like we don't have a perfect structure for this. Everyone is coming together and working together and like supporting each other. It is hard but like what makes it fun and enjoyable and also like what makes you tackle really hard problems is I think to a big extent like having the right team together working together. The burden is the way I see it is more like be clear about the potential of the technology that we have. I can't definitely say that twenty years from now it's the exact same LLM architecture.
I'm sure it won't be. Right? So I think pushing for new exploration is the right thing to do. We talked about, GDM as a whole together with Google Research, we have to be doing with the academic research communities. As a whole we have to push many different directions.
I think that's perfectly fine. What is right, what is wrong is a like, I don't think that it's the important conversation. I think like, like the capabilities and the demonstrations of those capabilities in real world is the real thing that should speak for itself.
Yeah. I have one last question, is, and I'm curious to have your reflection on this as well. I feel like the, for me personally, like my first year and a half plus at Google felt like, which I really liked actually this like, Google underdog story to a certain extent, which despite all the infrastructure advantage and all that, for me personally showing When
did you join?
April 2024. 2024. Okay. Yeah. And also the AI studio context, we were building this product and
Right. Sort of Oh, Now I remember.
We had no users, or we had 30,000 users, we had no revenue, we had sort of very early in the Gemini model life cycle, and I think fast forward to today, and, like, it's obviously not like I was getting a bunch of pings earlier as as sort of the the last couple of days as this model has been rolling out. And, you know, from folks across the ecosystem, I'm sure you got a bunch of these as well, people, like, very I think they're finally realizing that this is happening. But I'm curious from your perspective, what, did you feel that, like, again, I had belief, that's why I joined Google, that we were going to get to this point, but like, did you feel that underdog ness too? And I'm curious like how how you that, think the team will manifest for the team as we turn that corner?
I definitely did even before that because like, when LLMs really like became apparent that they're really powerful, right, like I felt like very honesty, I felt like we were the frontier AI lab, right, like in DeepMind, but also at the same time I felt like okay, there's something that we haven't invested as much as we should have as researchers and that's a big learning for me as well. Right? Like that's why I'm always very careful about like we need to cast a wide net, that's really important, that exploration is important. It's not about this architecture, that architecture. And I've been very I've been very open with the team that when we started taking LLMs a lot more seriously and starting with like with the Gemini program like two and a half years ago.
I think we have been always and I've been very honest with the team that like we are nowhere near what is state of the art here. We don't know how to do a lot of things. There are a lot of things we know how to do but like we are not at that level yet and it's a catch up and it has been a catch up for a long while. I feel like nowadays we are at that leadership group. I feel really good and positive about the pace that we are operating at.
We're in a good sort of rhythm. We have a good dynamic. We have a good rhythm. But like, yeah, we have been catching up. You have to be honest with yourself, Like when you are catching up, are catching up.
You have to see what others are doing and learn what you can learn, but you have to innovate for yourself. And that's what we did and that's what I feel like it's a good underdog story in a sense in that way, right? Like we innovated for ourselves and like we found our own solutions both like technology wise, model wise, process wise and how we run. Right? And it's unique to us, Like we run together with all of Google.
Like look at what we are doing, it's a very different scale. I never saw these things as like sometimes people also say oh Google is big and it is hard'. I see that as we can turn it into our advantage because we have unique things that we can do. I'm quite pleased where we are but we have to learn through and innovate through that. That's a good way to achieve what we have achieved right now and there's a lot more to do.
Right? Like, I mean, I feel like we are sort of just catching up. We are just getting there. There's always comparisons, but our goal is to build intelligence, right? Like we want to do that.
We want to do it the right way. And that's where we are putting all our minds, all our innovation that way.
Yeah. I feel like the next, the next six months are going to be probably just as exciting as the last six months and the previous six months before that. Thank you for taking the time to sit down. This was a ton of fun. I hope we get to sit down again before IO next year, which feels like forever, but it is going to sneak up on it.
I'm sure there's going to be meetings like next week that are like IO twenty twenty six planning to make everything happen. So, thank you for taking the time. Congrats, again, to, you and the DeepMind team and everyone on the model research team for making Gemini three, Nano Banana Pro, everything else happen.
Yeah. Thank you very much. It's been amazing having this conversation. It's an amazing journey as well, and glad to have all the team, but also, like, sharing with you as well. It's it's great.
Thank you very much for inviting me.
We got a special a special little gift. Thank you to congratulate you and the team for making this happen.
Oh, nice. Thank you very much. Very much on point.
Fifteen hundred point First Elo model, right? $15.00 1 for Yes,
first model. Very kind. Thank you very much.
Koray Kavukcuoglu: “This Is How We Are Going to Build AGI”
Ask me anything about this podcast episode...
Try asking: