| Episode | Status |
|---|---|
Join Nolan Fortman and Logan Kilpatrick for a conversation with Matan Grinberg, CEO of Factory AI, about the future of software development, the launch of Factory, the challenges of scaling up autonom...
People are elevated who have the skills that have kind of made for the best builders and developers in the first place, which has always been systems thinking and thinking around constraints. The reality is that even as models get better, there will be certain tasks that are within range of being delegated and autonomously solved. And then there will be some that you, as the human developer, still want to be able to monitor and jump in as necessary. A lot of these tools, they're super neck and neck and it's like, well, this person did this and we're going to lose half our users if we don't do that in the next payment cycle. So we have to do that.
Right?
I'm super excited to have this conversation. We haven't caught up in a long time. I'm happy to catch up. For folks who don't know what you're building, do you wanna give just like a really quick intro of what Factory is trying to build and the problem that y'all are solving?
Yeah. Absolutely. Great to great to see you again and great to meet you, Nolan. So at factory, our mission is to bring autonomy to software engineering. And what that means more concretely, we've built the home for agentic software development.
And what does that mean more concretely? Well, maybe stepping back talking a little bit higher level, I think everyone knows that software development is going to change dramatically over the next few years. But so far, a lot of the AI coding tools haven't really transformed behavior that much. In general, the tools we use look roughly the same with some new features here and there. And really, our take is that indeed, software development is going to change dramatically.
But in order to actualize that change, developer behavior needs to change. And it needs to go from kind of the typical software development lifecycle that we've seen over the last like, ten, fifteen years into one that's agent first, where really the kind of focus where human developers spend their time shifts from coding to really understanding and planning and systems thinking, thinking about the constraints from the business, from the product, from the customer, and then translating that into clear plans and instructions for agentic systems to then, you know, you delegate it to them, and then they go about executing it, you might iterate back and forth, you might jump in, but really kind of stepping into that more orchestrator role as the developer. And that's the platform that we were building really from scratch. So not kind of going iteratively from the existing interaction patterns like the IDE, but instead rethinking it from the ground up. And that is what factory platform is.
The autonomous systems that we have that you delegate these tasks to, we call them droids. And so they can be task specific. So things like RCA, incident response, as well as just general, coding, feature building, PRD creation. Yeah.
I love that. I I feel like one, the droids brand is sick. Like, I think you need, like, good droid swag. I feel like that goes really well. How do you think about this?
Like what you just described to me sounds like some tasks that are done by software engineers today, but like the sort of pool of people who might be able to do those tasks seems like it's like a much wider audience of people than potentially like traditional software engineering folks. Have you seen already like you're, you know, a few days away from going GA and having everyone available to use the platform, you've obviously, you have people using Factory today. Is the audience of people who are using Factory today, like also sort of already widened and like PMs and designers and stuff are sort of tasking droids to do work on their behalf? Or is that like still a little bit more future facing?
Yeah, that's a that's a great question. I mean, I think we've seen even over the past few years, some like early evidence of this with with like, low code and no code tools getting people who previously weren't developers interested interested in the space and excited. I think something that we see is really roles getting a little smeared in like EPD teams where before you might have a PM or a technical PM or an engineering manager or an IC, these are all seen as like pretty distinct. But as AI gets better, and as you have, you know, tools that you can delegate tasks to, we kind of zoom back out and people are elevated who have the skills that have kind of made for the best builders and developers in the first place, which has always been systems thinking and thinking around constraints. And really, code has kind of been a thing that you have to do once you, you know, go through that systems thinking.
And once you go through, like, understanding the constraints that you need to satisfy, since there was no AI, you have to go and then actually implement it. And the reality today that we're going towards is that, you know, once you actually have a clearly defined specification of what it is that you wanna build, you no longer need to actually go in deep and actually implement it. You'll be able to, you know, and now it's not 100%, right? You still as a, you know, human need to be able to go in and debug and like understand the actual implementation. But we're very rapidly getting to that place where you actually maybe 1% of your day as a developer will be actually going into the code.
And what that allows is there might be PMs who never studied computer science and didn't take the time to become fluent in any of those particular languages. But they're still able to think at that system's level. Tools like this just enable them to not need to gain fluency, in that language. So, there's definitely the smearing. We also see like, you know, even some, like solo developers or friends that we've given access to.
They're folks who have never developed in their life who are getting a ton of value. In the enterprise orgs that we work with, which, you know, that really is our focus, the large enterprise. We'll see technical PMs, engineering managers, like even c suites who haven't coded in, twenty years get involved, because it does kind of just lower that barrier to entry. And just allows anyone who has that that mindset to be able to jump in without some of the the barriers that don't necessarily filter for systems thinking or engineering thinking. It's instead just like you either know the language or don't.
Question two about the droids. Like, trying to comprehend. So do I go on to Factory AI? Are there, like, a bunch of different droids I can pick from that already have kind of pre built workflows? Am I starting from scratch?
Like, how do you actually go in and leverage the technology and build it to be beneficial to you?
Yeah, great question. So to start, we kind of, you know, when you jump into factory, there are three that are going to be put up like front and center, because these are the ones that as we go GA, we've just seen the most like general purpose applicability for. And that's the CodeDroid, the RCA or incident droid, and the knowledge droid. And so CodeOne, pretty self explanatory. And this is the thing that I think you'll see most people when they first jump in just to see what it can do will be like, spin me up some zero to one application or some like, build me some feature on some existing code that I have.
That's one that's kind of like, table stakes just to see, like, alright, go generate me some code, run it. I wanna see it do the whole agentic loop where, you know, it might make a mistake. It'll see the output from CLI and then iterate based on it. So that's kind of the first thing that most people do just to get to, like, watch the agent do things without you, being super involved. The incident response one is kind of when you get a little bit deeper and, you know, you're actually in some production setting and wanna save time instead of needing to go super deep into, you know, your Sentry Logs or Datadog or whatever tools you might use, seeing Factory able to go in and kind of pull in that information from your code base, from your docs, understanding what the PRDs were in the first place that led to this code change that might have led to some incident.
And so that's kind of a little bit deeper, because it's actually typically in production. And then the knowledge droid is one that takes advantage of a really big focus for us, which is context of the entire engineering system. So similar to how when you onboard a human engineer, you don't just give them access to your code base and then say, all right, go have fun. Generally, you'll also onboard them to tools like Slack or Jira, Linear, Google Drive, Notion. So the kind of first principles idea for Factory is if you want the autonomous systems to perform as well as a human, you need to give it access to similar information sources.
And so the knowledge droid kind of shows off that ability to really go deep and understand the interconnectedness of all of this information, knowing that code changes don't just happen on their own, but they're connected to Slack conversations or sprint plans from your linear or Jira. And then using that, you know, developers, let's say, in an onboarding scenario, can go in and get a quick understanding of how the organization is working, what it is that they might wanna do for a given day, given project, that sort of thing.
Quick question off that too. For the enterprise customers you're you're currently engaged with, are they usually leveraging all three of those droids? Are they pulling just one? Like, how does that kind of work? Is it like a a full tech stack that you're eventually trying to sell or get in front of everyone?
Yeah. So I mean, generally, the way we work with the enterprises, we'll typically start with one use case that's really top of mind. And normally, the large enterprise, there is some beastly migration or version upgrade that they need to get done. In which case, it kind of depends on the size and scale. Sometimes they might use multiple droids.
Sometimes they'll use one specifically for, let's say, like a Java eight to Java 21 migration. But that's typically where they'll start, especially because, you know, it's outside of the IDE. So it is a net new behavior, and you kind of need a fire under the ass to actually motivate that difference in behavior. And there's no fire under the ass like a necessity to do a migration in some like, let's say, four months, let's say. And then we come in and say, hey, you know, use factory, you can get it done in like, two weeks.
That's kind of an easy way for them to say, alright, well, let's try it out. Let's see. Once they get that win, then they can go broader, using different droids and other aspects of their development. But generally, for the large enterprise, we'll start with one pretty lump sum task before we go broader for just general developer productivity.
Yeah. Yeah. Mutant, how the way you've described factory and as someone who hasn't yet gotten to play around with it, though, if you've got an invite coach- Which
we should change. Let me send you a link after this, but-
I love it. I love it. I'm excited to play around. Like this angle of the model and the system being more agentic. For folks who, again, for folks who haven't seen, and we'll have a link somewhere and you can see demos and stuff like that in the show notes.
But for folks who haven't seen this, like how much more agentic is factory today than like, you know, insert your favorite AI product right now, cursor, windsurf, yada yada yada. Is it like really the like much longer running tasks that are happening or is it like sort of a familiar human in the loop after thirty seconds or a minute from, like, a cogeneration standpoint?
Yeah. That's a great question. So I actually I'm really glad you asked that because I think something that we're really opinionated about is that you actually need both. You can't just have the pure agentic delegation, nor can you have the pure, like, pair programming back and forth. The reality is that even as models get better, there will be certain tasks that are kind of within range of being delegated and autonomously solved.
And then there will be some that used you as the human developer still want to be able to kind of monitor and jump in as necessary. And so a big thing for us is allowing the developer to kind of dynamically and on the fly adjust whether they're doing more collaboration or more delegation. And so to answer your question more directly, when they're in the full delegation mode, typically what that looks like is, you know, they want to work on a feature within factory. They'll generally have a plan that they make first. So we're you know, users of factory generally learn pretty quickly.
You don't wanna just shoot from the hip. Because it's kind of you you get what you put in. And if you just say, hey, go do this, it'll make assumptions that you didn't explicitly state, which, again, kind of goes back to the constraint based thinking. If you say go do this, chances are there are constraints that you have in your head that you didn't actually verbalize to the system. And so it'll give you something back, and then you'll be upset because it violated those constraints because you didn't actually say them.
So, you know, generally within Factory, they'll, you know, come up with this plan. They'll decide, you know, maybe this part I want to do myself or, you know, someone on my team kind of has ownership there. This sub step, I want to go delegate. What that delegation could look like is factory generating the code, actually executing the code for, you know, whatever feature, verifying that it passes certain tests, which again, you know, tests would be involved in this plan. If it doesn't pass those tests, it can take in those output logs.
Based on the output logs, understanding the failures can keep iterating until it hits that acceptance criteria and then proceeds from there. So generally, if you're kind of want to see the full delegation magic, you as the developer have the control of how many iterations will it try if the tests are failing, let's say. And if you want, you can really just go, like, fully hands off and just, like, kind of give it the keys and just say, okay. Keep going until until you satisfy this. If you'd like and if you're seeing it's kind of going astray, similar to how you would work with a human developer, a friend of yours, you might go and say, hey, by the way, I think you're messing up on this axis.
Like, have you considered this? Or maybe let me throw in some docs that you might not have considered that would actually, make you realize, that you should use some other module, let's say, and that'll solve your problems.
For the enterprise companies you're talking to, are you seeing a friction point with folks wanting to delegate? Like, do they still want that hands on experience? I feel like, obviously, the larger organization, the more sensitive data kinda goes into a tricky situation. But, like, I'm just curious as far as how many people are like, alright. Let's just see what this see what happens.
Right? Let's give it the give it the keys and see how well it goes.
Yeah. Yeah. This is this is this is a great question. I think generally in the piloting phase, people really want to see that that capability is there. But then when we get to the actual deployment, they want to know that they have clear governance and control over what things can the agent do kind of without permission, and what things are like what even like CLI commands, does it need explicit user permission for?
Also, having the ownership of any of these changes or commands be clear, you know, for the developer, so that they kind of can't just put their hands up and be like, something happened. I don't know. I don't know what went wrong. So in terms of the actual deployments, having like, even like blacklisted or whitelisted commands for auto running, that's another thing that's really important. I think another thing to emphasize here as well is a lot of these tools that have some of these agentic capabilities typically will do so either locally.
And so that's the IDEs that have agentic capabilities. They might run CLI commands locally to test the code, that sort of thing. Or there are others that are more like ticket to PR type agents, which typically spin up cloud instances, where they'll run the code in the cloud, verify that it works, and then submit a PR. What's cool about factories, you can actually do both. So you can have parallel agents running locally and parallel agents running in the cloud.
The beauty there is the ones in the cloud typically would be the ones that you're, in some sense, is more confident on because you might not want to go hands on and, like, carry on from wherever it left off. Might just wanna you know, you're confident it'll end in a PR. Or ones that you're a little concerned about what commands it might run, and you wanna make sure that it's running in some, like, remote instance and not in your local environment. On the other hand, the agents that you might wanna run locally are ones where it's something that's super complicated. You want it to go and kick the tires for a little bit before you go in yourself.
But then this way, you don't need to then pull down a branch to start working on it locally. You can just pick up where it left off. But then on the other hand, since it is running locally, you'd wanna maybe monitor a little bit more because it could mess up your environment if you're not, you know, sure of what's actually going on.
Quickly off that as well. Sorry, Logan. I'm just curious too because, like, for the workflows of it, like, you training it on a human like, someone who's in that position doing it day in, day out? Like, here's how I want you to actually act. Like, are you actually with those droids that are kind of pre built, already brought brought that workflow into it?
Like, how do you actually make sure it's going and doing everything that your best or top developers doing today?
Yeah. This is this is a great question. So this is what, I think maybe some might consider, like, the less sexy parts of this because, know, generally, in the AI cogen space, it's all like you wanna do the fancy, like, RL or fine tuning and that sort of thing. I think for us, you can go a really long way just having explicit documentation of the best practices within the org. And this is something especially for the larger orgs where we really they have a lot of, like, codified standard operating procedures.
We go about doing in in that deployment, making sure that the droid is actually customized to that use case. And what's nice about that is it's actually a lot cheaper and as effective as fine tuning. And if this exists in like a YAML file, let's say, or some some doc within their code base, or even some config of theirs, they can also go and manually change it as they decide, oh, actually, you know, our best practices, we want to move it in this direction. So that's been something that's really successful. And I think the important thing, for scalability in these large orgs is making sure we don't have to go and do that every single time, but rather, teach them to understand how these systems work.
And then kind of they end up being the internal champions that can go and say, oh, hey, this team over here that wants to use a droid for this task, I know exactly how to set you guys up. And then they're they're good to go.
Yeah. Muttan, you you've spent the last year plus building this product from an enterprise perspective. And now that you're making it available to, like, the general developer audience and, obviously, like the and I'm I'm I have a bunch of questions that I'm curious here to take on, like the model progress story and all that that's happened in the last year as well in conjunction to the stuff that you've been doing from a product perspective. But how do you think about like solving this problem first for enterprises? Because I like alluding to Nolan's point, like I feel like solving this for enterprises is so much more difficult.
I'm thinking about just like internal code bases at large companies. There's a lot of human complexity. There's a lot of system complexity versus like, you know, the random developer who's just like vibe coding something. And obviously you built this product in a world where like the models were capable of different, sufficiently different things. So like maybe the value creation was only enterprise, but like, I'm curious, is that roughly why enterprise first or like why?
Curious to get that story.
Yeah. Great question. So I think part of it is because at the enterprise is where there's the most differentiated value from having a really good retrieval system and like code based understanding system. Because kind of considering the other case of like zero to one vibe coding, most of the alpha there in terms of the quality is from the models, not the product or the UI. Generally, the product or the UI serves a good purpose, which is, know, sometimes if you're working with someone who's never been a developer, you want to make it really welcoming and kind of like, easy to use for them.
But in terms of the actual, like, what differentiates the quality of the output, generally, that alpha kind of lives within the models for those zero to one use cases. There are definitely exceptions to that. But generally, that's the case. I think one exception worth calling out is UI can actually get more in terms of the constraints that the user is thinking. So if you have a better UI, you can extract that more easily from the user so that your output maybe is not better quality, but better tuned to the constraints that they implicitly had in their head, that maybe a tool with a worse UI or worse kind of like experience for that user wouldn't pull out of them.
And so then they would might get something that matches their expectations less. But yeah, so the reason the reason for the enterprise focus is like, obviously, there's a lot of alpha in the models themselves. But alone, you can't just drop any of these models into one of those code bases and hope something good happens because there's just so much context, so much dead code, so much tribal knowledge kind of stored within the the thousands of people within these orgs that the models just won't have access to. And they can try and they you know, in more local instances, like localized changes, they can provide value. But it's just generally pretty underserved because because also a lot of that problem of navigating the entire engineering system or even under like understanding the role based access control that you need for there are some repos that only some people have access to, or like Google Drive, there are just so many kind of permutations of that.
And it's a little less sexy than, you know, dealing directly with models and with like generating cool apps from scratch, that I think people haven't touched it as much. But that's kind of the type of problem that, you know, the team at Factory we jump on.
Yeah. Do you think this will change over time at all? Or like you still think like I could imagine a lot of the primitives or the scaffolding that you've to make it, make this work well for our large enterprises would like also potentially mean you could build a differentiated, like top of funnel vibe coding type of, and maybe that is what people will end up doing as part of the GA launch and you'll see more of that, but like, do you what what what what's your expectation going into the launch? Like, is it will you see a bunch of people doing that?
Yeah. So I think I think that's a that's a really good point. I think so first of all, I think the UI that we have is really new and innovative, and I think people will really like it. I I will say I think the quality is not going to be some, like, crazy difference in that zero to one use case. Because I think in that zero to one use case, most tools perform mostly the same way.
Even most of the raw models, just like if you give it some query, it'll perform mostly the same way. I think the difference is and a lot of people are kind of aware that, you know, some of these no code tools are like zero to one app building tools. It's kind like taking out debt. And so you can make a cool app at first. But if you start needing to, like, make a lot of changes or support a lot more throughput, or actually, like, set it up on its own and, you know, kind of in these cases, like, let's say you you wanted to actually use this as a piece of software to have a business built on top of, you end up having to pay that debt.
And really, most of the time, end up needing to get a real engineer in there who can go and kind of, like, fix all the wires behind the scenes. I think that's where, there's gonna be a big bifurcation in terms of performance between us and some of the other tools out there that are kind of catered towards that zero to one use case. So very excited to see that play out, especially, you know, the weeks following launch when people get to that phase with some of the software that they build.
Yeah. Nolan and I had a long conversation with Eric Simmons from Bolt about generally this exact problem, which is like, how do you, especially for them in the context of like having a much less technical user persona than I think it sounds like you have with Factory. The challenge is like people have great ideas and they take the first step, but like the current systems don't take you the next step of like, how do I do support? How do I do marketing? How do I actually even just deploy this thing and like turn it into something that people can use?
So I'm super excited. I really hope you hit this part out of the park because I do think it's like a requirement for this, like current era of like everyone being able to create software for that to actually create value for the world. Think people need to be able to do these, like turn them into real products and services.
Yeah. I think that phase transition is something that we're really interested in. Because I think that's kind of also the phase transition where someone come, someone becomes like a prime factory user in our perspective. And so we want to make sure we can also cater to them before that phase transition. But really, seeing that transition happen is something I'm really excited about, because we haven't deployed to nearly enough, you know, small companies.
So really, really excited for that.
I was I was reading an article this morning, I believe was lovable. Basically, I think they brought it to a k through eight school. And the idea was like, these kids are so much better at using lovable than adults. Right? They have no fear.
Imagination's kinda running crazy. And I think we had another conversation, Logan, not too long ago about how, like, almost when things are so simple and easy, like, you kinda have a roadblock of, like, alright. What do I do? How do I use it? So, like, kinda going back to those drones too, do you I guess your thesis around different use cases and how you originally thought they'd be used, is it pretty on target?
Are you seeing kinda use cases come out from some these enterprise organizations where you're like, crap. I didn't even kinda think of that. And now, again, it's kind of taking a win. Like, here's to get your feedback on just use cases and what people are seeing from a value perspective in general.
Yeah. Yeah. I mean, I think one of the biggest surprises so we talked about, you know, PMs and technical PMs using factory a lot in the enterprise. That was something that we actually weren't really expecting nor were we planning it. We kind of we saw with this one organization that we were working with that had about 200 engineers.
There's one day where like, I like obsessively just refresh all of our metrics, because OCD like that. And one day, they're like daily active users shot up by like 30. We had deployed to every engineering team. So I was very confused what was going on. So I sent a message to the CTO there.
And I like, hey, like, who who are all these people? Do you guys just like acquire a company? It's like, that who these people were? And turned out that the PMs kind of caught wind of this, and they all, you know, requested access. And I think what was surprising the moment retrospectively makes a ton of sense is we've actually found that PMs take to factory even easier because they don't have an ingrained software development life cycle that they've been doing for fifteen years.
And so for the developers, we kind of have to earn the right. We have to show, hey, you know, you've had this workflow for the past fifteen years. Here's why this is gonna make you more productive. Like, we need to meet them where they are a little bit, show them kind of ways to come in, and, like, slowly start adopting factory. Whereas the PMs, they they sound like they use the IDE ever.
It's not like they have this, like, kind of rigid path that they've carved through their development. And so for them, they'll jump in and, like, figure out the the discoverability that they have for, like, features that are, like, niche is way higher. Because for them, it's like, here's this whole new platform. Like, does that do? What does that do?
I'm gonna try this. I'm gonna try that. And it's been cool. Like, almost every single, or especially in that size of, like, like, 200 ish engineers, almost like clockwork. We've seen we never particularly reach out to the PMs.
They end up catching wind, end up using it, and then we get a message from, like, the VP of end or CTO being like, hey. By the way, our PMs are submitting PRs now, and they don't need to annoy our, you know, full stack developers to, like, go and change this thing on the front end or that thing. So that's been an exciting thing that we've seen. And now, as part of our org wide deployments that we do, we'll typically ping the head of PM and be like, hey, by the way, we've seen this work before. You guys might want to come try it.
I'd love that. As a PM, that sounds like a like a great success story. Do the droids have names?
Yes. So, I mean, we'll you know, generally, it's like it's the name is based on the task that it's for, so, like, the code droid.
They need cute names. That's that's the I feel like not even cute names, but I do think, like, a nontrivial amount of success of, like, Devon as an example is just that it, like, has a real name. And, like, people can just, like, kind of associate this thing like an like a coworker or whatever it is.
Yeah. So funny enough, the the source of the name droid and so originally, when we incorporated, we were actually the San Francisco Droid company. We were advised by our lawyers that Lucasfilm is a little eager to to claim that copyright, and so we're like, okay. Let's let's change our name so it became the San Francisco AI factory. But, so many times, like, towards the end of a POC or if someone's sharing, like, good piece of feedback, they'll be like, speak for the droids or these are the droids we're looking for or something like that.
I generally, the developer persona, they really, they really take to take that name. So a while, were like, Yeah, we're gonna have to change the name eventually. But I think it's it might be sticking.
Yeah. But I'm just saying, like, give them give them, like, cool personalities. I feel like that will honestly go, like, kind of like what has happened with Claude and how the model personality piece, people are associating with Claude because of the model personality. Think there's a bunch of efforts like that on the OpenAI models as well. It's been interesting that as the models become more partner thought partners or, like, execution partners, whatever it is, that personality piece matters a lot.
And I feel like you have a lot of degrees of freedom, like, controlling this UI service to be able to sort of land that message to them. I feel like Totally. It
It especially comes up for the Review Droid because a lot of people will have fun being like, oh, I'm getting roasted by the review droid, can it be nicer or can it be meaner? So that's, yeah, that's that's definitely something that we're thinking about.
This is a good, like, content licensing. I'm thinking about, like, old GPSs. Remember how they could like pay extra money to have like I don't know if this was a real one, but like Snoop Dogg as like your TVS thing or something like that. I feel like this is like kind of a similar vibe where like the personas could be really, really interesting. If you get like I the feel like the right, like, funny technical personas, I feel like
you're Get, like, Jeff Dean as your as your droid that's looking over your shoulder.
Yeah. Or the guy the people from Silicon Valley, like, the ones who always do those, like, fake videos. I forgot the Yeah. Some of those people. That'd be that'd be super interesting.
Yeah. I love it. I have a question really quick and then Nolan, sorry, I'll throw to you. Model progress from a coding perspective. Like how much has this Like, obviously that's been like the dominating narrative of the last three months is just how much progress there's been with all the vibe coding stuff, with everything that's happening with text app and just generally models being good at coding.
Has this, to this thread of like you refreshing metrics all day, have you seen like usage of factory because of this? And I assume you're on whatever the latest models are from whichever provider just, like, going up and to
the right because of how good the models have become? Yes. But it's also it's a little bit of a double edged sword. So on one hand, it's great because as the models get better, we kind of get like free performance boost. But on the other hand, as developers change their behavior and become factory users, they also get really used to the given model that they're using.
So a couple months ago, think generally the standard was like Sonic 3.5. That's really been shaken up in the last few months. But everyone kind of got used to it. And they understood how to interact with it to get general, like, just a familiar familiarity that you would have with like a human that you work with. And then recently, with like, Sonnet 3.7, with Gemini 2.5 with o three, it's it's kind of thrown a little bit of a wrench because even the way that these models use tools or the way that they're kind of you can see how, like, they're they're r l a little bit differently.
And it's an interesting question for us, which is how much of that change should we kind of be like shock absorbers for and make it so that users of factory don't need to adjust their behavior? And how much of that change do we actually like transmit to them? Because maybe this is the direction that models are going, and they should like adjust the way that they're interacting with them. This is super not obvious. Think this is a it's it's it's a little subtle, but when you have developers who really do change their behavior, like clockwork, like when we if we change a default model, they'll be like, hey, like, it feels like it's being different.
Like, why is it not as eager to do this thing or that thing? And it's it's super, super interesting.
Do you let developers or users, like, pick the model? Have you just thought about, like, a sort of I feel like it's so easy now from a product perspective to just, like, outsource this problem to their users to your users because, like, it's so many products have that now. Have you ever you thought about doing that?
Yeah. So we have a model selector. So you can pick whichever we do have a default. But that's been in the default again, for a while was 3.5. Everyone got used to it, they would try different models every so often.
Generally, like if they needed more context, they'd switch to Gemini. Sometimes if they were feeling fun, they'd switch to like a one. But generally, you know, that's that's kind of what that looked like. Now, it feels like there's not a consensus among our users of which model they want to use the most. And I think now part of that is because so if we are providing the ability for users to select a model, it's an interesting question of should we make it so that the user can ask base the same can they prompt in the or or converse in the same style regardless of model and expect the same output?
Or should they need to kind of adjust the way that they interact based on the model and, like, force that knowledge on them? I think that's a little difficult. But there you kind of to get the best out of o three versus 3.7 versus Gemini, you kinda do have to interact slightly differently. And it's I think that's something that's that's a pretty top of mind for us. Right now, we've been defaulting to just, like, we'll kind of absorb all of that to make it as unified and experienced for them as possible.
But also, as there are more and more models and more and more versions, and some people want this one because it's cheaper, this one because it's better at reasoning, it kind of the permutations explode a little bit. And so, yeah.
I don't know how much y'all do this type of like technical content, but I would personally, as a user and someone who likes reading interesting things about models, I would be super curious if you have like five examples in practice of like where user behavior really has to start to diverge for the different models. Because I sort of intuitively a 100% agree with you. Obviously the models are trained differently. They're on different data. You know, I would expect them to be different, But I even have a hard time articulating.
Like, intuitively, you would also think they should generally be able to grok similar sorts of so that'd be a cool if you're taking requests for good blog posts, I would love
to read the blog posts. It especially gets complicated with, like, custom tools that you give it because that's when it gets even like you get even further on the boundaries. Yeah, I'd love to love to jump into that.
Question two, how do you think product velocity will change as soon as developers don't have to touch every line of code? Because I think for me, it's hard to comprehend. Like, you have all these developers right now. They have, like, basically, this magic sauce. Like, does that mean corporations are just pumping out different features and different products here and there?
Like, how are you seeing that practice and, like, kinda what is your what is your vision on that moving forward to for both large and smaller companies?
Yeah. That's a great question. I think it raises dramatically what is table stakes for velocity in terms of shipping things, but it also puts so much more weight on good taste. Because now having good taste sometimes requires a lot of work. Now it doesn't.
Now if you have the best ideas and the best taste, you can implement it relatively quickly. Somewhat similar I can't remember who mentioned this to me, but it was I really love the analogy, which was to, like, early websites. Like, way back when people would put in so much effort to make, like, the early two thousands, put so much effort to make a really nice website. Now you can do the fanciest stuff on your website and still look horrible. And so at the end of the day, what are the websites that you remember?
The ones that have great taste, great understanding of like the user that's coming in and how to educate them on whatever journey it is that you want them to go through when they get to your website. And so I think it's somewhat analogous here with product, which is, you know, before there might be a product that took 1,000 developers two years to build, now maybe one person could build it in six months. And so what is table stakes is now like, you know, the amount of engineering work you can get done. So it's kind of all the more important to that, whatever you do end up putting your resources on, you have that core sense of taste. And I think that's actually, it's really good for the world.
Because there are probably so many people that have incredible taste that are kind of just like latent within society who maybe don't have the resources to build based on that taste. And with the way models are going and the tools that are coming out, kind of lowers a lot of those barriers. And so then hopefully, a lot of the software and even eventually, you know, this will obviously come to hardware as well. The software and hardware that we consume will have better taste. And I think that makes me happy.
I think that makes a lot of people happy even if they don't realize it.
Another quick throw off this load. I'm sorry. But I I think, you know, from someone who's building, someone who is a founder, like, I'm curious to get your perspective on this too because it sounds like, obviously, with this, you're able to turn around things a lot quicker. I think there's an emphasis on that feedback loop for different products. Like, hey.
What are our customers saying? What is the community saying? How can we change it? But I think too, especially with so many companies and products coming out now, especially around AI, like, I do think there is, like, a fine balance between, hey. Do we just push something out there?
Let the community give us feedback and then try to go reiterate. Or, again, for me too, like, I look at all these different newsletters or blogs. I'll try out a tool. I'll basically give, like, five minutes. If I really don't like it, I probably will never look at it again.
There's so many different options. So how do you kind of strike that balance between, hey, know, we could be quick and agile as far as trying to get something fixed, but at the same time, we kind of only have one opportunity, maybe two, to try to hook these people in for the long term?
Yeah, that's a that's a really good question. I think that's kind of I, I see that as almost part of the brand. Like as part of your brand, you can have either the hey, we're quick and dirty when we ship something, there could be a little, know, some stuff broken, but you know, you're getting stuff hot off the press, as soon as we kind of get come out, like develop it, I guess. And then there's the brand of like, no, you know, we're kind of gonna really digest things before we put something out. And so I think that's kind of on one hand, it's a branding choice.
I think on the other hand, it's also a choice of the external constraints that you choose to put on yourself. And so this is part of why we have chosen to be kind of enterprise focused is because if you're this space is obviously very new, and it moves really fast. If the the payment cycle that determines the churn that is, like, the thing that's gnawing at the back of your mind that's like, I if I don't come out with this feature, they're gonna switch to this other tool that does very similar things to me. You basically have a monthly cadence by which you can make bets and make product decisions. Right?
Because a lot of these tools, it's like they're super neck and neck, and it's like, well, this person did this, and we're gonna lose half our users if we don't do that in the next payment cycle, so we have to do that. Right? It's kind of a race that we wanted to avoid. And so focusing on the enterprise are kind of, characteristic time period with which we can make bets goes from monthly to yearly. And I think that's especially in a space that moves this quickly, it avoids some pitfalls that might be trendy for one month, but not really relevant for a quarter or for six months or for a year.
Yeah, but it's a difficult question. I think that's not an obvious answer.
That's a very interesting take. No, I think that's makes sense.
I think that's spot on. I think this is one of the biggest like challenges of the and it's my general sentiment. And I haven't been a part of any of the other platform shifts, so I don't know, but like, it's just the like race dynamics where it's like, as the race is like very clearly still early, it's like you are incentivized to not let, you know, from a model perspective, product, whatever the thing is, like, just not let your competitors have something that you don't and it makes it hard. Like I, you know, I feel this pressure in my life of like, it makes it hard to have a long term view of what a great product looks like or what the North Star looks like. So I'm happy that other people feel that pressure as well.
But Don, how do you think about like, and I'm curious actually back to this thread of like product velocity, have you seen, or maybe we're still too early for this, like, have you seen the pace of engineering change or like product development start to put pressure on like other functions inside of organizations where like the product team and the engineering team are moving so fast and it's like, you know, how do we even market this because it's changing so quickly, and we're delivering on the roadmap? So have you seen that, like, start to play out
at all? Yeah. So I think not super strong signal to the extent where it kind of goes into things like product marketing and that sort of thing, but definitely in the relationship between engineering and product for sure. The kind of the table stakes of what is like a PRD has completely changed. The way people think about, like, coming up with mocks or coming up with, like, proofs of concept, that has really, really dramatically changed.
Kind you can't I think gone are the days of, like, you know, a PM coming in with, like, an idea just in, like, some bullet points and some quotes from customers. It's like, go mock it up and show us, and it better be interactive, and we better be able to see and, like, start giving it to actual customers, and let's see what they think. Like, I think the the there's just so much less latency between idea and getting validation, and eventually shipping it to production, that the those cycles compress, I think we haven't seen it hit the other orgs yet. Because of the fact that there's still I think that there's still such a backlog of like ideas within engineering and product. Like, generally, story of engineering and product is we have this many ideas and this many that we can do.
And so maybe we're like, kind of increasing that a little bit. But there's still even though now there are more things you can do, there's still that taste factor, which is another kind of like area of that funnel, which is, okay, just because we want to do it and can do it, does that mean we should do it? And the answer is maybe not always yes. I think it also will take a while for, especially large orgs to get used to it new shipping speed. Because it's a little uncomfortable sometimes to to like, if you're kind of used to going at a certain pace, it's a little uncomfortable to suddenly like, ramp up the accelerator.
Yeah. You get a lot less sleep, unfortunately. Yeah. How, one question that's always interesting to hear folks perspective on, especially as you're sort of so deep building with the models every day, if you could sort of wave a magic wand and all of a sudden the models had a new capability or they were better at like one specific thing, what would that be from your perspective to like unlock either new customers that you can't serve right now or like make the existing process better for your current customers? Is it like model quality at coding, speed, etcetera, etcetera?
What I'm curious what that would be.
Yep. I mean, I think speed, especially like the time to first token is just such a such a nice thing to have fast. Pure like, even if the quality is the same, if you just get that first token faster, it just feels better. It feels like you can kind of move faster. So that's one thing.
Think in terms of performance, longer context is always great, assuming that the context is actually used well. I think sometimes I think this was less so now, but a while ago, there was, like, a lot of hype on kind of just, like, maximizing context windows as much as possible. But the reality is, okay. Great. You can have a lot of context, but are you using it well?
So it's it's a large context that's actually used well, I think is really important. But I do think that will eventually have diminishing returns because I think if you have good enough planning capabilities, you can then kind of subdivide problems enough such that, like, you might not need to have ridiculously large context windows. To future proof this, though, I'm sure we will have ridiculously large context window, like billions and billions of tokens, etcetera. What was it? I think Bill Gates once or was it may maybe it wasn't Bill Gates, but someone someone said something about, like, why would we ever need more than, like, a megabyte of memory or something like that?
I can't remember exactly what that was, but definitely don't wanna don't wanna fall into that trap. But, yeah, I think I think having longer context that is very consistent, and using all of that context is kind of an easy win, in the short term.
Kind of off the the same thread Logan was going on to. I'm curious to always, you know, for people who are building, you know, what were some of the bigger bottlenecks or challenges you face when trying to get it off the ground? And then also off that too, what was like the Oh, shit moment where you're like, this actually is going to get some traction, and we could potentially be successful here, like, kind of both sides of the coin?
Yeah, biggest friction point by far was we're not an IDE. And we've been very opinionated on that for a while. And if you're a startup, and you're coming in and saying, Hey, like, you know, the thing you've been doing for the last twenty years, don't do that, use this instead. It's kind of a big ask, especially for the for the larger enterprise. That's So kind of like a big hurdle that we've gotten really good at because the reality is you need to meet developers where they are and kind of give them inlets into the platform where they can get that win and then realize, oh, wait, maybe I should go use this again for some of these other things.
So importantly, we do play very nice with the IDEs. We can, like, sync to your local files so you can hand off at any point that you'd like. So that was one initial friction point. I think another is actually going back. Originally, the droids were fully in the background.
Like, there was no platform where you would interact with them as it it was instead, you know, you create a ticket to solve some certain bug, and it would just go and submit the PR. Big challenge was, we deploy this to the enterprise, and we found that sometimes it was right. But generally, if it was like a less important task. And for the important ones, it might get 90% of the way there, which is great. You know, we spent so much time making the agentic coding really good.
But if you have a PR that's 90% of the way there, it's kind of a pain because then you just have to go and pull it back down and work on it yourself. Then it's like, why did I even do this in the first place? And so basically, like around the 2024, we realized like, we need to adjust course because even though the engineering leaders like this, the developers themselves do not love what we're doing. They want to have more control. So we made a big bet, kind of shifted our entire focus for, like, three months on building this platform from scratch, which we this was not something that we had done in the entire, like, year and a half that we existed at that time.
And then, really in January, there was basically a there was a company that we were working with that really brought to mind why we needed a platform. And we ended up it was supposed to be a thirty day pilot. It ended up extending to, like, ninety days. And at the end of it, like, I we kept extending. I was like, no.
I promise we're building this new platform. It's gonna make up for the issues that we had. And then on the last week, we had a check-in, and it was like suddenly it finally like, everything clicked. The, you know, the pieces of friction were gone. They had that ability to adjust between that collaboration and delegation.
And I remember almost crying on this call because I was fully expecting, like, they were gonna churn, and it was gonna be you know, we maybe made a mistake doing that whole, you know, shift in focus. And basically, they were like, yeah, look, even if I was the only developer using this tool at our company, I told our CTO that we should still procure because I would rather use this than any other tool I've used. And that was really kind of the the jumping off point for since then we've deployed with dozens of enterprises, and it's been going really well. And I think sometimes like we as humans, or at least me, maybe I don't know if other people are like this, but are really bad at understanding how things, like, will compound over time even from, like, a product perspective. And so a year and a half just, like, trudging through, having some wins, but just not that, like, obsession from developers.
And then it's not like something like, I actually remember this this guy asked if we changed the model that we were using. And we had there were no model releases around then. And that was, like, suddenly, just the the friction and the paper cuts disappear, and it just feels like there's a step function change. And that was, I mean, that the best feeling. Yeah.
That's an awesome story. I love that. Are the last two questions that we'd like to ask everyone who we chat with, First one, what does your personal tech stack look like?
Yeah. So I guess this kind of varies day by day. At Factory, we use all the models, we use all the tools, I think partly because it's kind of our job to be super aware of, of what's going on. So, and really also, have customers that have super diverse stacks. And so for testing purposes, we also need to have Slack, Teams, RingCentral, Google Drive, Notion, OneDrive.
So kind of there, it's just a like, if you look at my toolbar, it's just like a huge mess of a ton of a ton of icons. In, I guess, personally, in terms of the the tool stack that I use, when I'm not spending time on factory, which is rare, I tend to try not use technology period. So like paper books, that sort of thing, I think it's always nice to have a little bit of the, the contrast there. But yeah, I think, that's, that's, that's the overview there. Wasn't sure if you wanted to go particular on a, like,
That's awesome. I think that that's a great like frame of reference and an awesome answer. The last one is what is, you know, we're sitting here in 2025. What is one thing you hope happens in 2025 and what's something actually that you also hope doesn't happen in 2025 and you can take it in whatever direction you want?
Oh man. Let's see. That is a really good question. Do you mind if I take a moment think about that?
Take as long as you need. I've cringed in the past at us asking this question, but we consistently get, not to set you up for a high bar, but we consistently get just like really interesting and like very varied answers. So whatever direction is helpful for you, I'm curious. I
guess the first thing that comes to mind, and I think this is relevant in particular for what we do and a lot of the messaging that other companies in this space have put out, I really hope that in 2025, people understand that developers are going to be more important. And it becomes clear to them, like we kind of dispel this notion that developers are going to disappear and all their jobs are going away. But instead, it's like, no, definitely study computer science, definitely learn how to code, definitely learn how to use these tools. There will be more need for developers. We will have better products, better software that has better taste.
And I just hope everyone else kind of realizes that in 2025. Something that I hope does not happen in 2025. I guess, yeah, maybe the first thing that comes to mind is I hope that The United States does not lose the lead in terms of AI, building an AI. I think something that has become clear over the last two years, let's say, is that obviously we're putting in a lot of money, but I think we're not necessarily diversifying our bets enough. And we've seen, like, some labs that have much more constraint than US labs come up with innovation almost because of that constraint.
And I think making sure that not only do we, like, put the capital necessary to invest in these AI labs, but also into the maybe less trendy bets. Actually, I know this viscerally from theoretical physics, which is what I did before. And in The US, like theoretical physics, and in particular string theory, there was a tendency for there basically, were two there are two professors who are kind of like the heralded, like, everyone wants to do what they're doing. And there's really, really bad, like, herd mentality of, like, whatever they're working on, everyone else wants to work on. And I think it's just important to to diversify the bets a little bit because, obviously, you know, what the two smartest people work on, they're doing it for a reason.
But chances are they're not thinking of everything. I think diversifying the bets, on AI are pretty important.
I love that. That's a great, that is a great place to end, Bhatan. It was wonderful to see you. It was great to catch up. Congrats on the launch.
Hopefully, we'll have you on again a year from now and we'll talk for hours about all the success of of everything that's happening. Yeah. Thank you for spending the time with us.
Thank you so much for having me. Really, really great chatting with you guys. Thanks.
An unfiltered conversation with Matan Grinberg, CEO of Factory AI
Ask me anything about this podcast episode...
Try asking: