| Episode | Status |
|---|---|
| Episode | Status |
|---|---|
From investing through the modern data stack era (DBT, Fivetran, and the analytics explosion) to now investing at the frontier of AI infrastructure and applications at Amplify Partners, Sarah Catanzar...
Okay. We're here with Sarah Carranzaro from Amplify. Welcome.
Thank you.
First time
on the pod. To be here. I'm
Too long.
I know. I know. We've we've known each other for so long. Yeah. Never made an appearance.
It also made the transition from data to AI, I guess. I I don't know if if, I did. I don't know if you were always, like, as deep on on on AI, but, obviously, there's a lot of sympatico.
Yeah. I've always actually kind of oscillated between data and AI. Sure. Like, arguably, I started my career in, quote, unquote, AI. It was just more, like, symbolic systems back then.
But as you said, I think, like, they're they're so symbiotic. Like, it it's almost hard to divorce them. That's actually what brought me into data. I was like, I want to better understand what happens when I write a SQL query. So Yeah.
Let's briefly touch on data because I I think, obviously, that's that's a lot of where you and I first met. Dbt five trend. That was so cool. I mean or Yeah. How do
you how
do you how do think about the end of the modern data stack?
Okay. So so, like, a lot of people look at the, like, dbt five tran, merger and, like, talk about the end of the modern data stack. And I think that is, like, a fundamentally wrong take. Both of these companies were growing, you know, very healthily. Both of these companies And
you've do you funded DBT?
We funded DBT. So so, like, both of the companies were actually, like, beating their revenue targets. I think what you're more seeing is, you know, IPO environment wherein companies are expected to have far more than, you know, like, a 100,000,000 revenue. And so
What would you say the bar is now? 300?
No. Like, above 600. 600. Yeah. Yeah.
And the combined company is 400?
I believe that they'll actually be close to 600. I don't have the exact number.
But they clearly just getting ready for IPO.
So so so, you know, basically, like, the merger was a way to accelerate that path to liquidity. As you might remember
And they were the presumptive winners in their categories anyway. So
Exactly. You know, I think one of the things that has actually pleasantly surprised me, and this speaks to, again, the symbiotic relationship between, you data and AI, many of the big frontier labs are actually using both DBT and Fivetran. I recall talking to folks at, thinking machines, like, within weeks of the company's formation, and DBT was already an important part of their stack. It's certainly, like, training datasets need to be managed. We need insight into what users are doing on these platforms.
And in fact, like, the way in which you would analyze interactions with an agent or analyze interactions with an LLM is even more complicated. And so while I think perhaps, like, the demand for analytics engineers, the demand for data scientists didn't explode in the way that some people thought. Like, analytics engineers are not one third of personnel. That doesn't actually mean that the demand for the tools, is not still, like, very prevalent.
But you go where you wanted. You wanted to democrat democratize things. You you got it.
Yeah. Yeah. I mean, I guess we democrat the we we we democratized things by, perhaps reducing the need for the people. I don't know whether or not that is a good thing. But, honestly, I do think that, like, the fact that it is easier than ever at from a tooling standpoint for people to make data driven decisions is probably a step in the right direction.
And I've become actually convinced that, like, while every company does need analytics engineers and does need data scientists, they probably don't need armies of them. And probably having, like, a moderately sized data and analytics team is a good thing.
Yep. So you touched on an interesting thing I wasn't planning to ask, but this is interesting. As I come from the data field, data was less synonymous with analytics. Yeah. But you're now saying that the d p t five trend are being used for training data.
Is there any notable differences in the workloads or the requirements?
Undoubtedly, there will be. I mean, I think one of the things that we saw with, analytics that, you know, was surprising to some of the people in the data infrastructure space was that, like, the workloads were actually quite predictable. They were quite predictable because, like, many of them were actually not being generated, by humans, but rather by deterministic systems. So, like, a lot of it was, you know, like, BI dashboards that are, you know, Tableau that is actually hitting your database or maybe not Tableau, but, like, Looker or, you know, Hacks or something like that. I think with, like, analyzing, curating, preparing datasets, it's a bit more ad hoc.
And so undoubtedly, it will be, less predictable. I don't know if that really changes the way that we approach developing data infrastructure. You know, I talked like, some people are quite interested in still in, like, things like learned indexes, learned optimizers, and it's a bit easier to build a learned optimizer, if you have more predictable workloads. And so it could change the way that we approach things like that.
Yeah. Data catalogs, do they become more important? Are they transferred?
Oh, man. Like, straight to the gut. So so that was something I got wrong.
I I'm sorry. I don't know the background. What what did you
I I I just I really believed that data catalogs were going to become an important part of, you know, the modern data stack.
And the the players are Atlin. I I the those are the best ones because she's she's Singaporean. So I
Yeah. Yeah. There there was
Data world.
Girl, data world, metaphor within our portfolio.
They've all struggled as a category?
They all have struggled a bit as a category. Many of them have been, you know, acquired subsequently, which suggests that, like, this was not, you know, perhaps a a standalone category. As a data scientist, like, I spent so much time working on data catalogs. And so, you know, I kind of felt like this was like, this was the thing I wanted. Like, I didn't wanna have to, like, build the yeah.
Or to the point also, like, pre training data, you have a lot more heterogenous data all over the place. Yeah. And, like, you you need to keep on top of it, and and you need to make it discoverable, accessible, and all that. So why didn't it work?
So so I think there were a couple of things. I think we have seen some consolidation in, the, you know, modern data stack, particularly around, you know, some of the key components, whether it was, you know, Fivetran or DBT or, you know, Hex or, you know, Snowflake. Many of these products offered kind of, like, data cataloging, capabilities as a feature. And I think for humans, that was good enough. Like, the the data catalog that you had available in Snowflake was good enough.
The data cataloging capabilities available in DBT, like, those were good enough. They did
DBT, like, obviously, as they didn't build the cloud, they were going to build it. Yeah. Yeah. What else do you do?
I mean, it's actually funny. In fact, my colleague, Bar, at Amplify, was, the, like, products lead on the these kind of, like, metadata services. I think it's still not obvious to me, but I think one opportunity that might have existed and or could have been realized was the opportunity to build data catalogs not for humans, but, you know, for, you know, machines. This would look a little bit more like, you know, metadata services.
Mhmm.
I don't just mean for agents, although I think, you know, that opportunity is arising more. But even, like, microservices and things like that. Okay. Yeah. So so I do wonder at times, like, if we built data catalogs for the wrong people, and potentially even, you know, for the wrong use cases.
Like, I think a lot of, data cataloging companies ended up focusing on, like, discoverability when perhaps, like, the real market opportunity was in governance.
Governance, very important. Any other comments just about what you know so far about the data stacks of the large labs? You know, I guess, obviously, a lot of data people might who might be listening would want to sell into them.
Yeah. I mean, a couple of observations. One is that, you know, they are actually paying careful attention to their data stacks. I think they're thinking about, you know, problems ranging from, you know, data discoverability to, data preparation to even things like the efficiency of data loading. Like, if you're unable to load data to a GPU efficiently, then the GPU is going to sit idle, and that's going to be a kind of like
a cost
yeah. Yeah. Exactly.
So But what what solution is tunnels that? I don't actually
I mean, I get to to talk about yes. Exactly. Plug my portfolio company is, we have a portfolio company called Spiral that has developed a file format called Vortex, and they make data loading, like, super efficient.
Specifically to GPUs?
Specifically to GPUs. Okay. Yep. Yeah. Good to know.
One of the things that has surprised me though is actually that, like, so much data infrastructure has actually scaled quite elegantly to meet the AI use case. You would hope. You you would. But, like like, the scale of these AI companies, it's incredible. And so they It's
not as big as ads.
Maybe. Maybe. Yeah. I I I think that could change, either, like, as agents actually become kind of, like, more prevalent or and are interfacing with each other, and therefore, like, perhaps, like, the number of transactions explodes. I have a friend who works on transactional databases at OpenAI, and I was like, so you must be, like, building databases.
Like, this is like a paradigm shift in terms of, like, the scale that, like, databases are, like, going to need to handle. And he's like, no. We use Rockset. Like, it it It's
the one they acquired. Right?
Yes. Exactly. Yeah.
Very cool. Okay. Let's just talk about funding around it because, obviously, that's that's like a a big theme this year. What is comes to mind in terms of looking back at 2025? What stands out?
It was crazy.
Yeah. You can give anonymized examples of, like, what what what does crazy look like?
Yeah. I mean, I think crazy looks like raising upwards of a $100,000,000.
Seed?
Like upwards of a $100,000,000 in a seed round where you have a long term vision, but not a near term road map. Yeah. This is something that I'm seeing happening not just occasionally, but quite frequently.
Yes.
And it definitely makes me anxious because firstly, like, when founders are asking me, you know, how much should I raise? I'm typically saying like
3, like, 5.
Well, like, what do you need to do? Like, what are your milestones for the next, you know, let's call it, like, twelve to twenty four months? What resources do you need in terms of, you know, headcount compute equipment to unlock those milestones, and then, like, maybe add, like, a 20% buffer or something like that. Yes. But doing that analysis requires you to, like, understand what you're going to build in the next zero to, let's call it, like, twenty four months.
But I've talked to some companies, and they're like, we're building a FrontierLab for x. And, like, okay. Cool. Like, I get the long term vision. There is an opportunity to, you know, make AI more secure, make AI more humane, make AI more data efficient, whatever it might be.
So so, like, I'm bought into the long term vision, and that that, you know, for me as an investor is super important. Like so let's talk about, like, what your team's going to work on in the next six months. They're like, maybe we might build a consumer app. Like, you know, we're diff I like
I know exactly the company you're talking about.
But but but, like, I wish I was talking about, like, one specific company. I'm actually talking about, like, several companies. Oh, no. And look, like, I'd be a hypocrite to say that, like, I've never done investments like that. But I've done investments like that when, like, I really know the people, and I'm like, they're gonna figure it out.
What is frightening about this funding environment is that you meet a founder. They're like, I'm raising, you know, a $100,000,000. I'm raising, like, a billion dollars maybe at times. And you need to make a decision in seven days, and I can't tell you what I'm gonna do for the next six months.
Yep.
And so, like, you have no way of even gaining conviction that they're going to figure it out
Yep.
Because you only have, like, seven days to get to know them. I think what some of the founders are missing is, like, you only have seven days to get to know me. If you haven't figured it out, like, you probably want a partner who's going to be working closely with you to help you figure
it out. Mean, they're absolutely viewing it as transactional. Right?
Like,
you know, they don't care.
No. They care about, you know, the most money at the highest valuation. I mean, the crazy thing is that they don't even seem to care about dilution. It's just, like, the most money at the highest valuation.
Yeah. And and but, you know, it does send a signal that helps.
So so, I mean, I yes. I think it does right now send a signal.
Look, I'll I'll tell you how how it it affects me, and I hate it. I hate it. Alright? Antithesis came out of stealth this week. Right?
And the the the it's like the the only thing I know about them is they do something something in AI testing, and Jane Street led a seed round of a $100,000,000.
Wait. They invested it in it too. I can tell you what they do, but they they do the permenastic simulation testing.
The the thing that is the the lead is the money.
Yeah.
And then, okay. Well, who else uses it other than GeneStreet? Like, what do you do that's innovative? Palantir. Okay.
Warp.
Warp stream. So yeah. So But yeah. Okay.
Anyway, so so maybe maybe and and so this is a bad example because they're actually legit, but, like, you know, there's there's a lot of similar examples where they just leave with the money, and, like, there's no no much substantiation behind it. Maybe it's just bad storytelling, and that's why I, as a podcaster, get to talk to the I just talk to general general intuition. And, like, once you spend some time with them, then you're like, oh, okay. This is why they raised a $100,000,000. But, like, without that context, it's, like, really hard to understand anything.
Well and and and, like, I think there are some companies that are raising, you know, a $100,000,000 or more because they need it. Like, a good example might be, like, periodic. In addition to, you know Wet lab. Yeah. Yeah.
They need to build out a wet lab and, like, designing a wet lab that can support high throughput biology, which is absolutely critical to, you know, their goals, that's costly. So so so, like, I understand why they need that that funding. But again, there are others where, like, they don't have these near term milestones. I think the thing that is a little bit you know, perturbing to me, many of them are doing it because it makes it easier for them to hire. Because, you know, there are all of these candidates who, like, want to be want to work at a company that is, like, a unicorn or a near unicorn.
They're pitching
because the alternative is work at a big lab where, you know, it's yeah. The prestige and the money is there.
Yeah. Well or the alternative is, like, work at, like, an early stage startup. But but but but but, like, there's something about, like, the big valuation that becomes enticing.
Yeah.
They're also kind of, pitching candidates. They they have a compelling equity pitch where they're like, okay. Maybe you're getting, you know, less than, zero point, like, 1% of the company. But, like, given the valuation, the value of your equity is already, you know, like, $10,000,000 or something like that.
And and they also guarantee the dollar value on the equity.
You you you mean that, like, they'll offer them a loan to to pay?
A buyback if if if it goes Yeah. If you wanna sell it.
Yeah. But but but but but
Because they have so much cash. Like
But the thing though is that, like, the valuation is a made up number. Like, valuation, until a company exits, it is an entirely made up number. So, like, I could just be like, you know what? The latent space pod, that is worth $5,000,000,000. And we could agree.
Like, we like, I as an investor could say, like, that is the price. And now now the company is worth $5,000,000,000. Like, do you think that, like, if you were to
to Yeah. It's not it's not real. It's not it's not actual It's real. Transacted at any volume.
And given the the funding amounts that they're raising too, like, if they spend that and they, you know, get acquired for less than, that amount
Yeah.
Then, like, their teams are getting nothing. I wish people were kind of, like, more sensitive to this dynamic and thinking more about, like, what is the upside associated with the company. And, you know, more fundamentally, like, do I deeply believe in this vision? Because I think, like, joining companies because, like, they have a billion dollar evaluation, it's just it's not the right way to choose a job.
I hear you. Okay. So there obviously, we can go about that forever.
Oh,
yeah. There's and there's there's a lot of there's also some stuff with, like, cyclical funding and all that stuff. But I I I do wanna be more relevant to engineers and researchers. Yeah. What is what are the the themes that are that are really strong?
Right? So one one thing I'll point out is world world models
Oh, yeah.
Just in general are a really strong bet. I would say so I have a every NeurIPS, I go to this, like, group of researchers and we take a vote on the top themes of the year. Everyone's extremely skeptical about world models. I think it's a trailing indicator because LLMs have been so enormously successful. You're like, don't need anything else.
I don't know if you ever take on world models or any other top theme of the year.
My, like, take on world models is that, like, we have not yet defined, like, what a world model is.
Oh, yeah. There's, three definitions right now.
Yeah. I think there's a lot of confusion about, like, what a world model is and therefore, you know, what it should be used for. We're already seeing, you know, plenty of, like, market potential for video models, including for things as, like, perhaps, like, Benola's, like, video editing. I think, you know, we're already seeing some applications of world models to things like autonomous driving and potentially even coding. But, again, it really hinges upon, like, how are you defining world models?
And I think one challenge that people have seen is that, like, world models perhaps designed for, one specific use case might not generalize to others. So as an example of this, like, world models for, like, video game generation might not, like, generalize to, like, factory settings or or
Yeah.
Robotics. I use the word might, like, strategically because I think, like, it is potentially a research problem that might be figured out.
Yeah. So Yeah. That's part of the Genoentition podcast that we did.
Yeah.
So that they had some evidence.
Yeah. Yeah. I think, like, it is possible. It's just we're not there yet today.
Yeah.
A theme that I've been spending a lot of time thinking about is memory management and continual learning. I work with a lot of companies.
Same startup now. Was thinking about.
Okay. The the the I I I think I know what startup you're you're thinking about as as well. But I actually like, I see I see, like, a lot of market potential for memory management and continual learning. My interest in this is actually more driven by conversations with, practitioners. Personalization is so important right now.
I think what we're seeing is that, like, a lot of AI application companies, they're growing really quickly, but they suffer from, you know, relatively low retention, relatively high churn. So, you know, if you're developing an app like Cursor, how do you ensure that your users don't, you know, switch over to, you know, Windsurf? Yes. Or, you know, Cloud Code or Cognition or or, you know, whatever else, when they release new features.
Yeah. Cursor rules isn't enough. Right? Like, it's it's like the shittiest form of memory. Yeah.
But and and and, you know and it's great. But, yeah, I I I agree with that, but also it's like as a I I've I've publicly mused about this before where, like, memorization memory is very poorly implemented today in a lot of surfaces. Like, even ChatGPT, I wouldn't say, like, people are particularly excited about, okay. Alright. Yeah.
Yeah. You feel stronger about it than I do.
Yeah. Yeah. I mean, I I I wish ChatGPT had, you know, much better
Yeah. It's like since this is most been the leading one. Mhmm. I don't know. So I and then I think, like, just in general, it makes product management harder because what is the product?
It's a combination of u plus memory. And, like, when you have a bug, is it the memory or is it something core? And and and that's as a user, especially if it's consumer, it's there's gonna be zero patience for any of this.
I agree. But that said, like, consumers seem to be, like, tolerating products with, like, no implementation of memory today. So I think, like Early adulterer. Better is still probably better than, like, what what what exists now. Better is better than nothing, I guess.
Would you agree with the statements that, basically, let's say, a key theme of 2026 is this personalization? I would call it kind of like the consumerization of, AI, in the same way that consumerization of of enterprise was a trend, like, ten years ago.
Yeah. I mean, I think that is a good way to putting it too. Like, I don't, for for for what it's worth, think, like, this is just a, like, consumer or a prosumer phenomena. If you are an enterprise that is adopting again, like, Deven or Augment or something like that
Yeah.
You probably also want your models to kind of, like, learn the, So like I cannot learn yeah.
Like, you start to like, k factor, I had to explain what that is to so many founders. And, you know, like, this this like, if you're in normal SaaS, this is what you obsess over. And to AI founders, they're like, what do mean growth does doesn't just show up? Like
Yeah. Yeah. Mean, it has though. But but I think, like, it has because, for a while, you know, AI has just felt magical. But, like, now we're getting more accustomed to the magic, and it's no longer enough.
And I think, you know, we need to revert to some of the, like, old tips and tricks for retaining people and, you know, bringing bringing them in. Personalization is one of them. I always kind of intermingle, like, memory and continual learning because I think, like, one interesting element of personalization is not just learning facts about your or your preferences, but, like, actually learning new skills from interactions with you and, you know, learning as the world changes. Like, there are new versions of, languages and frameworks and, you know, other repos that are coming out all the time. The world is changing all the time.
Human intelligence is incredibly dynamic, and yet, like, artificial intelligence is just so static today. Yeah. But, like
So it must update weights Yeah. For you.
But but but that also means that, like, it's an interesting kind of, like, systems problem because, like, if you must update weights, then, like, you know, weights become stateful. And today, like, inference is not stateful. So so, you know, I think I think there is going to be, like, a lot of kind of fun gnarly problems to figure out as we figure out things like personalization and continual learning.
That's also a fascinating infrastructure problem because you have to load and unload and, you know, cache and all the all the all the good stuff. Yeah. Exactly. One more thing. I think we have time for one more take on RL environments.
Huge topic. Is it just a Docker container with some custom software loaded and logging stuff out? What are the good ones like, and what what are the average ones like?
So I know I'm going on record on this, and it's like, I'm actually okay to be wrong, but I think our all environments is just a fad. Oh my god. Oh, no.
They're all they're all fake? I mean, like, would I mean, people are like okay. The the thing that I makes me take it seriously. The labs, I know, are paying $7.08 figures for our own environments, for other like, and they could build it in house. They're not.
And I don't understand why.
I mean, they were paying 7 to 8 figures for, like, piss poor data annotation too.
Yeah.
So, like and then data labeling before, like, the labs have a lot of money. I think perhaps, like, RL environments could create some value in the short term. But I think to to the point about, like, what makes a good RL environment, what makes a bad RL environment, I think the best RL environment is is, you know, the the real world. Why would I, you know, want to, buy a DoorDash clone when, like, I can just use logs and traces from, you know, DoorDash itself. It doesn't mean that we don't need to spend
parallel Yeah. Yeah.
I mean, I think, like, using the real world, using real apps as, like, r l r l environment is in fact, like, the best thing. And this is what Cursor does. Like, they actually do use, you know, real user activity on their platform to significantly, like, improve both their coding agents as well as tab. And I think that's one of the the approaches that has, like, made the platform so compelling. It doesn't like, you still need to figure out, like, the right rubrics.
You still need to figure out, like, the right set of tasks. So there are some aspects of aural environment design, you know, at least as we're talking about it today, that I think are going to remain incredibly relevant. But, like, just building a clone of an app, I think, is not that useful. Yeah.
Yeah. Okay. Yeah. That's that is all take. We have maybe three minutes for any other stuff that you think about just the state of start startups in general, state of funding.
Yeah. I I I so so maybe I can talk about, like, just the archetype startup that is, like, most exciting to me. Yes. I for startups. Yeah.
Yeah. I love investing in, you know, infra tools, platforms, etcetera. And as we talked about with continual learning, I think, like, there will be opportunities for, like, new tools, platforms, and infra in the future. I've spent a lot of time thinking about, like, applications today. Yeah.
And specifically, like, the relationship between research and applications. An example of this is, like, I think there were a lot of advances in Rag. And the biggest beneficiaries of these advances were the application companies for whom, you know, retrieval was a critical unlock. So as an example of this, you know, like, Harvey, Hebiah
I knew you were gonna say Harvey.
Yeah. I mean, they they they have, like, really interesting rag implementations. They have hired researchers, like, really good researchers to kind of advance the state of the art, and that enables them to build a better product. I feel this way very much about, like, rule following and customer support. Rule following is like a hard research problem.
But if you solve rule following, then you unlock, you know, better customer support. And I think, you know, a lot of, Sierra's success can be attributed to, like, their focus on this. So I've been thinking about, like, even for something like continual learning or memory, what is, like, the killer use case where you can either offer a dramatically better experience by having a good memory implementation, or you can do something that was just not possible today? I think you can also think about this in the inverse. Like, and often, you're the best companies emerge in this way.
They're like, I'm trying to do this thing, but in order to actually do it, I need to solve this hard technical problem. That that that's kind of like the story of runway. I don't think they would have built models if they didn't have to. But I love that that combination of like, we're delivering something that is like better for consumers, better for prosumers, better for users, but we're doing so by solving the these, like, really gnarly research and engineering problems.
Yeah. I I don't wanna, yeah. Got it. There's so there's so much that that I wanna sort of dig into there, but we are short on time. Thank you just thank you in general.
I don't know if I don't know if you have, like, general call to startups for, like, a page somewhere that you wanna point people to.
Twitter. Fax. Whatever it's called. Yep. Yeah.
You can find You can find me there. Yeah. Or in South Park with the one eyed dog. I'm easy to spot.
Oh, okay. Yeah. Well, thank you so much for your time. I know you gotta go, but I appreciate it.
Of course. It was great seeing you, and thanks for having me. Yeah.
Thanks.
[Latent Space LIVE @ NeurIPS] State of AI Startups 2025 — with Sarah Catanzaro, Amplify Partners
Ask me anything about this podcast episode...
Try asking: