Data Talks on the Rocks - NYC
Michael Driscoll: 00:00:04.888 I'm Mike Driscoll. I'm the founder of Rill Data. We are trying to build lakehouse native analytics on top of things like object stores. And we're primarily working in AdTech today, but we're trying to sell to many verticals. So on my left to start is Edo Liberty. Edo is the founder and CEO of Pinecone. Pinecone, well, I'll have you say a few words about that. I'm going to make personal notes, and then I'll let you both say the-- I'll let you all say the boring stuff about your actual businesses. Maybe not even so boring. So how I know Edo is he's actually a surfer, formerly a PhD researcher who worked at Amazon on things like SageMaker. And I was introduced to Edo when I was looking for a world expert in, I believe they're called-- are they approximate stochastic streaming algorithms? Is that approximately right?
Edo Liberty: 00:00:59.308 That's pretty good.
Michael Driscoll: 00:01:00.136 OK. And Edo pitched me on Pinecone many years ago, and I walked out of that lunch and the guy I was with, who is a CSMIT grad, turned to me and said, "I have no idea what that guy was talking about." [laughter]
Edo Liberty: 00:01:14.083 That was a very common outcome of most of my meetings for about two years.
Michael Driscoll: 00:01:17.843 So turns out he did really-- we didn't know what he was talking about, but Edo did and it's been working out well. I'll say a few more words. Next, we've got Erik Bernhardsson. Erik, hopefully I haven't butchered your name.
Erik Bernhardsson: 00:01:28.505 Any pronunciation.
Michael Driscoll: 00:01:34.151 Erik, I've known Erik online. I've been a follower of Erik's blog and his banger tweets or-- his zeets, I guess, for quite a while. And he's written a lot about cloud and what's going on in the world of cloud. Previously was at Spotify running an engineering team there, then CTO at Better. Now the founder and CEO of Modal Labs. Super hot, cool startup here doing cool stuff, which we'll say a few more words about. And then finally, Katrin Ribant, who I've known for, I think, 15 years. That's how old we are. We met in New York City when she was at her--
Katrin Ribant: 00:02:07.747 That's how old you are.
Michael Driscoll: 00:02:08.905 How old I am. Yeah. Sorry. [laughter] Gray hair. We met in New York City 15 years ago when I was working with Greenplum back in the day. She was running the analytics team at Havas Media. Went on to found Datorama, exited that to Salesforce for almost a unicorn amount. Very close to unicorn amount, but it was cash, so it was real money [laughter] instead of the fake unicorn valuations. And now she's the founder of Ask-Y. And we're going to ask her why she's starting that company, what it's about. So without further ado, I think I am going to have to, then, start with the first question. We're going to go from-- we're going to start with Katrin and go back this way, which is we'll start with that. Katrin, tell us, what is Ask-Y in the context of data, and AI, and analytics, and the mad landscape that has hundreds of logos on it? Why are you bold enough to start yet another analytic startup?
Katrin Ribant: 00:03:08.092 Well, you are partially to blame for that. Because when we talked a couple of years ago, two, three years ago, and I was kind of contemplating what to do with my life, you--
Michael Driscoll: 00:03:20.133 Kite surfing in Greece is what you were doing at that time, I think.
Katrin Ribant: 00:03:22.758 True, quite a bit of it. And you know it gets boring. Yeah, it does. I didn't believe that either. But there's a point where you need some--you need to do-- at least I need to do something. And I was like contemplating what-- sorry?
Edo Liberty: 00:03:39.370 Harder tricks.
Katrin Ribant: 00:03:40.465 Yeah, not enough.
Katrin Ribant: 00:03:43.502 There's an age, Mike's age, for example, [laughter] where you have to slow down with those things. So recognize the fact that some people are meant to build things, and that's what we need to do. And some people amongst those are chasers of white whales, aren't they? And my white whale is ETL for non-technical users. That's really what I'm passionate about. That's what I need to solve. I will fail at it my entire life. And that's okay. I will pursue it my entire life. And that's why Ask-Y is, from a personal point of view, that's why Ask-Y is there. From a business point of view, that landscape, or any other landscape of solutions, is super noisy, obviously. However, I have this mad belief that I can still do something for those business users who have problems that they can't solve, data that they can't harness. That just doesn't come fast enough, doesn't come in the shape that it needs to come to make decisions. It just isn't there. And I mean, we are living amazing times where you can do things with AI today that you really could not do a few years ago. And I would not pass on that.
Michael Driscoll: 00:05:00.744 Okay. Great answer. I'm going to follow up with a question to that. But I'm going to go to Erik and ask-- I'm going to add a layer, and I'm going to come back to Katrin on this, which is tell us what Modal is up to. It's raison d'etre. But most importantly, I want to know what is your entry strategy into this vast logo landscape of data and analytics and infrastructure tools that Matt Turk and others have put us all into, yeah, as captives.
Erik Bernhardsson: 00:05:32.403 I always used to use this landscape as one of the first slides when I raised money. Because the point was like, there's so many fucking tools. But data teams are still struggling with basic stuff, like pushing things to the cloud, and scaling things, and scheduling things. So as I was thinking about building a new startup or building something, and my experience working with data teams and leading data teams, basically what I realized was that, what if I go down at the deepest layer and focus on that and build something that lets people deploy things on the cloud, scale things out, and do that with a great developer experience that makes people iterate quickly. And then we spent a couple of years building that, and no one wanted to buy it. And realized, what do we do? And then all this gen AI stuff started happening. And realized, for various reasons, we're actually really well-positioned because we had a lot of serverless primitives. We had GPU support. And that really started taking off. So that's now been really pulling us, which is super exciting. And I think there's so many different applications. And now we're starting to go back to our original vision, which was we wanted to almost build like a Kubernetes for data teams. But the biggest use case for us is various types of gen AI models, stable diffusion, controlling it, hosting those models, and scaling it out, and dealing with all the infrastructure. But also starting to see some users doing biotech or 3D rendering. And what Modal does is really just hosting all those things in the cloud, and in a way where you don't have to think about infrastructure.
Michael Driscoll: 00:06:58.427 Okay. That's great. I've got some hard-hitting follow-ups for you as well. But I'm going to just start with the hard-hitting question for Edo right away.
Edo Liberty: 00:07:06.804 Kite surfing?
Michael Driscoll: 00:07:07.612 Kite surfing, Yes. Tell us more. How do you make kite surfing work? The question is, I think most of us probably here know of Pinecone. So I'm just going to skip to the question. You can say a bit more about Pinecone. I know you now have thousands of logos using Pinecone - I hope I'm not giving away any confidential information there - but thousands of folks who are using Pinecone. The question is, in the context of the big three clouds out there, in the context of Databricks, and the context of Snowflake, what is to stop-- how do you build a moat that keeps one of these other big players from-- and we just saw Timescale announced just today or yesterday that they built an embedded vector database. What is the moat for Pinecone? How do you maintain your lead ahead of some of these other behemoths or upstarts? You're welcome to say a few more words about Pinecone.
Edo Liberty: 00:08:04.775 Sure. Pinecone is a vector database. All the objects that come out of large language models and computer vision models are in this numeric form called a vector. And they need a specialized piece of infrastructure that, by the way, Erik built one of those inside Spotify as one of his-- in early days.
Erik Bernhardsson: 00:08:21.555 Annoy. Yeah.
Michael Driscoll: 00:08:22.809 Beautifully named open-source project.
Edo Liberty: 00:08:25.531 By the way, this is a scoop for you guys. This is very early days of Pinecone, Erik was already, I think, you're already contemplating moving, or you already left or something. And I pinged you at some point. I don't know what you were up to. And I thought maybe I could rope you into Pinecone in something. And you answered that you were starting a new thing. And then you reassured me, "Don't worry, it's not a vector database." [laughter]
Erik Bernhardsson: 00:08:54.789 So far.
Edo Liberty: 00:08:55.349 So far. All right. Whoo.
Michael Driscoll: 00:08:57.448 Not yet. [laughter]
Edo Liberty: 00:09:00.111 But anyway, just to show how deep this thing is, and Erik can attest to that, that functionality was very common at the hyperscalers. But just didn't really breach the surface for the common developer. And we just saw that and figured, "Hey, two things are going to hold true. First of all, this is going to become a lot more common with a lot more data, a lot more models. People are going to want to do more with the complex data. And the second thing, this is not going to get easier. People are going to spend less. They're going to want to do a bigger scale. And we have to really-- we really have to make some fundamental progress here." Right? And so we have a whole team of PhDs and principal engineers and it's high-performance computing folks and so on, just squeezing the lemon more, and more, and more, and more. And that's what we've done for four years. And so I worked at AWS. Right? And we've all worked in big companies. There's just so much you can solve with headcount. I mean, Amazon would want to go after us. Yeah, they'd put 50 headcounts on it. But okay, what's that going to do? I worked on launching SageMaker. We had more than 50 headcounts. It was still not a good product for a long time. Still not. You can double up on that. I mean, I'm sure you agree. [laughter]
Michael Driscoll: 00:10:48.159 So okay.
Edo Liberty: 00:10:49.254 Maybe that's the reason why. But anyway, so that's my long answer, which is again, this is a group of enthusiasts and developers and engineers, and people who really care about this thing. I get excited about the fact that this problem is hard, and I get excited about doing the hard work. And building databases is hard. You have to be passionate about the bits, and the bytes, and the millisecond that you lose here, and the millisecond that you lose there. That takes a long time to build.
Michael Driscoll: 00:11:20.215 Right. Okay. Katrin, we're going to go to talking about open source and open versus closed business models. I think, ironically, as the moderator, I'm a huge fan of open-source software. And the last couple of companies I've been a part of have both used and created open-source tools. But I think it's fair to say that all three of you have closed-sourced products. And you've been quite successful with at least that approach. I guess I would ask you, Katrin, as you both thinking about Datorama, looking back at kind of distribution models, thinking about open source, and open versus closed standards and formats, how have you thought about that decision with your new company? How did you think about it in the past with Datorama? How do you intersect? Because that's such a big part of the data community is open source and open standards and protocols.
Katrin Ribant: 00:12:23.403 It is. I build applications, applications for business users. Right? And basically, applications that join between the data teams and the business teams. There's a part that is generally used by the data teams, at least in setup. And then obviously there's a part that is used by the business teams. And there's a whole join-in between the two within the application. Primarily, if you sell to the business, open source is probably not your winning strategy because they typically don't really care. So if you're going to decide that you want to have a sales motion that specifically goes into the data teams and you sort of take part of your application and you say, "Well, this is something that I think would benefit to the data community and from the data community, I think it makes sense." However, I think it is something that is very hard to do post hoc. You do really need to have that idea from the onset. I didn't have the luxury of that when we built Datorama. I am thinking about it now because if you do something that's using generative AI for ETL, there is something to be said about a place for open source, a very defined place for open source for that. Haven't decided yet, though.
Michael Driscoll: 00:13:51.048 I'm going to have a two-part question for Erik, and maybe, Edo, a similar one for you, but more maybe for Erik. Which is, as someone who has created open-source tools like Luigi and A-N-N-O-Y, and someone who uses open-source tools internally and is benefiting from some of the proliferation of open models, open foundation models, which is driving probably some of the consumption of Modal's service. Is it hypocritical to be a beneficiary of all the open-source work, and yet, Modal itself is not actually contributing code-back directly, or making your own service infrastructure open source? So question to you is how did you make that decision to keep Modal closed source? And how do you, I guess, reckon with engineers who say, isn't this just the right thing to do, Erik, to open source some of our work?
Erik Bernhardsson: 00:14:51.567 As maintainer of Luigi and Annoy, maintaining those projects for a long time, I don't think I was actually feeling money was the end goal, or like I am entitled to the profits from the companies using them. To me, I was just excited that people were using my product and that they were contributing back. And I had a use case internally. And I open sourced Luigi and Annoy and a few other things. And then other people started contributing back. And I was like, "Great. They're fixing some bugs that I would have otherwise spent time fixing. Why not?" So I don't know. I don't think it's necessarily hypocritical. In terms of why Modal is not open source, I don't know. 10 years ago, I think open source made a lot more sense. I think there's definitely been a shift in how we deploy software, where 10 years ago, you ran your own custom environments and your own hardware or whatever. And that was the only deployment model that existed. It was you take some software you find and deploy it into your own environment. But I don't think that's the case today. And I think there's been trailblazers, like Snowflake or whatever, proving that you can be an infrastructure vendor selling in the cloud in their own environment, running things. And so for that reason, I do feel like there's less of a strong push, or open source as a strategy makes less sense today. And part of the reason why at Modal we never thought about it was because of those reasons. I think there's so much more of a precedence for people being cool with running their code in someone else's environment, or using a multi-tenant system in the cloud, and using an infrastructure as a service provider. That doesn't mean open source is dead. I think there's a lot of other places where open source makes a lot of sense, especially for more middleware-type applications. But it's always a harder business model. It's always fundamentally a much harder business model.
Michael Driscoll: 00:16:46.761 So okay, I'm going to let you comment on Erik's statement around open source because I know you clearly-- I remember talking about-- I remember about this with you in the early days. One of your board members is also an investor in Snowflake, so clearly influenced the decision to stay closed source. I want to let you comment on that decision for Pinecone, but then I want to ask your opinion around these open versus closed foundation models. And as someone who's obviously watching this space, how do you see it playing out? Who are the beneficiaries? Who are the winners and losers of this open versus closed foundation model, I guess, debate? But first, maybe a comment on the open-source business model and why you chose not to embrace it.
Edo Liberty: 00:17:31.801 So for me, it was a kind of fundamental question of what do developers want? What's the best thing we can do? Right? They want something that's simple, that's easy to use, that's free, that's accessible, that's cool. There's a lot of objectives that you want to hit. And the question is, what is the right vehicle? And I 100% agree with Erik that 10, 15, 20 years ago, open source was the only way to get there. Because there was really no other way to distribute your software in a way that it would actually get to the hands of a developer. Luckily for us, when we started Pinecone about four years ago, four and a half years ago, clouds, managed services, and so on were common enough as a consumption model that that ended up, in our opinion, being the better option. So you don't have to provision. You don't have to install anything. You don't have to figure out how to set up the Kubernetes environments and so on. Everything is running for you. Everything is ready to go.
Edo Liberty: 00:18:44.144 And at the same time, we decided, "Hey, because this is the objective, we have a very generous free tier that's just like free forever." We're not like bait and switching you on like, "Oh, use it for two weeks." And I'm like, "It's free. Run with it." And it's pretty generous. We have thousands of applications running on the free tier in production. Even though we tell them, "Don't go to production on the free tier. There's no support SLAs. We can't afford it." But they still do it. Great. I'm very happy with that. I mean, and we publish papers. We tell people about what we build. We're not secretive or defensive or not generous with our IP, or even literally our money. I mean, we run the free tier ourselves, and we actually pay for it. Right? So for me, the question was, how do you give-- how do you accelerate the developer experience, accelerate developers, and give them the best experience and the best value? And when we did the math, we figured open source is just not the right model for us. I mean, it's not going to help.
Michael Driscoll: 00:19:53.478 Okay. But the second question, then who are, in your view, winners and losers of this open versus closed foundation models kind of race, I would say, or controversy?
Edo Liberty: 00:20:04.838 I don't know how it's going to shake up. My two cents is it's not going to be decided by what's open and what's closed. I mean, that would be something they put on the marketing materials. Right? But if OpenAI's model was open source, would you use it more, or less, or just the same? Probably just the same. Or vice versa. Right? Have you actually looked at any of the code that generated the open-source ones? No. I mean, you just know the open source. I mean, how does that even help you? Right? So it's, in my opinion, not a huge issue. And the vast, vast, vast majority of companies that use open-source software never contribute a single character to the code base. And many of them don't even look at it. They just know it's open source, and so they're happy. So I mean, for me, again, I don't know how it's going to shake up, but I'm willing to bet that the openness versus closeness is not going to be the deciding factor.
Michael Driscoll: 00:21:09.895 I'm going to go back to Erik. What is your thought on that?
Erik Bernhardsson: 00:21:13.143 Maybe I have a different opinion. I actually think, despite what I said about open source maybe being less of a business model, I think actually I'm more of a fan of open source in this space, because it's more about open science. And I just think back at machine learning and AI.
Michael Driscoll: 00:21:25.437 Open science?
Erik Bernhardsson: 00:21:26.191 Open science. If I just think back about research for the last two, three decades in AI or whatever, it's always been driven. The latest advances has always been in Archive and GitHub. And I struggle to see how OpenAI is going to be able to maintain their lead which is, right now, mostly just that they have experience running things at enormous scale. But I don't know. I think the cutting-edge models are going to change, and there's going to be less and less compute needed to train them. So I think that could shift the balance towards open source, again, in that space, or at least undermine the pricing power that OpenAI has. It might turn into a commodity more where there's less pricing power for them. So I don't know. That's my bet, but I could be wrong.
Michael Driscoll: 00:22:14.013 Okay. So believe it or not, my last question. And I'm going to open up for questions from the audience, but it's going to be a question for all three of you. Which is we talked, both Erik and Edo, you're building businesses that provide infrastructure for other folks that are building applications and products that might incorporate AI into them. Katrin, you're building a company that ostensibly is starting today that could actually leverage some of the infrastructure tools that Model and Pinecone are creating.
Katrin Ribant: 00:22:42.962 I know. We should talk.
Erik Bernhardsson: 00:22:43.524 You should use Modal.
Michael Driscoll: 00:22:44.909 Hopefully, after tonight, hopefully it'll be your next credit card swipe online. But I guess the question to all three of you, but I'll start with Katrin is, how are you thinking about incorporating AI and LLMs into your products themselves? How is it changing the way-- I guess, two flavors of that question. One is the way that you build your products, the way that your software engineers and your product managers work, do their daily work. Second, how does it manifest in the product itself?
Katrin Ribant: 00:23:15.924 Should I add to that the way you chase VC money with it?
Michael Driscoll: 00:23:19.832 Yes, and the way you chase VC money. For sure. It starts there.
Katrin Ribant: 00:23:24.069 I really do think that it is a part of it. Right? Because times are tough currently, right, and if you don't have AI in your pitch, we all know it's not going to make things easier for you. So there is that aspect to be kept in mind. And I think there is a sort of pragmatic compromise to be kept between where you need to have AI for reasons that aren't necessarily concretely the best for your product or your users, and those other reasons for which you need to have AI. This said, I'll just talk about the use case I'm going after. I think that first of all, it's extremely tempting to put generative AI everywhere because it is just so much fun. It's amazing. It's fantastic. It's a revolution. I feel young again. It's great. And then once you start thinking a little bit about it and where it really is useful versus whether other types of technology are more useful because they are more specialized-- and for example, using AI for that is like, I don't know what the expression is in English, trying to kill a mouse with an elephant. And also not being very efficient at it because you're kind of just stomping and thinking like, "What is the next place where the mouse would be?" It's just really not working very well.
Katrin Ribant: 00:24:53.654 I think AI is very good for disambiguation. If you have a sort of a workflow in which you need to have inputs and those inputs are going to be ambiguous, and they're going to be incomplete, however you have context enough that you can sort of complete that, that, I think, is where AI is extremely useful. It is, however, very limited in its ability to take volumes of data, etc. It's really very limited. It's expensive. And it's really still quite clumsy to put that into an application, like a module with other systems. So doable will be done, but I think that's where I'm thinking of it.
Michael Driscoll: 00:25:43.343 Regular expression generation seems like-- I never want to be able to have to look up like a RegEx again and maybe have AI help that with that.
Katrin Ribant: 00:25:53.134 That or a simple example, like what fields do you want? This, this, that. I know what analysis you're going to do. I know you will also need all of these other fields that you're not thinking about. Because for you, yeah, obviously you're going to need those. Just those sorts of things that look very small, but that work with ambiguity and with uncertainty. But because of the context, you can actually disambiguate that and help, enormously help, in a flow.
Michael Driscoll: 00:26:23.396 Right. Erik, same question to you. How is Modal, how is maybe today, is it changing the way you develop software? And how might you or might not incorporate actual AI into Modal's products?
Erik Bernhardsson: 00:26:39.804 There's a few places where maybe we'll use machine learning to optimize scheduling or things like that. I don't know. I kind of don't like the question in a way. Because honestly, I'm extremely bullish on AI long-term. And I spent most of my career working on machine learning, whatever, building different things. But I think there's a lot of people out there that are like, how can we sprinkle magic AI dust over this thing? And there's a lot of--
Michael Driscoll: 00:27:07.002 Are those people named VCs? [laughter]
Erik Bernhardsson: 00:27:08.733 Well, not just VCs. I think, actually, a lot of more so just regular stock analysts or whatever going to a shareholder meeting and saying, "What's your AI strategy?" And at the end, of course, if you're a big company today, you have to come up with an AI strategy. And you have to spend money. You have to show you invested $100 million in an AI strategy. And I kind of think that's going to distort a lot of stuff in the short to medium term. There's going to be a lot of shitty chatbots again that I don't want. And then there's going to be a pullback where they stop spending money. Ultimately, I'm very bullish on this stuff. It's too much of a tool looking for a problem right now than it is, actual, like when I worked at Spotify, I used machine learning for the music recommendation system. But I started with the question, "How do we help people find more music?" And then I realized I have to do all this matrix factorization and build a vector database and all that stuff.
Katrin Ribant: 00:28:03.362 Thank you for all my playlists.
Michael Driscoll: 00:28:04.977 Yeah. It works.
Erik Bernhardsson: 00:28:06.768 And I think that's the kind of feeling, that's how you should think about this thing. Let's start with what are you actually trying to do? Chase spending a gazillion dollars on, or whatever, Bank of America spending a gazillion dollars on chatbot is not an answer to their shitty mobile app. Right? [laughter] So I don't know. I'm very excited about the long-term stuff, but let's not stick AI on top of bad stuff.
Michael Driscoll: 00:28:35.997 I'm going to focus the question for version, then, for Edo, which is, how are you using today these tools for software development? Has it changed the way you think that your engineers internally are writing code?
Edo Liberty: 00:28:48.139 100%.
Michael Driscoll: 00:28:48.735 Okay. So maybe say a few words about--
Edo Liberty: 00:28:50.731 I mandated the use of a copilot for coding. Not everybody has used it all the time, but everybody had to try it for a while before they decided they like it. [laughter]
Michael Driscoll: 00:29:05.400 How many people liked it?
Edo Liberty: 00:29:07.715 Almost everyone.
Michael Driscoll: 00:29:08.737 Really?
Edo Liberty: 00:29:08.954 Yeah. Yeah. That's the way it is. I mean, they don't use it all the time, but they use it pretty extensively.
Michael Driscoll: 00:29:18.516 And then the second piece, is there any place where you could see it actually manifesting today in the actual product that Pinecone delivers, the service that you deliver to customers?
Edo Liberty: 00:29:29.795 Yeah. I mean, 100%. So I don't know if you know this, but there's a specialization in the AI space. Right? We used to think about these models that we basically thought about them as just neural nets, and we trained them with back propagation. And that was pretty much the only game in town. And they got bigger and bigger and bigger. And I think people are still thinking about those systems mainly as that, which is already not the case anymore. Right? And there's specializations. So they work alongside vector databases. They work alongside other specialized technologies. Some parts of those networks actually look slightly different, and sometimes actually use different hardware. And so this specialization is happening. And in some sense, the success of Pinecone has to do with the fact that the long-term memory and-- so people do what's called RAG, which is retrieval augmented generation. So they would put the data next to the model, and so they don't have to mix the two. So they don't have to retrain their models. And they use the data externally. And that somehow works really well.
Edo Liberty: 00:30:44.017 What we see now is that it's still the beginning of this journey. Right? And so we see somehow we still have to inform-- for example, it's going to sound very meta, but it's not. It actually kind of needs to be manifested in code, which is with vector databases, the index itself is in some sense a model of your data, which sounds weird. But it is. Right? In the same way that in your brain, the way you remember faces and the way you remember, I don't know, poetry is completely different. It's just organized differently. It's accessed differently. It's a different kind of data. And it needs different setup. Right? The same thing is true for vector databases. So we're seeing this. On the one hand, it's separating out because they need specialized hardware. On the other hand, they need to work really well together. And so they influence each other quite a lot. So I know it sounded meta, and if you don't know how this, but it isn't. For me, it isn't.
Michael Driscoll: 00:31:55.766 You're teaching a class at Princeton this semester. Is that right?
Edo Liberty: 00:31:58.710 That's right.
Michael Driscoll: 00:31:59.481 Okay. So those notes are all online. So you can look up Edo Liberty, Princeton dot edu. You'll find further ruminations on these.
Edo Liberty: 00:32:05.127 It's mostly math. I had to warn my students. It was like, "We're going to do math on a blackboard. It's called long-term memory for AI and vector databases, but it's going to be a bunch of proofs. So if you're into that, stick around."
Michael Driscoll: 00:32:20.311 Great to read that. After a few glasses of whiskey, it will make even more sense.
Edo Liberty: 00:32:24.030 Yeah. Exactly. [laughter]
Michael Driscoll: 00:32:25.240 Okay. We're going to move to questions. I'm going to ask-- is Sid in the audience here? Yeah, Sid. Can you respond to the invite with my phone number? So here's how we're going to do the questions. You guys are all going to get my phone number. You all are going to get my phone number. And I'm going to ask, if you want to ask a question, I'm going to have you text it to the phone number that you will receive on the invite email. The reason I do that is because sometimes at these Q&A things, you just get some bad questions, and you're forced to just take the bad questions. [laughter] So I like to vet questions and make sure that someone doesn't just pitch their startup and then say, "And what do you think about that?"
Guest: 00:33:05.781 I already had your number, so I texted you. [inaudible].
Michael Driscoll: 00:33:07.006 You can text me. All right. Great. Well, now, I did promise Lauren the first question though. But before I do that, while Lauren's preparing her bomb-throwing question from the back - Lauren's the best bomb thrower I know here - I'm going to ask a final thing, which is for each of you to name three startups. Hopefully they're not ones that you're investors in, but they can be. But three of the most exciting startups that you know of in the data, infra, and AI space that you're paying attention to right now. So I'm going to start with Katrin.
Katrin Ribant: 00:33:39.798 [inaudible].
Michael Driscoll: 00:33:40.863 You're going to pass. Two of them are already up here. You have to start. Three of the most interesting. You might only have two, but things that you're excited about that maybe we should all be paying attention to.
Erik Bernhardsson: 00:33:52.124 Wow, that's such a hard question.
Michael Driscoll: 00:33:54.598 It is.
Erik Bernhardsson: 00:33:54.521 Man, I'm on the spot. I'm going to have to think about it for a minute more. I'm serious.
Michael Driscoll: 00:33:59.674 Okay. I'm going to answer it first myself, then.
Erik Bernhardsson: 00:34:01.450 OK, you do it, then. You take it.
Michael Driscoll: 00:34:02.847 This is going to be a completely-- I'm just going to-- so I'm going to name just startups that are in the room here. So one is Drifting in Space. Paul and Taylor, if you haven't checked them out, they're building some really cool tools for serverless applications. The second is I'm going to-- Ciro, are you here? Okay, Ciro, is it Bauman?
Guest: 00:34:22.236 It's Bauplan.
Michael Driscoll: 00:34:22.837 Bauplan. B-A-U-P-L-A-N.
Guest: 00:34:26.796 We chose a German word because it's [inaudible]. [crosstalk].
Michael Driscoll: 00:34:31.334 So German words, they're so mellifluous. They roll off the tongue. Some Italians picking German words for their startups. But Bauplan. And if you haven't seen it, there's a paper that Ciro and his colleagues wrote called Building a Serverless Data Lake.
Guest: 00:34:49.484 Data Lake House.
Michael Driscoll: 00:34:49.538 Data Lake House from Spare Parts. It was in VLDB a couple of weeks ago. I found that to be fascinating. And my third startup that I'm just going to pick out of thin air here, and I should probably just be talking about ones that are in here. Okay, well, I mean, I'm just going to call it out. Great. David here from Estuary. Estuary is working on stream processing. Very cool startup here in New York City, still early days, has built some very cool stuff. And I have not caught up with you, Dave, in a long time. But you're definitely the beneficiary of us checking out what Estuary has been up to since we last talked. Okay, those are my three. But then I go now down the line fast. I bought you guys some time. Katrin.
Katrin Ribant: 00:35:34.102 I'll do one. Can it be not data?
Michael Driscoll: 00:35:36.354 Which one?
Katrin Ribant: 00:35:37.088 Can it be not data?
Michael Driscoll: 00:35:39.395 What's it called?
Katrin Ribant: 00:35:40.226 It's called Easop. I'm Belgian. It's actually a Belgian startup.
Michael Driscoll: 00:35:43.926 Okay. Nice.
Katrin Ribant: 00:35:44.693 And they help companies like us manage equity plans in non-US locations.
Michael Driscoll: 00:35:51.023 Okay. Easop. Spelled with a Z or with a with a--
Katrin Ribant: 00:35:54.357 With an S.
Michael Driscoll: 00:35:54.746 Okay. S. Okay. Great. I can check that out. All right, Erik, you're on the spot.
Erik Bernhardsson: 00:35:59.835 I like what Hex is doing. I think they're doing a good job rethinking dashboards and analytics. That's the only one I could think of on the spot.
Michael Driscoll: 00:36:08.538 Yeah. Definitely reimagining dashboards and analytics is an awesomely hot space right now. So anyone who's in that space, we want to get that. Okay. I agree with that, for sure. Edo?
Edo Liberty: 00:36:19.152 I'm going to have to pass, man. It's hard. And yeah, it's both hard and I'm going to offend so many people. [laughter] I'm going to hard pass.
Michael Driscoll: 00:36:36.844 There's this company called OpenAI in San Francisco. You--
Edo Liberty: 00:36:39.292 They're doing well. Yeah, exactly. I hear good things.
Michael Driscoll: 00:36:43.659 I don't know if it's all just hype, but they seem to have a lot of customers.
Edo Liberty: 00:36:46.193 Yeah, yeah, yeah.
Michael Driscoll: 00:36:47.357 Okay, Lauren, bomb throw from the back. First question goes to you. And you have to direct it at one of the panelists. Oh, by the way, I'm going to give you-- I'm going to give you this, and then we're going to pass this around. That way we catch it on-- stand up. Here we go.
Guest: 00:37:01.495 I would ask, three years from now, how have vector bases not become completely commoditized? What are you doing from a go-to-market perspective, product perspective, to build that moat, defend that moat, and stake your claim as Pinecone, as the number one player and V go-to for that?
Michael Driscoll: 00:37:26.254 That's right at you, man.
Edo Liberty: 00:37:27.009 Sure. First of all, I don't know if you know this, but I kind of joked about this before when, in the first two years of the existence of Pinecone, I would tell investors that I'm building a vector database. And they would just look at me foggy-eyed and confused. And they had no freaking clue what I was talking about. And those days have now changed, with I think there are like 50, I swear to God, like 50 companies that they say that they are vector database. Not that they also have vector database capabilities, that they are dedicated to being vector databases. Right? So your question is spot on. Right? I mean, what do we do to keep earning our spot at the top of the podium. Right? And I can tell you as somebody who's dedicated his life, and at the very least, the last decade or so, building infrastructure and caring very deeply about these issues, that we invest very, very deeply, both in research, and engineering, and the system, and the reliability. And we spend a shit ton of energy just earning people's trust one day at a time. And I think that it's a slog. Right? It's a long journey, and you're in it because you love that kind of pain. We love that kind of pain. This is what I stay up at night thinking about. Right? And I hope we are earning people's trust. I think we are.
Edo Liberty: 00:39:12.993 I don't know if you've seen, we had-- what is it? I think 9, 10 months ago, we had a young engineer who wrote a script to delete old defunct data out of free users that left the system months ago. Right? And long story short, that script went rogue and deleted more data than-- it stopped free tier users. Now we had no SLA and no guarantees for free tier users, but that was a huge problem for us. Right? We took full ownership of that, and we took care of everybody. We ended up actually collaborating deeply with the core engineering team at GCP to go recover that data. We literally went and worked with the Google folks to go figure it out, even though it was on the free tier. And even [those?] who were not, they didn't pay us a dime. And at least contract-wise, we didn't have to do that. But that was the trust that we wanted. And we put a full report down to the minute of what happened, how we recovered. I think for us, that's how we win. And we build the best platform and we earn trust one day at a time. And hopefully, people see that.
Michael Driscoll: 00:40:33.356 And you don't delete your users' data. And if you do, you--
Edo Liberty: 00:40:38.204 We recovered everything. By the way, this comes at the heels. Just so you understand what was happening on our side. Pinecone was only used by very savvy developers. Right? And so if you signed up for a free account, you got a CPU, and you got-- I forget what it was, like 4GB of RAM, and like a 2030 GB SSD, and a bunch of networking. And you got a lot for free, basically. Right? And then AutoGPT happened. I don't know if you remember what AutoGPT was. It was this kind of random agent, the first really viral engine that went out. And we started having 10,000 signups a day. We started provisioning 10,000 CPUs a day for weeks. Right? We started hemorrhaging millions of dollars a month. We had to do something. [laughter]
Michael Driscoll: 00:41:32.471 Delete those users.
Edo Liberty: 00:41:33.849 We had to figure out something. And most of them were using the service for about five seconds. They'd just create an index and just leave. So anyway, so we can talk about how that-- you can read the report if you want.
Michael Driscoll: 00:41:47.814 All right, I've got a question that I vetted from the audience. I'm going to give this to Katrin. So you've got a lot of founders here in the room. You've also got some folks who are probably thinking about starting a company. And maybe they're at a safe, big FANG company right now. I'm going to ask, not for this company you're starting, but for Datorama, because I think anyone who's a second-time founder that's had a big exit, is kite surfing in Greece and getting bored, it's a different question. But for that previous leap of faith, how did you--
Katrin Ribant: 00:42:22.331 I was young.
Michael Driscoll: 00:42:23.070 When you were young and had a safe, comfortable existence, how did you make that leap? What was the fundamental thing driving you to do that?
Katrin Ribant: 00:42:34.248 I think, I mean, those variables are very personal to everyone. Right? There is a question of what drives you, just period, what drives you. I always wanted to build. Always. I don't know. You have that or you don't. And then what is your comfort with risk? And in my opinion, the advice that I give to everybody who asked me that question is, I think you have to truly understand what is your degree of comfort with risk, and put yourself into a situation where you can take that risk and be okay if you fail. You have to be okay with the downside. You have to account with the fact that all of this can fail. And some rare people can put themselves into extremely uncomfortable situation and survive that. I can't. So--
Katrin Ribant: 00:43:27.981 No. So to me, it's really about understanding what your limits are. Imagine everything goes wrong. What do you do on the other side of that? And put yourself in a situation where you're okay with that. And then if you don't jump at that point, it's just not for you.
Michael Driscoll: 00:43:49.083 Great answer. When you're younger, you're also a bit more comfortable with risk. Right?
Katrin Ribant: 00:43:53.821 Oh yeah. Do it while you're younger.
Michael Driscoll: 00:43:55.398 I strongly advocate jumping while you're young, because you probably won't do it by the time you get to later stages. Okay, this is a question from Ethan. I'm actually going to ask Ethan to ask it, him-- or I assume it's him. So okay, Ethan, I'm going to give you the mic so you can ask it. You made the cut. Nice question.
Guest: 00:44:15.053 I don't know if this works. [inaudible].
Michael Driscoll: 00:44:17.168 [inaudible]. You can hold on to it. It records, but it doesn't amplify.
Guest: 00:44:21.538 Oh, okay. Gotcha. This one's mostly for Erik. When I look at Modal, and I understand it makes sense for hosting good open-source models like scalable fusion, but when it comes to text completion models, it feels like almost all of the risk of someone wanting to use it is dependent on open-source models eventually being able to outcompete something like an OpenAI or an Anthropic's model. Because at least across the customers that we deploy for, for someone who's not super picky, they're fine with our public cloud deployment. I feel like we're almost always going with OpenAI's inference, or the 32K inference. For someone who's somewhat picky and needs a VPC, but they're still okay with the cloud, AWS Bedrock feels like a [inaudible] improvement guaranteeing an Anthropic model that still feels like it would outperform, if we tore our hair out and tried to tune like Llama to what we do. And in the very last case, if God wills it, we sell to the DoD or some giant pharmaceutical company, and we need to deploy in their basement servers, at that point, the forced on-premise kind of just makes it so that we can't leverage the cloud deployment of the models on Modal. So am I missing something? Or is it the only case that it would make sense to use a non-AWS cloud provider to deploy a large language model for tech stuff in the case that open-source models get better? Sorry, that was a really long, rambly question.
Erik Bernhardsson: 00:45:57.427 No, I think you're right. I think that we've seen we don't have a lot of people using Modal for language models. And part of the reason is just it's so API-dominant. It's all driven by APIs. And the best models today are proprietary, and extremely large, and very hard to host. The biggest category for us is people building their own custom models who don't want to deal with infrastructure. And not just building their own custom models, but they're also building a lot of infrastructure around it. So that's the segment we focus on. I tend to think that the sort of segment slightly above us, the sort of AI API, where you can just pick a popular model and call it, I tend to think that segment is going to have a harder time maintaining competitive advantage against each other. We're building a layer below, which is the infrastructure. You can run custom code on Modal. And that's a hard thing to build. And fundamentally, there's like 10, 100,000 machine learning engineers out there training their own models, deploying their own models. And that's where we see our strong competitive advantage.
Michael Driscoll: 00:47:07.497 Okay. This is the last question, which is just a numeric question. Then we're going to go back to actually mingling and chatting, and having lots of questions and dialogue.
Edo Liberty: 00:47:19.395 Numeric question?
Michael Driscoll: 00:47:20.374 Numeric question. So this is a question with a numeric answer each of you can give.
Edo Liberty: 00:47:22.980 Oh, okay. You said that in one [crosstalk].
Michael Driscoll: 00:47:25.358 A floating point. That's right. It's going to be a floating point decimal, a floating point number is accepted here, an integer. I'm going to force an integer answer here. So the question is, how many years until AGI? Katrin? And you can say, you can say infinite, but integer number of years until we have achieved artificial general intelligence, AGI.
Katrin Ribant: 00:47:54.468 Okay. Infinite.
Michael Driscoll: 00:47:57.373 Infinite.
Katrin Ribant: 00:47:57.909 I'm not answering that, so.
Erik Bernhardsson: 00:48:00.103 Can you define AGI?
Michael Driscoll: 00:48:02.253 I'm going to just say, ask GPT4 what is AGI, and we'll just accept that as the appropriate answer.
Erik Bernhardsson: 00:48:07.927 [We'll be?] self-serving.
Michael Driscoll: 00:48:08.980 Yes, of course. Erik?
Erik Bernhardsson: 00:48:11.685 Yeah. I think 50 to 100 years. I think there's been so many waves of people getting excited. And I'm sure when pocket calculators came out, people were like, "Whoa, what are humans going to do? These guys can do math." So I don't know. I think there's so many cycles of getting excited and then pulling back. And it was like, actually, there's a lot of other stuff. So I don't know. I'm long-term very optimistic. I think we'll get there. But I think there's a lot of reasons to not buy into the sort of Sam Altman--
Michael Driscoll: 00:48:42.115 Fast takeoff?
Erik Bernhardsson: 00:48:42.907 Yeah.
Michael Driscoll: 00:48:43.598 Okay. Edo?
Edo Liberty: 00:48:47.442 I would be contrarian and say minus one.
Michael Driscoll: 00:48:52.467 We're already here?
Edo Liberty: 00:48:54.467 I think so. I mean, I think it's initial. It's like think about life on Earth. The first life was pretty basic. But it was a step function above non-living matter. Right? I think we've done that. Okay? I think we've figured out something fundamental. And I think there is-- think about AGI as the eventual conclusion of what we're doing now. But I think what happened in the last year, and if you ask academics and machine learning practitioners, and scientists, and so on, what happened in the last year, we thought was going to take 30 years. None of this we could foresee. Right? And so I think, in some sense, we are there in the sense that we've made the qualitative jump. Now it's about figuring out getting it better, faster, bigger, more accurate, stop hallucinating. We have a lot to figure out. But I think the binary jump has already been made.
Michael Driscoll: 00:50:17.455 Okay. All right. To be continued over drinks and conversation. But I want to thank our panelists. I want to thank all of you for being here tonight. [applause]