diff --git a/practicalai/practical-ai-274.md b/practicalai/practical-ai-274.md index f42ba33c..5a8a25bf 100644 --- a/practicalai/practical-ai-274.md +++ b/practicalai/practical-ai-274.md @@ -16,7 +16,7 @@ And wouldn't it be nice to avoid that step, where you ask the question and you get an answer right away. So that's the ultimate destination where we're trying to get. Obviously, it's been tough to get there over the last decade or so, there has been a lot of work, but it never quite worked. There is this much higher level of hallucinations, much higher level of maybe not perfect synthesis of the information... You basically get a Frankenstein... So instead of a coherent and nice, easily parsable and readable answer, you get some just basically extracted pieces of the information and just concatenated together, so not very pleasant. And it's funny that when we started -- so one of our angel investor was Jeff Dean; he requires no introduction. And he was saying Google actually wanted to always build something this, but because they had such high expectations for accuracy, because millions and billions of users are using Google... And if you hallucinate 1% of the time, you're gonna get a lot of unhappy people. And so they were never able to -- because the models were not as strong as they are right now, they were never able to get to just 99.9% of accuracy. And that's why like this work never panned out. -\[00:06:20.00\] But something great happened in 2022. When we started our company, both myself and Aravind, my co-founder, we come from academics, so we've been doing a lot of research in language modeling, reinforcement learning and stuff like that... And he was actually at Open AI at that time. We've been very literally following improvements of GPT models, GPT 2 and then GPT 3; that's where it actually got very interesting. And it became obvious there was going to be something there. And this was primarily the motivation for us to start a company. +\[06:20\] But something great happened in 2022. When we started our company, both myself and Aravind, my co-founder, we come from academics, so we've been doing a lot of research in language modeling, reinforcement learning and stuff like that... And he was actually at Open AI at that time. We've been very literally following improvements of GPT models, GPT 2 and then GPT 3; that's where it actually got very interesting. And it became obvious there was going to be something there. And this was primarily the motivation for us to start a company. We wanted to build an answer engine from the get go, but it was very ambitious. I remember we would go to the investors and say "Oh, we're going to build a search engine", and they're looking at you like you're crazy, which makes a lot of sense. They're just like "Oh, there's Google ready." And they had a fair point. But we still weren't very discouraged by that. We knew there is something there, and we started prototyping. @@ -32,7 +32,7 @@ And then obviously, ChatGPT happens a few weeks after, and we're like "Okay, so And so the citation was this sort of first class citizens in our product. And then when ChatGPT came out, one of the biggest points of feedback for them was "Okay, so I don't know if this is accurate information or if it's a hallucination. If it's not, how would I verify it?" And that's why we were like "Okay, so this seems like a good opportunity to release our product." We literally in a matter of two days put up a website, connected to our backend that we had, and just -- we obviously did not expect that people were going to use it and the usage was going to grow as much as possible... -\[00:10:08.03\] But coming back to your original question, I think what happened - literally in a matter of days or a month... I mean, obviously, it follows many years of research, but it was a very clear step function in the quality of the generated answers. And you can literally -- if you sort of spend some time playing with it, you can clearly see that it now becomes very good. We also realized at that time "Okay, so models are only gonna get better. Things are only going to get faster and cheaper, so there's something. There's a lot of stuff to build here." +\[10:08\] But coming back to your original question, I think what happened - literally in a matter of days or a month... I mean, obviously, it follows many years of research, but it was a very clear step function in the quality of the generated answers. And you can literally -- if you sort of spend some time playing with it, you can clearly see that it now becomes very good. We also realized at that time "Okay, so models are only gonna get better. Things are only going to get faster and cheaper, so there's something. There's a lot of stuff to build here." **Chris Benson:** For those who are still learning about your organization and what you're offering, could you step back for a second and if you were talking to Jeff Dean or another investor, kind of giving them the elevator pitch about what you're doing specifically, and how it's differentiated from the GPTs and stuff out there... How do you define that? What how do you describe yourself in terms of the specific opportunity that you're pursuing? @@ -46,7 +46,7 @@ We wanted to do things very simple. Early on we identified "There are two things So the main differentiator is just we care a lot about quality, so we minimize the chance of things being inaccurate, or hallucinate, and we want to do it as fast as possible. And so that distinguishes us from Google, because Google doesn't, for example, generate the answers... Even though more recently they started doing this, which has kind of just validated our idea... And ChatGPT probably primarily focuses on different things. But I guess also more recently they've started doing web search as well. -**Break**: \[00:13:18.23\] +**Break**: \[13:18\] **Daniel Whitenack:** So you mentioned a few things there... You mentioned web search, you mentioned retrieval, you mentioned the large language model... So at least in how I think about it - and maybe others categorize it differently - there's one element of information that you can get from an LLM, which is "I'm going to put in a prompt, and it's going to generate text, and that may contain some facts, or made up facts, or some text, but it may be informational." So there's some sort of knowledge that can be gained there. And then in a second case, there's a way to retrieve on the fly external data. So that could be from your company's documents, it could be from the web, whatever, and then inject that into prompts into the model, which kind of grounds it, and like you say, would give you a citation. @@ -54,7 +54,7 @@ There's also more agentic approaches to this, where maybe there's multiple ways **Denis Yarats:** The way I see it's going to unfold, I think the tools and the agentic behaviors - I think that's where it's going. I think it's going to be the main bottleneck for this right now; it's just models are not smart enough yet to take into account and sort of reason all of the information that is out there... But I think it's going to be a main component. So there's going to be models -- they're very powerful already right now. They're trained on a lot of data; basically, internet. They have a lot of internal information, internal knowledge. And they can do already a very good job of synthesizing information. There's certain things that they don't do well, and perhaps they're never going to do well those things. For example like computation, when you need to maybe run some code, or do some sophisticated math computations... The LLM architecture as transformers is going to struggle at that. -\[00:18:26.00\] Also, because those models are so big, sometimes it's going to be very expensive to then update very frequently. So you need a way to ingest something, some new information that just happened and it's still not part of the LLM weights. And this is where we're also specialized. Also some private documents, as you mentioned. Sometimes if it's like enterprise, you have some of the documents that obviously the model was not trained on, and you maybe want to reason about those documents. And there's all kinds of other tools that you can -- yeah, eventually there is going to be agents that are going to do actions, like maybe book a ticket, buy a ticket or something like that, and stuff like that. +\[18:26\] Also, because those models are so big, sometimes it's going to be very expensive to then update very frequently. So you need a way to ingest something, some new information that just happened and it's still not part of the LLM weights. And this is where we're also specialized. Also some private documents, as you mentioned. Sometimes if it's like enterprise, you have some of the documents that obviously the model was not trained on, and you maybe want to reason about those documents. And there's all kinds of other tools that you can -- yeah, eventually there is going to be agents that are going to do actions, like maybe book a ticket, buy a ticket or something like that, and stuff like that. So I think definitely where it's going, it's going to be a synergy; everything's gonna come together. We just need a top-level powerful model that's gonna reason behind multiple things, and we'll have to have long context windows, maybe some memory as well... And then just utilize those tools as much as possible. @@ -64,7 +64,7 @@ The other aspect that I'm very excited about them and we're working on is kind o **Chris Benson:** As people are using an answer engine like yours more and more often going forward, and you've alluded a moment ago to the fact that LLMs are not the be-all; there are things they don't do well, like mathematics and such, and a variety of other things I'm sure that we can all throw out there... But they're really powerful at what they do. But clearly, there is a place and a need for both the LLMs, these largest models that get all the -- they kind of soak up all the air in the news cycles, as well as many smaller models that are specialized, a mathematics model that you plug in... As we're looking at trying to use answer engines to retrieve information and that information is increasingly multimodal in nature in terms of what you're asking, how does the architectures of those come together? This is a space -- it's not the first time we've asked it here, but it's evolving so rapidly; we're no longer hosting a model, you're now hosting a whole collection, and they may be mixed with models that your API-ing out to, and such... How does that look to you as you're building this company at this point? -**Denis Yarats:** \[00:22:11.12\] It's going to always be a trade-off between -- if you have one powerful model, I mean, yes, ít can do lots of general things super-well, but it's going to be slower, it's going to be more expensive... One of our key principles is just we want to do things very fast, so we can get answers as fast as possible. That means you have to design your orchestration system in a way where certain things will have to rely on customized models. And something that is much smaller, much faster, but it's a specialist model. So it's not a model that knows how to do everything, but it knows how to do one task. And basically the challenge here is how do you balance between these general models and these specialist models? And I think we've been doing this from the very beginning. So when you send a request to Perplexity, it's not just one model; there's at least ten different models trying to do lots of things with your request. It's all kinds of ranking models, a bunch of embeddings, all different classifiers, and stuff like that. +**Denis Yarats:** \[22:11\] It's going to always be a trade-off between -- if you have one powerful model, I mean, yes, ít can do lots of general things super-well, but it's going to be slower, it's going to be more expensive... One of our key principles is just we want to do things very fast, so we can get answers as fast as possible. That means you have to design your orchestration system in a way where certain things will have to rely on customized models. And something that is much smaller, much faster, but it's a specialist model. So it's not a model that knows how to do everything, but it knows how to do one task. And basically the challenge here is how do you balance between these general models and these specialist models? And I think we've been doing this from the very beginning. So when you send a request to Perplexity, it's not just one model; there's at least ten different models trying to do lots of things with your request. It's all kinds of ranking models, a bunch of embeddings, all different classifiers, and stuff like that. And the other trade-off here is with a general model - let's see, one of the big, and I think it was actually a very critical component of why a company like Perplexity in the first place became possible, it's the speed of iteration. You literally can change the prompt, and you can just get a new product in a matter of hours. Imagine a couple of years ago, if you wanted to build something like Perplexity or another gen AI product, you'd have to collect data first, you'd have to train the model, launch the product, see if this product makes sense, does it have market fit or not? If it has market fit, then you'd start collecting data, and then you'd just keep improving. So what's been possible with GPT models and APIs - this just kind of flipped over. So you very quickly can build a product, see if there is any signs of life in this product, and then you start collecting data, which is I think honestly the most important thing. And once you collect data, you can distill it, you can build many other smaller models, and optimize the experience so you can make the models faster, you can specialize them. I think this is the key, I think this is the -- honestly, it was one of the most fundamental changes in the development. And we took advantage of this theme early on, and then still using... @@ -74,9 +74,9 @@ So the key is just you don't want to go too overboard with those models, like ev **Daniel Whitenack:** Yeah, I have a question maybe related to that, which I think is a pain point a lot of people are feeling, and I'm guessing your teams have felt, which you even mentioned this... You can make a small change in your prompt or create a new prompt, and all the sudden it's almost like you have a new product... Which is sort of amazing in one sense, and really frustrating in another sense... Because as you were just alluding to, it's like, oh, maybe I have these 17 different things chained together, and they all have prompts that I've worked really hard on... And then tomorrow LLaMA 17 comes out or something, and now it has a different character of behavior than the previous model that I was using... I'd love to use it, but now I have all of this -- it's almost like AI model debt that I've got in my system... Do you have any perspective on that, or anything that sort of has happened in your experience in this regard? -**Denis Yarats:** \[00:26:21.23\] Yeah, this is clearly been a theme, and it's been happening quite often... And if I would guess, that's going to continue happening. One thing we realized early on is, okay, so this is going to be the case; there is not going to be one model that rules them all. I mean, even though for some time it was GPT 4, but now we can see there's particularly like Anthropic, Gemini, LLaMA... There is going to be a future where there's several frontier models. Because of that, we decided, okay, so let's design our infrastructure and our system in such a way that it's going to be model-agnostic. That means there's ways where you can evaluate each component independently, there is a way where you can quickly change things up to adapt for a new model, and stuff like that. It took some time to get there, but I feel like it was a very correct decision for us. So for example, one of the advantages we have over let's say things like ChatGPT, or Claude, or one-model providers companies is just we can seamlessly integrate many different models. Our users can decide they want to use this model, or they want to do that model. Later on, as we progress, I think we can even decide based on the complexity of the query, or the type of the query, we can route to a particular model that does the better job for those type of queries, and minimize -- maybe some of the queries that are super-simple, you don't need to run a very large model for that type of answer. So then you can optimize speed, and things like that. So I feel like you have to just make a system in a way where it's agnostic to the model. +**Denis Yarats:** \[26:21\] Yeah, this is clearly been a theme, and it's been happening quite often... And if I would guess, that's going to continue happening. One thing we realized early on is, okay, so this is going to be the case; there is not going to be one model that rules them all. I mean, even though for some time it was GPT 4, but now we can see there's particularly like Anthropic, Gemini, LLaMA... There is going to be a future where there's several frontier models. Because of that, we decided, okay, so let's design our infrastructure and our system in such a way that it's going to be model-agnostic. That means there's ways where you can evaluate each component independently, there is a way where you can quickly change things up to adapt for a new model, and stuff like that. It took some time to get there, but I feel like it was a very correct decision for us. So for example, one of the advantages we have over let's say things like ChatGPT, or Claude, or one-model providers companies is just we can seamlessly integrate many different models. Our users can decide they want to use this model, or they want to do that model. Later on, as we progress, I think we can even decide based on the complexity of the query, or the type of the query, we can route to a particular model that does the better job for those type of queries, and minimize -- maybe some of the queries that are super-simple, you don't need to run a very large model for that type of answer. So then you can optimize speed, and things like that. So I feel like you have to just make a system in a way where it's agnostic to the model. -**Break**: \[00:28:11.03\] +**Break**: \[28:11\] **Chris Benson:** So to follow up on what we were talking about before the break there, I know that you were talking about really building around model agnosticism, to be able to handle that... I couldn't help but wonder, occasionally, as we get a new model out, it breaks new ground on modality, being added in a whole new approach, that kind of thing... And so how do you as a business builder who is having to try to accommodate all these different models, when you have one that jumps out and has a completely new thing added in, that was unexpected prior to the announcement, in the organization how do you guys pivot to accommodate that, and keep the agnosticism, and yet provide that extra functionality that's now available? How do you all tackle that problem? @@ -88,7 +88,7 @@ So having the system that \[unintelligible 00:32:03.29\] update can support thos **Daniel Whitenack:** While you were talking, I was thinking there's sort of one axis that you have to navigate here around model releases, and functionality, and modalities, all that stuff... There's sort of another maybe around UI and user experience. There were multiple people making the comment "Oh, well, the chat interface came out with ChatGPT", so everyone's sort of focused around the chat interface. Is that the best way to utilize this sort of technology in the long run? There's probably a lot of exploration that's still open around UI, and user experience with this type of technology. And certainly chat is relevant, and we're using it a lot already... I'm wondering, from your perspective, especially as we see this functionality maybe more embedded in the physical world around that, whether that be in our glasses, with Meta glasses, or in kiosks, in airports or whatever those things are, what is your perspective on how important it is to explore new types of UI or user experience with this technology? -**Denis Yarats:** \[00:33:55.28\] I believe and -- I mean we were very confident early on chat interfaces is a temporary thing. It's just too limiting. It has a lot of constraints. And that's why we didn't follow the usual road of all of the chatbots. They literally copied ChatGPT and put like a chat interface. We thought a little bit more about this, and we decided -- I feel ultimately, right now we're still in this early stage where people care about the model itself, so the model is the theme... But as this thing gets more advanced, as more people start using gen AI products, I feel like the main thing is going to be the product itself. What kind of things can the product do? Do you do it better than this? Do you have the best UI? Do you have the best UX? And that's why we early on we were thinking about those things, and designed our product in a way that is the most suitable for the things that we wanted to do. If it's search, we knew that chat doesn't make sense for search. It's just like, that's not how people search for information. That was a very big factor, I think, in our success. +**Denis Yarats:** \[33:55\] I believe and -- I mean we were very confident early on chat interfaces is a temporary thing. It's just too limiting. It has a lot of constraints. And that's why we didn't follow the usual road of all of the chatbots. They literally copied ChatGPT and put like a chat interface. We thought a little bit more about this, and we decided -- I feel ultimately, right now we're still in this early stage where people care about the model itself, so the model is the theme... But as this thing gets more advanced, as more people start using gen AI products, I feel like the main thing is going to be the product itself. What kind of things can the product do? Do you do it better than this? Do you have the best UI? Do you have the best UX? And that's why we early on we were thinking about those things, and designed our product in a way that is the most suitable for the things that we wanted to do. If it's search, we knew that chat doesn't make sense for search. It's just like, that's not how people search for information. That was a very big factor, I think, in our success. The other thing I guess we also -- even last year, we started prototyping and experimenting with this concept of generative UI... So something where the LLM can guide what UI elements you can generate... Sometimes one of the things in a chat interface, if you want to ask a follow-up question, sometimes it doesn't make sense to ask it as a sentence. So maybe you want to show a checkbox, or a button, or whatever. Especially on mobile - everybody uses phones - it's just not very convenient to type, especially if you're on the run, so you'd rather press a button. That's why maybe speech, and I guess voice technology is going to be one of the interesting modalities, for sure, for an interesting interface, because it has a lot of advantages. Obviously, it has lots of a disadvantages too, but it's definitely going to be interesting. And I think going forward, as we go towards agentic behaviors and more things are going to become possible, I think it's definitely not going to be chat interfaces. It has to be something else. @@ -96,7 +96,7 @@ The other thing I guess we also -- even last year, we started prototyping and ex **Denis Yarats:** Definitely looking into this, I think, and we consider multiple options. There are certain things I think we can do ourselves, for certain other things we probably have to work with some other partners. But I truly share your experience, and I think it's -- yeah, if you sit in front of a computer, I think by far the best interface is the keyboard. I don't think you can do better than that. But yeah, if you're occupied with something else, if you're driving a car, maybe you're walking, there has to be something else. Even the phone is -- it's okay, maybe even taking notes, maybe you say a command and something like that, and you get the voice back, that's already something... But it misses visual information. So you're gonna want to add that. So that means you probably have to have some sort of glasses on. I think it will definitely happen. -\[00:38:20.25\] And we will try, for sure -- we spent a lot of time this year improving our mobile app to do voice-to-voice, and we invested a lot into voice generation. So for example, you can ask various questions; if you need something quickly, like you're walking and you're like "I want a quick lookup of information", so we support that. +\[38:20\] And we will try, for sure -- we spent a lot of time this year improving our mobile app to do voice-to-voice, and we invested a lot into voice generation. So for example, you can ask various questions; if you need something quickly, like you're walking and you're like "I want a quick lookup of information", so we support that. There is something for example for if you drive a car, we have this -- you can read up the stories, or discover from Perplexity, that's also AI-generated voice. So it's like you listen to a podcast, or... So that's super-important. Yeah, I think that the next step is vision, and so how do you get there. @@ -106,7 +106,7 @@ There is something for example for if you drive a car, we have this -- you can r **Daniel Whitenack:** It's like fighting malware. -**Denis Yarats:** \[00:41:13.27\] Yeah, exactly. It's the same concept. It's definitely going to be an issue, for sure... But my hope and my belief is that the good guys, the good generators are going to be -- from machine learning fundamentals, discrimination is a much easier problems than generation. It's much easier to tell what is good, what is not, than generated. And usually it seems like we've been more successful at detecting this stuff, and I don't see any reasons why it's not going to continue. +**Denis Yarats:** \[41:13\] Yeah, exactly. It's the same concept. It's definitely going to be an issue, for sure... But my hope and my belief is that the good guys, the good generators are going to be -- from machine learning fundamentals, discrimination is a much easier problems than generation. It's much easier to tell what is good, what is not, than generated. And usually it seems like we've been more successful at detecting this stuff, and I don't see any reasons why it's not going to continue. **Chris Benson:** So this has been really fascinating. As we wind up here, and have you here for one more question, as we -- we've talked about the future and have been kind of talking about what our expectations might be, and how those might be fulfilled... Are there any other areas that we haven't addressed, that you're interested in? And possibly, as part of that, any way of summarizing your own vision, without it being just answering questions that Daniel and I have thrown at you, but your own vision for what the future looks like to you, and what you want it to be, and what Perplexity is trying to realize it as, to kind of paint a picture of what we might see over whatever timeframe you want to address?