A guide to building product in a post-LLM world | Ryan Glasgow and Kevin Mandich from Sprig
Episode 104

A guide to building product in a post-LLM world | Ryan Glasgow and Kevin Mandich from Sprig

Sprig is an AI-powered user insights platform that has raised over $88m. Today’s discussion features two key individuals in Sprig’s journey so far: Ryan Glasgow, Sprig’s CEO and founder; and Kevin Mandich, Sprig’s Head of Machine Learning.

Play Episode

Sprig is an AI-powered user insights platform that has raised over $88m. Today’s discussion features two key individuals in Sprig’s journey so far: Ryan Glasgow, Sprig’s CEO and founder; and Kevin Mandich, Sprig’s Head of Machine Learning. Before Sprig, Ryan was an early PM at GraphScience, Vurb, and Weeby (all of which were acquired), and Kevin was an ML Engineer at Incubit, and a Post-Doctoral Researcher at UC San Diego.


In today’s episode, we discuss:


Referenced:


Where to find Ryan Glasgow:


Where to find Kevin Mandich:


Where to find Brett Berson:


Where to find First Round Capital:


Timestamps

(02:50) Intro

(04:57) What attracted Kevin to Sprig

(05:53) Kevin's background before Sprig

(07:56) How Ryan gained conviction about Kevin

(09:55) Key technical challenges and how they solved them

(18:46) How to overcome AI skepticism

(21:47) The early difficulties of building an ML-enabled product

(25:06) Evaluating new models and knowing when to switch

(35:09) Using Chat GPT

(37:23) Product development in the pre vs. post-LLM world

(39:53) The impact of AI hype on Sprig's product development

(45:36) Balancing AI automation with user-psychology

(48:47) Do recent LLMs reduce Sprig's competitive advantage?

(51:00) The importance of "selling the vision" to customers

(54:40) How Sprig structures teams

(57:25) How Sprig upskills all team members on AI

(60:25) 3 key tips for companies trying to navigate AI

(66:05) Major limitations with LLMs right now

(70:27) The future of AI and the future of Sprig

Brett Berson: I thought maybe kind of a, a place to anchor the conversation is how you thought about applied ML in the context of, of founding Sprig about five years ago. Maybe Ryan, you could talk a little bit about the role it played in the very, very early version of the product, and ultimately how it sort of overlaid with bringing, Kevin in to lead the function kind of at the, the ground floor.

Ryan Glasgow: This is very much building on the previous in-depth uh, review Brett, that we recorded, uh, perhaps two years ago about the early days of Sprig and really the unmet need, what we were looking to solve for in the market, and how that really unmet need came about. And I think it's exciting now and very relevant to talk about AI work that was happening in parallel to really, you know, discovering that unmet need and thinking about building something that didn't exist.

In the market, um, up until then and in the explorations of that unmet need and what the customers were looking for, they were looking for a) a way to collect qualitative, uh, product experience data from their users at scale. And only half of the job to be done was collecting that data. The other half was understanding that data and in that process of manually building out reports for the early Sprig customers and sending over a deck of analysis, found out that the collection of the data was really interesting and insightful. But it was actually the ability to quantify the survey data. And the qualitative data at the time was actually what made the data so revolutionary for the customers that we're working with. And realizing that it was something that had not been done in the time around open text analysis of data and realizing that needed to really think about this with a new approach, you know, started to uh, work with Kevin on a prototype and build out a prototype.

And so we realized that this, there's actually enough progress here to see that this is actually something that might be possible to build. And that's when Kevin, you know, joined Sprig full-time and, and really joined in Sprig's earliest days as a founding team member.

Kevin Mandich: I, I can add to that also, um, one, one of the things that really attracted me, in addition to, you know, the problem space, which I had experienced even as an engineer. Um, you know, Ryan's having lived this before and his ability to, you know, articulate the problem and his uh, idea to solve it. Um, one thing that really spoke to me was the analysis of, for example, OpenText survey responses or just OpenText feedback in, in general.

You know, That is probably the most data rich feedback source there is. Um, and historically it's really hard to, been hard to, analyze that. You know, people have done word clouds, word counts uh, topic modeling. But you know, none of those really capture the nuance of, of um, what people are saying.

And uh, you know, doesn't account for the fact that people could be saying multiple things per, per response. Um, you know, natural language is very messy and uh, I saw all these as sort of like a, a signal that there's really something special that we could build here. Um, and there's a really a lot of untapped data that's out there.

Brett Berson: Before you join Kevin, maybe you could talk a little bit about some of the things that you had done that transferred well into the role at Sprig.

Kevin Mandich: Yeah, definitely. You know, As a quick background, I've worked as a machine learning engineer for pretty much all of my career. Um, it's close to 10 years now. Prior to this, had worked at a variety different companies in the tech space uh, legal tech. Uh, I worked at a email security company trying to solve the spearfishing problem.

Um, worked at a computer vision startup for a little while. Um, and prior to Sprig actually didn't have any experience in the NLP space. Um, that was sort of brand new to me. But you know, one of the things about machine learning and AI um, in this industry is that there's a lot of crossover between these different da- data domains.

You know, if you're able to solve a computer vision problem pretty well a lot of those skills are transferrable to natural language or to time series processing or audio processing. And so, you know, I think Ryan realized that and saw that as well. Um, when we started looking at this data, um, doing a kind of a quick POC on the analysis I was able to show pretty quickly, I think that, uh, there was something there and that we could potentially build out something like this.

So, you know, in that sense, I think, yeah, like, uh, like I said the the skills were transferable and we're able to use them to, to come up with something relatively quickly.

Ryan Glasgow: I'll just say one thing that I think was interesting is that the work that Kevin and you know, had built here at Sprig had never been done before, you know, in the broader category. There was no prior concept of this having been ever built in a production system. And so I think for anyone who's looking at the field of AI, a lot of the work, um, maybe now it's been done, but at the time it was all Greenfield work and so it was less about finding someone who had actually done what we were looking to do cause it hadn't been done before. It's actually about finding someone who can solve very difficult problems.

 And so Kevin was very much someone who was looking to solve something that we hadn't solved at Sprig we hadn't seen anyone else solve. And that's really, I think what, you know, attracted him to the problem, but also something that I think was important in bringing someone in, as someone who was really excited about solving something that hadn't been done, and felt very comfortable and confident in tackling that type of problem.

Brett Berson: Ryan, how did you get to conviction on Kevin, specifically given how important the role was at that point in the company's life. It was also an area that you weren't sort of a domain expert in. And it was also an area that I think historically attracted very research oriented versus production oriented people. Which I think has its own set of challenges when you're a few people kind of getting a product off the ground, not kind of an at scale business with a research function. So I'd, I'd be curious, like what was the, what was the mutual courting process, and evaluation process, that ultimately got, got you to conviction that Kevin was the right fit for the role?

Ryan Glasgow: So with Kevin, it was really digging into the AI work that he had done. And really understanding the type of work that he had done that hadn't been done before in the field of AI. You know, I think email phishing was one that he mentioned, very large scale email phishing, and something that was very new and really digging into with his, in the reference checks and in the interviews. Deeply understanding how, you know, he was involved in bringing something new to the company that he was at, and talked to other ML engineers as well as references. And talked to Kevin's own references. And the one theme that we've seen with world-class ML engineers is they are often some of the smartest people that you'll know.

And in Kevin's references, that was very clear. Multiple people, and I've never had this, some references before, multiple people told me this is, you know, uh, unequivocally the smartest person that they know. And usually these really, really smart people are attracted to these really, really hard problems.

And with ML and data science, it is, in my opinion, in engineering some of the more difficult cutting edge frontier problems that are being solved, you know, in the technology landscape. Um, and so making sure that the slope is there is really critical as someone who's really smart. Just because the AI field is changing, and particularly this year, every week, it's rapidly changing.

And so you're gonna need someone who, you can't rely on their prior experience, you have to be able to look at what you think they can do in the future, and that's ultimately what's gonna determine whether they're successful in that new role or not.

Brett Berson: So that's some good context setting. So before we spend a, a lot of the time talking about how you've approached implementing a bunch of the new stuff happening in LLMs. Maybe Kevin, you could share a little bit about when you joined, explain how you approached applied ML in the context of Sprig in the product experience in the early days.

And maybe it, you know, obviously not too much in the weeds from a technical depth perspective, but maybe talk about kind of the product surface area and what you're, what you're doing maybe at, at a little higher level on the product uh, sorry, on the engineering side to, to sort of build those features, call it three or four years ago.

Kevin Mandich: You know, maybe I can focus on kind of our first major application of ML at Sprig, you know, this is at a high level the ability to take a bunch of responses, feedback, survey responses, whatever it is, and uh, you know, categorize them or distill them down into actionable takeaways. Um, because that's essentially what researchers do.

And so, a lot, a lot of interesting facets to that problem. One of which is it's a very subjective problem. You can take two expert user researchers, give them the same list of, you know, 500 responses, tell 'em to distill them down into 10 actionable takeaways.

And those 10 will be completely different. Or even if they are the same team, 10 takeaways, the responses that would go into each of those buckets could be completely different. It's just, you know, based on the person's, their history, their education, and what they're really trying to get out of research.

Um, you know, that That changes from person to person. So that's, that's number one. Um, and maybe a better, a succinct way of saying that is that there's really no universal source of truth for an analysis like that. So that makes it kind of difficult. In classic ML, you want an input and an output that's very cut and dry, and you wanna be able to say with certainty, this output is correct or it's not.

But with this application, that was very difficult to do. So that, that I think made a few things necessary on the product and the engineering side. One of which was having, you know, a human in the loop approach to make sure that even if we didn't get the answers that our customers expected or wanted, or didn't align with their analysis, um, we'd be able to, to modify it.

And that also included making the system as flexible as possible. So an example would be um, at one point we implemented per customer models. So we would take, you know, our base model, fine tune it on uh, the data from a single customer and make that their base model. And then from there, depending on how, you know, different, we needed to, to make the results, we could also do like per survey models. So, you know, take basically different have, have different uh, levels of trained models for these different use cases. Um, and that helped to get us a long way towards per customer or per user customization. 

So those are some of the main challenges. Um, other ones include the fact that, uh, if you're running a survey, you are continuously gathering data in real time. Um, And our customers expect to see these results analyzed in real time. So that makes kind of an interesting situation where, you know, you're getting a constant trickle of responses.

 From an ML point of view, I think it's advantageous to analyze as much data in batch as possible. You tend to get more information to work with. So you had these two conflicting sort of ideas. Um, and so we had to come up with a system that allowed us to take advantage of all the information we had at any one point in time, uh, to be able to continuously deliver results to our customers.

Ryan Glasgow: And, and I do wanna dig into that first point cause that was something that's been really interesting and unique about AI and applications of AI, is it reminds me almost as a first time manager. Cause a first time manager, you delegate a task to someone and they do it differently. And you know, there's a very wide set of permutations for a task. And the man- that first time manager might say, this task is done incorrectly cause it's different than how I would do it. But an experienced manager might say that you are doing this task in a different way than I would do it, and it is still considered correct. And so with our own set of customers, they often would say, you know, this is wrong, but it was actually the AI had a different way of coming out with the conclusion or the answer.

And so when you get into, as we, I think, trend towards artificial general intelligence, the definition of correct or the definition of whether something is, you know, valid or not, is becoming increasingly subjective. And so when you're building technology, that could have a wide range of permutations. How do you actually work with your customers on what the correct, the definition of correct is?

And I think that's something that we've kind of gone back and forth of uh, of really kind of working through. And it's certainly not solved and I think's gonna take us a while for, as an industry. But, uh, something that, you know, Ken was touching on is how do you kind of incorporate customer subjectivity into building AI, which is something that's really new for.

The people building their products, but also the people using the products as well.

Brett Berson: It's a really interesting way to frame it. So to pull on that a little bit. How did you answer that or sort of figure that out in the first few years of Sprig?

Kevin Mandich: Let's see. Maybe you could back up a little bit um, and say, that's actually not where we landed to begin with. Um, you know, I think at the very beginning we tried to follow the mantra of, you know, keep it simple. Try the simple thing first. If that doesn't work, then you increase in complexity. You know, I think the very first attempt we made at this uh, which we thought could work because at the time we had a relatively narrow customer base and a narrow dataset, uh, narrow data domain.

Um, and that was basically like a, a categorization, uh, sort of approach. So, you know, of of all the things you could ask somebody about their use in a product, maybe there are 10 big primary themes you could talk about, you know, ease of use, user interface, pricing, support, whatever. And maybe within each of those you have some subcategories. So like kind of a hierarchical categorization. 

And we found pretty quickly even for our narrow subset of, uh, initial customers that this just didn't capture everything. Like I mentioned, language is very nuanced, but also people's needs and thoughts about how they're using a product are very nuanced.

 You know, Somebody could be talking about the, you know, maybe they want cheaper pricing for the product and somebody else could be saying, you know, they want team pricing instead of individual pricing. Uh, a really basic analysis would put those two together, but um, they're really talking about different things.

And so that's an example of why we decided to go with something a little more, you know, complex and nuanced. 

Brett Berson: How did you think about solving, and maybe Ryan, you want to take this, that issue of the difference between kind of, this is what I was expecting versus this is correct. at a lot of companies, those two things were distinct or, or was that more like a 301 problem that you were working on?

Not kind of a 101 problem.

Ryan Glasgow: Yeah it was definitely a 101. Our V.1.0 was a human loop solution, and so, you know, we built open source models. It would take a first pass. It would be, you know, generally correct.

We would have people with internal tooling review all the output before we shipped it to our customers to view in the interface. And we originally hired Amazon Mechanical Turks, and we said, Hey, we'll save money, we'll have three of them review every response, but we would often see that even if all three of them gave the same output, we would go to an expert researcher or a customer or look at the data ourselves and often disagree.

And then, so the AI might come up with one thing. Our human, the loop process might have another, and our customer might have it. So it's actually three different inputs. And getting all three to agree, to agree was, you know, very challenging. And so we brought on a world class researcher, who was actually on the faculty at Yale, as one of our other, you know, very early team members.

She was actually uh, the fourth person to join Sprig. And her first role was reviewing. This is a very experienced, you know, world-class researcher, you know, who is studying, uh, quantitative research uh, at Yale, come on, and actually be that who is stamping and approving and tuning the AI.

And so we at least knew that we have someone with some level of expertise. And even though though it's one person, it's not three mechanical Turks per response, it's one person per response saying that we believe definitively, based on what we know, this is the proper way that the AI should categorize this data.

And so she really worked with Kevin very closely in the early days to build and fine tune what correct was. And with customers though, that was the piece that they will always have their own preferences and nuances and there'll always be some level of gap. And so what we've done in our, how we display the data to our customers is make it clear to them, and give them the ability to then modify and correct to the 100% their own definition, their 100% definition of correct cause we will never, it's very likely we'll ever get to 100% and so we'll get, you know, 90, 95, 99. Um, but we'll always make sure that they have the ability to then do that last mile of analysis on their own. And so that's something that we have delegated, and put that responsibility on our customers instead of taking on ourself. 

Brett Berson: In those early days, did you find that there was specific nuance in building these ML enabled features that, from either a product development process or the way that you interface with customers, that felt different than building, let's say traditional features and functionality and software? 

Ryan Glasgow: Yeah, one thing that really stood out was the skepticism, particularly, this was 2019 when we started selling this, the skepticism around the AI, you know, that we had built and skepticism around the entire AI, you know what was possible with AI. Because we meet with so many teams, you know, even some of the, our largest customers today, they had tried all the AI solutions around text analysis and NLP, and said we've tried them, you know, they don't meet our needs. Or they say they work great, but we've tried them and they don't work well. And I think you're still seeing a lot of this today. There's some very buzzy, hot Gen-AI companies. You look at the homepage, people are very excited. You hear about the user feedback on Twitter and it's not delivering on the promise.

That's something that we had been dealing, and you know, Kevin had been dealing with from day one. And so how we solve that was a) we said we had this human loop process with an expert user researcher who is reviewing all the data before it was delivered to customers. But then b) we actually, in the early days generally encouraged every customer to pilot Sprig because it was never about, listening to us and taking our word, it was always about how can we actually get Sprig into your hands with your data, um, and let us actually show you how well this AI works.

Because we're not gonna, you know, sell you and debate and tell you how well it works, that's never gonna be good enough for you. The only way for you to be fully confident and trust what we've built will actually work for you in production capability is for you to try it with your own data set, with your own users. And so it's something in the early days we always almost required. I can't think of a customer that didn't run a pilot with Sprig, uh, cause it was always something where they wanted to see to believe and they'd install Sprig in their product, they would get that data, they would start to see the quality of a text analysis that was coming out of the models that we had developed with the human-in-the-loop process as our commitment, uh, to them that this is something that they can trust and feel that they can make decisions about. 

And that's how we overcame the hurdle and we had 100% conversion rate from a pilot to a paying customer. And so we felt really good about that promise and taking on that responsibility ourselves. And so I think for anyone building with AI right now is how can you overcome the skepticism that you're naturally gonna get even today with where models are at?

And you have to make sure that you're building that into your product promise, whether it's on your marketing website or on your sales process, that there continues to be a lot of skepticism. Increasingly there'll be skepticism as more complex tasks are taken on, and you have to think about how to overcome that. And ultimately letting users try it for themselves, uh, is the easiest way to overcome that.

Kevin Mandich: I think what we're going through now is we're probably close to the peak of the current hype cycle that was introduced earlier this year with, you know, chatGPT and everything. 

Right now we're seeing things with seeing, seeing issues with, uh, you know, hallucinations with LLMs. Other interesting failure modes that um, are kind of harder to catch them before so it's gonna make it even more challenging and potentially, uh, make people even more skeptical. So it's, it's definitely something you have to have top of mind as you're building out a product like this.

Brett Berson: What did you find as you were building kind of the first version of this sort of ML enabled product? What did you find was surprisingly difficult? 

Kevin Mandich: The first one, which is probably, you know, more relevant for folks that are still hosting their own models is that you know, we had to figure out how to ba- we basically had to build out a large amount of um, continuous training infrastructure. I think I mentioned before that we had models for different customers, different surveys, et cetera.

That came with a pretty heavy, pretty steep learning curve. So trying to figure out, figure that part out was was pretty challenging. 

Ryan Glasgow: And we, we were scaling at the point, you know, in 2020, we were scaling to the point we're getting so much, uh, survey responses increasingly per month. And, you know, we were continuously chipping away at this long tail of model tuning. Uh, but it was a little bit, I think, scary just to see that we weren't quite scaling, I think quite as fast as we wanted to on the efficacy of the model, and how much was able to take on. And in 2020, I think, you know, Kevin hinted this as this trough of sorrow around ML, will it ever get there? Will it ever deliver on that promise? Uh, and there is some skepticism, you know, in the industry whether it would get there.

And we did start to, you know, credit to Kevin, continue to chip away every month. The model got more and more, you know, efficient and effective of taking on the analysis from the internal research team. And then over the past nine months, 12 months, now we're at the period where it's actually all real time.

And so there was that period of are we gonna get to where we want to go? Can this scale to level that we need to? But we did get there, which was definitely something that uh, in hindsight, you know, a risk, but something that w- was not clear at the time and a lot of people were giving up on ML and AI knowing that they're relying too much on that human loop process. And a lot startups actually went under, uh, in 2019, 2020, just realizing they were not gonna get there.

Brett Berson: How does GPT3 and then chatGPT and this new wave of large language models sort of fit into the journey of, uh, Sprig thus far? 

Ryan Glasgow: I'll start just by giving the context for, I think why having an ML engineer is really critical cause I think right now there's kind of this thinking is that, you know, any engineer can be an ML engineer. You just use, you know, chatGPT APIs. But when we started to think about shifting from Google's open source models to using something from openAI, I do think it's critical to have, at least one, you know, ML data scientist who is evaluating all the different models out there and really knowing what the local maximum looks like. Cause I would often ask Kevin, are we ready to switch to you know, openAI's GPT models. For the past three years I've been asking him that question and you know, he was constantly testing and evaluating when it made sense for us to switch over.

Uh, but you don't really know what the absolute maximum looks like if you're not able to leverage the open source models that are available. And that's not something that a typical full stack software engineer is able to fully evaluate cause it's not where the model starts, it's where the model can get to.

And that's that unknown piece that's really critical of how far can this model take us and what will the result look like after significant investment. And so Kevin was able to kind of test and evaluate when it made sense to reach a new absolute maximum with the efficacy of our models and switch over. 

Brett Berson: Yeah, and Kevin, to build on, on Ryan's point, I would love to hear what your own process was to figure out if this new set of enabling technologies were right for Sprig, and at, and at what point, in the early days of, openAI's models or, or others? 

Kevin Mandich: When we were trying to figure out how to use it, uh, we basically did the same thing we were doing before when evaluating other open source models, you know, had a, a set of what we call source of truth data.

Um, you know, surveys from customers, customer feedback that we'd have to feed through it and evaluate and see, you know, how, how good the model was at, at performing its, its, uh, its function there. Um, what really kicked off our decision to, uh, to promote this model and to use it in production, was, you know, a discontinuous jump in efficacy that we saw.

So it was significantly better than what we were using before. And we determined that we'd be able to use it at that high level of efficacy ongoing in the future. You know, assuming we had some ways of being able to continuously test it and make sure it was still able to deliver over time.

Ryan Glasgow: We rolled that out and announced it in May. Of switching from the previous generation of Google's open source models to, you know, OpenAI's GPT4, and now we're at this process of now evaluating other models against OpenAi's GPT4, and primarily open source models like Llama to see is there any opportunity to augment with other models, or perhaps does it make sense to consider other models entirely as a replacement?

And so I think for any startup, given the advancements in the LLM field, the current model you're using might be the right model today. But three months from now, 12 months from now, there's a, a good chance that another model might actually surpass the current model that you're using for the specific use case that you have given how fast the landscape is evolving.

And I think that's the key question, and becoming, you know, a not competitive advantage, but you know, for teams internally to knowing what model to use and when to switch over. Um, will be something I think will be a, yeah, a key question for a lot of different companies uh, developing AI.

Brett Berson: How do you think about building infrastructure around these product features that gives you maximum flexibility in terms of incorporating new models in the future? Given I assume kind of your take on the world is there's gonna be all sorts of new, interesting technology coming out, and you don't want to have dependency risk on a specific LLM, for example. And does it mean that you're approaching building products differently today than maybe previously? Because there's this desire to be able to use whatever the best is at that given point in time.

Ryan Glasgow: One of the key inputs that, you know, I've been partnering with Kevin on and, um, is really understanding the bounds of each model at any given time. You know we often think around the jobs to be done that we wanna solve for customers. We have to look at the different models that are available and see what can this model do for us today, and can we test the various use cases and applications for what, what we wanna do with these various models that we've got.

And it's very much like a, you know, I have a one-year-old, and I'm not gonna ask her to ride a bike today. It's not a task that she would succeed at, you know, but I, I can't ask her to open and close her hand. And she's very successful at that. And I remember in the early days, a lot of people actually asked us to also ingest support tickets and other type of unstructured data into Sprig.

And we knew the bounds of the model at the time were not successful with unstructured, uh, customer support tickets. And so what we did was we relied on and encouraged customers to ask, with a template gallery, very specific survey questions about specific moments in time. And ask for example, how was your experience using this feature for the first time?

And you get somewhat like a narrow set of data back. And the model at the time, 2020 2018, it was very successful with a narrow set of structured data. And so I think that's a, an application, you know, a kinda example today that's still very relevant, is you have to kind of understand the bounds of what the model can do.

Is it a one year old? Is it a 10 year old? With that specific task that you want it to do. And so that's where I think the various models, they do have all their strengths and weaknesses. It's not something that you can tell any startup or any team, this model's right for you. You really have to look at those specific examples, test it, but understand with your data type or your specific use cases that you have in mind, what are the bounds of that model? And how advanced is it with what you're looking to do?

Um, and so that's where Kevin is often a value of the different models, you know, for us. And reporting back here are the bounds for these different models available today. The, the, you know, four to five popular models that people are using. And here's how well they perform at the various tasks that we have in mind that we would like to roll out and deliver, uh, as jobs to be done and use cases for our customers.

So it's, it's a lot of back and forth with less so on the customer. But I think more so internally of thinking about where we want to take the product. Understanding the limitations of the various models, and then starting to really dig in and invest more and more internally. When we start to see that task, you know, opening and closing of a hand, or maybe at some point riding a bike, we start to see early success where we think, Hey, maybe this is something that's feasible that we should explore further.

Kevin Mandich: Yeah, on that note, I think one of the interesting points about that, the limitations that we noticed early on, is that, uh, the trajectory of ML at that point was you know, going up and to the right. Um, at an accelerating pace. I think Ryan mentioned we made a pretty early bet that this would be able to do what we wanted it to do at scale and with minimal human intervention.

And so, you know, part of us, part of it was relying on, or kind of following that trend and seeing that trajectory of what is possible with these models, um, could get us there in the future. And so, you know, we really set it up in such a way that we'd be able to take advantage of that. You know, we relied on a lot of human intervention in, in the meantime until we were able to get there.

But, you know, we, we are at the point now where, it can do what we want it to do. 

Brett Berson: How did you, or how do you set up this sort of testing environment or approach where you can look at a different model and decide if it could be useful for Sprig or not?

Kevin Mandich: Yeah. Um, now it's a bit different. Um, you know, I think prior to large LLMs becoming available, you know, our testing framework was asking a model, maybe a series of specific questions like, Hey, does this response represent somebody talking about pricing? Yes or no? Is this person happy or sad? Things like that.

You know, with the advents of chat completions, um, and how, you know, even classification tasks are going through chat completions now. Um, we've had to sort of shift that to, giving an input into an LLM, gathering the outputs, and then either semi automatically or manually determining if the output is what we're looking for or not.

Uh, one example of, of, you know, something we've been doing is feeding in data from different types of questions being asked. You know, maybe question one is, how's your experience one to five? Question two is OpenText, what else can we do to improve your experience? And we've been experimenting with feeding, or have been successful with feeding uh, results from both of those questions into a model and then asking it to, for example, create correlations between those questions.

For example, of the people that gave it a four or five star, uh, what are some of the interesting themes that we see in their OpenText response? And that's just, that's something that's really hard to evaluate in an automated manner, at least right now, because it's a co- more complex task because the output is, you know, non-deterministic and freeform. For now, it really just does require some manual, eval- evaluation. 

Ryan Glasgow: I think that's a perfect example of what I was referring to is that in 2019, 2020, we wanted to feed an entire survey results in the entire survey into the models that we had at the time. And they're not able to come back with those correlations between question one and question two. And come back with much more deeper insights across the entire, uh, set of survey data.

Um, and then we started, actually continue, you know, 2019, no. 2020, no. 2021, no. 2022, okay, we're now seeing some progress here. And so something that we're gonna be formally announcing by the end of this month, uh, in August, is what we call, uh, study level AI, where it's actually the LLM is able to understand the entire study and draw insights and summarization.

You can ask questions about the entire studies' results. And so we're going from, which we're still gonna have today, you know, open text analysis on the question level. To now moving to the entire survey or what we call a study level. And the LLM is now making sense of the entire study. And then now we're also in the process of evaluating different models for what we call product level AI and insights.

And actually the, the LLM is understanding all the surveys they're collecting, and all the different correlations and meaning between all those studies. Um, and so it, it's been exciting to kind of ride the advancements as we go, while also testing and evaluating where those boundaries are, and when we can actually move forward with the value that we're delivering to customers with AI.

Brett Berson: In the case of a, a study level analysis or summary, what, what was that process like iterating to figure out if basically the, the infrastructure was good enough to provide the type of product experience that you want? Particularly because I think you have this issue that we talked about all the way at the beginning of the conversation, which is, there's a lot of subjective dynamics to things like summaries or insights, or those types of things.

Kevin Mandich: to a large extent we, we stick with the fundamentals when it comes to, you know, any sort of product or feature developments. In this case and in actually a lot of the cases now, we are doing a couple different things. Uh, the first is, you know, a really basic data proof of concept internally, uh, where we, you know, take our inputs, run it through the LLM, get our outputs, kinda review it as a team and give our sort of best estimate as to how useful it is, how accurate it is, and how impactful it is to our end users. 

Um, and then, you know, we follow that up with, uh, very early sort of testing with users, um, our own user research. You know, either we show it to somebody in a product through an early beta or, you know, just interview somebody and, uh, kind of get their opinion directly. It's just, uh, as much iteration and evaluation as possible. 

Ryan Glasgow: And I remember later last year, you know, Brett, to answer your question, we didn't have that infrastructure developed for testing. And actually earlier this year, I should say, uh, we didn't have that infrastructure available. We're using chatGPT.

Brett Berson: Where you would just take, a, a piece of data or a data asset and drop it in and see what it could do?

Ryan Glasgow: Exactly because it's, and you, you get the plus version. So everyone involved had the plus, we got everyone the plus account, the upgraded version to access the GPT4 models. And we had a specific example that we are running different prompts on to see can it actually do that cross question analysis, and had some breakthroughs, you know, those late nights, those weekend breakthroughs when you know you're away from everyone, and really kind of hammering on the keyboard and finding out how to get it to do that cross question analysis.

Brett Berson: And most of that work was, was oriented around prompt engineering?

Ryan Glasgow: A lot of what's prompt engineering. Yeah. Can you actually get the model to look at both of these questions at once? A lot of the, the early testing, it was only focused on analysis around here are takeaways, and each takeaway was, uh, only looking at a specific question at a time. So the first takeaway might be around question three.

The second takeaway might be question two. So how can you get the model to actually look at correlations or analysis between, you know, a specific user who responded to all the questions, um, and actually find correlations between, you know, their responses. And that's really where we had to do the prompt engineering.

And I think a lot of the value for AI companies now, you know, is in prompt engineering. I think the early days was actually just can you build the model? Can you get the model running? Can you get the model working in production? Where I think a lot of it is, can you actually work with these models where we don't quite know how they work, even people building them, you know, Google's admitted they don't know how their, their model actually works. But actually to get the output that you're looking for. And then the second test is, can you get it, the output you're looking for in a highly reproducible way where every time you run that prompt, you're getting a result that that customer's looking for in the format they're looking for, but also the accuracy.

You know, and so, you know, not having hallucinations is probably considered accurate right now in the, uh, the ML field. 

Brett Berson: Can you talk more, Ryan about what your end-to-end product development process looks like now? And maybe in what ways would you say, would you say it's similar to the pre LLM world and in what ways would you say when you're building a feature today it, it might look different in some way.

Ryan Glasgow: The uniqueness that we found for product development specific with AI is that, you know, we define product, uh, best in class product development as a product manager and a, at minimum designer starting from the very beginning of ideation. And building a product spec. And a design prototype together. We'd love to have engineers involved sometimes due to, you know, time constraints and other goals and not always able to join.

I think the unique part with AI is that you have to have an AI engineer involved from day one because there's a question of the feasibility of what you're looking to build, and I think that's the big difference is that a best in class product development process has an engineer involved from day one.

Uh, but with AI it's actually gonna be a requirement because there's been a lot of times where maybe our product man- you know, someone on our product management team and design team come up with something that they wanna do with AI and then they go to Kevin and they might spend, you know, a month working on something and we quickly find out it's not something that is feasible.

And so we see more of a, you know, iterative process of testing. Um, with, here, here's an idea that we want to try. Is this something that is possible with the current model that we're working with? Or maybe a different model? And then once we start to get the green light that the, Hey, there is some, you know, there's something here.

We think we might be able to make this work. That's where the you know, product manager, designer, Kevin, along with typically another, uh, engineer, will then start to, in lockstep, build out that entire feature together. And so that's the example that we were referring to earlier around the study level insights, is that there was a lot of that early exploration, you know, chatGPT, some internal open AI model infrastructure that we had developed with retool where we can enter in our own commands and a retool and get back output, you know, from OpenAI. Um, that's, that's how we do a lot of the testing today. Um, but it does require everyone to be involved in that testing process to see, you know, and a lot of times the AI has actually taken us in different, uh, directions.

Maybe it's working better than we expected on something. In other cases it's working lesser than we had hoped, and so we might actually simplify or work within the constraints that we're given.

Brett Berson: Kind of maybe continuing to go down a little bit of this path of, of product development and prioritization, kind of in, in this world of large language models. How has the hype in euphoria in the market changed the way that you think about any of these prioritization decisions? 

Kevin Mandich: Yeah, I think we've noticed a couple things. Um, like you mentioned, there's the obvious, uh, sort of external pressure. I think with the advent of some of these models being released, the bar has been lowered quite a bit. Although some of these analyses or things you wanna do are, you know, more achievable.

 And so at least I think from our point of view, that's kind of prompted us to honestly to, to work faster and to build, build better and faster. On the flip side, I think it's allowed us to build more than before. Um, You know, as one quick example, I spent, you know, several years tuning that, that very first sort of feature we built.

Um, a lot of effort spent on that. Not much effort spent elsewhere because we really wanted to focus and make sure that was, you know, where we were spending our effort. But now that the floodgates are open, we can work on a lot more stuff that expands our ability to, to, to, produce. 

Ryan Glasgow: I would say, yeah, we, we've had so many, you know, all of that testing and hitting the walls of what's possible. In the early years, we now have the ability to actually, the models are now mature enough to actually deliver on a lot of tho those ideas. I think that's been what's been really exciting for us is less on the competitive set and now that they're now incorporating AI.

I think it's more so that the things like study level AI or product level AI have now been unlocked. Where before they were not possible. And at Sprig we see our value for customers as delivering specific recommendations to improve their product experiences. And so, so much of the early years, the R&D went into the collection of the data at large scale.

You know, we've been kept the AI efforts quite lean just because of the possibilities were, you know, candidly quite limited and bounded by a single use case of text analysis on a question level. But our customers ultimately want us to tell them how to improve their product experience and ultimately can we actually push their product experience forward with them?

And so I think that's the exciting part with AI and where we see it going here at Sprig is that we can move beyond helping them, you know, summarize question level analysis to now saying here's actually all the challenges and customer issues we found across an entire survey to now saying, here are all the issues with your product experience across your entire product, to now saying, can we actually tell our customers, here's the issue, but here's actually the recommendation to fix the issue. And can we actually partner with other vendors in the space to say we're actually going to implement that fix for you. With your permission. And so I think the level of what's possible has significantly increased and evolved over the past nine months.

 Our customers expect significantly more value delivered with all the recent advancements in AI. And so I think for us it's just unlocking this huge, uh, level of advancements that we're now quickly shipping this year by far. Uh, more AI announcements and AI features and AI, you know, advancements than the company in all the previous years combined.

And so I think it's just more of that external excitement, external energy shifting to an AI first company where everyone at Sprig, every engineer knows how to incorporate AI into their products. Product managers know how to use a build with AI, designers know how to build and design with AI, and shifting to an AI first company is really the main unlock for us as an organization is shifting from a company that has an extremely powerful, AI feature to being a company that has AI incorporated into everything that it does and how it delivers value to customers.

Kevin Mandich: I think that's a good way of putting it. You know, not only has it accelerated our way towards our, our long-term vision, but it's increased the scope of that long-term vision pretty significantly. 

Brett Berson: What else can you share about what you're hearing from customers as this kind of tidal wave has sort of been built up? Do you see sort of a level of enthusiasm and excitement in the abstract that just feels very different than conversations you were having? Call it 12 months ago. And are there sort of strategy impli- implications of that in any way?

Ryan Glasgow: I think we're seeing a level of excitement from customers, but also they have to be okay delegating some of the control they previously had. It's that balance between man, or woman, and machine, and human and machine. And with the new study level AI that we're shipping right now, it summarizes the entire study, you know, with summarized output. And you can ask it questions, how should I improve my product experience? And I think a lot of people are starting to realize that they need to be comfortable actually delegating something that they previously felt like was a decision that they had to make. And be comfortable with moving higher up in the order of decision making and see AI as a true copilot.

Or I think a lot of people still are trying to retain that control of all the decisions or do all the analysis or may, may make their own decisions around what they wanna do, you know, with this data. And how can we actually move, you know, our, the users using Sprig. And I think the broader, uh, individual working with AI to focus on higher level orders of work, and I think ultimately as like a species.

We're now so much further along if we can trust and actually find ways to work with AI, um, that allow us to leverage our time more effectively. And so I think that balance, we're seeing a lot of people kind of uncomfortable with that balance and seeing about how they fit in and work with, you know, AI together.

Brett Berson: Kind of, one of the things that just came to mind there is that there's this sort of famous Betty Crocker story. Where the original Betty Crocker product had the eggs mixed, mixed in eggs and milk or whatever it is, and it didn't sell very well. But when they just sold the mix without it and required a little bit of effort in mixing it sort of took off. Does that apply to any ways that you're thinking about these AI enabled features where you, you kind of need the end customer to feel like they're at least doing a little bit of, you know, dropping the eggs in and mixing it up so, They've done some level of work where they feel like there's some, um, amount of ownership when they're delivering it or, or sort of some amount of involvement versus just spit the thing out and, and job is done.

Do you think about building these products in that way?

Ryan Glasgow: I will say that it's something that we don't intentionally do, but I think based on the customer feedback and working with customers to develop these features, that's how we have landed. And a lot of them you know, I think a recent example for study level AI, we recommend which questions to ask and they click a button on the question that they want to ask.

And so giving them, I won't say the perception, but the, they still have that involvement. They are still in the cockpit of, you know, their own work is critical. And it's not something that they feel like any parts of their work is being replaced by AI. And I think that's where you have to find the right balance. Um, and to your example, you know, in your own product and working with AI, what is the eggs? You know, what's the piece that the user's gonna bring to the table where they still have that sense of control and authority and purpose and being, but the AI is helping them succeed so much more in their role.

We certainly see as well, I think our positioning is involved about how we let we talk about the OpenText AI. I remember at first the very early days of Sprig there was some wording around your automated user researcher, you know, or it does all the text analysis for you. And that was actually very off-putting for both user researchers and product teams.

And so today, a lot of the wording around OpenText AI, specifically around the question level was that the AI will do the heavy lifting for you and let you review the results and then you can deliver. And so, We've noticed in the sales calls as well, people actually feel better if we tell them that it's not meant to do the entire work.

It's not supposed to give you the answer. It's actually supposed to give you everything until right before the answer. And when you look at chatGPT I think partly why it's been so successful is that it's such a standalone product that's not integrated into anything else. And I think if it actually was integrated into your Slack or into your email or into your work, I think people might feel like they're losing control.

 Cause all of a sudden it's in that application with them. If you have this kind of dark, you know, corner of the web to go to, that gives you a result that you tweak and modify and then you put forward as if it's your own, you actually feel really good about that 'cause you got that done, you know, faster and also, potentially higher quality work.

And so I think for the AI work that we've been developing, it's helping customers get to where they want to go, but maybe not getting all the way there.

Brett Berson: When you started to see some of the early breakthroughs in large language models, was there a period of time that you were worried that it could be a disruptive innovation for the company? That in some ways the core of what the company special with this really hard work that you did in these ML features, and now it made it much easier for anybody else to do these types of things.

And did that inform your strategy in any way?

Ryan Glasgow: We certainly, certainly expected it was a matter of when and not if, knowing that AI at some point would allow the barrier to build the technology to be lower and lower over time. I think the key question that we just wanna make sure we focus on is we're always ahead of the competition. In the field of AI.

And I think that's the dimension that we take a hard look at. And you know, we are seeing other companies in our space add, for example, a survey or study level summarization, uh, but while they're launching summarization of an entire survey, we're adding the ability to ask custom questions, you know, summarization and ask custom questions, and as they're starting to think about, you know, maybe whatever is next for them we're moving on to, you know, product level insights. And something that some investors pushed back on in the early days of, at some point, this will not be a competitive advantage for you, but for us, we will always make sure that we are the most innovative company in the space.

And we just lined a really large contract with a very high profile company that, you know, everyone listening to this podcast has used at some point. And they purchased Sprig because of the AI vision that they saw for what we're going to be working on. And so we just announced AI vision as well, uh, last month. And what we wanna do with AI.

And so as long as we continue to push the boundaries, and that's why I talked about slope in having someone on your team, uh, who is thinking about AI and ML, uh, is that, you know, for whatever we tackle next, I know that Kevin will help us figure out whatever that is. And I also know that it's something that we've never done before at Sprig and it's never been done before.

Potentially in the broader field of technology. And so as long as we can solve those Greenfield problems, you know, with AI and continue to deliver customer value, that's ultimately, the, the race that we're focused on and winning.

Brett Berson: Maybe Ryan on that point, it, it's an interesting chance to talk a little bit about what sales conversations looked like today do you spend more time trying to explain to customers what your worldview is and where we are and where we're going in AI? Or does it tend to be much more problem centric and solution centric?

Ryan Glasgow: It is absolutely far more vision selling than it would've expected. And a lot of times we talk to customers and they ask, we ask them, why do you want us to build AI? And I, it's, you know, very common, faster horse, you know, type of response or it's, I don't know what I want 'cause I don't know what's possible and I don't even know what you could do with these models.

And so going back to the product development process where you have to really rely on what many would believe is a best in class product development process with an AI engineer, a product designer, product manager, all working in lockstep from day one. You know, I think with working with customers with AI, you also have to develop world class techniques of not asking the customer what to build, but instead giving them a set of options to provide feedback on.

And so a lot of the work that we do with AI is we have five different features that we could build with ai, you know, A, B, C, D, E. Give us your feedback and responses to each one. Sometimes we'll ask, what do you want us to build? And you know, they'll say, uh, you know, again, we don't know. We'll show them what's possible.

They'll give us that feedback, but we always hear if it works or if it works, as you say, so there's always that caveat and that skepticism that we hear around what we can do. Um, but on the sales front, it's absolutely around the taking what we hear as presenting those options, those five options or 10 options, hearing which ones are really resonating with customers, and then going on the sales side and saying, this is our AI vision for what we want to do with AI.

And then we've already validated that by presenting those options and getting that feedback and seeing what resonates, and then projecting that for, on the sales side, getting those customers to buy in that this is what I should be looking for in a solution with AI.

Brett Berson: When you're getting that customer feedback and you're kind of teeing up five to 10 specific product ideas or, or potential AI features, are you giving them points to like assign importance? Is it qualitative? Is it quantitative? How are you getting them to rank those?

Kevin Mandich: A combination of both, I think. Um, yeah, we have a couple different feedback mechanisms. Like I mentioned in-person interviewing, um, you know, getting async information. Uh, we use our own product to get async feedback. You know, we ask people one to five, how useful was this? You know, OpenText, what can we do to improve this? Pretty much anything we can get our hands on.

Brett Berson: And Ryan on the, on the sales side, how does that kind of vision setting. Fit into your sales process.

Ryan Glasgow: We have a specific, you know, AI vision deck, and then if you go to sprig.com/ai, you know, we've publicly shared our vision, uh, for AI. And so I think it's just going back to validating that with the current customers and then in the sales side, really seeing, you know, what is standing out. And asking, you know, what parts of this are most interesting and applicable to the company that you know you are at?

And, and hearing that feedback. Um, but we're seeing a lot of companies very similar to digital transformation. There's an AI transformation that's happening, and a lot of 'em are saying, we have been tasked with finding and applying AI to the work that we do. You know, this is an OKR, this is a company mandate, you know, from the CEO. And taking what we're saying to them and pitching internally and saying, this is how we want to apply AI for our role, for how we develop products here at our company.

And so that's where there's a lot of sales enablement of telling them this is how AI should be transforming how you develop products, and really setting that narrative for them to take with them.

Brett Berson: Switching gears slightly. You both mentioned this a little bit, but can you share more about, the org structure has changed and maybe product teams have changed if it's been significant over the past year?

Kevin Mandich: Um, I can start with, uh, specifically AI and how we've, how we developed it at the company. I think recently, the past six months, we've modeled our, what we call our AI squad, based on some successful examples we've seen in industry. Kind of like the vertically integrated approach. I think Ryan mentioned this, but a dedicated designer, dedicated product manager, dedicated engineers that are responsible for full end-to-end delivery of, you know, AI related features. 

I think this is in contrast to some other examples, and I think something we've tried before where, you know, maybe you have a dedicated ML team or AI team, ML scientists, ML engineers, research engineers who will, you know, sort of receive tasks and return the results. You know, maybe have a separate product team that needs AI, uh, some, some AI feature with in what they're building.

Uh, they'll kind of contract it out in a sense to this separate team. You know, I, I've read a lot about companies that have kind of taken that approach and from the times we've tried it not as much success. Uh, I think mostly because there's an alignment issue sometimes. Um, it's just kind of harder to get that end-to-end release of a quality feature.

And so, like I mentioned, yeah, we've had a lot of success with this fully vertically integrated approach.

Ryan Glasgow: And we've certainly seen those challenges as well, where a lot of the models today require, they have engineering challenges as well as data science challenges. And so having AI engineers embedded with, traditional engineers has been the big breakthrough for us in working together to develop and build these features.

And not seeing AI as, like Kevin said, an input output. You know, the, the model spits out a response that the engineer then integrates. We're seeing. It truly does need to be, the AI data scientist embedded into a broader squad and a part of that squad delivering on and building a feature together.

Brett Berson: And in the case of AI engineers, in the context of your company, Is the profile and background relatively same, similar, excuse me, to the same type of AI or ML engineer you would've hired in 2020, or is the spec change for the person?

Kevin Mandich: A lot of similarities to who I would've hired, um, in 2020. Like I mentioned before, a lot of the fundamentals remain the same. Uh, we need a lot of the same skill sets. I think given our current switch to hosted LLMs and away from, you know, taking Google's BERT and, and fine tuning it ourselves more towards, I guess we call like a full stack ML engineer, somebody who's able to, you know, help, uh, integrate this and actually implement parts of this in, in your backend I think we would switch more towards, towards that.

Ryan Glasgow: Yeah, so more of that hybrid.

How 

Brett Berson: about how you share knowledge and sort of upskill existing product and engineering folks? In your work, do you have rituals or things that you found to be impactful other than this important idea of these kind of cross-functional pods where you have specialists on the team?

Kevin Mandich: Well I know one change, change that, uh, Ryan masterminded pretty recently, actually not recently, a few months ago, several months ago, was um, giving everybody at the company who wanted it access to, you know, chatGPT, especially people on the AI squad, other engineers, uh, with the sort of overall directive to, you know, try this out, uh, see how can you can use it in your everyday job.

And as people used it and have been using it, I think everyone's sort of getting ideas of how, you know, we can use it in our own products um, and with some of our features. And you know, I've gotten prompt engineering ideas from folks around the company, uh, from pretty much every single department.

Things I wouldn't have thought of before because, you know, people have different points of view and, uh, that's actually been immensely helpful. So yeah, that, that's one great example of, uh, I guess that, that proliferation of, of this, this tooling and that requirement.

Ryan Glasgow: Yeah, chatGPT we did a, at our all hands, Kevin actually led a training on how to use chatGPT and you'd be surprised a lot of people actually have not used it yet. And so I think, you know, a lot of us, if you're on Twitter, you're getting the weekly updates on what's happening in AI. You've used, you know, chatGPT, you've tried Bard, you've tried all the models.

But I think for a lot of individuals, they're still learning how to use these models. There are so open-ended today, and so Kevin ran through various examples, you know, interview questions to ask in an interview. For someone that you're interviewing, uh, for sales team, understanding sales contracts.

And I think the more pervasive AI becomes within a company, the more that people realize to Kevin's point, how it can be utilized, you know, in the, in the, uh, development of a company's own product. Another one has been GitHub copilot, is that we pretty much required all the engineers to use GitHub copilot.

And so we're constantly thinking about what are all the ways, all- where all the vendors that we can utilize Intercom with Fin we've been trying out Fin, uh, with our own customers for our customers answering customer support tickets, and all of a sudden gi- gives everyone that AI hat they can put on and think about, uh, internally.

Kevin also ran a, a brown bag for our engineering team for one hour and really got everyone up to speed on how to, use the open AI APIs. And how to integrate AI into the features of the developing, because we have a, an AI squad now. But the next step for us is actually having every engineer know how to incorporate AI.

And every product manager and every, you know, everyone on PDE should know and have built, uh, AI into one of their features by the end of the year. And so to get there, getting people comfortable and understanding, you know, AI maybe in existing vendors today. Then understanding how to use it in their own, and build with it, in their own products.

And then ultimately, incorporating AI into a feature that someone has built, uh, here at Sprig.

Brett Berson: Ryan, I'd be curious to get your perspective on how you think other CEOs similar to you, might think about building products this new world. I think there's some nuance because obviously the founding of Sprig was oriented around AI and the ability to sort of synthesize information. But I'm curious, I'm sure you spent a bunch of time with other CEOs that have, you know, a product in market with early traction or doing well.

So in some ways they're incumbent, but they're not, you know, Salesforce or AWS or something like that. So they're a scaling startup. And we have this new wave of, of pools basically at their disposal. Do you have any thoughts on sort of advice or perspectives you'd give to other similar types of, of CEOs and founders to try to help them kind of navigate this and also maybe reduce the chance that they just chase shiny objects all the time, which I think is probably one of the number one critiques of what's happening right now. 

Ryan Glasgow: To give, you know, recommendations and advice for others. A, I think a lot of CEOs right now are thinking about what is their AI roadmap and there's probably expected to have something. if it's something that you're really serious about, I do think it's critical to bring in an experienced data scientist with an AI background into the company, you know, very similar to for all the reasons that, you know, Kevin has been sharing and I've been sharing about, you know, why Kevin's been so integral to our success is that you do need, an AI architect.

and historically companies have had a CTO for the technology and thinking about the, the architecture of the technology and the stack and how they're gonna build their architecture. And I think going forward, many companies will bring on AI architects that might even be at the same level or report directly to a CTO to think about what are the models that we're gonna use and what are the boundaries of AI and how can we build reproducible AI and think about this in a systematic way to build a foundation for a company to be AI first. And so for here at Sprig, as we shift to being an AI first company, you know, Kevin is tasked with setting our AI vision from a technology perspective, but also ensuring that our entire, uh, PDE team knows how to build AI features and functionality. And so it's a, you know, very large challenge of thinking about all these different dimensions, but also helping guide and steer our roadmap. And so if you're really serious about AI with, which I imagine most companies are having that AI, uh, architect to lead the way is gonna be really the, I think the start, the first step.

I think the second thing is continue to stay focused on the customer value that your customers are looking for. And so a lot of companies right now I'm seeing are, you know, coming up with solutions with AI. Instead of actually thinking about what is an existing problem that exists today that AI can actually solve now that maybe didn't exist in the world previously. And seeing AI as really just a means to delivering customer value and not something that you wanna rely on to create a new solution that doesn't maybe solve something that a customer is looking to do. And so a lot of, you know, AI launches and features that we're seeing announced that are interesting technological breakthroughs or maybe new ways to apply AI. But it's not always clear what exactly the problem is that they're solving.

And so at the end of the day, the customer's still looking to go from point A to point B. And just like other technologies, AI is a way to get them from point A to point B. The third thing is we are seeing the incumbents, uh, Microsoft, you know, being one great example as actually the leaders in the field of AI.

And so we're seeing a lot of ch- you know, from my perspective, a lot of ch- startups are having trouble breaking into industries because they don't own the core workflow or they don't have the proprietary data. To provide the, AI model. And so you mentioned if other customers launch text analysis as a competitive threat to us, we own the proprietary data that we collect with Sprig's own SDKs and directly from our customers' customers.

And so we know that by owning, and continuing to collect that proprietary data ourselves, it sets us up to be a vertically integrated solution to end that, own that workflow from end to end. And so we're seeing startups emerge that might own just the analysis. But the companies that are actually collecting or owning that data themselves could easily build in that analysis.

And so knew very early on that we wanted to focus on collecting our own data. And then following the breakthroughs of AI to then start to incorporate AI to analyze that data at large scale and together, the collection and the analysis is ultimately that value that we're delivering to customers of helping them, , understand deeply how to improve their user experience.

And so if you are you know, a founder or someone working on ai, you really need to think about if you're not owning that core workflow or if you're not owning the data yourself, then it's very easy to be disrupted and someone else could easily, you know, and most likely will build what you're delivering to customers.

Brett Berson: Maybe this would be a good place to sort of wrap up and on a similar theme, have you share a little bit more about how each of you think about maybe the next 6, 12, 18, 24 months in this wave of LLMs and maybe what are some of the things that you think are over-hyped and some of the things that you think are under-hyped?

Not even in like the 10 year vision of what's happening in AI, but maybe the, the sort of, near term future in the sense of where you think there's a lot of really exciting opportunity where you think we're gonna see some really big breakthroughs in a ton of end customer value delivered. Versus what are the things that maybe are a little bit more mirage like that everybody feels like is gonna happen in the next 12 or 24 months?

But given your understanding of the contours of the problem, it's actually probably gonna take three, five, or 10 years to sort of see that thing exist in the world.

Kevin Mandich: A couple immediate ones that come to mind include, uh, this is something that we've been dealing with at Sprig is, uh, the ability of a model such as an LLM to, uh, successfully answer a quantitative question or do math or even count to 10 successfully. Uh, you know, right now a lot of these models fall flat and for an application like ours and for a lot of applications out there that's kind of a pretty glaring omission. You know, the, these models are fantastic at summarization. They're really good at generating text based on other text. But, uh, they're not, they're not quite good at, uh, you know, basic math and things of that nature.

So if you're trying to do any sort of analysis, it's, it, it really is kind of a, a big omission right now. 

Ryan Glasgow: And I'll just jump in there 'cause that's a big topic we didn't cover. That's been a huge challenge for us is simple comparison of managers that you would expect to be second grade, third grade math is shockingly incorrect on the results. Uh, so I, I won't go into that any further, but I think just wanna em emphasize what Kevin was calling out there, is that I, it's been a, a major challenge for us to work around and talking about the bounds of AI.

That's sort of something that right now is a wall that that we are hitting here at Sprig.

Kevin Mandich: Yeah. And you know, it's, it's something that's not entirely unexpected. It's, this is not what LLMs were created for. They were created to, go from input to output with text. Numbers can be text, but it's, it's kind of a different domain. But yeah, that, that's one thing. 

Brett Berson: Are there certain implications of that or, or does it highlight maybe a misunderstanding as people might have today as it relates to the technology.

Kevin Mandich: I think it's both. Yeah, it's definitely a misunderstanding. I think people, will see, have seen chatGPT do amazing things. They'll ask it to write a recipe to summarize an email, to write like a pirate, whatever, and it, it just knocks those out of the park. And then somebody asks it to do their taxes or, you know, like, uh, add 12+18, um, and it returns four or something, right? And people kind of say, whoa, what, what's going on? Like, this thing is incredibly intelligent, in some cases, super humanly intelligent. But it can't do second grade math, like, what's going on? So that, you know, I, I think that is really just a, a fundamental misunderstanding. Probably something that could have been publicized better when, when these things came out.

But anyways, that, that's kind of where we are. 

Brett Berson: Do you think it, it means that these large language models are, quote, less intelligent than a lot of people think?

Kevin Mandich: I think they're less capable maybe is a better word. Yeah, it's just, uh, they're built, the, the way they're trained supports a handful of really important use cases, but just not everything out there. People have found ways of getting around this problem using LLMs, like for example, asking your LLM to write Python code that will add 16+32.

Um, and in that case it's really good at it. Uh, but you know, you don't necessarily want to have to go through that circuitous route to get your answer. Aside from that, you know, I think we're, you know, related to that, we're starting to see the down part of the hype cycle with this, uh, just the limitations that become, became pretty obvious, maybe that that quantitative sort of failure falls under a broader category of failures that these LLMs will experience.

But you know, despite not knowing the answer, will give an answer very confidently, right? I think that's gonna be a major thing to over- a major challenge to overcome because, you know, right now there's really not a good way to evaluate these at scale or auto in an automated way. You can't necessarily read through every single sentence a LLM generates when it's summarizing text for you. And I think that's going to hurt trust, uh, in a lot of these and maybe kind of slow down adoption. Because if you're really counting on these, these models to, you know, sort of re- replace low level intelligence as it were, if you can't trust it to do some basic stuff or to always give you the right answer, then you know, why, why would you use it?

Right? Uh, especially for very, for applications where accuracy is very, very critical. So yeah, to your, to your original question I think issues like those hallucinations, lack of quant answering, those are probably gonna be the major things that get released over the next couple years. Uh, I don't know what that looks like.

You know, it might be an LLM that's part of a broader sort of array of models that attack a single input. But since those are the biggest, you know, most glaring failures of these, I, I think that's what we're gonna see the next 12 to 18 months.

Ryan Glasgow: I'll say right now it's a little bit of like a teenager phase. There's a lot of maturity that needs to happen, uh, where we can just put all this behind us. 

I think the exciting step function that gave us a little bit of a glimpse of the future is auto GPT, where it can, you can give it a very complex task.

You can say plan a birthday party in Nappa for 20 people with a budget of X amount of dollars. And it'll take that broad, complex task and break it up into subtasks and actually run the subtasks too. One model or different models and actually make sure and verify that each of those subtasks is being done correctly.

And I think that concept is where the next wave, and I think we'll see that kind of absolute maximum of AI breakthroughs, is that, you know, even here at Sprig, we are using AI to solve very specific jobs to be done, you know, uh, such as you run a study, you wanna understand the study results, you look at the open text analysis for a specific question.

We also applying AI to help you summarize all the open text responses for a specific question. But what if you could actually apply the concepts of auto GPT and say, write a product spec about how to improve the onboarding flow and the AI could then run the Sprig study. You know, for you. Design the questions, launch it, optimize the study, so it's getting just the responses that you need.

When the results are done, analyze all the results for you and actually write the product spec based on the results that it found. And then you have a product spec that is typically considered multiple jobs to be done, but you're actually delivering a very complex task that AI is able to break down into smaller subtasks and then complete, and you actually get this kind of end result. 

Brett Berson: You could even have a model then break it up into Jira tickets and so on and so forth.

Ryan Glasgow: Exactly, exactly. Yeah. It could be implement this product spec. It could run it by your team. It could break into Jira tasks, it could, you know, engineers are working on it, engineers are using copilot to write it. And so right now we're applying AI to very specific steps along these broader, larger, more complex tasks.

But I think having AI couple those tasks together into complete workflows. And one thing we started to work on is actually our AI partnerships. So there's a lot of companies right now that are working on prompts to build designs or pro, you know, LLMs to ship code and changes the code. Can we actually partner with companies in the gen design space or gen development space where Sprig is the input into an LLM that's creating a mockup or writing code and actually complete end-to-end, very complex workflows of several jobs to be done and actually have these handoffs between different agents and Sprig maybe being one more complex agent.

 And so I, I think the, the eggs though, is the open question. What is the piece that the user brings to, to the table to make that final cake? But Kevin, I'm curious your thoughts on that. 

Kevin Mandich: Yeah, um, I mean what you mentioned with auto GPT and what, what it's able to do now is a great sort of window into what's coming in, probably the near future. Um, I mean that's essentially using auto GPT or something like Langchain is like taking a bunch of really powerful, you know, car parts and sort of mashing them together to create, you know, a really janky car that has a very powerful engine.

That's the first step. The next thing that comes is a sort of more cohesive model that comes outta the factory, is able to, you know, do all of the things you wanted it to do when you were sort of gluing these pieces together. And once you have that, I mean, it's just continuing this march down, this march towards full automation, across a bunch of different domains. Um, for us, like you said, Ryan, maybe that means, um, you know, determining how to build a product spec for a user researcher. Maybe it's guiding them throughout the research cycle saying like, Hey, this is what your users think um, this is what you should ask them next.

And depending on the answer, this is what you should ask them after that. I think it's just gonna guide us down or bring us down a road of yeah, helping at different points of, of, uh, you know, somebody's, somebody's workday, I guess.

Brett Berson: Just to kind of bring that to life a little bit more. Ryan and sort of that medium term vision of kind of the auto GPT inspired ability to break up tasks, have them done, have them brought back, and then have that sort of loop be closed and a whole nother set of tasks be kicked off. And in the context of the Sprig product, do you think that's, if you had to guess, is six to 12 months, two years, three years, five years?

Like when you actually look at the technology, when you look at where all this enabling technology is, what's your best guess in terms of that? What it's gonna take in terms of time horizon for that vision to come to life?

Ryan Glasgow: The exciting thing for us, you know, and you mentioned about others being able to build a text analysis is that based on what we've seen today in the boundaries of the LLMs today, is that we could, based on what we have tested with the models, is actually build that now. And we are seeing the, the sophistication capability of the models to build that now. I think the roadblocks we've seen is more around the reproducibility, the reliability. You know, some of the other edge cases, like the quantitative analysis. And I think the other piece is really the comfort of the user and the comfort of the customer with those more complex tasks and giving them that control.

And I was actually on a, speaking on a webinar and I've talked about this co-pilot and someone chimed in with a, a comment said copilot soon to be pilot. You know, a little bit of a dark, you know, comment for people, you know, in that webinar, uh, because I think it is, how do we make sure that, you know, there is that sense of control and that we're all involved and we're co-creating with AI, and AI is not creating for us.

And so I think the waves that we're really riding right now is really the, the reproducibility and the comprehensiveness of the models, which I think for Kevin and the team, it's more of a kind of a tactical roadblock. I'll say speed bump. It's not really a roadblock, it's more of a speed bump. Uh, and then the speed bump of getting the user comfortable with how far we can take what's possible.

Uh, but we're absolutely seeing really just a function of time, of being able to deliver on that vision, which to me is so exciting about starting with just text analysis and now delivering on very complex workflows that we've always wanted to do, but haven't had the LLMs to deliver on this until now.

Brett Berson: Good place to end. Thank you two so much for coming on and talking about all the stuff you've been, uh, you've been building at Sprig.

Kevin Mandich: Thank you Brett.

Ryan Glasgow: Awesome. 

Thanks for having us​