Building enterprise AI products with PolyAI

Hey readers, this is a new series in which I interview people at the forefront of building AI products for enterprises. Through these Interviews, I hope to share the hard-won lessons these folks have gotten from doing large-scale deployments.

Shawn Wen, CTO and co-founder (left) and Devidas Desai, SVP Product (right)

In this interview, I speak with PolyAI’s Shawn Wen (CTO and co-founder) and Devidas Desai (SVP of Product) about building enterprise AI products and how communications & AI have evolved with generative AI. PolyAI develops enterprise conversational assistants that engage in natural conversations with customers to resolve their issues. These assistants understand customers regardless of their phrasing or manner of speaking. While voice interaction has gained recent popularity due to GPT-4, PolyAI has been at the forefront of this technology since 2017. The company recently raised a $50M Series C round led by Hedosophia and NVentures (NVIDIA’s venture arm), bringing their total funding to $120M from prominent investors including Khosla Ventures, Georgian, Point72 Ventures, and others.

Enterprise AI adoption requires balancing multiple stakeholders: While customer experience heads traditionally were the main decision-makers, generative AI brings security, IT, branding, and legal teams into the conversation. This complicates the sales process but also creates opportunities for education and addressing diverse concerns.
Enterprises often hold AI to higher standards than humans: As Devidas noted, AI assistants may be expected to stay rigidly on-topic in ways that human agents are not. This creates challenges in making AI seem natural while still meeting strict enterprise requirements.
Customization and control are critical: Some enterprises have extremely specific restrictions, like forbidding an AI from stating basic facts unrelated to their business. AI systems need to be highly configurable to meet these idiosyncratic needs.
Practical solutions trump theoretical ideals: Shawn emphasized the importance of being practical rather than trying to build the perfect technical solution. Time-to-market, packaging, marketing strategy, and sales execution are as important as the underlying technology.
Layered safeguards are necessary: PolyAI uses multiple layers of protection, from general content filters to project-specific customizations. This allows tailoring the AI’s behavior to different levels of enterprise risk tolerance.
Iterative testing with clients is crucial: Finding edge cases and potential issues requires extensive testing, both internally and with clients. This process accumulates knowledge over time that can be applied to future projects.
Transparency about limitations builds trust: Being open about the current state of generative AI technology and its limitations, while showing a clear roadmap for addressing concerns, helps enterprises feel more comfortable adopting these solutions.
Self-serve capabilities are becoming important: With generative AI, some enterprises want more control in maintaining or even building their own assistants. Providing tools for this can be a differentiator.

Kenn: Thank you both for joining me today. I’m excited to talk with you about PolyAI. To start, could you each share how you found your way to the company?

Shawn: I’m Shawn, co-founder and CTO of PolyAI. I was there at the very beginning with Nikola and Eddy when we co-founded the company in November 2017, about 7 years ago now. The three of us met at the University of Cambridge. I actually met Eddy during my undergrad, as we’re both Taiwanese, so our history goes back about 15 years. Eddy and I went to Cambridge to pursue our PhDs under the same supervisor, and Nikola happened to be in the same year as well.

We were actually the last batch of Steve Young’s students. Steve Young is a pioneer in speech recognition who created the HTK open source toolkit. It was initially sold to Microsoft in the early 1980s, then open-sourced, with Microsoft continuing to maintain it. It was a traditional n-gram based and hidden Markov model toolkit for speech recognition. Since then, Steve moved on to building dialogue systems and doing research in that area.

I think the reason we’re doing what we’re doing now is very much because of Steve, both on the research side and the entrepreneurship side. Steve actually started three companies and sold them to Microsoft, Apple, and Google respectively. So because of the research topics we worked on with him, and his entrepreneurship history, it pointed us towards wanting to do something together.

Before graduation, we all went to big tech companies – I went to Google, Nikola to Apple, and Eddy to Facebook. But we started to feel it was a good time to do something together. So we all came back to the UK and started the business here. That’s a brief history of the company’s founding.

Kenn: That is cool. I did not know that about the founding team’s history with Steve Young. Thank you for sharing that background. Devidas, how about you?

Devidas: My story isn’t as cool as Shawn’s, but I’ve been in the collaboration and communication space since I started in product management, and the field really fascinates me. Joining PolyAI allowed me to stay close to communication, but leverage conversational AI to improve customer and caller experiences. Of the companies I met with, it sounds cliché, but honestly the quality of people I met with helped me make the decision to join PolyAI. The strength of the technology team was just super impressive, which is what essentially brought me here. And it’s fantastic to be here.

Kenn: That’s great context. Shawn, you mentioned studying for your PhD. Machine learning was a very different world back then, from both an academic and industry perspective. How has the AI field changed from your perspective?

Shawn: It’s a very interesting question. I started to get into AI and machine learning in my third year of undergrad. I actually started in electrical engineering, then gradually shifted towards AI because I found it more interesting. I began with speech recognition, which involved a lot of signal processing. As I mentioned before, it was hidden Markov models and n-gram based models. These approaches are actually very Bayesian, and that continued to be the case when I went to Cambridge, which is traditionally a big Bayesian camp. You have people like Zoubin Ghahramani there doing a lot of Bayesian-based approaches, graphical models, Gaussian processes, and so on. It’s very mathematically heavy and intellectually interesting.

That kind of model was actually popular for quite some time, and deep learning was just something in the background because the compute wasn’t there. But many of the algorithms and optimization techniques we’re using these days were already there. I think the shift started around 2013-2014 in the speech recognition field. In those days, if you had an improvement of word error rate by 0.5%, you could publish a paper. But Microsoft trained a deep neural network model and achieved a 4% reduction in word error rate, which was huge because for several years no one had managed to achieve that. It blew everyone out of the water, a bit like OpenAI with ChatGPT these days, but people don’t remember that because it was more in the academic world.

Afterwards, you see a bunch of people starting to train bigger neural networks and applying them to different use cases – first ImageNet, then people moved on to different problems like NLP. Deep learning has progressed a lot since then, and a lot of the benefit is because the compute power caught up. People started to realize it’s very difficult to figure out exact, beautiful mathematical equations to solve real-world messy problems. So people started engineering these large machines and piling data into them, and magically, you get something really powerful.

I think the reason large language models surprised a lot of people is because previously, if you were training small language models, it was really about garbage in, garbage out. It wasn’t really that helpful. And magically, at some point, once the model reaches a certain scale and the data reaches a certain scale, it reaches that tipping point and boom, it just starts to work really well. I think this caught a lot of people by surprise.

Kenn: GPT-4o is a big leap from the early days of neural networks that you sketched out. How has GPT-4o affected PolyAI or your thinking about building products in this space?

Shawn: GPT-4o is super exciting, and this kind of multimodal model is at the forefront of technology innovation. I think it’s going to make things potentially even easier for everyone. We’re super excited about it. In fact, we’re actually in touch with OpenAI to make sure that we can get private access once they finally release it. It’s unfortunate that they recently announced they have to push it back for another month.

We’re very excited about it, but at the same time, I think getting these models right is probably quite tricky. That’s probably the reason they’re pushing back the timeline. Even for the initial announcement, they’re only going to release it to a few trusted partners to test it. If you think that hallucination in GPT’s text version is already dangerous, you can totally imagine that hallucination in the voice channel is a completely new world.

We haven’t actually gotten our hands on testing the system yet. But we do see that for building consumer-facing products, which I think is what they’re trying to do, they will make a lot of huge progress. I think for enterprise applications, I wouldn’t think that you would actually gain so much quick adoption yet, just because the end-to-end modality makes it even harder to understand what’s actually going on. If GPT-4o suddenly starts screaming at you in the voice channel, you don’t exactly know what’s happening or what input is causing the trouble. I think that would actually make enterprises worry. So we’re super excited, but we’re also cautiously thinking about what would be the best way to use it because we know that a lot of our enterprise customers would have concerns.

Kenn: It’s a good distinction. As a consumer, I’m super excited about getting my hands on GPT-4o. Have you had prospective customers or customers reach out to you asking if you can build something like GPT-4o for them?

Shawn: We haven’t heard that yet. I think people still don’t quite understand what it means, especially the end-to-end modality of it, because they haven’t actually gained access to it yet. Once they gain access to it, I do imagine that people will start to ask about it. But there’s no way for them to try it yet. And I think in the documentation or the announcement, unless you’re a techie, which the majority of our clients aren’t because they’re actually operating contact centers, they don’t quite get what’s coming yet.

Kenn: It’s interesting that the models continue to progress at a very fast pace. How do you think about the rapid progress of these models? And how do you future-proof yourselves, or your tech stack? Because I’d imagine that multimodal capabilities will change your tech stack quite a bit. But also from a more strategic perspective, how do you think about the progress of these models and how that relates to the problems you’re solving for your customers?

Shawn: I think the models will continue to evolve, and they’re going to evolve at a very rapid speed because now there are more and more companies jumping into this kind of large model training and multimodality. Therefore, I think these will continue to progress super fast. As a company, our philosophy is always that these are all new tools that we should incorporate into our product or offering. Not one single technology could be for everyone. I think it’s very important to actually find the right technology and tooling for the right customer.

Our goal has always been to continue to invest and innovate on our own technical stack, but at the same time, we should keep an open mind to incorporate any of the latest technology into our offering. Because at the end of the day, our goal is to make the best, human-like voice assistants that enterprises would be comfortable adopting and using, and that callers would be happy with as well. So to us, it’s really about how we can continue to incorporate these new technologies fast enough into our product offering.

Devidas: I would just add that from a product standpoint, we are building this as a true platform. Each of the components that power the experience, whether it’s the listening piece, the reasoning piece, the speaking piece, or the biasing piece, we can offer the best in breed. Some of the best-in-breed pieces could just be PolyAI tech, which is proprietary, and a combination of that just provides an amazing caller experience because we are the thought leaders when it comes to bringing this together and building proprietary tech on the available technology so that we are thinking about the caller experience.

From a product strategy standpoint, we know what we are good at. We know what are some of the areas that we have an upper hand on, and we want to make sure that we continue to build on that proprietary tech, make it a part of the platform to solve for amazing caller experiences.

Kenn: Devidas, you’ve been a veteran of the communications space. How has communications changed, either from a personal or business perspective? And how have developments in AI changed or shaped your view of communications from a product perspective?

Devidas: Those are both excellent questions. In general, I think the changes have been very sticky because the technological changes have actually also changed user behavior. From a communications and collaboration standpoint, previously, communications were very simple and single-channeled. We’re talking about either an email communication, an IM, an SMS, or a phone call, and they all had different jobs. That’s exactly what users were expecting. If I want to email you, I’m going to email you, and I would expect your email back. The lines were quite thick.

Then we started moving towards modalities coming closer, where you started having the concept of messages having attachments or mentioning people in your emails, and the lines started getting blurrier. This is where you bring multiple modalities together. Then we moved over to web conferencing, which led to UCaaS (Unified Communications as a Service) products, bringing messaging, video, and phone together. It almost doesn’t matter where the user is or what the other side is using – you have all of the modalities, and you can switch between them to talk to people. It’s about the speed of communication.

Now we’re having AI-driven communications. All of these changes are really impacting user behavior and changing user habits. People are getting more and more used to spending less time on communication channels and making them super efficient. Previously, users were expecting simple, basic communication channels. Then we started moving towards multi-channel communications. Then users started expecting 24/7 instant responses, which brings AI into the picture because you don’t necessarily want to power those communications with humans all the time.

Now we’re in this phase where people are expecting 24/7 instant responses that are personalized. You don’t necessarily want to go over the same details over and over again. When you reach out to a rep or an AI, you expect the other side to know who you are, what your preferences are, and just pick up the conversation where you left off.

It’s been fascinating to see how the tech has evolved based on users’ expectations and vice versa, and how some of these changes are very sticky. AI is really helping provide quality communications to users. Separately, it’s helping businesses allocate human capacity to the highest ROI customer profiles. The way we see it, AI is not replacing anything. It’s actually making the spend on customer service, customer interactions, communications, or collaboration in general more efficient. You’re getting a higher ROI because you’re now in a position to determine which conversations you want bots to handle versus humans, and how personalized you want those exchanges to be.

I’m pretty sure we’ll soon see a world where it really doesn’t matter as a user or caller who I’m speaking with, whether it’s an AI-powered bot or a human. As long as my experience is great and it does the job, I really don’t care whether I’m talking to a human, because I don’t want to spend too much time on a phone talking to a customer service professional anyway.

Kenn: That’s a good segue to finally talking about what PolyAI is at a high level and what problem you’re solving. Could you elaborate on that?

Shawn: PolyAI has been quite consistent in terms of what we’re building. During our PhDs, we were doing research on conversational systems, and the company is really an extension of that. We continue to build voice assistants, primarily over the voice channel. The same technology can be used for building text-based assistants as well, but we’ve been focusing on developing a really good voice assistant because putting all these technologies together requires a lot of focus.

Our primary market is in enterprise contact centers. A lot of these are communications that happen that normal people might not pay attention to. But whenever you have an issue and you call a contact center, especially in the Western world, the quality of service is often not good enough. This is because of labor shortages, increasing labor costs, and the fact that there’s still a requirement to staff enough people in a contact center, but fewer people are willing to do that kind of job. Or if they do want to do it, it’s often a temporary job – you come in, train for 3 months, and 6 months later, you’re out.

There’s also the challenge of effective communication between callers and offshore contact centers. When you have an issue with your bank and you call, you might speak with someone from a different part of the world who may not share the same cultural context or communication style. This can sometimes lead to misunderstandings or difficulties in fully grasping the caller’s situation and concerns. As a result, many customers find it challenging to connect on a personal level during these interactions. This is one of the reasons why many companies initially moved their contact centers offshore, but are now considering bringing them back to their home countries in response to customer feedback. However, the underlying issues persist – service quality is declining, there’s a shortage of people willing to take on these roles, and contact centers continue to face significant challenges.

We think technology can solve this kind of problem. In a contact center, voice calls actually need a dedicated agent listening to the phone call and speaking to the user on the phone. While chat or digital channels like emails are much easier because you can parallelize the work, voice is especially challenging for contact centers. But a lot of people still want to call when they have urgent issues or when they’ve tried to do something online but couldn’t.

PolyAI is trying to help solve that problem. We want to place AI agents into contact centers. The AI agent should be the best tier 1 agent that the enterprise can have, representing their brand. They need to communicate naturally and sound like what the brand wants them to sound like. We’re not intending to just automate the entire workflow because naturally, there are certain kinds of use cases that AI is not good enough for yet. Like a lot of emotional cases, some cases that require lots of empathy, and some use cases that require very complicated transactions. These are things where tier 2 human agents will have a very strong capability to help, and that’s where AI can step out and hand the call back to the contact center.

A lot of people have been thinking about AI taking people’s jobs. I think partially that’s true, but it’s also not true because there are already problems with contact centers – they just cannot staff enough people. And arguably, training AI is also not free. You actually need to supply it with a lot of data, and there should be a lot of humans in the loop as well. As new technology continues to evolve, there will always be new jobs being created, even as some jobs go away.

Kenn: That’s interesting. I grew up in the Philippines, and I know turnover rates in contact centers there are 50% every year.

Shawn: Even in the Philippines, yes. It just tells you how tough the whole business is.

Kenn: I think it would be helpful to sketch out who the different users and personas involved are. As an end user, I just talk to someone. But I know in the enterprise, you have corporations that may or may not outsource it to someone, and they may have their own requirements. Then you also have the end users. How would you sketch out the problems and how you’re solving for those different personas?

Devidas: Maybe I can talk about this a bit. I’ll answer this in two ways. There’s been a historical persona who we’ve been selling to, and then I feel like generative AI is definitely changing the landscape in terms of who’s involved with respect to the buying process and the maintenance of the conversational assistant.

Historically, and this is still the case, customer experience heads and heads of contact centers are still the primary people we sell to as far as enterprises are concerned. I feel like generative AI is changing that landscape a bit and is bringing in a few more personas that are involved in the buying process. Because with generative AI, you’re no longer just solving for customer experience, cost efficiency, or improvement of CSAT. With generative AI, you have to worry about how safe the solution is. So that brings security and IT into the mix.

With generative AI, you also want to make sure that, especially for a voice assistant, the voice is on brand and you are appropriately representing the brand that the assistant is taking calls for. So that brings in someone like a CMO or head of branding into the mix as well. It’s been a really interesting shift where we are seeing more people coming to the table with respect to making decisions, and it’s fascinating. Some of these interactions are very exciting in terms of us either educating them or answering good questions.

As Shawn touched upon, one additional persona that we’re seeing come into the mix is that there are teams and jobs being created for every conversational assistant that is deployed or sold or built for a particular brand. There are people responsible for monitoring the calls, making sure that the assistant is on point, and reviewing calls to see what could be improved. They either make those improvements themselves, or if they don’t want to be hands-on, they work with us to make those improvements. This alludes to the point that Shawn was just making, which is that it’s partially going to make some jobs redundant, but at the same time, it’s introducing new jobs and new skill sets that haven’t been in the industry for a while. And that’s a shift that we’re seeing which is here to stay.

From a persona standpoint, those are the ones that we’re seeing. One of the other trends that we see is that with generative AI, it is becoming a bit simpler and easier for customers to actively maintain their own assistants. In some cases, they can even build their own assistants, which was previously quite tough with intent-based models. We’re seeing a lot of demand for that as well, which is, “Hey, I want to self-serve. I want to self-care for my own assistant because I know my business the best.” And that’s totally valid. We are able to provide for that, and we are seeing that shift as well. So it’s been overall a fascinating experience. Generative AI has definitely changed the landscape with respect to conversational AI.

Kenn: That’s really interesting. I didn’t think about the other personas now getting involved in customer service agents as well. It seems like there’s been a significant shift.

Devidas: It’s still early days, but I think we’re seeing enough of it to know that this is going to be a very logical trend going forward.

Kenn: You touched on something interesting about voice assistants. Before, with any customer experience assistants, you had to script every workflow. But now, I don’t know how your team is building it, but I’m curious how much you can share. It seems like you’re leaving it more to the AI system, which needs to know more about the company and the policies, but you also have to prevent it from hallucinating. So there’s less and less control. Can you share how you’re building it and controlling it to make sure that it’s on brand, on policy, and aligned with the business?

Shawn: That’s a very good question. This has been something we’ve been working really hard on since generative AI and large language models came along. If you try to build these kinds of voice assistants, you can actually build something quite impressive quite quickly. But it doesn’t really do certain things well. For example, if you ask ChatGPT to make a restaurant reservation, it would actually pretend that it’s making an API call in the background and tell you it’s been done. So it hallucinates a lot.

I think GPT is pretty good at pattern matching. How we think about large language models is just to incorporate them as one of the building blocks into building the voice assistant. It’s a super powerful building block. What it can do really well is pattern matching. So if you provide some information in the prompt and then the task is to ask GPT to pattern match what users say to what is included in the prompt, it can usually do it quite well. And that’s usually how we handle FAQ questions from a large source of knowledge base.

When you start to build more complicated flows that require several steps of transactions, then there will be some design elements involved. You also need to build some guardrails on top of it. What we do is basically build a programmatic approach overlaying on top of GPT. Think about it like this: if you’re just prompting GPT to do certain things, it’s equivalent to handing GPT 100 pages of contact center instructions and saying, “Hey, read it and then interact with the user directly.” If you were newly onboarded as a person, you’re not going to read all these manuals before you jump into a call, and you’re going to hallucinate yourself because you just want to survive in the conversation. That’s what GPT does when you don’t give it proper training or plans.

What we effectively do is say, “I’m not going to just give you 100 pages of these instructions and guidelines. I’m going to group them into different categories for you.” For example, if the user is asking about a payment issue, you go into this particular payment issue flow. You fetch all these instructions or manuals, and those manuals are also paginated for you. So that means if a user asks about this question, the first step is you need to ask about these 5 different questions, and then you flip to the next page, then you answer another 3, and so on. By paginating these instructions and giving it a step-by-step progression into a conversation, you’re kind of guiding GPT. It’s more of a copilot kind of design we’re doing here, rather than relying on GPT to just go on autopilot.

Kenn: That’s really interesting. Let’s say I’m a head of customer experience and I go to PolyAI. What do prospective customers usually ask you about? How do you describe the product to them, and what does it take to get set up? And what concerns do you usually hear from customers, particularly as they think about deploying at enterprise scale?

Shawn: From the hallucination aspect, it’s not even about hallucination they’re worried about. Some of these enterprises are super strict in terms of what the bot can say and cannot say. For example, there’s this telco company we work with that doesn’t even allow the model to say who their CEO is, even though it’s a customer service bot for that particular company. They also don’t allow the bot to say what the capital of France is. It’s not allowed to say Paris because it’s considered “out of scope.” So that’s the level of scrutiny you’ll probably run into with enterprises. It depends on the enterprise, but these are the extreme cases we’ve seen so far.

Devidas: Sometimes you can say that AI assistants are held to a higher bar than human assistants. For example, if I press a human four times to tell me what’s 2 plus 2, the human assistant could totally be like, “Alright, it’s 4. Now what can I help you with?” And that’s totally okay, but it probably wouldn’t be okay if an AI assistant did that. So sometimes the bar is much higher for AI compared to humans.

To extend Shawn’s point, some of this is quite synonymous with generative AI. When you talk about generative AI, you start thinking about safety and whether the assistant is on point. Typically, the three dimensions that we see our customers having concerns about are: Is the assistant providing accurate responses? Is the assistant detecting any threats if someone’s trying to change its behavior, and how is it managing that threat? And lastly, how is the AI treating my customer’s information? Because you end up sharing PII (Personally Identifiable Information) in these conversations in order to help the caller, and how is that being managed?

At PolyAI, we take these concerns very seriously. If you see our roadmap, there are clear items addressing each one of these concerns, and we’re actually on a good path with that.

Kenn: Interesting. To the extent you can share, how do you red team or test against these concerns? For example, making sure the voice assistant doesn’t say who the president of France is?

Shawn: That’s a problem that everyone will have to face. As we’ve gotten very deep into selling to enterprises, we’ve developed several layers of safeguards. The first layer is that we actually put in a content filter. This filter is basically the highest level, filtering out anything that is dangerous, biased, sexist, or otherwise harmful content. The model would be triggered if any of this content is included, and the system or assistant would basically just refuse to answer or have a particular behavior. Some of our clients want the call to be handed off back to a human agent in this scenario.

Then going downwards, you have these very specific requirements about what you shouldn’t share outside of what you’re supposed to do in the contact center scope. We have an automatic evaluation framework that allows our developers or designers to write stories about particular conversations. Every time we’re about to launch a new system, we run all those examples against the system to make sure that all the examples pass. That’s another way to safeguard it.

One of the major challenges is how do you actually find those examples of what you’re not supposed to do. In that particular case, it requires years of accumulated relations. It requires several projects where users do a lot of extensive testing. You need to give it to your clients to do a lot of extensive testing as well. And then you have to decide a boundary where you put those examples in.

So we have different layers of safety guardrails. One is that hallucination and harmful information should be excluded. The second layer is that you are allowed to say freely what is not harmful. But then you have another layer that you are only allowed to answer questions related to the contact center. And then the most customized level is that you are only allowed to answer specifically for that particular project or account. We don’t have any clients at that level yet, but there are definitely some customizations we’ll have to do for some future clients, I imagine.

Kenn: This has been great. We’ve touched on a lot of enterprise concerns. Is there anything else you think we haven’t discussed about what enterprises are looking for? How would you advise people trying to build AI products for the enterprise? What are some of the lessons you’ve learned in your journey building PolyAI?

Devidas: From a product standpoint, the trend we’re seeing from enterprises is that almost every enterprise champion we’ve interviewed wants to adopt generative AI, but they’re a bit nervous about it. That nervousness stems from a range of concerns. It ranges from “I believe in it, I want to get this done in my company. How can you help me convince the business that this is a good thing?” all the way to “How do you take care of safety? How do you take care of prompt injection? How do you take care of hate speech?”

Essentially, that’s exactly why I said a lot of this is education for our customers and just being very transparent about how we are solving for some of those problems. From a solutioning standpoint, it ranges from us addressing the concerns or giving our champions enough ammunition to feel good about the fact that this makes absolute sense, and they’re going to put their name behind it and push for it within the organization.

What is working well for us is just being open and transparent about how we are solving for some of these concerns. Customers understand that generative AI is fast-moving, there isn’t a foolproof solution yet. It’s about trying to stay ahead of the curve, trying to constantly evolve, and then showcase data on what the assistant is doing well, where it could improve, and then have tools in place to keep that improvement going on an ongoing basis. That’s how they feel pretty good, strong, and safe about adopting the solution.

Shawn: From the technical founder’s point of view, I think you just need to be super practical. Everyone wants to build the next Google or Facebook. Everyone wants to build a platform. But in reality, it’s probably not going to happen that way just because if it’s actually that much of a market, then the big players would jump in and they would already take the market. You’re never going to be able to win that battle because they have way too many resources.

I think we just need to be practical. Technical founders like myself sometimes are not very practical. You always want to solve the theoretical best problem, like what is the best revenue model, like a search engine. If you can actually build a business model that can print money, that would be the best. Everyone wants to build that, but only a couple of people manage to do it.

What we learned through the journey is that we just need to be practical about go-to-market. It’s not just about technology. It’s about the product, how you package it together. It’s about the marketing strategy. It’s about the sales executions as well. You just need to find the right way to sell the right product to the right people. And that requires a lot of learning and a lot of trying over a long period.

Kenn: That’s good advice. In San Francisco, I attend all these AI events, and it’s mostly about the technology and building prototypes around it. Less about the go-to-market side of it.

Kenn: Before we end the interview, congratulations on the huge $50M series C round and $500M valuation. What’s next for PolyAI? What are each of you excited about for the future? And do you have any call to action for people reading this interview?

Devidas: I think we’re going to keep things simple. We want to keep our product strategy living and breathing. We’re going to spend a lot of time on product discovery, keeping our eyes on the competition, and accordingly evaluate our strategy on an ongoing basis. We’re going to keep the focus high. We’re going to use the funding round to address some of the gaps in the right skill set areas. But all in all, we’re going to keep it simple. We’re going to focus, have a clear product direction. We’re going to differentiate on our strengths, and we’ll continue to invest in that. We’re going to continue to believe in and increase our amazing team. We have a fantastic team that can make anything happen, and we’re going to continue to support them and keep addressing the gaps wherever necessary.

Shawn: For me, it’s about continuing to expand PolyAI’s reach. Even 6-7 years down the road, we’re still not taking enough calls yet. I just want the systems to take more calls, to be able to solve more real-world problems that callers and contact centers are facing. Because this kind of pain is real. Every time I actually phone into a doctor’s office, I hope there will be some sort of system that can actually just help me get through what I want rather than waiting on the phone for 30 minutes, someone picks it up, doesn’t understand what I want, transfers me to someone else, I wait for another 15 minutes, and so on. I think those kinds of experiences for humans in this new world should be in the past. People shouldn’t spend their time phoning their doctors, their banks. They should just have a system that can understand them and send them to the right place.

Kenn: That’s awesome. And with that, let’s wrap. Thank you both for your time. This was a great conversation.

Source link
lol

Building enterprise AI products with PolyAI

By stp2y

Leave a Reply Cancel reply