Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
Gladia, an AI transcription and audio intelligence provider, has raised $16 million in funding.
The Paris, France-based company will use the funding to develop an end-to-end audio infrastructure – starting with a new real-time audio transcription and analytics engine – enabling voice-first platforms to deliver more value to their users across borders with cutting-edge AI.
It’s a challenge to rivals such as Otter.ai and Fireflies.ai, as well as other AI-based services that transcribe voice conversations to text. In an interview with VentureBeat, CEO Jean-Louis Quéguiner explained to me why he started the company.
“As you can hear from a beautiful French accent, I’m not an English speaker and I was extremely frustrated with the accents,” Quéguiner said. “That’s why I founded the company.”
I got a demo of the AI transcription, and it worked in real time as Quéguiner spoke English with his heavy French accent. I’m used to services like Otter getting a lot of words wrong in a transcription, but in the first page of results from Gladia, I saw no errors. He also showed how he could speak two different languages and the system could shift from one language to another as needed.
XAnge led the round, with participation by Illuminate Financial, XTX Ventures, Athletico Ventures, Gaingels, Mana Ventures, Motier Ventures, Roosh Ventures, and Soma Capital.
Founded in 2022, Gladia has now raised a total of $20.3 million, with earlier seed investments headed by New Wave, Sequoia Capital (as part of the First Sequoia Arc program), Cocoa, and GFC. Gladia recently was selected to participate in the AWS generative AI accelerator program.
“Gladia represents the qualities we like to champion at XAnge: a bold, global tech team at the forefront of AI innovation, with a proven business model to unlock new opportunities across industries,” said Alexis du Peloux, partner at XAnge, in a statement. “In a fast-paced AI environment, Jean-Louis Quéguiner and his team have executed extremely well, and we are proud to back Gladia for the Series A.”
Given that most speech recognition models today are trained predominantly on English audio data and are therefore inherently biased, Gladia prioritized building the first real-time product that is truly multilingual.
The new fine-tuned engine delivers advanced real-time transcription in over 100 languages, along with enhanced support for accents and the unique ability to adapt to different languages on the fly.
Gladia’s new engine is unique in its ability to extract insights from a call—like the caller’s sentiment, key information, and conversation summary—in real-time. This means it takes less than a second to generate both transcript and insights from a call or meeting using Gladia.
New real-time AI transcription
Building an accurate, low-latency, and multilingual engine in-house is a complex and resource-intensive task. It requires extensive expertise in language understanding, real-time data handling, with continuous optimization and maintenance. Real-time models require more computing power and may struggle to produce accurate output immediately due to limited context.
Gladia’s new product allows companies to bypass these challenges. The real-time speech-to-text engine boasts an industry-leading latency of under 300 milliseconds without compromising accuracy, regardless of the language, geography, or tech stack used.
“Companies are spending valuable time and resources trying to incorporate multiple AI functions into their existing platforms,” said Jonathan Soto, CTO of Gladia, in a statement. “Our single API is compatible with all existing tech stacks and protocols, including SIP, VoIP, FreeSwitch, and Asterisk. This allows us to easily integrate real-time transcription and analysis into our customers’ AI platforms, so they can focus on delivering the best services to their end users.”
What’s ahead
The company’s first async transcription and audio intelligence API launched in June 2023 and was based on a proprietary version of Whisper ASR.
It rapidly gained traction in the enterprise market, particularly with meeting recorders and note-taking assistants. The API is now adopted by over 600 customers around the world, including Attention, Circleback, Method Financial, Recall, Sana, and VEED.IO and has more than 70,000 users.
“Gladia’s technology allows companies in vertical markets that need cutting-edge real-time transcription, including sales enablement and contact center platform, to shift seamlessly from manual post-call processing to proactive, low-latency workflows,” Quéguiner said. “Whether it’s automated CRM enrichment or real-time guidance for support agents, Gladia is designed to help businesses operate smarter and more efficiently in record time, without requiring AI expertise in-house.”
Gladia will use the new capital to advance its R&D efforts and soon bring to market a one-stop AI toolkit for audio and expand its product offering with additional à la carte models—including large language models (LLMs) and retrieval-augmented generation (RAG). With several design partners in the contact-center-as-a-service (CCaaS) segment, the company is currently piloting an agent-assist solution powered by Gladia’s real-time AI engine. Additionally, Gladia will continue to expand its talent base as it prepares for international expansion.
“We are multilingual, and we have something that is called ‘code switching,’ which makes it unique,” Quéguiner said. “You can start with the language and switch to another.”
He went on to show me that he could start a call in English and initiate the transcription. Then he spoke French words, and the model correctly translated it in French.
“Keep in mind that [others] are not real time right now, and this one is real time,” he said. “Usually, real time is a little bit less accurate. You can also have your own custom vocabulary in real time, which is pretty unusual, with us. We have the capability to extract some real-time insights.”
The service has an AI summarizer, and it will have new optional features in the coming months. Quéguiner said that his service can also get acronyms right and detect the switch to another language.
“The model we use is very similar to LLMs (large language models). It has no code decoder architecture, which is not the case for most of the models that you’ve seen with Fireflies, for instance.
The market includes “meeting recorders,” Quéguiner said. The results can be passed on to real-time insights, which can help people like sales leads close deals faster.
The company also works with Call Centers, giving them 30% faster time to completion when they are on the phone thanks to better accuracy. The company will charge a flat fee such as a per-hour pricing.
Source link lol