Building Industry-leading AI Models for Universal Speech Intelligence

We just followed the documentation online, and within a few hours, we were operational and started running a job. We never had any problems.
– Klemen Simonic, Founder/CEO

Soniox, founded in 2020 by experienced AI researchers, is the originator of unsupervised learning for speech recognition. In 2022, they released their first product, a speech recognition AI with the highest level of accuracy for the major eight languages: German, Portuguese, Italian, French, Spanish, Chinese, Korean, and English. Each foreign language AI model is bilingual, able to understand that language plus English to better facilitate business use cases.

The Soniox team was well-versed in training custom AI models, to say the least; before working with Databricks they had already trained one multilingual large language model (LLM), Soniox 7B. Yet they still turned to Databricks for support with training their next large multimodal LLM, Omnio, which has the ability to fully utilize all the information available in an audio signal and represents a significant advancement in the field of speech recognition. Omnio is the first large AI model that processes speech and audio in a manner similar to how a human might. It can recognize and understand speech, identify separate speakers, and discern emotions and sentiment. It can even distinguish between background and human-made sounds. In order to build this incredibly innovative model, Sonix had to wrangle Internet-scale datasets for audio and text.

After some online research, Soniox found its way to Databricks and Mosaic AI Training. Simonic explained, “We aren’t a typical Databricks customer; we have our own training loops and distributed training infrastructure. But when we started working with your team, it was clear that your tools were built for developers by developers. We love Mosaic AI training; it’s easy to use.” Although Soniox had used other infrastructure providers, they appreciated the compute availability and convenience of the Mosaic AI Training cluster.

Continued Simonic, “You can tell that whoever built Mosaic AI Training really understands how to launch and train jobs. We have tried other platforms, and your platform has been the easiest way to start any job. Your team built the right features the right way and made them easy to use.” As a startup founder, Simonic originally perceived Databricks to be an enterprise-focused company. He was pleasantly surprised to get personalized support from his account team. “It’s really important to listen to your customers, even if they are an early-stage startup.” Simonic continued, “When technical challenges arise, it can be hard for startups because they lack a big organization’s budget to support any failures.” The personal attention that Simonic received from the Databricks team has given him confidence in the ability to work through any issues that may arise in future training runs.

Although the Soniox team was initially drawn to the functionality of Mosaic AI Training, they appreciate that it is part of a broader GenAI ecosystem from Databricks that can support workloads from data ingestion to model serving. Looking ahead, Soniox plans to expand the capabilities of its speech-to-text and Omnio products so that it can transform users’ interaction with audio in use cases that range from transcription to audio summarization to voice interaction, supporting industries like healthcare, legal, customer care and beyond. Soniox initially began as a research project to investigate how to leverage unlabeled audio data. Today, its groundbreaking speech recognition AI unlocks new possibilities in human-machine interaction.

Next steps

Source link
lol

Building Industry-leading AI Models for Universal Speech Intelligence

Next steps

By stp2y

Leave a Reply Cancel reply