We want to hear from you! Take our quick AI survey and share your insights on the current state of AI, how you’re implementing it, and what you expect to see in the future. Learn More
ElevenLabs, the AI voice startup known for its voice cloning, text-to-speech and speech-to-speech models, has just added another tool to its product portfolio: an AI Voice Isolator.
Available on the ElevenLabs platform starting today, the offering allows creators to remove unwanted ambient noise and sounds from any piece of content they have, right from a film to a podcast or YouTube video.
It comes mere days after the launch of a Reader app from the company and is free to use (with some limits). However, users must also note that the capability is not something entirely new in the market. Many other creative solution providers, including Adobe, have tools on offer to enhance the quality of speech in content. The only thing that remains to be seen is how effective Voice Isolator is in comparison to them.
How will the AI Voice Isolator work?
When recording content like a film, podcast or interview, creators often run into the issue of background noise, where unwanted sounds interfere with the content (imagine random people talking, winds blowing or some vehicle passing on the road). These noises may not come to notice during the shoot but may affect the quality of the final output — mainly, suppressing the voice of the speaker at times.
Countdown to VB Transform 2024
Join enterprise leaders in San Francisco from July 9 to 11 for our flagship AI event. Connect with peers, explore the opportunities and challenges of Generative AI, and learn how to integrate AI applications into your industry. Register Now
To solve this, many tend to use mics with ambient noise cancellation that remove the background noise during the recording phase itself. They do the job, but may not be accessible in many cases, especially to early-stage creators with limited resources. This is where AI-based tools like the new Voice Isolator from ElevenLabs come into play.
At the core, the product works in the post-production stage, where the user just has to upload the content they want to enhance. Once the file is uploaded, the underlying models process it, detect and remove the unwanted noise and extract clear dialogue as output.
ElevenLabs says the product extracts speech with a level of quality similar to that of content recorded in a studio. The company’s head of design Ammaar Reshi also shared a demo where the tool can be seen removing the noise of a leaf blower to extract crystal clear speech of the speaker.
We ran three tests to try out the real-world applicability of the voice isolator. In the first, we spoke three separate sentences, each disturbed by different noises in the background, while the other two had three sentences with a mix of different, noises occurring at random points, irregularly.
In all the cases, the tool was able to process the audio in a matter of seconds. Most importantly, it removed the noises — from those associated with opening/closing of doors and banging on the table to clapping and moving of household items – in almost all cases and extracted clear speech, without any kind of distortion. The only few sounds it failed to recognize and remove were those of banging on the wall and finger snapping.
Sam Sklar, who handles growth at the company, also told us that it does not work on music vocals at this stage but users can try it on that use case and may have success with some songs.
Improvements likely on the way
While Voice Isolator’s ability to remove irregularly occurring background noise certainly makes it stand out from most other tools that only work with flat noises, there’s still some room for improvement. Hopefully, just like all other tools, ElevenLabs will further improve its performance.
It’s important to note here that the company has not shared much about the underlying models powering the tool or whether the recordings that go into it are used for training its models in any way. Sklar said he cannot share the specifics of what goes into model creation but emphasized the company has a form linked in its privacy policy where users can opt out of the use of personal data for training.
As of now, the company is providing Voice Isolator only through its platform. It plans to open API access in the coming weeks, although the exact timeline remains unclear. For users coming to the website or app to try out the tool, ElevenLabs is offering free access with certain usage limits.
“The Voice Isolator model costs 1000 characters per minute of audio. We have a free plan on our site that comes with 10k characters/month, so it’s possible to use it with 10 minutes of audio per month for free,” Sklar explained. This means users looking to remove background noise from larger audio files will have to switch to paid plans that start at $5/month, billed monthly.
Source link lol