‘Uncanny’: ChatGPT’s Advanced Voice Mode is blowing minds

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

It was criticized by Scarlett Johansson. It was delayed by more than a month. And now that it’s finally here, only a select few customers in an “alpha” group have access to the new ChatGPT Advanced Voice Mode from OpenAI, a more naturalistic, human-like audio conversational mode for the hit chatbot available through the official ChatGPT app for iOS and Android.

Yet, already, just days after the first alpha testers got their hands on ChatGPT Advanced Voice Mode, people are posting examples of it engaging in fantastically expressive and impressive utterances, impersonating Looney Toons characters and counting so fast it runs out of “breath” just like a human would.

Here are some of the more interesting examples we’ve come across shared by initial alpha users on X, with the caveat that we ourselves don’t have access to it yet so can’t verify the authenticity.

Language instruction and translation

Several users on X noted that popular language learning app Duolingo might be in trouble given that ChatGPT Advanced Voice Mode can perform interactive, “hands on” (or is that, “voice on”?) instruction custom tailored to an individual attempting to learn or practice another language.

Advanced Voice Mode is also powered by OpenAI’s new GPT-4o model, which is the company’s first natively multimodal large model, designed to handle vision and audio inputs and outputs without linking back to other specialized models for these media (unlike GPT-4, which relied on other domain-specific OpenAI models).

As such, Advanced Voice Mode can speak about what ChatGPT is able to see through the user’s phone camera if they grant the app access to it. In one example, McGill University mixed reality design instructor Manuel Sainsily posted how Advanced Voice Mode was able to use this capability to translate screens from a Japanese version of Pokémon Yellow for GameBoy Advance SP:

Humanlike utterances

Cristiano Giardina, an Italian-American AI writer, has posted a number of examples of tests with the new ChatGPT Advanced Voice Mode, including one viral demo where he shows how he can ask it to count up to 50 faster and faster. It dutifully does so, but even stops to catch its breath near the end.

ChatGPT Advanced Voice Mode counting as fast as it can to 10, then to 50 (this blew my mind – it stopped to catch its breath like a human would) pic.twitter.com/oZMCPO5RPh
— Cristiano Giardina (@CrisGiardina) July 31, 2024

Giardina later followed up with a post on X noting that the transcript of that counting experiment didn’t showcase any breaths, indicating ChatGPT’s Advanced Voice Mode “has simply learned natural speaking patterns, which includes breathing pauses. Uncanny.”

Interestingly, the transcript has no interruptions or notations – the voice model has simply learned natural speaking patterns, which includes breathing pauses. Uncanny. pic.twitter.com/jFJWMC68mi
— Cristiano Giardina (@CrisGiardina) July 31, 2024

ChatGPT Advanced Voice Mode can also clear its throat and mimic applause, as seen in the below video on YouTube:

Beatboxing

Startup founder Ethan Sutin posted a video to X showing how he was able to get ChatGPT Advanced Voice Mode to beatbox fluidly and convincingly like a human MC:

Audio storytelling and roleplaying

ChatGPT can also roleplay (the SFW kind) if the user asks it to “play along” and invents a fictitious scenario such as going back in time to Ancient Rome, as University of Pennsylvania Wharton School of Business Ethan Mollick showed in a video posted to X:

ChatGPT, engage the Time Machine!
(A big difference from text is how voice manages to keep a playful vocal tone: cracking and laughing at its own jokes, as well as the vocal style changes, etc.) pic.twitter.com/TQUjDVJ3DC
— Ethan Mollick (@emollick) August 1, 2024

If the user just wants to listen, they can ask ChatGPT Advanced Mode to tell a story, and it will do so complete with its own AI generated sound effects such as thunder and footsteps in this example taken from Reddit and reposted on X:

‼️A Reddit user (“u/RozziTheCreator”) got a sneak peek of ChatGPT’s upgraded voice feature that’s way better and even generates background sound effects while narrating !
Take a listen ? pic.twitter.com/271x7vZ9o3
— Sambhav Gupta (@sambhavgupta6) June 27, 2024

It can also reproduce the sounds of an intercom voice:

Testing ChatGPT Advanced Voice Mode’s ability to create sounds.
It somewhat successfully sounds like an airline pilot on the intercom but, if pushed too far with the noise-making, it triggers refusals. pic.twitter.com/361k9Nwn5Z
— Cristiano Giardina (@CrisGiardina) July 31, 2024

Mimicking and reproducing distinct accents

Giardina showed how ChatGPT Advanced Voice Mode can be used to mimic a vast variety of regional British accents:

ChatGPT Advanced Voice Mode speaking a few different British accents:
– RP standard
– Cockney
– Northern Irish
– Southern Irish
– Welsh
– Scottish
– Scouse
– Geordie
– Brummie
– Yorkshire
(I had to prompt like that because the model tends to revert to a neutral accent) pic.twitter.com/TDfSIY7NRh
— Cristiano Giardina (@CrisGiardina) July 31, 2024

…as well as impersonate a soccer commentator across languages:

Sutin showed how it can attempt to reproduce different U.S. regional accents including Bostonian, Cajun, Minnesotan/Midwestern, and Southern Californian, though to my Midwestern ear that one sounded almost more Japanese American:

And it can imitate fictional characters, too…

Finally, Giardina showed that ChatGPT Advanced Voice Mode not only knows and understands the difference between how different fictional characters speak, but can imitate them as well:

The alpha mode continues with OpenAI earlier promising that it would roll out to all paying ChatGPT Plus subscribers by the fall.

The real question is: what is this mode good for in a practical sense? Beyond fun and interesting demos and experiments, will it make ChatGPT more useful or appealing to a wider audience? Will it result in more audio-based scams? As the company expands access, we’re sure to find out.

VB Daily

Stay in the know! Get the latest news in your inbox daily

By subscribing, you agree to VentureBeat’s Terms of Service.

Thanks for subscribing. Check out more VB newsletters here.

An error occured.

Source link lol