Hotshot launches new text-to-video AI generator

Hotshot launches new text-to-video AI generator

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


If you care at all about AI generated video, you’ve probably already heard of the big names in the rapidly emerging sector: Runway ML with its Gen-3 Alpha Turbo model, OpenAI’s (still non-public) Sora, Luma’s Dream Machine, and Pika’s self-titled AI video generator.

Now you can add yet another name to that list: Hotshot, a startup founded in 2023 by Aakash Sastry, John Mullan, and Duncan Crawbuck, today announced its new self-titled text-to-video AI generator model as a public “early preview.”

“For the first time in over a decade, it’s possible to build powerful and novel video applications for customers,” wrote Sastry in a post on the social network X. “This model is our foundation for building those experiences and this is only the beginning. We can’t wait to share more soon.”

You can use Hotshot now for free at its website Hotshot.co and the videos are free of watermarks, though the free tier is capped to two generations per day.

Hotshot’s origins

Hotshot burst onto the scene last year as a free, consumer-facing AI photo creation and editing app, but that project appears to have been discontinued in favor of the new text-to-video AI model.

Reached by VentureBeat via X Direct Message, Sastry noted that the trio had been building consumer apps for 11 years and is financially “backed by Lachy Groom, Alexis Ohanian, SV Angel, and more!”

How Hotshot was trained in 4 months by a team of just 4 engineers

In a paper describing how the small company built the model, the three co-founders plus newer team ember Chaitu Aluru describe Hotshot as a “a text-to-video model that generates up to 10 seconds of footage at 720p,” and was trained over the course of the last four months.

Previously, Hotshot trained an open source model Hotshot-XL which generates 1 second-long videos at 8 frames-per-second, and has more than 20,000 monthly users. 

It also trained a successor model, Hotshot Act-One, to make 3-second video clips also at 8-frames-per second. But the new, self-titled Hotshot model was the most ambitious one yet.

The paper explains that the team used 600 million clips and “thousands of GPUs” requiring “constant babysitting, and it sometimes even feels like they have a mind of their own,” later stating “[Nvidia] H100s fail regularly, particularly when you are pushing the hardware to the max in training a video model.”

“Managing this pipeline was a 24/7 job for one of our team members for an entire month,” the paper notes.

The paper also describes how the team members trained a new autoencoder “to compress videos both spatially and temporally,” allowing the videos to be reduced in size while still maintaining all the data about their contents from which a new AI model could be trained.

What Hotshot excels at

The new Hotshot text-to-video model is also highly adaptable, with potential extensions to longer video durations, higher resolutions, and the inclusion of additional modalities, such as audio.

On X, Sastry showed off examples of different styles Hotshot can produce including animations similar to a comic book or rotoscoped video.

In addition, on X, Sastry posted a thread explaining how he is particularly excited about the broader implications of this technology, predicting that AI-generated content could soon become a staple in digital media.

Within the next 12 months, Sastry anticipates that entire YouTube videos will be generated by AI, with creators having control over every aspect of the generation process, from text to video, and eventually audio.

Ultimately, he believes that Hotshot is currently the most advanced publicly available model of its kind.

VentureBeat tested it ourselves and found mixed results — a video of a “unicorn riding through Paris” produced a fairly convincing video of a horse riding through the same City of Light, but it definitely showed off strong potential. It is, however, lower quality, lower detail and resolution than some of the competition — for now. And the more competition in AI video generation, hopefully the more options and better results for users.



Source link lol
By stp2y

Leave a Reply

Your email address will not be published. Required fields are marked *

No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.