Teaching AI to tell visually consistent stories

YouTube voice Notes, Google Veo and Grammarly’s new acquisition



We understand stories differently than we understand moments. A moment can be striking or beautiful on its own – a sunset, a dancer’s leap, a smile. But stories work by building relationships between moments. Each scene has to flow naturally from the ones before it. Characters need to stay consistent. Actions need to have consequences that persist.

This difference between moments and stories points to one of the hardest problems in artificial intelligence. Current AI systems can generate remarkable individual video clips: faces speaking, people dancing, animals moving. But these systems fail when asked to generate anything longer. The character’s face subtly changes between scenes. The movements become jarring and unnatural. The story falls apart.

A lot of us assumed this was simply a matter of scale – that with bigger models and more training data, AI would naturally progress from generating moments to generating stories. But one of the top papers on AImodels.fyi today shows how the gap between moments and stories requires fundamental innovations in how AI systems work.



Source link
lol

By stp2y

Leave a Reply

Your email address will not be published. Required fields are marked *

No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.