Today, Meta announced Make-A-Video, an AI-powered video generator that can create novel video content from text or image prompts, similar to existing image synthesis tools like DALL-E and Stable Diffusion. It can also make variations of existing videos, though it’s not yet available for public use.
On Make-A-Video’s announcement page, Meta shows example videos generated from text, including “a young couple walking in heavy rain” and “a teddy bear painting a portrait.” It also showcases Make-A-Video’s ability to take a static source image and animate it. For example, a still photo of a sea turtle, once processed through the AI model, can appear to be swimming.
The key technology behind Make-A-Video—and why it has arrived sooner than some experts anticipated—is that it builds off existing work with text-to-image synthesis used with image generators like OpenAI’s DALL-E. In July, Meta announced its own text-to-image AI model called Make-A-Scene.
Instead of training the Make-A-Video model on labeled video data (for example, captioned descriptions of the actions depicted), Meta instead took image synthesis data (still images trained with captions) and applied unlabeled video training data so the model learns a sense of where a text or image prompt might exist in time and space. Then it can predict what comes after the image and display the scene in motion for a short period.
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.