OpenAI took the first step with Sora, and now Microsoft is following suit: As a research team from the Redmond company explains, using VASA-1 they have developed a model for lifelike avatars, which particularly sees the advantage of real-time computation on its side.
The requirements for such a created video are similarly frugal. One photo of a person in biometric passport style is sufficient plus an audio track, which can come from classic text-to-speech software. The VASA-1 can use this to create “hyper-realistic video,” which is processed offline at a resolution of 512 x 512 pixels and 45 frames per second. The traditional Nvidia Geforce RTX 4090 is used for this purpose – the access time should be only 170 ms.
Users can set individual parameters within VASA-1. For example, the direction of vision, the angle of the head tilt, or the pitch of the sound can be adjusted. Regarding the animation itself, according to the researchers, previous AI models primarily specialized in lip-synchronized playback, and on the other hand, VASA-1 can also “realistic” animate realistic head movements, facial expressions or other fine details. road.
However, the videos created are not completely flawless, at least at second glance. That's why some “head twitches” seem abnormal; In addition, misshapen teeth can be seen through individual mouth movements. After all, VASA-1 is supposed to be a research project exclusively for demonstration purposes; No product or API release is planned. As the research team emphasizes, they are aware of the potential for misuse of these AI models, but they also emphasize the “huge positive potential of the technology.”
source: Microsoft
“Certified tv guru. Reader. Professional writer. Avid introvert. Extreme pop culture buff.”
More Stories
Remotely controlled cargo ships coming soon on the Elbe Canal?
Siemens technology makes Baden Canton Hospital smart
Discovering an ancient Mayan city – what do the rainforests hide?