Images and audio become talking video

OpenAI took the first step with Sora, and now Microsoft is following suit: As a research team from the Redmond company explains, using VASA-1 they have developed a model for lifelike avatars, which particularly sees the advantage of real-time computation on its side.

The requirements for such a created video are similarly frugal. One photo of a person in biometric passport style is sufficient plus an audio track, which can come from classic text-to-speech software. The VASA-1 can use this to create “hyper-realistic video,” which is processed offline at a resolution of 512 x 512 pixels and 45 frames per second. The traditional Nvidia Geforce RTX 4090 is used for this purpose – the access time should be only 170 ms.

Users can set individual parameters within VASA-1. For example, the direction of vision, the angle of the head tilt, or the pitch of the sound can be adjusted. Regarding the animation itself, according to the researchers, previous AI models primarily specialized in lip-synchronized playback, and on the other hand, VASA-1 can also “realistic” animate realistic head movements, facial expressions or other fine details. road.

Recommended editorial contentHere you will find external content from [PLATTFORM]. To protect your personal data, external integrations will only be displayed if you confirm this by clicking “Load all external content”:

Images and audio become talking video

AI-powered traffic lights are now being tested in this city in Baden-Württemberg.

The use of artificial intelligence in companies has quadrupled

AI Startup: Here Are Eight Startup Ideas

You may have missed

It's better to call it a digital camera. The Xperia 1 VI lets you take any kind of photo | Gizmodo Japan

Boar's Head Deli Products Recalled Amid Listeria Outbreak Investigation

Who is the band Gojira that will perform at the Olympics opening ceremony?

SpaceX Moves Crew Dragon Spacecraft to West Coast After Multiple Space Debris Incidents