Images and audio become talking video

OpenAI took the first step with Sora, and now Microsoft is following suit: As a research team from the Redmond company explains, using VASA-1 they have developed a model for lifelike avatars, which particularly sees the advantage of real-time computation on its side.

The requirements for such a created video are similarly frugal. One photo of a person in biometric passport style is sufficient plus an audio track, which can come from classic text-to-speech software. The VASA-1 can use this to create “hyper-realistic video,” which is processed offline at a resolution of 512 x 512 pixels and 45 frames per second. The traditional Nvidia Geforce RTX 4090 is used for this purpose – the access time should be only 170 ms.

Users can set individual parameters within VASA-1. For example, the direction of vision, the angle of the head tilt, or the pitch of the sound can be adjusted. Regarding the animation itself, according to the researchers, previous AI models primarily specialized in lip-synchronized playback, and on the other hand, VASA-1 can also “realistic” animate realistic head movements, facial expressions or other fine details. road.

Recommended editorial contentHere you will find external content from [PLATTFORM]. To protect your personal data, external integrations will only be displayed if you confirm this by clicking “Load all external content”:

Images and audio become talking video

Samsung Quantum Dot TV: Art meets technology

Pitch: €56m for energy startup Reverion

Plastoplan: Plastics for Energy Transition

You may have missed

Best National Burger Day Deals 2024

Trump attacks Fed for ‘playing politics’ with historic rate cut

A fossilized creature may explain a puzzling drawing on a rock wall.

MrBeast Sued Over ‘Unsafe Environment’ on Upcoming Amazon Reality Show | US TV