May 30, 2024


Technology/Tech News – Get all the latest news on Technology, Gadgets with reviews, prices, features, highlights and specificatio

Images and audio become talking video

Images and audio become talking video

OpenAI took the first step with Sora, and now Microsoft is following suit: As a research team from the Redmond company explains, using VASA-1 they have developed a model for lifelike avatars, which particularly sees the advantage of real-time computation on its side.

The requirements for such a created video are similarly frugal. One photo of a person in biometric passport style is sufficient plus an audio track, which can come from classic text-to-speech software. The VASA-1 can use this to create “hyper-realistic video,” which is processed offline at a resolution of 512 x 512 pixels and 45 frames per second. The traditional Nvidia Geforce RTX 4090 is used for this purpose – the access time should be only 170 ms.

Users can set individual parameters within VASA-1. For example, the direction of vision, the angle of the head tilt, or the pitch of the sound can be adjusted. Regarding the animation itself, according to the researchers, previous AI models primarily specialized in lip-synchronized playback, and on the other hand, VASA-1 can also “realistic” animate realistic head movements, facial expressions or other fine details. road.

Recommended editorial contentHere you will find external content from [PLATTFORM]. To protect your personal data, external integrations will only be displayed if you confirm this by clicking “Load all external content”: