April 30, 2024

TechNewsInsight

Technology/Tech News – Get all the latest news on Technology, Gadgets with reviews, prices, features, highlights and specificatio

Breakthrough for Scalable GenKI Models – The Technology Behind Sora by OpenAI – IT BOLTWISE® x Artificial Intelligence

Breakthrough for Scalable GenKI Models – The Technology Behind Sora by OpenAI – IT BOLTWISE® x Artificial Intelligence

MUNICH (IT BOLTWISE) – Capable of creating videos and interactive 3D environments in real-time, OpenAI's Sora is an impressive demonstration of excellence in generative AI (GenKI) – a true milestone.

OpenAI has made significant progress in GenAI with Sora, which is based on an AI model architecture known colloquially as a diffusion transformer. This technology, which also powers the latest generation of Stable AI imagery, Stable Diffusion 3.0, is poised to revolutionize the GenAI field by enabling expansion beyond previous limits. Saining Xie, a professor of computer science at New York University, led the research project that produced the diffusion transformer in June 2022, along with William Peebles, his student while interning at the Meta AI Research Lab and now Sora's co-leader at OpenAI Xie. Concepts in Machine Learning – Diffusion and Transformer – to create a diffusion transformer.

Modern AI-powered media generators, including OpenAI's DALL-E 3, rely on a process called diffusion to output images, videos, audio, music, 3D meshes, artwork, and more. Diffusion models typically have a “backbone” or some type of engine called a U-Net. U-nets are complex and can slow down the propagation process significantly. Fortunately, inverters can replace U-nets with increased efficiency and performance. Transformer, the architecture of choice for complex inference tasks, leverages models such as GPT-4, Gemini, and ChatGPT. Its defining property is an “attention mechanism” which, for each piece of input data (in the case of diffuse image noise), weighs the importance of every other input and from this generates an output result (image noise estimation).

For a more detailed look at the early stages of Sora and its innovative approach, please refer to our previous article A Look at Sora: OpenAI's Revolutionary Artificial Intelligence for Realistic Video Production.

See also  Timac Agro: Silicon mineralization technology in plant waste

Introducing transformers into the deployment process represents a major leap in scalability and effectiveness, especially for models like Sora, which leverage training on large-scale video datasets and take advantage of rich model parameters to demonstrate the transformative potential of transformers at scale. Diffusion transformers were intended to be an easy replacement for existing diffusion models, regardless of whether the models generate images, video, audio, or any other form of media. The current way of training diffusion transformers may lead to some inefficiencies and performance penalties, but this can be addressed in the long term.

The main message is pretty simple: forget U-Nets and switch to Transformers because they're faster, work better, and are more scalable. The future vision is to integrate the areas of content understanding and creation within the framework of diffusion transformers. These aspects are currently like two different worlds – one of understanding and one of creation. Standardization of infrastructure is required for this integration, with switches being an ideal candidate for this purpose.

If Sora and Stable Diffusion 3.0 are a preview of what we can expect from diffusion converters, we're in for an exciting time.

The GenKI revolution: diffusion transformers as the key technology behind OpenAI's Sora (Image: DALL-E, IT BOLTWISE)

notice: Parts of this text may have been generated using artificial intelligence.


Please send any additions and information to the editorial team via email to de-info[at]it-boltwise.de