OpenAI's Video Generator Sora Is Breathtaking, Yet Terrifying
Last edited Thu Feb 15, 2024, 10:00 PM - Edit history (1)
Source: Gizmodo
OpenAI introduced Sora, its premier text-to-video generator, on Thursday with beautiful, shockingly realistic videos showcasing the AI model's capabilities. Sora is now available to a small number of researchers and creatives who will test the model before a broader public release, which could spell disaster for the film industry and our collective deepfake problem.
"Sora is able to generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background," said OpenAI in a blog post. "The model understands not only what the user has asked for in the prompt, but also how those things exist in the physical world."
-snip-
Sora is OpenAI's first venture into AI video generation, adding to the company's AI-powered text and image generators, ChatGPT and Dall-E. It's unique because it's less of a creative tool, and more of a "data-driven physics engine," as pointed out by Senior Nvidia Researcher Dr. Jim Fan. Sora is not just generating an image, but it's determining the physics of an object in its environment and renders a video based on these calculations.
-snip-
The videos produced by Sora are undeniably incredible. These videos would have taken hours to produce by a real film crew or animators. Sora will likely be disruptive to the film industry in the same way that ChatGPT and AI-image generators have shocked the editorial and design world. It's a technology that is both remarkable and yet frightening in terms of job security for video creators.
-snip-
Read more: https://gizmodo.com/openai-video-generator-sora-is-breathtaking-terrifying-1851261593
Videos at the link.
But there are dozens more videos at OpenAI's page about Sora - https://openai.com/sora - which also says this:
In addition to us developing new techniques to prepare for deployment, we're leveraging the existing safety methods that we built for our products that use DALL·E 3, which are applicable to Sora as well.
For example, once in an OpenAI product, our text classifier will check and reject text input prompts that are in violation of our usage policies, like those that request extreme violence, sexual content, hateful imagery, celebrity likeness, or the IP of others. We've also developed robust image classifiers that are used to review the frames of every video generated to help ensure that it adheres to our usage policies, before it's shown to the user.
We'll be engaging policymakers, educators and artists around the world to understand their concerns and to identify positive use cases for this new technology. Despite extensive research and testing, we cannot predict all of the beneficial ways people will use our technology, nor all the ways people will abuse it. That's why we believe that learning from real-world use is a critical component of creating and releasing increasingly safe AI systems over time.
Editing to mention that the last of several slide shows of videos on that page, the one above the Research techniques section, contains this, as the 5th of 10 videos there:
Link to tweet
Full prompt, which won't show up if you just view the tweet on DU instead of clicking on Show more:
EDITING again because what I've been reading this evening suggests more and more that OpenAI's training dataset for Sora was probably illegal. See reply 3 below.
highplainsdem
(62,137 posts)OpenAI, the company behind the ChatGPT chatbot and the still-image generator DALL-E, is among the many companies racing to improve this kind of instant video generator, including start-ups like Runway and tech giants like Google and Meta, the owner of Facebook and Instagram. The technology could speed the work of seasoned moviemakers, while replacing less experienced digital artists entirely.
It could also become a quick and inexpensive way of creating online disinformation, making it even harder to tell whats real on the internet.
I am absolutely terrified that this kind of thing will sway a narrowly contested election, said Oren Etzioni, a professor at the University of Washington who specializes in artificial intelligence. He is also the founder of True Media, a nonprofit working to identify disinformation online in political campaigns.
highplainsdem
(62,137 posts)Transformers are great at processing long sequences of data, like words. That has made them the special sauce inside large language models like OpenAIs GPT-4 and Google DeepMinds Gemini. But videos are not made of words. Instead, the researchers had to find a way to cut videos into chunks that could be treated as if they were. The approach they came up with was to dice videos up across both space and time. Its like if you were to have a stack of all the video frames and you cut little cubes from it, says Brooks.
The transformer inside Sora can then process these chunks of video data in much the same way that the transformer inside a large language model processes words in a block of text. The researchers say that this let them train Sora on many more types of video than other text-to-video models, including different resolutions, durations, aspect ratio, and orientation. It really helps the model, says Brooks. That is something that were not aware of any existing work on.
From a technical perspective it seems like a very significant leap forward, says Sam Gregory, executive director at Witness, a human rights organization that specializes in the use and misuse of video technology. But there are two sides to the coin, he says. The expressive capabilities offer the potential for many more people to be storytellers using video. And there are also real potential avenues for misuse.
highplainsdem
(62,137 posts)Yavin4
(37,182 posts)At this point, but it will get cheaper over time.
highplainsdem
(62,137 posts)limbicnuminousity
(1,416 posts)Could see some mental health applications combining AI with virtual reality.