Latest Breaking News

highplainsdem

(62,137 posts) Thu Feb 15, 2024, 05:32 PM Feb 2024

OpenAI's Video Generator Sora Is Breathtaking, Yet Terrifying

Last edited Thu Feb 15, 2024, 10:00 PM - Edit history (1)

Source: Gizmodo

OpenAI introduced Sora, its premier text-to-video generator, on Thursday with beautiful, shockingly realistic videos showcasing the AI model's capabilities. Sora is now available to a small number of researchers and creatives who will test the model before a broader public release, which could spell disaster for the film industry and our collective deepfake problem.

"Sora is able to generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background," said OpenAI in a blog post. "The model understands not only what the user has asked for in the prompt, but also how those things exist in the physical world."

-snip-

Sora is OpenAI's first venture into AI video generation, adding to the company's AI-powered text and image generators, ChatGPT and Dall-E. It's unique because it's less of a creative tool, and more of a "data-driven physics engine," as pointed out by Senior Nvidia Researcher Dr. Jim Fan. Sora is not just generating an image, but it's determining the physics of an object in its environment and renders a video based on these calculations.

-snip-

The videos produced by Sora are undeniably incredible. These videos would have taken hours to produce by a real film crew or animators. Sora will likely be disruptive to the film industry in the same way that ChatGPT and AI-image generators have shocked the editorial and design world. It's a technology that is both remarkable and yet frightening in terms of job security for video creators.

-snip-

Read more: https://gizmodo.com/openai-video-generator-sora-is-breathtaking-terrifying-1851261593

Videos at the link.

But there are dozens more videos at OpenAI's page about Sora - https://openai.com/sora - which also says this:

We're also building tools to help detect misleading content such as a detection classifier that can tell when a video was generated by Sora. We plan to include C2PA metadata in the future if we deploy the model in an OpenAI product.

In addition to us developing new techniques to prepare for deployment, we're leveraging the existing safety methods that we built for our products that use DALL·E 3, which are applicable to Sora as well.

For example, once in an OpenAI product, our text classifier will check and reject text input prompts that are in violation of our usage policies, like those that request extreme violence, sexual content, hateful imagery, celebrity likeness, or the IP of others. We've also developed robust image classifiers that are used to review the frames of every video generated to help ensure that it adheres to our usage policies, before it's shown to the user.

We'll be engaging policymakers, educators and artists around the world to understand their concerns and to identify positive use cases for this new technology. Despite extensive research and testing, we cannot predict all of the beneficial ways people will use our technology, nor all the ways people will abuse it. That's why we believe that learning from real-world use is a critical component of creating and releasing increasingly safe AI systems over time.

Editing to mention that the last of several slide shows of videos on that page, the one above the Research techniques section, contains this, as the 5th of 10 videos there:

Link to tweet

Full prompt, which won't show up if you just view the tweet on DU instead of clicking on Show more:

Prompt: A white and orange tabby cat is seen happily darting through a dense garden, as if chasing something. Its eyes are wide and happy as it jogs forward, scanning the branches, flowers, and leaves as it walks. The path is narrow as it makes its way between all the plants. the scene is captured from a ground-level angle, following the cat closely, giving a low and intimate perspective. The image is cinematic with warm tones and a grainy texture. The scattered daylight between the leaves and plants above creates a warm contrast, accentuating the cat's orange fur. The shot is clear and sharp, with a shallow depth of field.

EDITING again because what I've been reading this evening suggests more and more that OpenAI's training dataset for Sora was probably illegal. See reply 3 below.

6 replies

= new reply since forum marked as read

Highlight:

OpenAI's Video Generator Sora Is Breathtaking, Yet Terrifying (Original Post) highplainsdem Feb 2024 OP

From the NY Times story, "OpenAI Unveils A.I. That Instantly Generates Eye-Popping Videos": highplainsdem Feb 2024 #1

MIT Technology Review: highplainsdem Feb 2024 #2

OpenAI may very well have copyright problems with this. Tweet from an highplainsdem Feb 2024 #3

This is hugely expensive (computational-wise) to render. Yavin4 Feb 2024 #4

See reply 3. Sora may very well be trained on stolen intellectual property. highplainsdem Feb 2024 #6

Impressive technology limbicnuminousity Feb 2024 #5

highplainsdem

(62,137 posts)

1. From the NY Times story, "OpenAI Unveils A.I. That Instantly Generates Eye-Popping Videos":

Reply to highplainsdem (Original post)

Thu Feb 15, 2024, 05:42 PM

Feb 2024

https://www.nytimes.com/2024/02/15/technology/openai-sora-videos.html

Just 10 months later, the San Francisco start-up OpenAI has unveiled a similar system that creates videos that look as if they were lifted from a Hollywood movie. A demonstration included short videos — created in minutes — of woolly mammoths trotting through a snowy meadow, a monster gazing at a melting candle and a Tokyo street scene seemingly shot by a camera swooping across the city.

OpenAI, the company behind the ChatGPT chatbot and the still-image generator DALL-E, is among the many companies racing to improve this kind of instant video generator, including start-ups like Runway and tech giants like Google and Meta, the owner of Facebook and Instagram. The technology could speed the work of seasoned moviemakers, while replacing less experienced digital artists entirely.

It could also become a quick and inexpensive way of creating online disinformation, making it even harder to tell what’s real on the internet.

“I am absolutely terrified that this kind of thing will sway a narrowly contested election,” said Oren Etzioni, a professor at the University of Washington who specializes in artificial intelligence. He is also the founder of True Media, a nonprofit working to identify disinformation online in political campaigns.

highplainsdem

(62,137 posts)

2. MIT Technology Review:

Reply to highplainsdem (Original post)

Thu Feb 15, 2024, 06:00 PM

Feb 2024

https://www.technologyreview.com/2024/02/15/1088401/openai-amazing-new-generative-ai-video-model-sora/

Sora takes this approach and applies it to videos rather than still images. But the researchers also added another technique to the mix. Unlike DALL-E or most other generative video models, Sora combines its diffusion model with a type of neural network called a transformer.

Transformers are great at processing long sequences of data, like words. That has made them the special sauce inside large language models like OpenAI’s GPT-4 and Google DeepMind’s Gemini. But videos are not made of words. Instead, the researchers had to find a way to cut videos into chunks that could be treated as if they were. The approach they came up with was to dice videos up across both space and time. “It’s like if you were to have a stack of all the video frames and you cut little cubes from it,” says Brooks.

The transformer inside Sora can then process these chunks of video data in much the same way that the transformer inside a large language model processes words in a block of text. The researchers say that this let them train Sora on many more types of video than other text-to-video models, including different resolutions, durations, aspect ratio, and orientation. “It really helps the model,” says Brooks. “That is something that we’re not aware of any existing work on.”

“From a technical perspective it seems like a very significant leap forward,” says Sam Gregory, executive director at Witness, a human rights organization that specializes in the use and misuse of video technology. “But there are two sides to the coin,” he says. “The expressive capabilities offer the potential for many more people to be storytellers using video. And there are also real potential avenues for misuse.”