Veo 2: Our video generation model (opens in new tab)

(deepmind.google)

587 pointsmvoodarla1y ago327 comments

327 comments

I got access to the preview, here's what it gave me for "A pelican riding a bicycle along a coastal path overlooking a harbor" - this video has all four versions shown:

https://static.simonwillison.net/static/2024/pelicans-on-bic...

Of the four two were a pelican riding a bicycle. One was a pelican just running along the road, one was a pelican perched on a stationary bicycle, and one had the pelican wearing a weird sort of pelican bicycle helmet.

All four were better than what I got from Sora: https://simonwillison.net/2024/Dec/9/sora/

yurylifshits1y ago

There's another important contender in the space: Hunyuan model from Tencent

My company (Nim) is hosting Hunyuan model, so here's a quick test (first attempt) at "pelican riding a bycicle" via Hunyuan on Nim: https://nim.video/explore/OGs4EM3MIpW8

I think it's as good, if not better than Sora / Veo

chrismorgan1y ago

> A whimsical pelican, adorned in oversized sunglasses and a vibrant, patterned scarf, gracefully balances on a vintage bicycle, its sleek feathers glistening in the sunlight. As it pedals joyfully down a scenic coastal path, colorful wildflowers sway gently in the breeze, and azure waves crash rhythmically against the shore. The pelican occasionally flaps its wings, adding a playful touch to its enchanting ride. In the distance, a serene sunset bathes the landscape in warm hues, while seagulls glide gracefully overhead, celebrating this delightful and lighthearted adventure of a pelican enjoying a carefree day on two wheels.

What does it produce for “A pelican riding a bicycle along a coastal path overlooking a harbor”?

Or, what do Sora and Veo produce for your verbose prompt?

1 more reply

sashank_15091y ago

Hard to say about SORA but the video you shared is most definitely worse than Veo.

The Pelican is doing some weird flying motion, motion blur is hiding a lack of detail, cycle is moving fast so background is blurred etc. I would even say SORA is better because I like the slow-motion and detail but it did do something very non physical.

Veo is clearly the best in this example. It has high detail but also feels the most physically grounded among the examples.

1 more reply

dyauspitr1y ago

Pretty good except the backwards body and the strange wing movement. The feeling of motion is fantastic though.

arjie1y ago

I was curious how it would perform with prompt enhancement turned off. Here's a single attempt (no regenerations etc.): https://www.youtube.com/watch?v=730cb2qozcM

If you'd like to replicate, the sign-up process was very easy and I was easily able to run a single generation attempt. Maybe later when I want to generate video I'll use prompt enhancement. Without it, the video appears to have lost a notion of direction. Most image-generation models I'm aware of do prompt-enhancement. I've seen it on Grok+Flow/Aurora and ChatGPT+DallE.

    Prompt
    A pelican riding a bicycle along a coastal path overlooking a harbor
    Seed
    15185546
    Resolution
    720×480

1 more reply

gcr1y ago

FYI your website shows me a static image on iOS 18.2 Safari. Strangely, the progress bar still appears to “loop,” but the bird isn’t moving at all.

Turning content blockers off does not make a difference.

1 more reply

dr_kiszonka1y ago

Reddit says it is much better than Sora. Are you hosting the full version of Nunyuan? (Your video looks great.)

2 more replies

prometheon11y ago

Is it still better if you copy his whole prompt instead of half of it?

c0brac0bra1y ago

I mean, the pelican's body is backwards...

tim3331y ago

Here's one of a penguin paragliding and it's surprisingly realistic https://x.com/Plinz/status/1868885955597549624

0_____01y ago

This is the first GenAI video to produce an "oh shit" reflex in me.

oh, shit!

p1necone1y ago

As long as at least one option is exactly what you asked for throwing variations at you that don't conform to 100% of your prompt seems like it could be useful if it gives the model leeway to improve the output in other aspects.

oneshtein1y ago

Here is my version of pelican at bicycle made with hailuoai:

https://hailuoai.video/share/N9dlRd1L1o0p

nkingsy1y ago

His little bike helmet is adorable

mckirk1y ago

The AI safety team was really proud of that one.

AgentME1y ago

It's funny having looked forward to Sora for a while and then seeing it be superseded so shortly after access to it is finally made public.

grumbel1y ago

I am surprised that the top/right one still shows a cut and switch to a difference scene. I would assume that that's something that could be trivially filtered out of the training data, as those discontinuities don't seem to be useful for either these short 6sec video segments or for getting an understanding of the real world.

jerpint1y ago

It looks much better than Sora but still kind of in uncanny valley

spaceman_20201y ago

This is the worst it will ever be…

victorbjorklund1y ago

That is surprisingly good. We are at a point where it seems to be good enough for at least b-roll content replacing stock video clips.

rob741y ago

Well yeah, if you look closely at the example videos on the site, one of them is not quite right either:

> Prompt: The sun rises slowly behind a perfectly plated breakfast scene. Thick, golden maple syrup pours in slow motion over a stack of fluffy pancakes, each one releasing a soft, warm steam cloud. A close-up of crispy bacon sizzles, sending tiny embers of golden grease into the air. [...]

In the video, the bacon is unceremoniously slapped onto the pancakes, while the prompt sounds like it was intended to be a separate shot, with the bacon still in the pan? Or, alternatively, everything described in the prompt should have been on the table at the same time?

So, yet again: AI produces impressive results, but it rarely does exactly what you wanted it to do...

soco1y ago

Technically speaking I'd say your expectation is definitely not laid out in the prompt, so anything goes. Believe me I've had such requirements from users and me as a mere human programmer am never quite sure what they actually want. So I take guesses just like the AI (because simply asking doesn't bring you very far, you must always show something) and take it from there. In other words, if AI works like me, I can pack my stuff already.

jillyboel1y ago

This tech is cute but the only viable outcomes are going to be porn and mass produced slop that'll be uninteresting before it's even created. Why even bother?

andybak1y ago

There will be both of those things in abundance.

But I'm also seeing some genuinely creative uses of generative video - stuff I could argue has got some genuine creative validity. I am loathe to dismiss an entire technique because it is mostly used to create garbage.

We'll have to figure out how to solve the slop problem - it was already an issues before AI so maybe this is just hastening the inevevitable.

1 more reply

bottled_poe1y ago

Comments like this one are so predictable and incredulous. As if the current state of the art is the final form of this technology. This is just getting started. Big facepalm.

4 more replies

sigmar1y ago

Winning 2:1 in user preference versus sora turbo is impressive. It seems to have very similar limitations to sora. For example- the leg swapping in the ice skating video and the bee keeper picking up the jar is at a very unnatural acceleration (like it pops up). Though by my eye maybe slightly better emulating natural movement and physics in comparison to sora. The blog post has slightly more info:

>at resolutions up to 4K, and extended to minutes in length.

https://blog.google/technology/google-labs/video-image-gener...

torginus1y ago

It looks Sora is actually the worst performer in the benchmarks, with Kling being the best and others not far behind.

Anyways, I strongly suspect that the funny meme content that seems to be the practical uses case of these video generators won't be possible on either Veo or Sora, because of copyright, PC, containing famous people, or other 'safety' related reasons.

jonplackett1y ago

I’ve been using Kling a lot recently and been really impressed, especially by 1.5.

I was so excited to see Sora out - only to see it has most of the same problems. And Kling seems to do better in a lot of benchmarks.

I can’t quite make sense of it - what OpenAI were showing when they first launched Sora was so amazing. Was it cherry picked? Or was it using loads more compute than what they’ve release?

1 more reply

BugsJustFindMe1y ago

> the jar is at a very unnatural acceleration (like it pops up).

It does pop up. Look at where his hand is relative to the jar when he grabs it vs when he stops lifting it. The hand and the jar are moving, but the jar is non-physically unattached to the grab.

lukol1y ago

Last time Google made a big Gemini announcement, OpenAI owned them by dropping the Sora preview shortly after.

This feels like a bit of a comeback as Veo 2 (subjectively) appears to be a step up from what Sora is currently able to achieve.

htrp1y ago

Some PM is literally sitting on this release waiting for their benchmarks to finish

esafak1y ago

And it's going to be hard for OpenAI to do that again, now that Google's woken up.

jasonjmcghee1y ago

I appreciate they posted the skateboarding video. Wildly unrealistic whenever he performs a trick - just morphing body parts.

Some of the videos look incredibly believable though.

visnup1y ago

our only hope for verifying truth in the future is that state officials give their speeches while doing kick flips and frontside 360s.

stabbles1y ago

sadly it's likely that video gen models will master this ability faster than state officials

1 more reply

markus_zhang1y ago

Maybe they will do more in person talks, I guess. Back to the old times.

1 more reply

throw43211y ago

What officials actually say doesn't make a difference anymore. People do not get bamboozled because of lack of facts. People who get bamboozled are past facts.

1 more reply

kaonwarb1y ago

This was my favorite of all of the videos. There's no uncanny valley; it's openly absurd, and I watched it 4-5 times with increasing enjoyment.

bahmboo1y ago

Cracks in the system are often places where artists find the new and interesting. The leg swapping of the ice skater is mesmerizing in its own way. It would be useful to be able to direct the models in those directions.

johndough1y ago

It is great so see a limitations section. What would be even more honest is a very large list of videos generated without any cherry picking to judge the expected quality for the average user. Anyway, the lack of more videos suggests that there might be something wrong somewhere.

dyauspitr1y ago

The honey, Peruvian women, swimming dog, bee keeper, DJ etc. are stunning. They’re short but I can barely find any artifacts.

__float1y ago

The prompt for the honey video mentions ending with a shot of an orange. The orange just...isn't there, though?

mattigames1y ago

Just pretend it's a movie about a shape shifter alien and it's just trying it's best at ice skating, art is subjective like that doesn't it? I bet Salvador Dali would have found those morphing body parts highly amusing.

cyv3r1y ago

I don't know why they say the model understands physics when it makes mistakes like that still.

0xcb01y ago

Imho is stunning, yet what is happening there is super dangerous.

These videos will and may be too realistic.

Our society is not prepared for this kind of reality "bending" media. These hyperrealistic videos will be the reason for hate and murder. Evil actors will use it to influence elections on a global scale. Create cults around virtual characters. Deny the rules of physics and human reason. And yet, there is no way for a person to detect instantly that he is watching a generated video. Maybe now, but in 1 year, it will be indistinguishable from a real recorded video

ks20481y ago

Are Apple and other phone/camera makers working on ways to "sign" a video to say it's an unedited video from a camera? Does this exist now? Is it possible?

I'm thinking of simple cryptographic signing of a file, rather than embedding watermarks into the content, but that's another option.

I don't think it will solve the fake video onslaught, but it could help.

jazzyjackson1y ago

Leica M11 signs each photo. "Content Authority Initiative" https://leica-camera.com/en-US/news/partnership-greater-trus...

Cute hack showing that its kinda useless unless the user-facing UX does a better job of actually knowing whether the certificate represents the manufacturer of the sensor (dude just uses a self signed cert with "Leica Camera AG" as the name. Clearly cryptography literacy is lagging behind... https://hackaday.com/2023/11/30/falsified-photos-fooling-ado...

1 more reply

ttul1y ago

I think this will be a thing one day, where photos are digitally watermarked by the camera sensor in a non-repudiable manner.

1 more reply

bravoetch1y ago

Nikon has had digital signature ability in some of their flagship cameras since at least 2007, and maybe before then. The feature is used by law enforcement when documenting evidence. I assume other brands also have this available for the same reasons.

tomp1y ago

We've had realistic sci-fi and alternate history movies for a very long time.

oldmanhorton1y ago

Which take millions of dollars and huge teams to make. These take one bored person, a sentence, and a few minutes to go from idea to posting on social media. That difference is the entire concern.

1 more reply

krapp1y ago

We already have hate and murder, evil actors influencing elections on a global scale, denial of physics and reason, and cults of personality. We also already have the ability to create realistic videos - not that it matters because for many people the bar of credulity isn't realism but simply confirming their priors. We already live in a world where TikTok memes and Facebook are the primary sources around which the masses base their reality, and that shit doesn't even take effort.

The only thing this changes is not needing to pay human beings for work.

dtquad1y ago

Instead of calling for regulations, the big tech companies should run big campaigns educating the public, especially boomers, that they no longer can trust images, videos, and audio on the Internet. Put paid articles and ads about this in local newspapers around the world so even the least online people gets educated about this.

WickyNilliams1y ago

Do we really want a world where we can't trust anything we see, hear, or read? Where people need to be educated to not trust their senses, the things we use to interpret reality and the world around us.

I feel this kind of hypervigilance will be mentally exhausting, and not being able to trust your primary senses will have untold psychological effects

7 more replies

Retr0id1y ago

What would motivate "big tech" to warn people about their own products, if not regulations?

jprete1y ago

Don't forget text. You can't trust text either.

And no big tech company would run the ads you're suggesting, because they only make money when people use the systems that deliver the untrustworthy content.

onel1y ago

The same things could be said when everyone could print their own newspapers or books. How would people distinguish between fake and real news?

I think we will need the same healthy media diet.

dbbk1y ago

There wasn't even a healthy media diet before generative AI given the amount of 'fake news' in 2016 and 2020.

golergka1y ago

Photoshop has been a thing for over 30 years.

EForEndeavour1y ago

Isn't the whole point of OP that we're currently watching the barrier to generating realistic assets go from "spend months grinding Photoshop tutorials" to "type what you want into this box and wait a few minutes"?

dbbk1y ago

I still don't really know why we're doing this. What is the upside? Democratising Hollywood? At the expense of... enormous catastrophic disinformation and media manipulation.

ddalex1y ago

The society voted with their money. Google refrained from launching their early chatbots and image generation tools due to perceived risks of unsafe and misleading content being generated, and got beaten to the punch in the market. Of course now they'll launch early and often, the market has spoken.

Retr0id1y ago

We have constructed a society where market forces feel inevitable, but it doesn't have to be that way.

1 more reply

veryrealsid1y ago

FWIW it feels like Google should dominate text/image -> video since they have access to Youtube unfettered. Excited to see what the reception is here.

paxys1y ago

Everyone has access to YouTube. It’s safe to assume that Sora was trained on it as well.

Jeff_Brown1y ago

All you can eat? Surely they charge a lot for that, at least. And how would you even find all the videos?

3 more replies

bangaladore1y ago

Does everyone have "legal" access to YouTube.

In theory that should matter to something like Open(Closed)Ai. But who knows.

1 more reply

hirako20001y ago

They also had a good chunk of the web text indexed, millions of people's email sent every day, Google scholar papers, the massive Google books that digitized most ever published books and even discovered transformers.

fernly1y ago

Superficially impressive but what is the actual use case of the present state of the art? It makes 10-second demos, fine. But can a producer get a second shot of the same scene and the same characters, with visual continuity? Or a third, etc? In other words, can it be used to create a coherent movie --even a 60-second commercial -- with multiple shots having continuity of faces, backgrounds, and lighting?

This quote suggests not: "maintaining complete consistency throughout complex scenes or those with complex motion, remains a challenge."

okdood641y ago

B-roll for YouTube videos.

hersko1y ago

This is still early. It's only going to get better.

becquerel1y ago

Fun. Fun! I find it a lot of fun to have a computer spit out pixels based on silly ideas I have. It is very amusing to me

m3kw91y ago

You blend them and extend the videos and then you connect enough for a 2 min short

fernly1y ago

That's what I think the tech at this stage cannot do. You make two clips from the same prompt with a minor change, e.g.

> a thief threatens a man with a gun, demanding his money, then fires the gun (etc add details)

> the thief runs away, while his victim slowly collapses on the sidewalk (etc same details)

Would you get the same characters, wearing the identical clothing, the same lighting and identical background details? You need all these elements to be the same, that's what filmmakers call "continuity". I doubt that Veo or any of the generators would actually produce continuity.

sdenton41y ago

Dank memes.

exodust1y ago

> "what is the actual use case of the art?"

Not much. Low quality over-saturated advertising? Short films made by untalented lazy filmmakers?

When text prompts are the only source, creativity is absent. No craft, no art. Audiences won't gravitate towards fake crap that oozes out of AI vending machines, unrefined, artistically uncontrolled.

Imagine visiting a restaurant because you heard the chef is good. You enjoy your meal but later discover the chef has a "food generator" where he prompts the food into existence. Would you go back to that restaurant?

There's one exception. Video-to-video and image-to-video, where your own original artwork, photos, drawings and videos are the source of the generated output. Even then, it's like outsourcing production to an unpredictable third party. Good luck getting lighting and details exactly right.

I see the role of this AI gen stuff as background filler, such as populating set details or distant environments via green screen.

eddd-ddde1y ago

> Imagine visiting a restaurant because you heard the chef is good. You enjoy your meal but later discover the chef has a "food generator" where he prompts the food into existence. Would you go back to that restaurant?

That's an obvious yes from me. I liked it, and not only that, but I can reasonably assume it will be consistently good in the future, something lot's of places can't do.

1 more reply

AuthConnectFail1y ago

short video creation tools, its a huge market

gloflo1y ago

Misinformation

xnx1y ago

This looks great, but I'm confused by this part:

> Veo sample duration is 8s, VideoGen’s sample duration is 10s, and other models' durations are 5s. We show the full video duration to raters.

Could the positive result for Veo 2 mean the raters like longer videos? Why not trim Veo 2's output to 5s for a better controlled test?

I'm not surprised this isn't open to the public by Google yet, there's a huge amount of volunteer red-teaming to be done by the public on other services like hailuoai.video yet.

P.S. The skate tricks in the final video are delightfully insane.

echelon1y ago

> I'm not surprised this isn't open to the public by Google yet,

Closed models aren't going to matter in the long run. Hunyuan and LTX both run on consumer hardware and produce videos similar in quality to Sora Turbo, yet you can train them and prompt them on anything. They fit into the open source ecosystem which makes building plugins and controls super easy.

Video is going to play out in a way that resembles images. Stable Diffusion and Flux like players will win. There might be room for one or two Midjourney-type players, but by and large the most activity happens in the open ecosystem.

sorenjan1y ago

> Hunyuan and LTX both run on consumer hardware

Are there other versions than the official?

> An NVIDIA GPU with CUDA support is required. > Recommended: We recommend using a GPU with 80GB of memory for better generation quality.

https://github.com/Tencent/HunyuanVideo

> I am getting CUDA out of memory on an Nvidia L4 with 24 GB of VRAM, even after using the bfloat16 optimization.

https://github.com/Lightricks/LTX-Video/issues/64

2 more replies

WillyWonkaJr1y ago

I wonder if the more decisive aspect is the data, not the model. Will closed data win over open data?

With the YouTube corpus at their disposal, I don't see how anyone can beat Google for AI video generation.

dyauspitr1y ago

Stable Diffusion and Flux did not win though. Midjourney and chatGPT won.

1 more reply

qwertox1y ago

OpenAI is like the super luxurious yacht all pretty and shiny, while Google's AI department is the humongous nuclear submarine at least 5 times bigger than the yacht with a relatively cool conning tower, but not that spectacular to look at.

Like the tanker which is still steering to fully align with the course people expect it to be, which they don't recognize that it will soon be there and be capable of rolling over everything which comes in its way.

If OpenAi claims they're close to having AGI, Google most likely already has it and is doing its shenanigans with the US government under the radar. While Microsoft are playing the cool guys and Amazon is still trying to get their act together.

tokioyoyo1y ago

All it took was a good old competition that has potential to steal user base from core Google search product. Nice to be back to competition era of web tech.

griomnib1y ago

Or, using Occams Razor; Sundar is a shit CEO and is playing catchup with a company largely fueled by innovations created at Google but never brought to market because it would eat into ads revenue.

That, or they have a secret super human intelligence under wraps at the pentagon.

nmfisher1y ago

That's the conventional take, but (as far as I can tell), the TPU program was also started under Sundar, which would have been a bold investment at the time, and looks like absolute genius in retrospect.

OpenAI might be well-capitalized, but they're (1) bleeding money, (2) no clear path to profitability, and (3) competing head-to-head with a behemoth who can profitably provide a similar offering at 10-20x cheaper (literally).

Google might be slow out the blocks, but it's not like they've been sitting on their hands for the past decade.

1 more reply

byyoung31y ago

google definitely does not have AGI hhaaha

simultsop1y ago

ex-googler confirms :/

JeremyNT1y ago

Yeah pretty bad example from parent but the point stands I think... I mostly just assume that for everything ChatGPT hypes/teases Google probably has something equivalent internally that they just aren't showing off to the public.

1 more reply

demarq1y ago

just to remind everyone that state of the art was Will Smith Eating Spaghetti in April of 2023

https://arstechnica.com/information-technology/2023/03/yes-v...

We're not even done with 2024.

Just imagine what's waiting for us in 2025.

nosbo1y ago

But it's the same thing just at a higher fidelity. Which is impressive don't get me wrong. But they are also kinda bad looking. Like even there good examples have so many issues. I just don't see how this gets extrapolated into the ideas in various posts like full length movies, custom TV shows and holodecks or whatever else people dream up. Do we have any examples of tech that just kept improving at exponential or linear rates? Why is everyone so confident it will just keep getting better?

scotty791y ago

> Do we have any examples of tech that just kept improving at exponential or linear rates?

SD Cards?

xvector1y ago

> Why is everyone so confident it will just keep getting better?

Because there are literally thousands of avenues to explore and we've only just begun with the lowest of low hanging fruit.

1 more reply

markus_zhang1y ago

My friend working in a TV station is already using these tools to generate videos for public advertising programs. It has been a blast.

gamesbrainiac1y ago

This might be a dumb question to ask, but what exactly is this useful for? B-Roll for YouTube videos? I'm not sure why so much effort is being put into something like this when the applications are so limited.

jonas211y ago

If you want to train a model to have a general understanding of the physical world, one way is to show it videos and ask it to predict what comes next, and then evaluate it on how close it was to what actually came next.

To really do well on this task, the model basically has to understand physics, and human anatomy, and all sorts of cultural things. So you're forcing the model to learn all these things about the world, but it's relatively easy to train because you can just collect a lot of videos and show the model parts of them -- you know what the next frame is, but the model doesn't.

Along the way, this also creates a video generation model - but you can think of this as more of a nice side effect rather than the ultimate goal.

manquer1y ago

It doesn’t have to understand anything, none of these demonstrate reasoning or understanding.

All these models have just “seen” enough videos of all those things to build a probability distribution to predict the next step.

This is not bad, or make it inherently dumb, a major component of human intelligence is built on similar strategies. I couldn’t tell what grammatical rules are broken in text or what physical rules in a photograph but can tell it is wrong using the same methods .

Inference can take it far with large enough data sets, but sooner or later without reasoning you will hit a ceiling .

This is true for humans as well, plenty of people go far in life with just memorization and replication do a lot of jobs fairly competently, but not in everything.

Reasoning is essential for higher order functions and transformers is not the path for that

1 more reply

terhechte1y ago

Back when computers took up a whole room, you'd also have asked: "but what exactly is this useful for? B-Roll some simple calculations that anybody can do with a piece of paper and a pen."?

Think 5-10 years into the future, this is a stepping stone

alectroem1y ago

That's comparing apples to oranges though isn't it? Generating videos is the output of the technology, not the tech itself. It would be like someone asking "this computer that takes up a whole room printed out ascii art, what is this useful for?"

1 more reply

code_for_monkey1y ago

this is kind of an unfair comparison. Whats the endpoint of generating AI videos? What can this do that is useful, contributes something to society, has artistic value, etc etc. We can make educational videos with a script but its also pretty easy for motivated parties to do that already, and its getting easier as cameras get better and smaller. I think asking "whats the point of this" is at least fair.

2 more replies

carlosjobim1y ago

They were calculating missile trajectories, everybody understood what they were useful for.

1 more reply

drusepth1y ago

We're preparing to use video generation (specifically image+text => video so we can also include an initial screenshot of the current game state for style control) for generating in-game cutscenes at our video game studio. Specifically, we're generating them at play-time in a sandbox-like game where the game plays differently each time, and therefore we don't want to prerecord any cutscenes.

moritonal1y ago

Okay, so is the aim to run this locally on a client's computer or served from a cloud? How does the math work out where it's not just easier at that point to render it in game?

notatoad1y ago

in it's current state, it's already useful for b-roll, video backgrounds for websites, and any other sort of "generic" application where the point of the shot is just to establish mood and fill time.

but more than anything it's useful as a stepping stone to more full-featured video generation that can maintain characters and story across multiple scenes. it seems clear that at some point tools like this will be able to generate full videos, not just shots.

wnolens1y ago

TV commercials / youtube ads. You don't need a video team anymore to make an ad.

nope961y ago

This is a first step towards "the holodeck". You describe a scene and it exists. Imagine you could jump in and interact with it. That seems like something that could happen in 10-20 years.

mbil1y ago

You and your friends gather around the TV to watch a video about the time that you all traveled abroad and met a mysterious stranger. In the film, you witness each other take incredible risks, have intimate private conversations, and change in profound ways. Of course none of it actually happened; your voices and likenesses were fed into the movie generator. And did I mention in the film you’re driving expensive cars and wearing designer clothes?

Philpax1y ago

Are they that limited? It's a machine that can make videos from user input: it can ostensibly be used wherever you need video, including for creative, technical and professional applications.

Now, it may not be the best fit for those yet due to its limitations, but you've gotta walk before you can run: compare Stable Diffusion 1.x to FLUX.1 with ControlNet to see where quality and controllability could head in the future.

picafrost1y ago

I have observed some musicians creating their own music videos with tools like this.

aenvoker1y ago

This silly music video was put together by one person in about 10 hours.

https://www.reddit.com/r/aivideo/comments/1hbnyi2/comment/m1...

Another more serious music video also made entirely by one person. https://www.youtube.com/watch?v=pdqcnRGzH5c Don't know how long it took though.

hnuser1234561y ago

Because it's pretty cool to be able to imagine any kind of scene in your head, put it into words, then see it be made into a video file that you can actually see and share and refine.

carlosjobim1y ago

Use your imagination.

yieldcrv1y ago

this is perfect for the landing page of any website I make

my templates all are waiting for stock videos to be added looping in the background

you have no idea how cool I am with the lack of copyright protections afforded to these videos I will generate, I'm making my money other ways

krunck1y ago

Streaming services where there is no end to new content that matches your viewing patterns.

code_for_monkey1y ago

this sounds awful haha

chefandy1y ago

It's got a lot of potential as a way for google to get paid for other people's skills and hard work instead of the people that made all of that "data".

ElemenoPicuares1y ago

It’s kind of hilarious that anybody considers this “democratizing” creating media. How many people that need a video clip are going to be capable of running an open version of this themselves? The wonky “open” models aren’t even close. How much do you think these services are going to cost once the introductory period financed by race-to-the-bottom money stops? OpenAI already charges $200/mo if you want to be guaranteed more than 30-60 minutes of Advanced Voice. The introductory period exists solely to get people engaged enough to push through blatantly stealing millions of artists creative output so they can have a beautiful tool they sell to Hollywood for a whole lot of money that’s still less than traditional vfx, and to m everyone gets to dink around in the useless free models or too-expensive-for-most prosumer tools and people with expensive video card arrays or the functional equivalent will still be niche tinkering hobbyists with inferior tooling and models and the skilled commercial artists still employed are being paid shit because of market forces. Great job SV. Making the world a better place.

tucnak1y ago

You really think making videos with computers is not useful? Is this a joke?

thatfrenchguy1y ago

The example of a "Renaissance palace chamber" is very historically inaccurate by around a century or two, the generated video looks a lot like a pastiche of Versailles from the Age en Enlightenment instead. I guess that's what you get by training on the internet.

ralfd1y ago

I watched that 10 times because the details are bonkers and I find amazing that she and the candle is visible in the mirror! Speaking of inaccuracy though are these pencils/textmarkers/pens on the desk? ;)

esafak1y ago

What's inaccurate about it?

EForEndeavour1y ago

It's technically and superficially breathtaking, but on closer inspection, it's a mishmash of styles across like 500 years.

- gold everywhere is excessive - more Rococo (1730s-1760s) than Renaissance (1300-1600 roughly), which was a lot more restrained

- mirror way too big and clear. Renaissance mirrors were small polished metal or darker imperfect glass

- candelabras too ornate and numerous for Renaissance. Multi tier candleholders are more Baroque (1600-1750), and candles look suspiciously perfect, as opposed to period-appropriate uneven tallow or beeswax

- white paper too pristine (parchment or vellum would be expected), pen holders hilariously modern, gold-plated(??) desk surface is absurd

- woman's clothing looks too recent (Victorian?); sleeves and hair are theatrical

- hard to tell, but background dudes are lurking in what look like theatrical costumes rather than anything historically accurate

chrsw1y ago

So, in addition to images that don't look right the web will also be flooded with animations and videos that are disturbingly awful. Great.

seabombs1y ago

I'm always curious with the examples in these announcements, how close is the training data to the sample prompts? And how much of the prompt is important or ends up ignored in the result?

The prompt for the figure running through glowing threads seems to contain a lot of detail that doesn't show up in the video.

In the first example (close-up of DJ), the last line about her captivating presence and the power of music I guess should give the video a "vibe" (compared to prescriptively describing the video). I wonder how the result changes if you leave it out?

Cynically I think that it's a leading statement there for the reader rather than the model. Like now that you mention it, her presence _is_ captivating! Wow!

sungho_1y ago

I've already started not to notice the quality differences in the photo-like images produced by each image generation model.

Now, examples of image or video generation models showing off how great they are should be stickman drawings or stickman videos. As far as I know, no model has been able to do that properly yet. If a model can do it well, it will be a huge breakthrough.

dangan1y ago

Is it just me or do all these models generate everything in a weird pseudo-slow motion framerate?

christianqchung1y ago

I've noticed this too, it's extremely prominent to me and I'm not sure why it's not discussed frequently.

bufferoverflow1y ago

Better than the opposite. You can always skip frames to get the normal speed. But motion interpolation never looks good to me.

vunderba1y ago

I mean, I'm not sure it's done deliberately but... if I was trying to guarantee video gen was always 5 seconds in a consistent manner and the gen process was highly non-deterministic then if the resultant video would have only been 3 seconds I'd stretch it out, interpolate the frames, and then send it down the pipes.

Another point to consider is that if my generative video system isn't good at maintaining world consistency, then doing a slow-motion video gives the illusion of a long video while being able to maintain a smaller "world context".

m3kw91y ago

With unfettered access to video training data from YouTube this isn’t all surprising they can get be better than what OpenAI has with Sora. Not sure how they will respond

joshdavham1y ago

Impressive but the page crashed chrome on my iPad!

talldayo1y ago

Might be time for a new iPad. My old-school iPad Air has 2gb of memory and is an absolute hog when loading content-heavy websites.

can16358p1y ago

It crashed Safari on iPhone 16 Pro Max, I doubt it's the device.

The website is horrible on resources.

wruza1y ago

This site is not content-heavy, it’s less than a bunch of videos, pictures and text. This site was just done by useless jokes not worth their money.

zb31y ago

We should collectively ignore these announcements of unavailable models. There are models you can use today, even in the EU.

ilaksh1y ago

Actually there is a pretty significant new model announced today and available now: "MiniMax (Hailuo)Video-01-Live" https://blog.fal.ai/introducing-minimax-hailuo-video-01-live...

Although I tried that and it has the same issue all of them seem to have for me: if you are familiar with the face but they are not really famous then the features in the video are never close enough to be able to recognize the same person.

creativenolo1y ago

It was announced weeks ago.

50 cents per video. Far more when accounting for a cherrypick rate.

the8thbit1y ago

I don't see why, unless you think they're lying and they filmed their demos, or used some other preexisting model. I didn't ignore the JWST launch just because I haven't been granted to ability to use the telescope.

zb31y ago

Back when Imagen was not public, they didn't properly validate whether you were a "trusted tester" on the backend, so I managed to generate a few images..

..and that's when I realized how much cherry picking we have in these "demos". These demos are about deceiving you into thinking the model is much better than it actually is.

This promotes not making the models available, because people then compare their extrapolation of demo images with the actual outputs. This can trick people into thinking Google is winning the game.

seanvelasco1y ago

as OpenAI released a feature that hit Google where it hurts, Google released Veo 2 to utterly destroy OpenAI's Sora.

Google won.

Jotalea1y ago

Random fact: Veo means "I see" in Spanish. Take it on any way you want.

espadrine1y ago

Hernan Moraldo is from Argentina. That may be all there is to it.

mgnn1y ago

Video without the id. You pick which definition of id.

arnaudsm1y ago

While Video means "I see" in latin

stabbles1y ago

It's interesting they host these videos on YouTube, cause it signals they're fine with AI generated content. I wonder if Google forgets that the creators themselves are what makes YouTube interesting for viewers.

yoavm1y ago

What makes you think that viewers wouldn't be watching AI generated content? Considering the possibilities of fake videos, I'm sure that it can be very engaging. And the costs are zero.

bufferoverflow1y ago

The costs are not zero. I recently generated a short AI video for my son in Runway Act One. That $15 balance evaporated in like 6 prompts.

Of course, it's orders of magnitude cheaper than making a video or an animation yourself.

1 more reply

Miraltar1y ago

Search youtube for stoicism, you'll find an overwhelming amount of generated content. And a lot of other niche subjects have been colonized like that.

1 more reply

brap1y ago

Google is killing it

sylware1y ago

Anybody does realize this is very sad?

Namely, so few neurons to get picture in our heads.

I guess, end of the world scenarios may lead us to create that super intelligence with a gigantic ultra performant artificial "brain".

reassess_blind1y ago

Website keeps crashing and reloading on Brave iOS.

can16358p1y ago

Same here. Well, Google being Google, not surprised.

wruza1y ago

A page with a bunch of videos struggles to scroll on iphone and crashes the browser for me. Google actively punches through rock bottom with its frontend teams.

mabedan1y ago

Yeah but I’m sure they crushed Leetcode exercises

itsTyrion1y ago

Impressive we can do that - but, again, a hyped up solution in search for a problem after pouring tons of resources into it

ible1y ago

That product name sucks for Veo the AI sports video camera company who literally makes a product called the Veo 2. (https://www.veo.co)

jsheard1y ago

Judging by how they've been trying to ram AI into YouTube creators workflows I suppose it's only a matter of time before they try to automate the entire pipeline from idea, to execution, to "engaging" with viewers. It won't be good at doing any of that but when did that ever stop them.

https://www.youtube.com/watch?v=26QHXElgrl8

https://x.com/surri01/status/1867433782992879617

spankalee1y ago

They basically already have this: https://workspace.google.com/products/vids/

cj1y ago

Last week I started seeing a banner in Google Docs along the lines of "Create a video based on the content of this doc!" with a call to action that brought me to Google Vids.

1 more reply

larodi1y ago

And then suddenly this is not something that fascinates people anymore… in 10 years as non-synthetic becomes the new bio or artisan or whatever you like.

Humanity has its ways of objecting accelerationism.

turnsout1y ago

Put another way, over time people devalue things which can be produced with minimal human effort. I suspect it's less about humanity's values, and more about the way money closely tracks "time" (specifically the duration of human effort).

2 more replies

vouaobrasil1y ago

> Humanity has its ways of objecting accelerationism.

Actually, typically human objection only slows it down and often it becomes a fringe movement, while the masses continue to consume the lowest common denominator. Take the revival of the flip phone, typewriter, etc. Sadly, technology marches on and life gets worse.

1 more reply

echelon1y ago

Are you kidding?

TikTok is one of the easiest platforms to create for, and look at how much human attention it has sucked up.

The attention/dopamine magnet is accelerating its transformation into a gravitational singularity for human minds.

1 more reply

gom_jabbar1y ago

Sure, humanity has its ways of objecting Accelerationism, but the process fundamentally challenges human identity:

"The Human Security System is structured by delusion. What's being protected there is not some real thing that is mankind, it's the structure of illusory identity. Just as at the more micro level it's not that humans as an organism are being threatened by robots, it's rather that your self-comprehension as an organism becomes something that can't be maintained beyond a certain threshold of ambient networked intelligence." [0]

See also my research project on the core thesis of Accelerationism that capitalism is AI. [1]

[0] https://syntheticzero.net/2017/06/19/the-only-thing-i-would-...

[1] https://retrochronic.com/

noch1y ago

> Judging by how they've been trying to ram AI into YouTube creators workflows […]

Thanks for sharing that video and post!

One way to think about this stuff is to imagine that you are 14 and starting to create videos, art, music, etc in order to build a platform online. Maybe you dream of having 7 channels at the same time for your sundry hobbies and building audiences.

For that 14 year old, these tools are available everywhere by default and are a step function above what the prior generation had. If you imagine these tools improving even faster in usability and capability than prior generations' tools did …

If you are of a certain age you'll remember how we were harangued endlessly about "remix culture" and how mp3s were enabling us to steal creativity without making an effort at being creative ourselves, about how photobashing in Photoshop (pirated cracked version anyway) was not real art, etc.

And yet, halfway through the linked video, the speaker, who has misgivings, was laughing out loud at the inventiveness of the generated replies and I was reminded that someone once said that one true IQ test is the ability to make other humans laugh.

jsheard1y ago

> laughing out loud at the inventiveness of the generated replies

Inventive is one way of putting it, but I think he was laughing at how bizarre or out-of-character the responses would be if he used them. Like the AI suggesting that he post "it is indeed a beverage that would make you have a hard time finding a toilet bowl that can hold all of that liquid" as if those were his own words.

handsaway1y ago

"remix culture" required skill and talent. Not everyone could be Girl Talk or make The Grey Album or Wugazi. The artists creating those projects clearly have hundreds if not thousands of hours of practice differentiating them from someone who just started pasting MP3s together in a DAW yesterday.

If this is "just another tool" then my question is: does the output of someone who has used this tool for one thousand hours display a meaningful difference in quality to someone who just picked it up?

I have not seen any evidence that it does.

Another idea: What the pro generative AI crowd doesn't seem to understand is that good art is not about _execution_ it's about _making deliberate choices_. While a master painter or guitarist may indeed pull off incredible technical feats, their execution is not the art in and of itself, it is widening the amount of choices they can make. The more and more generative AI steps into the role of making these choices ironically the more useless it becomes.

And lastly: I've never met anyone who has spent significant time creating art react to generative AI as anything more than a toy.

3 more replies

EGreg1y ago

Who needs viewers anyway? Automate the whole thing. I just see the endgame for the internet is https://en.wikipedia.org/wiki/Dead_Internet_theory

wruza1y ago

They already do that. 90% of cheerful top comments on some channels are clearly generated. They never mention content (yet) and are abstract as hell. “Very useful video, thanks”, “I’m watching this every day, love the content” and so on. It’s unclear if they have a real view count either, at least in promotion phase.

klabb31y ago

It’s telling that safety and responsibility gets so much fluff words, technical details are fairly extensive, but no mention of the training data? It’s clearly relevant for both performance and ethical discussions.

Maybe it’s just me who couldn’t find it, (the website barely works at all on FF iOS)..

tokioyoyo1y ago

Most people called that the second one of the companies stop caring about safety, others will stop as well. People hate being told what they’re not supposed to do. And not companies will go forward with abandoning their responsible use policies.

Retr0id1y ago

Huge swathes of social media users are going to love this shit. It makes me so sad.

theorangejuica1y ago

Time and money are better spent on creating actual video, animation, and art than this gen AI drivel.

tauntz1y ago

Google being Google:

> VideoFX isn't available in your country yet.

ilaksh1y ago

Don't worry, even if it was "available" in your country, it's not really available. I am in the US and I just see a waitlist sign up.

jjbinx0071y ago

Give it a few months and it'll get cancelled

warkdarrior1y ago

Why would the country get cancelled?

1 more reply

alsodumb1y ago

My theory as to why all the bigtech companies are investing so much money in video generation models is simple: they are trying to eliminate the threat of influencers/content creators to their ad revenue.

Think about it, almost everyone I know rarely clicks on ads or buys from ads anymore. On the other hand, a lot of people including myself look into buying something advertised implicitly or explicitly by content creators we follow. Say a router recommended by LinusTechTips. A lot of brands started moving their as spending to influencers too.

Google doesn't have a lot of control on these influencers. But if they can get good video generations models, they can control this ad space too without having human in the loop.

spankalee1y ago

It's so much simpler than that:

1) AI is a massive wave right now and everyone's afraid that they're going to miss it, and that it will change the world. They're not obviously wrong!

2) AI is showing real results in some places. Maybe a lot of us are numb to what gen AI can do by now, but the fact that it can generate the videos in this post is actually astounding! 10 years ago it would have been borderline unbelievable. Of course they want to keep investing in that.

summerlight1y ago

> Think about it, almost everyone I know rarely clicks on ads or buys from ads anymore.

This is a typical tech echo chamber. There is a significant number of people who make direct purchases through ads.

> But if they can get good video generations models, they can control this ad space too without having human in the loop.

Looks like based on a misguided assumption. Format might have significant impacts on reach, but decision factor is trust on the reviewer. Video format itself does not guarantee a decent CTR/CVR. It's true that those ads company find this space lucrative, but they're smart enough to acknowledge this complexity.

the8thbit1y ago

> This is a typical tech echo chamber. There is a significant number of people who make direct purchases through ads.

Even if its not, TV ads, newspaper ads, magazine ads, billboards, etc... get exactly 0 clickthrus, and yet, people still bought (and continue to buy) them. Why do we act like impressions are hunky-dory for every other medium, but worthless for web ads?

PittleyDunkin1y ago

> Think about it, almost everyone I know rarely clicks on ads or buys from ads anymore.

I remember saying this to a google VP fifteen years ago. Somehow people are still clicking on ads today.

wruza1y ago

Sometimes it feels like we could solve most of the world’s problems by simply finding all those people and giving them a good talk. Cause I know that even stupid ads may work, on you, on me, on someone else, simply by mentioning brand existence. But clicking on ads equals to signing your own stupidity in my book. It must be not more than a few per thousand. Maybe the world is so big that even 0.1% is enough?

vinayuck1y ago

I did not think about that angle yet but I have to admit, I agree. I rarely ever even pay attention to the YT ads and kind of just zone out but the recommendations by content creators I usually watch are one of the main sources I keep up with new products and decide what to buy.

dragonwriter1y ago

> Think about it, almost everyone I know rarely clicks on ads or buys from ads anymore.

Most people have claimed not to be influenced by ads since long before networked computers were a major medium for delivering them.

chefandy1y ago

Nah. They're trying to eliminate the threat of content creators, artists, designers, animators, etc getting paid for their art and hard won skill instead of google.

j / k navigate · click thread line to collapse

327 comments

simonw1y ago

I got access to the preview, here's what it gave me for "A pelican riding a bicycle along a coastal path overlooking a harbor" - this video has all four versions shown:

https://static.simonwillison.net/static/2024/pelicans-on-bic...

All four were better than what I got from Sora: https://simonwillison.net/2024/Dec/9/sora/

yurylifshits1y ago

There's another important contender in the space: Hunyuan model from Tencent

My company (Nim) is hosting Hunyuan model, so here's a quick test (first attempt) at "pelican riding a bycicle" via Hunyuan on Nim: https://nim.video/explore/OGs4EM3MIpW8

I think it's as good, if not better than Sora / Veo

chrismorgan1y ago

What does it produce for “A pelican riding a bicycle along a coastal path overlooking a harbor”?

Or, what do Sora and Veo produce for your verbose prompt?

1 more reply

sashank_15091y ago

Hard to say about SORA but the video you shared is most definitely worse than Veo.

Veo is clearly the best in this example. It has high detail but also feels the most physically grounded among the examples.

1 more reply

dyauspitr1y ago

Pretty good except the backwards body and the strange wing movement. The feeling of motion is fantastic though.

arjie1y ago

I was curious how it would perform with prompt enhancement turned off. Here's a single attempt (no regenerations etc.): https://www.youtube.com/watch?v=730cb2qozcM

    Prompt
    A pelican riding a bicycle along a coastal path overlooking a harbor
    Seed
    15185546
    Resolution
    720×480

1 more reply

gcr1y ago

FYI your website shows me a static image on iOS 18.2 Safari. Strangely, the progress bar still appears to “loop,” but the bird isn’t moving at all.

Turning content blockers off does not make a difference.

1 more reply

dr_kiszonka1y ago

Reddit says it is much better than Sora. Are you hosting the full version of Nunyuan? (Your video looks great.)

2 more replies

prometheon11y ago

Is it still better if you copy his whole prompt instead of half of it?

c0brac0bra1y ago

I mean, the pelican's body is backwards...

tim3331y ago

Here's one of a penguin paragliding and it's surprisingly realistic https://x.com/Plinz/status/1868885955597549624

0_____01y ago

This is the first GenAI video to produce an "oh shit" reflex in me.

oh, shit!

p1necone1y ago

oneshtein1y ago

Here is my version of pelican at bicycle made with hailuoai:

https://hailuoai.video/share/N9dlRd1L1o0p

nkingsy1y ago

His little bike helmet is adorable

mckirk1y ago

The AI safety team was really proud of that one.

AgentME1y ago

It's funny having looked forward to Sora for a while and then seeing it be superseded so shortly after access to it is finally made public.

grumbel1y ago

jerpint1y ago

It looks much better than Sora but still kind of in uncanny valley

spaceman_20201y ago

This is the worst it will ever be…

victorbjorklund1y ago

That is surprisingly good. We are at a point where it seems to be good enough for at least b-roll content replacing stock video clips.

rob741y ago

Well yeah, if you look closely at the example videos on the site, one of them is not quite right either:

So, yet again: AI produces impressive results, but it rarely does exactly what you wanted it to do...

soco1y ago

jillyboel1y ago

This tech is cute but the only viable outcomes are going to be porn and mass produced slop that'll be uninteresting before it's even created. Why even bother?

andybak1y ago

There will be both of those things in abundance.

We'll have to figure out how to solve the slop problem - it was already an issues before AI so maybe this is just hastening the inevevitable.

1 more reply

bottled_poe1y ago

Comments like this one are so predictable and incredulous. As if the current state of the art is the final form of this technology. This is just getting started. Big facepalm.

4 more replies

sigmar1y ago

>at resolutions up to 4K, and extended to minutes in length.

https://blog.google/technology/google-labs/video-image-gener...

torginus1y ago

It looks Sora is actually the worst performer in the benchmarks, with Kling being the best and others not far behind.

jonplackett1y ago

I’ve been using Kling a lot recently and been really impressed, especially by 1.5.

I was so excited to see Sora out - only to see it has most of the same problems. And Kling seems to do better in a lot of benchmarks.

I can’t quite make sense of it - what OpenAI were showing when they first launched Sora was so amazing. Was it cherry picked? Or was it using loads more compute than what they’ve release?

1 more reply

BugsJustFindMe1y ago

> the jar is at a very unnatural acceleration (like it pops up).

It does pop up. Look at where his hand is relative to the jar when he grabs it vs when he stops lifting it. The hand and the jar are moving, but the jar is non-physically unattached to the grab.

lukol1y ago

Last time Google made a big Gemini announcement, OpenAI owned them by dropping the Sora preview shortly after.

This feels like a bit of a comeback as Veo 2 (subjectively) appears to be a step up from what Sora is currently able to achieve.

htrp1y ago

Some PM is literally sitting on this release waiting for their benchmarks to finish

esafak1y ago

And it's going to be hard for OpenAI to do that again, now that Google's woken up.

jasonjmcghee1y ago

I appreciate they posted the skateboarding video. Wildly unrealistic whenever he performs a trick - just morphing body parts.

Some of the videos look incredibly believable though.

visnup1y ago

our only hope for verifying truth in the future is that state officials give their speeches while doing kick flips and frontside 360s.

stabbles1y ago

sadly it's likely that video gen models will master this ability faster than state officials

1 more reply

markus_zhang1y ago

Maybe they will do more in person talks, I guess. Back to the old times.

1 more reply

throw43211y ago

What officials actually say doesn't make a difference anymore. People do not get bamboozled because of lack of facts. People who get bamboozled are past facts.

1 more reply

kaonwarb1y ago

This was my favorite of all of the videos. There's no uncanny valley; it's openly absurd, and I watched it 4-5 times with increasing enjoyment.

bahmboo1y ago

johndough1y ago

dyauspitr1y ago

The honey, Peruvian women, swimming dog, bee keeper, DJ etc. are stunning. They’re short but I can barely find any artifacts.

__float1y ago

The prompt for the honey video mentions ending with a shot of an orange. The orange just...isn't there, though?

mattigames1y ago

cyv3r1y ago

I don't know why they say the model understands physics when it makes mistakes like that still.

0xcb01y ago

Imho is stunning, yet what is happening there is super dangerous.

These videos will and may be too realistic.

ks20481y ago

Are Apple and other phone/camera makers working on ways to "sign" a video to say it's an unedited video from a camera? Does this exist now? Is it possible?

I'm thinking of simple cryptographic signing of a file, rather than embedding watermarks into the content, but that's another option.

I don't think it will solve the fake video onslaught, but it could help.

jazzyjackson1y ago

Leica M11 signs each photo. "Content Authority Initiative" https://leica-camera.com/en-US/news/partnership-greater-trus...

1 more reply

ttul1y ago

I think this will be a thing one day, where photos are digitally watermarked by the camera sensor in a non-repudiable manner.

1 more reply

bravoetch1y ago

tomp1y ago

We've had realistic sci-fi and alternate history movies for a very long time.

oldmanhorton1y ago

Which take millions of dollars and huge teams to make. These take one bored person, a sentence, and a few minutes to go from idea to posting on social media. That difference is the entire concern.

1 more reply

krapp1y ago

The only thing this changes is not needing to pay human beings for work.

dtquad1y ago

WickyNilliams1y ago

I feel this kind of hypervigilance will be mentally exhausting, and not being able to trust your primary senses will have untold psychological effects

7 more replies

Retr0id1y ago

What would motivate "big tech" to warn people about their own products, if not regulations?

jprete1y ago

Don't forget text. You can't trust text either.

And no big tech company would run the ads you're suggesting, because they only make money when people use the systems that deliver the untrustworthy content.

onel1y ago

The same things could be said when everyone could print their own newspapers or books. How would people distinguish between fake and real news?

I think we will need the same healthy media diet.

dbbk1y ago

There wasn't even a healthy media diet before generative AI given the amount of 'fake news' in 2016 and 2020.

golergka1y ago

Photoshop has been a thing for over 30 years.

EForEndeavour1y ago

dbbk1y ago

I still don't really know why we're doing this. What is the upside? Democratising Hollywood? At the expense of... enormous catastrophic disinformation and media manipulation.

ddalex1y ago

Retr0id1y ago

We have constructed a society where market forces feel inevitable, but it doesn't have to be that way.

1 more reply

veryrealsid1y ago

FWIW it feels like Google should dominate text/image -> video since they have access to Youtube unfettered. Excited to see what the reception is here.

paxys1y ago

Everyone has access to YouTube. It’s safe to assume that Sora was trained on it as well.

Jeff_Brown1y ago

All you can eat? Surely they charge a lot for that, at least. And how would you even find all the videos?

3 more replies

bangaladore1y ago

Does everyone have "legal" access to YouTube.

In theory that should matter to something like Open(Closed)Ai. But who knows.

1 more reply

hirako20001y ago

fernly1y ago

This quote suggests not: "maintaining complete consistency throughout complex scenes or those with complex motion, remains a challenge."

okdood641y ago

B-roll for YouTube videos.

hersko1y ago

This is still early. It's only going to get better.

becquerel1y ago

Fun. Fun! I find it a lot of fun to have a computer spit out pixels based on silly ideas I have. It is very amusing to me

m3kw91y ago

You blend them and extend the videos and then you connect enough for a 2 min short

fernly1y ago

That's what I think the tech at this stage cannot do. You make two clips from the same prompt with a minor change, e.g.

> a thief threatens a man with a gun, demanding his money, then fires the gun (etc add details)

> the thief runs away, while his victim slowly collapses on the sidewalk (etc same details)

sdenton41y ago

Dank memes.

exodust1y ago

> "what is the actual use case of the art?"

Not much. Low quality over-saturated advertising? Short films made by untalented lazy filmmakers?

When text prompts are the only source, creativity is absent. No craft, no art. Audiences won't gravitate towards fake crap that oozes out of AI vending machines, unrefined, artistically uncontrolled.

I see the role of this AI gen stuff as background filler, such as populating set details or distant environments via green screen.

eddd-ddde1y ago

That's an obvious yes from me. I liked it, and not only that, but I can reasonably assume it will be consistently good in the future, something lot's of places can't do.

1 more reply

AuthConnectFail1y ago

short video creation tools, its a huge market

gloflo1y ago

Misinformation

xnx1y ago

This looks great, but I'm confused by this part:

> Veo sample duration is 8s, VideoGen’s sample duration is 10s, and other models' durations are 5s. We show the full video duration to raters.

Could the positive result for Veo 2 mean the raters like longer videos? Why not trim Veo 2's output to 5s for a better controlled test?

I'm not surprised this isn't open to the public by Google yet, there's a huge amount of volunteer red-teaming to be done by the public on other services like hailuoai.video yet.

P.S. The skate tricks in the final video are delightfully insane.

echelon1y ago

> I'm not surprised this isn't open to the public by Google yet,

sorenjan1y ago

> Hunyuan and LTX both run on consumer hardware

Are there other versions than the official?

> An NVIDIA GPU with CUDA support is required. > Recommended: We recommend using a GPU with 80GB of memory for better generation quality.

https://github.com/Tencent/HunyuanVideo

> I am getting CUDA out of memory on an Nvidia L4 with 24 GB of VRAM, even after using the bfloat16 optimization.

https://github.com/Lightricks/LTX-Video/issues/64

2 more replies

WillyWonkaJr1y ago

I wonder if the more decisive aspect is the data, not the model. Will closed data win over open data?

With the YouTube corpus at their disposal, I don't see how anyone can beat Google for AI video generation.

dyauspitr1y ago

Stable Diffusion and Flux did not win though. Midjourney and chatGPT won.

1 more reply

qwertox1y ago

tokioyoyo1y ago

All it took was a good old competition that has potential to steal user base from core Google search product. Nice to be back to competition era of web tech.

griomnib1y ago

Or, using Occams Razor; Sundar is a shit CEO and is playing catchup with a company largely fueled by innovations created at Google but never brought to market because it would eat into ads revenue.

That, or they have a secret super human intelligence under wraps at the pentagon.

nmfisher1y ago

Google might be slow out the blocks, but it's not like they've been sitting on their hands for the past decade.

1 more reply

byyoung31y ago

google definitely does not have AGI hhaaha

simultsop1y ago

ex-googler confirms :/

JeremyNT1y ago

1 more reply

demarq1y ago

just to remind everyone that state of the art was Will Smith Eating Spaghetti in April of 2023

https://arstechnica.com/information-technology/2023/03/yes-v...

We're not even done with 2024.

Just imagine what's waiting for us in 2025.

nosbo1y ago

scotty791y ago

> Do we have any examples of tech that just kept improving at exponential or linear rates?

SD Cards?

xvector1y ago

> Why is everyone so confident it will just keep getting better?

Because there are literally thousands of avenues to explore and we've only just begun with the lowest of low hanging fruit.

1 more reply

markus_zhang1y ago

My friend working in a TV station is already using these tools to generate videos for public advertising programs. It has been a blast.

gamesbrainiac1y ago

jonas211y ago

Along the way, this also creates a video generation model - but you can think of this as more of a nice side effect rather than the ultimate goal.

manquer1y ago

It doesn’t have to understand anything, none of these demonstrate reasoning or understanding.

All these models have just “seen” enough videos of all those things to build a probability distribution to predict the next step.

Inference can take it far with large enough data sets, but sooner or later without reasoning you will hit a ceiling .

This is true for humans as well, plenty of people go far in life with just memorization and replication do a lot of jobs fairly competently, but not in everything.

Reasoning is essential for higher order functions and transformers is not the path for that

1 more reply

terhechte1y ago

Back when computers took up a whole room, you'd also have asked: "but what exactly is this useful for? B-Roll some simple calculations that anybody can do with a piece of paper and a pen."?

Think 5-10 years into the future, this is a stepping stone

alectroem1y ago

1 more reply

code_for_monkey1y ago

2 more replies

carlosjobim1y ago

They were calculating missile trajectories, everybody understood what they were useful for.

1 more reply

drusepth1y ago

moritonal1y ago

Okay, so is the aim to run this locally on a client's computer or served from a cloud? How does the math work out where it's not just easier at that point to render it in game?

notatoad1y ago

in it's current state, it's already useful for b-roll, video backgrounds for websites, and any other sort of "generic" application where the point of the shot is just to establish mood and fill time.

wnolens1y ago

TV commercials / youtube ads. You don't need a video team anymore to make an ad.

nope961y ago

This is a first step towards "the holodeck". You describe a scene and it exists. Imagine you could jump in and interact with it. That seems like something that could happen in 10-20 years.

mbil1y ago

Philpax1y ago

Are they that limited? It's a machine that can make videos from user input: it can ostensibly be used wherever you need video, including for creative, technical and professional applications.

picafrost1y ago

I have observed some musicians creating their own music videos with tools like this.

aenvoker1y ago

This silly music video was put together by one person in about 10 hours.

https://www.reddit.com/r/aivideo/comments/1hbnyi2/comment/m1...

Another more serious music video also made entirely by one person. https://www.youtube.com/watch?v=pdqcnRGzH5c Don't know how long it took though.

hnuser1234561y ago

Because it's pretty cool to be able to imagine any kind of scene in your head, put it into words, then see it be made into a video file that you can actually see and share and refine.

carlosjobim1y ago

Use your imagination.

yieldcrv1y ago

this is perfect for the landing page of any website I make

my templates all are waiting for stock videos to be added looping in the background

you have no idea how cool I am with the lack of copyright protections afforded to these videos I will generate, I'm making my money other ways

krunck1y ago

Streaming services where there is no end to new content that matches your viewing patterns.

code_for_monkey1y ago

this sounds awful haha

chefandy1y ago

It's got a lot of potential as a way for google to get paid for other people's skills and hard work instead of the people that made all of that "data".

ElemenoPicuares1y ago

tucnak1y ago

You really think making videos with computers is not useful? Is this a joke?

thatfrenchguy1y ago

ralfd1y ago

esafak1y ago

What's inaccurate about it?

EForEndeavour1y ago

It's technically and superficially breathtaking, but on closer inspection, it's a mishmash of styles across like 500 years.

- gold everywhere is excessive - more Rococo (1730s-1760s) than Renaissance (1300-1600 roughly), which was a lot more restrained

- mirror way too big and clear. Renaissance mirrors were small polished metal or darker imperfect glass

- white paper too pristine (parchment or vellum would be expected), pen holders hilariously modern, gold-plated(??) desk surface is absurd

- woman's clothing looks too recent (Victorian?); sleeves and hair are theatrical

- hard to tell, but background dudes are lurking in what look like theatrical costumes rather than anything historically accurate

chrsw1y ago

So, in addition to images that don't look right the web will also be flooded with animations and videos that are disturbingly awful. Great.

seabombs1y ago

I'm always curious with the examples in these announcements, how close is the training data to the sample prompts? And how much of the prompt is important or ends up ignored in the result?

The prompt for the figure running through glowing threads seems to contain a lot of detail that doesn't show up in the video.

Cynically I think that it's a leading statement there for the reader rather than the model. Like now that you mention it, her presence _is_ captivating! Wow!

sungho_1y ago

I've already started not to notice the quality differences in the photo-like images produced by each image generation model.

dangan1y ago

Is it just me or do all these models generate everything in a weird pseudo-slow motion framerate?

christianqchung1y ago

I've noticed this too, it's extremely prominent to me and I'm not sure why it's not discussed frequently.

bufferoverflow1y ago

Better than the opposite. You can always skip frames to get the normal speed. But motion interpolation never looks good to me.

vunderba1y ago

m3kw91y ago

With unfettered access to video training data from YouTube this isn’t all surprising they can get be better than what OpenAI has with Sora. Not sure how they will respond

joshdavham1y ago

Impressive but the page crashed chrome on my iPad!

talldayo1y ago

Might be time for a new iPad. My old-school iPad Air has 2gb of memory and is an absolute hog when loading content-heavy websites.

can16358p1y ago

It crashed Safari on iPhone 16 Pro Max, I doubt it's the device.

The website is horrible on resources.

wruza1y ago

This site is not content-heavy, it’s less than a bunch of videos, pictures and text. This site was just done by useless jokes not worth their money.

zb31y ago

We should collectively ignore these announcements of unavailable models. There are models you can use today, even in the EU.

ilaksh1y ago

Actually there is a pretty significant new model announced today and available now: "MiniMax (Hailuo)Video-01-Live" https://blog.fal.ai/introducing-minimax-hailuo-video-01-live...

creativenolo1y ago

It was announced weeks ago.

50 cents per video. Far more when accounting for a cherrypick rate.

the8thbit1y ago

zb31y ago

Back when Imagen was not public, they didn't properly validate whether you were a "trusted tester" on the backend, so I managed to generate a few images..

..and that's when I realized how much cherry picking we have in these "demos". These demos are about deceiving you into thinking the model is much better than it actually is.

This promotes not making the models available, because people then compare their extrapolation of demo images with the actual outputs. This can trick people into thinking Google is winning the game.

seanvelasco1y ago

as OpenAI released a feature that hit Google where it hurts, Google released Veo 2 to utterly destroy OpenAI's Sora.

Google won.

Jotalea1y ago

Random fact: Veo means "I see" in Spanish. Take it on any way you want.

espadrine1y ago

Hernan Moraldo is from Argentina. That may be all there is to it.

mgnn1y ago

Video without the id. You pick which definition of id.

arnaudsm1y ago

While Video means "I see" in latin

stabbles1y ago

yoavm1y ago

What makes you think that viewers wouldn't be watching AI generated content? Considering the possibilities of fake videos, I'm sure that it can be very engaging. And the costs are zero.

bufferoverflow1y ago

The costs are not zero. I recently generated a short AI video for my son in Runway Act One. That $15 balance evaporated in like 6 prompts.

Of course, it's orders of magnitude cheaper than making a video or an animation yourself.

1 more reply

Miraltar1y ago

Search youtube for stoicism, you'll find an overwhelming amount of generated content. And a lot of other niche subjects have been colonized like that.

1 more reply

brap1y ago

Google is killing it

sylware1y ago

Anybody does realize this is very sad?

Namely, so few neurons to get picture in our heads.

I guess, end of the world scenarios may lead us to create that super intelligence with a gigantic ultra performant artificial "brain".

reassess_blind1y ago

Website keeps crashing and reloading on Brave iOS.

can16358p1y ago

Same here. Well, Google being Google, not surprised.

wruza1y ago

A page with a bunch of videos struggles to scroll on iphone and crashes the browser for me. Google actively punches through rock bottom with its frontend teams.

mabedan1y ago

Yeah but I’m sure they crushed Leetcode exercises

itsTyrion1y ago

Impressive we can do that - but, again, a hyped up solution in search for a problem after pouring tons of resources into it

ible1y ago

That product name sucks for Veo the AI sports video camera company who literally makes a product called the Veo 2. (https://www.veo.co)

jsheard1y ago

https://www.youtube.com/watch?v=26QHXElgrl8

https://x.com/surri01/status/1867433782992879617

spankalee1y ago

They basically already have this: https://workspace.google.com/products/vids/

cj1y ago

Last week I started seeing a banner in Google Docs along the lines of "Create a video based on the content of this doc!" with a call to action that brought me to Google Vids.

1 more reply

larodi1y ago

And then suddenly this is not something that fascinates people anymore… in 10 years as non-synthetic becomes the new bio or artisan or whatever you like.

Humanity has its ways of objecting accelerationism.

turnsout1y ago

2 more replies

vouaobrasil1y ago

> Humanity has its ways of objecting accelerationism.

1 more reply

echelon1y ago

Are you kidding?

TikTok is one of the easiest platforms to create for, and look at how much human attention it has sucked up.

The attention/dopamine magnet is accelerating its transformation into a gravitational singularity for human minds.

1 more reply

gom_jabbar1y ago

Sure, humanity has its ways of objecting Accelerationism, but the process fundamentally challenges human identity:

See also my research project on the core thesis of Accelerationism that capitalism is AI. [1]

[0] https://syntheticzero.net/2017/06/19/the-only-thing-i-would-...

[1] https://retrochronic.com/

noch1y ago

> Judging by how they've been trying to ram AI into YouTube creators workflows […]

Thanks for sharing that video and post!

jsheard1y ago

> laughing out loud at the inventiveness of the generated replies

handsaway1y ago

I have not seen any evidence that it does.

And lastly: I've never met anyone who has spent significant time creating art react to generative AI as anything more than a toy.

3 more replies

EGreg1y ago

Who needs viewers anyway? Automate the whole thing. I just see the endgame for the internet is https://en.wikipedia.org/wiki/Dead_Internet_theory

wruza1y ago

klabb31y ago

Maybe it’s just me who couldn’t find it, (the website barely works at all on FF iOS)..

tokioyoyo1y ago

Retr0id1y ago

Huge swathes of social media users are going to love this shit. It makes me so sad.

theorangejuica1y ago

Time and money are better spent on creating actual video, animation, and art than this gen AI drivel.

tauntz1y ago

Google being Google:

> VideoFX isn't available in your country yet.

ilaksh1y ago

Don't worry, even if it was "available" in your country, it's not really available. I am in the US and I just see a waitlist sign up.

jjbinx0071y ago

Give it a few months and it'll get cancelled

warkdarrior1y ago

Why would the country get cancelled?

1 more reply

alsodumb1y ago

Google doesn't have a lot of control on these influencers. But if they can get good video generations models, they can control this ad space too without having human in the loop.

spankalee1y ago

It's so much simpler than that:

1) AI is a massive wave right now and everyone's afraid that they're going to miss it, and that it will change the world. They're not obviously wrong!

summerlight1y ago

> Think about it, almost everyone I know rarely clicks on ads or buys from ads anymore.

This is a typical tech echo chamber. There is a significant number of people who make direct purchases through ads.

> But if they can get good video generations models, they can control this ad space too without having human in the loop.

the8thbit1y ago

> This is a typical tech echo chamber. There is a significant number of people who make direct purchases through ads.

PittleyDunkin1y ago

> Think about it, almost everyone I know rarely clicks on ads or buys from ads anymore.

I remember saying this to a google VP fifteen years ago. Somehow people are still clicking on ads today.

wruza1y ago

vinayuck1y ago

dragonwriter1y ago

> Think about it, almost everyone I know rarely clicks on ads or buys from ads anymore.

Most people have claimed not to be influenced by ads since long before networked computers were a major medium for delivering them.

chefandy1y ago

Nah. They're trying to eliminate the threat of content creators, artists, designers, animators, etc getting paid for their art and hard won skill instead of google.

j / k navigate · click thread line to collapse