undefined | Better HN

0 pointsjowday11mo ago0 comments

Why are you going all in on world models instead of basing everything on top of a 3D engine that could be manipulated / rendered with separate models? If a world model was truly managing to model a manifold of a 3D scene, it should be pretty easy to extract a mesh or SDF from it and drop that into an engine where you could then impose more concrete rules or sanity check the output of the model. Then you could actually model player movement inside of the 3D engine instead of trying to train the world model to accept any kind of player input you might want to do now or in the future.

Additionally, curious about what exactly the difference between the new mode of storytelling you’re describing and something like a crpg or visual novel is - is your hope that you can just bake absolutely everything into the world model instead of having to implement systems for dialogue/camera controls/rendering/everything else that’s difficult about working with a 3D engine?

0 comments

olivercameron11mo ago

Great questions!

> Why are you going all in on world models instead of basing everything on top of a 3D engine that could be manipulated / rendered with separate models?

I absolutely think there's going to be super cool startups that accelerate film and game dev as it is today, inside existing 3D engines. Those workflows could be made much faster with generative models.

That said, our belief is that model-imagined experiences are going to become a totally new form of storytelling, and that these experiences might not be free to be as weird and whacky as they could because of heuristics or limitations in existing 3D engines. This is our focus, and why the model is video-in and video-out.

Plus, you've got the very large challenge of learning a rich, high-quality 3D representation from a very small pool of 3D data. The volume of 3D data is just so small, compared to the volumes generative models really need to begin to shine.

> Additionally, curious about what exactly the difference between the new mode of storytelling you’re describing and something like a crpg or visual novel

To be clear, we don't yet know what shape these new experiences will take. I'm hoping we can avoid an awkward initial phase where these experiences resemble traditional game mechanics too much (although we have much to learn from them), and just fast-forward to enabling totally new experiences that just aren't feasible with existing technologies and budgets. Let's see!

> is your hope that you can just bake absolutely everything into the world model instead of having to implement systems for dialogue/camera controls/rendering/everything else that’s difficult about working with a 3D engine?

Yes, exactly. The model just learns better this way (instead of breaking it down into discrete components) and I think the end experience will be weirder and more wonderful for it.

jowdayOP11mo ago

> Plus, you've got the very large challenge of learning a rich, high-quality 3D representation from a very small pool of 3D data. The volume of 3D data is just so small, compared to the volumes generative models really need to begin to shine.

Isn’t the entire aim of world models (at least, in this particular case) to learn a very high quality 3D representation from 2D video data? My point is if that you manage to train a navigable world model for a particular location, that model has managed to fit a very high quality 3D representation of that location. There’s lots of research dealing with NERFs that demonstrate how you can extract these 3D scenes as meshes once a model has managed to fit it. (NERFs are another great example of learning a high quality 3D representation from sparse 2D data.)

>That said, our belief is that model-imagined experiences are going to become a totally new form of storytelling, and that these experiences might not be free to be as weird and whacky as they could because of heuristics or limitations in existing 3D engines. This is our focus, and why the model is video-in and video-out.

There’s a lot of focus in the material on your site about the models learning physics by training on real world video - wouldn’t that imply that you’re trying to converge on a physically accurate world model? I imagine that would make weirdness and wackiness rather difficult

> To be clear, we don't yet know what shape these new experiences will take. I'm hoping we can avoid an awkward initial phase where these experiences resemble traditional game mechanics too much (although we have much to learn from them), and just fast-forward to enabling totally new experiences that just aren't feasible with existing technologies and budgets. Let's see!

I see! Do you have any ideas about the kinds of experiences that you would want to see or experience personally? For me it’s hard to imagine anything that substantially deviates from navigating and interacting with a 3D engine, especially given it seems like you want your world models to converge to be physically realistic. Maybe you could prompt it to warp to another scene?

godelski11mo ago

  > wouldn’t that imply that you’re trying to converge on a physically accurate world model?

I'm not the CEO or associated with them at all, but yes, this is what most of these "world model" researchers are aiming for. As a researcher myself, I do not think this is the way to develop a world model and I'm fairly certain that this cannot be done through observations alone. I explain more in my response to the CEO[0]. This is a common issue is many ways that ML is experimenting, and you simply cannot rely on benchmarks to get you to AGI. Scaling of parameters and data only go so far. If you're seeing slowing advancements, it is likely due to over reliance on benchmarks and under reliance on what benchmarks intend to measure. But this is a much longer conversation (I think I made a long comment about it recently, I can dig up).

[0] https://news.ycombinator.com/item?id=44147777

j / k navigate · click thread line to collapse