HybridNeRF: Efficient Neural Rendering (opens in new tab)

(haithemturki.com)

165 pointstzmlab1y ago49 comments

49 comments

One of the paper authors here - happy to answer any questions about the work or chat about neural rendering in general!

Thanks for your work!

From my experience, NERF works great, but depends on highly accurate camera location information. Unless the VR device has this baked in, one must run a Colmap-style or SFM-style process to generate those camera extrinsics. Is there anything special HybridNeRF does around this?

modeless1y ago

The method in this paper relies on precomputed camera poses as input, but there have been tons of papers published on the topic of eliminating this requirement. Here are a few: https://dust3r.europe.naverlabs.com/ https://arxiv.org/abs/2102.07064 https://arxiv.org/abs/2312.08760v1 https://x.com/_akhaliq/status/1734803566802407901

1 more reply

refibrillator1y ago

Congrats on the paper! Any chance the code will be released?

Also I’d be curious to hear, what are you excited about in terms of future research ideas?

Personally I’m excited by the trend of eliminating the need for traditional SfM preprocessing (sparse point clouds via colmap, camera pose estimation, etc).

turkihaithem1y ago

Thank you! The code is unlikely to be released (it's built upon Meta-internal codebases that I no longer have access to post-internship), at least not in the form that we specifically used at submission time. The last time I caught up with the team someone was expressing interest in releasing some broadly useful rendering code, but I really can't speak on their behalf so no guarantees.

IMHO it's a really exciting time to be in the neural rendering / 3D vision space - the field is moving quickly and there's interesting work across all dimensions. My personal interests lean towards large-scale 3D reconstruction, and to that effect eliminating the need for traditional SfM/COLMAP preprocessing would be great. There's a lot of relevant recent work (https://dust3r.europe.naverlabs.com/, https://cameronosmith.github.io/flowmap/, https://vggsfm.github.io/, etc), but scaling these methods beyond several dozen images remains a challenge. I’m also really excited about using learned priors that can improve NeRF quality in underobserved regions (https://reconfusion.github.io). IMO using these priors will be super important to enabling dynamic 4D reconstruction (since it’s otherwise unfeasible to directly observe every space-time point in a scene). Finally, making NeRF environments more interactive (as other posts have described) would unlock many use cases especially in simulation (ie: for autonomous driving). This is kind of tricky for implicit representations (like the original NeRF and this work), but there have been some really cool papers in the 3D Gaussian space (https://xpandora.github.io/PhysGaussian/) that are exciting.

1 more reply

choppaface1y ago

Re: the 4FPS example, if one renders a VR180 view as an equirectangular image and lets the headset handle the 6DOF head movement, then 4FPS is plenty. Especially if there’s one render per-eye. 110% if there are no objects within a meter of the ego camera.

So your motivating problem does not exist.

More FPS is better, and yes we all do want to find a hybrid of NeRF and splats that works well, but then you should emphasize your theoretical and experimental contributions. Flatly claiming 4FPS doesn’t work is specious to most readers. Even Deva knows this is being too aggressive for a paper like this.

dr_dshiv1y ago

Wow, it looks beautiful.

Can regular phones capture the data required? How to get into this, as a hobbyist? I’m interested in the possibilities of scanning coral reefs and other ecological settings.

turkihaithem1y ago

One of the datasets we evaluated against in our paper uses a bespoke capture rig (https://github.com/facebookresearch/EyefulTower?tab=readme-o...) but you can definitely train very respectable NeRFs using a phone camera. In my experience it's less about camera resolution and more about getting a good capture - many NeRF methods assume that the scene is static, so minimizing things like lighting changes and transient shadows can make a big difference. If you're interested in getting your feet wet, I highly recommend Nerfstudio (https://docs.nerf.studio)!

ttul1y ago

Does anyone else look forward to a game that lets you transform your house or neighbor into a playable level with destructible objects? How far are we from recognizing the “car” and making it drivable, or the “tree” and making it choppable?

totalview1y ago

I work in the rendering and gaming industry and also run a 3D scanning company. I have similarly wished for this capability, especially the destructability part. What you speak of is still pretty far off for several reasons:

-No Collision/poor collision on NERFs and GS: to have a proper interactive world, you usually need accurate character collision so that your character or vehicle can move along the floor/ground (as opposed to falling thru it) run into walls, go through door frames, etc. NERFs suffer from the same issues as photogrammetry in that they need “structure from motion” (COLMAP or similar) to give them a mesh or 3-D output that can be meshed for collision to register off of. The mesh from reality capture is noisy, and is not simple geometry. Think millions of triangles from a laser scanner or camera for “flat” ground that a video game would use 100 triangles for.

-Scanning: there’s no scanner available that provides both good 3-D information and good photo realistic textures at a price people will want to pay. Scanning every square inch of playable space in even a modest sized house is a pain, and people will look behind the television, underneath the furniture and everywhere else that most of these scanning videos and demos never go. There are a lot of ugly angles that these videos omit where a player would go.

-Post Processing: of you scan your house or any other real space, you will have poor lighting unless you took the time to do your own custom lighting and color setup. That will all need to be corrected in post process so that you can dynamically light your environment. Lighting is one of the most next generation things that people associate with games and you will be fighting prebaked shadows throughout the entire house or area that you have scanned. You don’t get away from this with NERFs or gaussian splats, because those scenes also have prebaked lighting in them that is static.

Object Destruction and Physics: I Love the game teardown, and if you want to see what it’s like to actually bust up and destroy structures that have been physically scanned, there is a plug-in to import reality capture models directly into the game with a little bit of modding. That said, teardown is voxel based, and is one of the most advanced engines that has been built to do such a thing. I have seen nothing else capable of doing cool looking destruction of any object, scanned or 3D modeled, without a large studio effort and a ton of optimization.

modeless1y ago

I think collision detection is solvable. And the scanning process should be no harder than 3D modeling to the same quality level. Probably much easier, honestly. Modeling is labor intensive. I'm not sure why you say "there’s no scanner available that provides both good 3-D information and good photo realistic textures" because these new techniques don't use "scanners", all you need is regular cameras. The 3D information is inferred.

Lighting is the big issue, IMO. As soon as you want any kind of interactivity besides moving the camera you need dynamic lighting. The problem is you're going to have to mix the captured absolutely perfect real-world lighting with extremely approximate real-time computed lighting (which will be much worse than offline-rendered path tracing, which still wouldn't match real-world quality). It's going to look awful. At least, until someone figures out a revolutionary neural relighting system. We are pretty far from that today.

Scale is another issue. Two issues, really, rendering and storage. There's already a lot of research into scaling up rendering to large and detailed scenes, but I wouldn't say it's solved yet. And once you have rendering, storage will be the next issue. These scans will be massive and we'll need some very effective compression to be able to distribute large scenes to users.

3 more replies

knicholes1y ago

Maybe a quick, cheap NeRF with some object recognition, 3D object generation and replacement, so at least you have a sink where there is a sink and a couch where you have a couch, even though it might look differently.

smusamashah1y ago

Is there a teardown mod that uses reality captured models? Or is there any video even? I have played the game once, destruction was awesome. I want to see how it looks like the way you said.

andybak1y ago

I think you're generalising from exacting use cases to ones that might be much more tolerant of imperfection.

naet1y ago

My parents had a floor plan of our house drawn up for some reason, and when I was in late middle school I found it and modeled the house in the hammer editor so my friends and I could play Counter Strike source in there.

It wasn't very well done but I figured out how to make the basic walls and building, add stairs, add some windows, grab some pre existing props like simple couches beds and a TV, and it was pretty recognizable. After adding a couple ladders to the outside so you could climb in the windows or on the roof the map was super fun just as a map, and doubly so since I could do things like hide in my own bedroom closet and recognize the rooms.

Took some work since I didn't know how to do anything but totally worth it. I feel like there has to be a much more accessible level editor in some game out there today, not sure what it would be though.

I thought my school had great architecture for another map but someone rightfully convinced me that would be a very bad idea to add to a shooting game. So I never made any others besides the house.

orbital-decay1y ago

An interactive game is much more than just rendering. You need object separation, animation, collision, worldspace, logic, and your requirement of destructibility takes it to a completely different level.

NeRF is not that, it's just a way to represent and render volumetric objects. It's like 10% of what makes a game. Eventually, in theory, it might be possible to make NeRFs or another similar representation animated, interactive, or even entirely drivable by an end-to-end model. But the current state is so far from it that it isn't worth speculating about.

What you want is doable with classic tools already.

w-m1y ago

Recognizing what is a car in a 3D NeRF/Gaussian Splatting scene can be done. Also research from CVPR: https://www.garfield.studio/

TeMPOraL1y ago

I dreamed of that since being a kid, so for nearly three decades now. It's been entirely possible even then - it was just a matter of using enough elbow grease. The problem is, the world is full of shiny happy people ready to call you a terrorist, assert their architectural copyright, or bring in the "creepiness factor", to shut down anyone who tries this.

jsheard1y ago

There's also just the fact that 1:1 reproductions of real-world places rarely make for good video game environments. Gameplay has to inform the layout and set dressing, and how you perceive space in games requires liberties to be taken to keep interiors from feeling weirdly cramped (any kid who had the idea to measure their house and build it in Quake or CS found this out the hard way).

The main exception I can think of is in racing simulators, it's already common for the developers of those to drive LiDAR cars around real-world tracks and use that data to build a 1:1 replica for their game. NeRF might be a natural extension of that if they can figure out a way to combine it with dynamic lighting and weather conditions.

smokel1y ago

Having destructible objects is in no way possible on contemporary hardware, unless you simplify the physics to the extreme. Perhaps I'm misunderstanding your statement?

Recognising objects for what they are has only recently become somewhat possible. Separating them in a 3D scan is still pretty much impossible.

2 more replies

antihero1y ago

I mean depending on your risk aversion we’re not that far, nor have ever been.

555551y ago

What's the state of the art right now that can be run on my laptop from a set of photos? I want to play with NERFs, starting by generating one from a bunch of photos of my apartment, so I can then fly around the space virtually.

sorenjan1y ago

Probably Nerf studio

https://docs.nerf.studio/

lxe1y ago

Absolute noob question that I'm having a hard time understading:

In practice, why NeRF instead of Gaussian Splatting? I have very limited exposure to either, but a very cursory search on the subject yields a "it depends on the context" answer. What exact context?

GistNoesis1y ago

There are two aspects in the difference between NeRF and Gaussian Splatting :

- The first aspect concern how they solve the light rendering equation :

NeRF has more potential for rendering physical quality but is slower.

NeRF use raycasting. Gaussian Splatting project and draw gaussians directly in screen space.

Each have various rendering artefacts. One distinction is in handling light reflections. When you use raycasting, you can bounce your ray on mirror surfaces. Where as gaussian splatting, like alice in wonderland creates a symmetric world on the other side of the mirror (and when the mirror surface is curved, it's hopeless).

Although many NeRF don't implement reflections as a simplification, they can handle them almost natively.

Alternatively, NeRF is a volumetric representation, whereas Gaussian Splatting has surfaces baked in : Gaussian Splats are rendered in order front to back. This mean that when you have two thin objects one behind the other, like the two sides of a book, Gaussian splatting will be able to render the front and hide the back whereas NeRF will merge front and back because volumetric element are transparent. (Though in NeRF with spherical harmonics the Radiance Field direction will allow to cull back from front based on the viewing angle).

- The second aspect of NeRF vs Gaussian Splatting, is the choice of representation :

NeRF usually use a neural network to store the scene in a compressed form. Whereas Gaussian Splatting is more explicit and uncompressed, the scene is represented in a sort of "point cloud" fashion. This mean that if your scene has potential for compression, like repetitive textures or objects, then the NeRF will make use of it and hallucinate what's missing. Whereas gaussian splat will show holes.

Of course like this article is about, you can hybridize them.

xrd1y ago

This is a beautiful explanation, thanks so much!

choppaface1y ago

NeRF does glass, fog, reflections, and some furs better than gsplat. Gsplat does normal surfaces as well or better than NeRF, and gsplat also provides explicit geometry (a point cloud). NeRF models the entire image rendering function while gsplats only model typical surfaces.

This work (and others e.g. https://creiser.github.io/binary_opacity_grid/ ) attempt to blend the raytracing aspect of NeRF with the explicit surface aspect of gsplats.

One key non-research problem is that gsplats can render on mobile devices / headsets using vanilla WebGL APIs. But approaches like this paper require CUDA (and also apparently a top-shelf desktop GPU). If Apple and others (mostly Apple has been sandbagging tho) provided better support for WebGPU or an alternative API then NeRF research would be dramatically more impactful versus gsplats. The popularity of gsplats is largely due to its accessibility.

zlenyk1y ago

It's just a completely different paradigm of rendering and it's not clear which one will be dominant in the future. Gaussian splats are usually dependent on initialisation from point cloud, which makes whole process much more compliacated.

ofou1y ago

I'd spent hours navigating Street Views with this

tmilard1y ago

Everyone I believe... - Light data for Rendering and - fast 3D Reconstruction ======> Big winner.

So many laboratories and software dev have given a shot at this. None have yet won.

Success often lies in small (but important ) details...

j / k navigate · click thread line to collapse

49 comments

turkihaithem1y ago

One of the paper authors here - happy to answer any questions about the work or chat about neural rendering in general!

aerodog1y ago

Thanks for your work!

modeless1y ago

1 more reply

refibrillator1y ago

Congrats on the paper! Any chance the code will be released?

Also I’d be curious to hear, what are you excited about in terms of future research ideas?

Personally I’m excited by the trend of eliminating the need for traditional SfM preprocessing (sparse point clouds via colmap, camera pose estimation, etc).

turkihaithem1y ago

1 more reply

choppaface1y ago

So your motivating problem does not exist.

dr_dshiv1y ago

Wow, it looks beautiful.

Can regular phones capture the data required? How to get into this, as a hobbyist? I’m interested in the possibilities of scanning coral reefs and other ecological settings.

turkihaithem1y ago

ttul1y ago

totalview1y ago

modeless1y ago

3 more replies

knicholes1y ago

smusamashah1y ago

Is there a teardown mod that uses reality captured models? Or is there any video even? I have played the game once, destruction was awesome. I want to see how it looks like the way you said.

andybak1y ago

I think you're generalising from exacting use cases to ones that might be much more tolerant of imperfection.

naet1y ago

I thought my school had great architecture for another map but someone rightfully convinced me that would be a very bad idea to add to a shooting game. So I never made any others besides the house.

orbital-decay1y ago

What you want is doable with classic tools already.

w-m1y ago

Recognizing what is a car in a 3D NeRF/Gaussian Splatting scene can be done. Also research from CVPR: https://www.garfield.studio/

TeMPOraL1y ago

jsheard1y ago

smokel1y ago

Having destructible objects is in no way possible on contemporary hardware, unless you simplify the physics to the extreme. Perhaps I'm misunderstanding your statement?

Recognising objects for what they are has only recently become somewhat possible. Separating them in a 3D scan is still pretty much impossible.

2 more replies

antihero1y ago

I mean depending on your risk aversion we’re not that far, nor have ever been.

555551y ago

sorenjan1y ago

Probably Nerf studio

https://docs.nerf.studio/

lxe1y ago

Absolute noob question that I'm having a hard time understading:

In practice, why NeRF instead of Gaussian Splatting? I have very limited exposure to either, but a very cursory search on the subject yields a "it depends on the context" answer. What exact context?

GistNoesis1y ago

There are two aspects in the difference between NeRF and Gaussian Splatting :

- The first aspect concern how they solve the light rendering equation :

NeRF has more potential for rendering physical quality but is slower.

NeRF use raycasting. Gaussian Splatting project and draw gaussians directly in screen space.

Although many NeRF don't implement reflections as a simplification, they can handle them almost natively.

- The second aspect of NeRF vs Gaussian Splatting, is the choice of representation :

Of course like this article is about, you can hybridize them.

xrd1y ago

This is a beautiful explanation, thanks so much!

choppaface1y ago

This work (and others e.g. https://creiser.github.io/binary_opacity_grid/ ) attempt to blend the raytracing aspect of NeRF with the explicit surface aspect of gsplats.

zlenyk1y ago

ofou1y ago

I'd spent hours navigating Street Views with this

tmilard1y ago

Everyone I believe... - Light data for Rendering and - fast 3D Reconstruction ======> Big winner.

So many laboratories and software dev have given a shot at this. None have yet won.

Success often lies in small (but important ) details...

j / k navigate · click thread line to collapse