Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
PlenOctrees For Real-time Rendering of Neural Radiance Fields (NeRFs) (alexyu.net)
175 points by rsp1984 on March 26, 2021 | hide | past | favorite | 31 comments


There's another real-time NeRF viewer that just came out here: nerf.live. For comparison between the two. Chair:

http://alexyu.net/plenoctrees/demo/?load=https://storage.goo...

https://phog.github.io/snerg/viewer/index.html?dir=https://s...


I much prefer SNeRG's subtitle, Baking Neural Radiance Fields for Real-Time View Synthesis. These techniques are cool but it's very misleading to say they are ‘Real-time Rendering of Neural Radiance Fields’. No, they're real-time renders of data structures sampled from a NeRF.

This matters largely because there is more to NeRFs than just the first paper; eg. there are Mip-NeRF (https://jonbarron.info/mipnerf/) and D-NeRF (https://nerfies.github.io/). A quantized bunch of samples loses the power of neural implicits.


on your second link I get Error: Unsupported renderer: ANGLE (Intel, Mesa Intel(R) UHD Graphics (CML GT2), OpenGL 4.6 core). Are you running with hardware acceleration enabled?


Worked for me on Firefox 84.0 without any special config and just mainboard graphics.


Oooh exciting! NERF is a very interesting technique but last I heard rendering was taking 20 seconds per frame. This looks great. It says it can do 150 fps by translating the NERF to a more renderable format, which makes sense.

See an overview of NERF here:

https://youtu.be/LRAqeM8EjOo


Try 3 days per frame if you want anything in the frame to move/change...


Dang!


Can someone explain what NeRF is and what’s going on here?

Do they produce geometry/polygon data as an output of the neural network?


Conventional methods of rendering 3D objects and spaces rely on specifying geometry and material properties in some format. You then simulate a viewpoint using that info and physics simulations.

A NeRF takes over both the role of the file format and part of the rendering in the form of a neural network. You feed in a world coordinate and a viewpoint and you get an RGB tuple and density out of it. If you interrogate the NeRF enough you can render any traditional 2D or 3D image out of it by combining all the datapoints.

One theoretical benefit is that that a NeRF is a continuous function, so the resolution is only limited by the capacity of the neural network. Another cool thing is that a NeRF is trained on pictures (with info about where they were taken from), so if you train a NeRF successfully in high-res it’s like scanning an object. A major practical challenge is that it is (was?) pretty frickin’ slow to work with. I wrote a more elaborate comment about it on the previous NeRF improvement post [1]. There I closed with:

> It would be amazing to have NeRF-based graphics engines that can make up spaces out of layers of NeRFs, all probed in real-time.

Here they’ve taken a major step in that direction by speeding up the rendering 3000X.

[1]: https://news.ycombinator.com/item?id=25300283


This technique isn't actually speeding up the NeRF rendering algorithm.

It bakes the NeRF back to a semi-discrete representation (Octree of Spherical Harmonics voxels) which can render near-identical results at interactive speeds.

The baked data is much larger than the original NeRF model (2gb vs 5mb), but they can be downsampled to 30-100mb with little loss in quality.


So if I understand right, for the real-time version rather than querying the NeRF to compute the frame pixels on the fly, they instead use the NeRF to pre-generate 3D Voxel data representing the scene which can then be rendered in real time using more traditional voxel rendering?


Yes and No.

This preserves the exact lighting equation that the NeRF learned, while traditional voxel rendering is limited to traditional lighting equations.

You would have a hard time voxelizing a NeRF, because you can't extract a traditional lighting equation out of it.


I think this is sensitive to Hinton's work on capsules, which I believe is a more reprojectible primitive. Maybe you can coax a voxel


The point about spherical harmonics hits home. You could sample the different harmonics with a probabilistic scattering to construct a probability distribution for a signed distance function render, and use half the very solid render pipeline


Imagine you have pictures of some real-life object from many angles, and want to turn it into a virtual 3d object (for a web shop, or for use in a computer game, or whatever). Traditional methods try to reconstruct the geometry, but have a couple of problems in more complex situations. NeRFs are a quite new approach that tries to accomplish the same task by training a neural net that essentially allows you to query the color of each point in space (also accounting for the direction you are viewing from). This works great (and works from as few as two source images), but it's slow, and rendering the result was even slower.

This paper introduces a way to render NeRFs at reasonable speed. Still not stellar, but quick enough to make NeRFs useful for many use cases.



I wonder how long it takes to build a model. Is it comparable to typical neural network training times, e.g. hours to days? And how much memory does it typically require?


The linked video shows some information on that https://youtu.be/obrmH1T5mfI?t=288

And in the paper: https://arxiv.org/pdf/2103.14024.pdf

> As training time poses another hurdle for adopting NeRFs in practice (taking 1-2 days to fully converge), we also showed that our PlenOctrees can accelerate effective training time for our NeRF-SH.


if they were to use deepspeed/zero 3 can we expect training time to be significantly reduced ?


the first step of NeRF is camera registration. people seem to use colmap (for real data, not synthesized data), but I always get very bad results. I don't know why.

I have tried meshroom too, the result is equally bad.


The "Nerf--" paper [1] attempt to tackle that with a directly comparable method. You might also be interested in this [2] slightly different approach that focus on a more explicit decomposition.

[1] https://nerfmm.active.vision/

[2] https://lioryariv.github.io/idr/


Try https://github.com/AIBluefisher/DAGSfM , it's graph-based approach is much more robust to common repetitive and only-partially-similar image content.


Is your camera at equal distance to the object in all photographs? NeRF has troubles with variation in the distance between the object and camera (which Mip-NeRF [0] solves).

[0]: https://youtu.be/EpH175PY1A0


I'm seeing NERFs everywhere at the moment. Improvements seem to be accelerating.


I wonder if the days of "digital" content is numbered (from a purely "speculative fiction" point of view)

As someone who consumes plenty of science fiction, I could easily imagine a sci-fi story that breaks information down into some epochs...

epoch 1, "analog": the incoming signal is the raw representation of the data, like grooves on a vinyl record or AM/FM radio

epoch 2, "digital": data is encoded in binary format, and possibly compressed

epoch 3 "???": you don't store the data itself, you store a neural network backed function that can reproduce the data in many different ways based on inputs. I.e. you don't store a video of a chair, you store a neural-net that can approximate the chair from any angle with any light source, PLUS you also store a "camera track" that can be used as a pre-defined input to get a preset "video". But at any time, the user can "unlink" the track and operate the "camera" as they choose.


3 is part of what we really mean by AI. We know humans do this...


This could have direct use in https://www.thingiverse.com/ or even amazon and clothes e-commerces


What is the proposed application of NeRFs? It seems implausible to me that they will offer better practical performance than other rendering techniques for something like video games.


Radiance fields are basically photographs that you can move around in (within a limited box). They are especially effective in VR. But, they are huge files. NeRF is effectively AI magic compression for radiance fields.


Automatic, visually accurate 3D visualization of real-world objects from a few images.


How much different pictures/points of view are needed to realize a "high quality" NeRF ?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: