Computer Graphics

Neural Rendering: NeRF

2020: NeRF is published. 2023: 3D Gaussian Splatting delivers real-time rendering. 2024: Apple Vision Pro ships this technology in a consumer product. In four years, research went from interesting idea to billion-pocket reality. Neural rendering is reshaping how films are shot, how buildings are designed, how virtual tours are run, and how AR/VR devices work.

  • **Google Maps Immersive View** builds 3D city views from aerial and street photos via NeRF-style reconstruction, so users can fly around buildings without visiting them
  • **Apple Vision Pro Spatial Photos** converts ordinary photos into a 3D immersive experience using on-device neural depth estimation and view synthesis
  • **Netflix Matrix Resurrections** used NeRF-based technology for bullet-time effects without installing 100+ physical cameras
  • **Matterport** digitizes real estate into 3D for sales: 11M+ scenes, NeRF-powered 3D tours so buyers can fly through an apartment before visiting

Neural Radiance Fields (NeRF)

**NeRF (Mildenhall et al., 2020)** represents a 3D scene as a neural network: an MLP takes a 3D coordinate (x,y,z) and a viewing direction (θ,φ) and returns density σ (whether an object is present) and color (r,g,b). To render a new view, each ray through a pixel is sampled along its depth, the MLP is queried at each point, and a volume rendering integral combines the results. 100 photographs produce photorealistic synthesis from any viewpoint.

Positional encoding is the key trick. An MLP fed raw (x,y,z) inputs cannot capture high-frequency detail (sharp edges, fine textures) because neural networks suffer from spectral bias. PE transforms coordinates into a basis of sinusoids: gamma(p) = (sin(2^0 pi p), cos(2^0 pi p), ..., sin(2^L pi p), cos(2^L pi p)). After PE, the network sees the high-frequency structure.

NeRF trains for 1 to 2 days on a single A100 GPU. Why so slow for such a small network (8-layer MLP)?

3D Gaussian Splatting

**3D Gaussian Splatting (3DGS, 2023)** is a leap forward in neural rendering speed. Instead of an MLP, the scene is represented as millions of 3D Gaussians, small colored clouds. Each Gaussian carries a position (3D), a covariance matrix (shape and orientation), an opacity, and spherical harmonics coefficients for view-dependent color. Rendering: project the Gaussians to the screen (2D splats), sort by depth, and alpha-blend. The result runs at 30 to 120 fps on an RTX 3090.

3DGS trains in 30 to 60 minutes versus 1 to 2 days for NeRF. Storage: a 3DGS file for a room is around 300 MB versus 5 MB for NeRF weights. The tradeoff: 3DGS storage is larger, but rendering is dramatically faster. A WebGL implementation of 3DGS (splat.xyz) lets users browse Gaussian splats directly in the browser without a GPU. Reality Capture, COLMAP, and 3DGS form a full reconstruction pipeline from photographs.

3DGS renders 100 to 1000 times faster than NeRF. Why is rendering faster when storing millions of 3D Gaussians instead of one MLP?

Novel View Synthesis

**Novel View Synthesis (NVS)** generates a photograph of a scene from an arbitrary camera position based on a set of input images. NeRF and 3DGS are the leading modern NVS methods. Applications include virtual tours, 3D reconstruction without LiDAR, and cinematic effects such as bullet time without hundreds of cameras.

Instant-NGP (NVIDIA, 2022) cut NeRF training from 1 to 2 days to about 5 minutes via multiresolution hash encoding. Instead of positional encoding, it uses a lookup table in a hash grid across multiple resolutions. Parameters: GPU-side hash tables updated by backprop. A student can build a NeRF from a video over a coffee break. Implementation: github.com/NVlabs/instant-ngp.

A NeRF trained on 100 photos of a room renders crisply at training viewpoints but blurry at novel viewpoints between them. What does this issue indicate?

Real-Time Neural Rendering

**The gap between quality and speed** in neural rendering keeps narrowing. 3DGS delivered real-time rendering (30 to 120 fps) but remains limited to static scenes with heavy storage. Active research directions: Dynamic NeRF (deformable scenes), Compact 3DGS (compression via vector quantization), Relightable NeRF (lighting changes), and NeRF for AR/VR.

Apple Vision Pro (2024) uses NeRF-style rendering for Spatial Photos: a stereo pair is transformed into an immersive 3D experience through neural depth estimation and view synthesis. Google Immersive View in Maps builds 3D views from a combination of aerial imagery and NeRF, letting users fly around buildings from any angle.

NeRF and 3DGS create true 3D geometry of objects, like a 3D scanner

NeRF and 3DGS produce representations for view synthesis. They are excellent for rendering from any angle but do not give explicit meshes or precise geometry for CAD or manufacturing

A NeRF density field and 3DGS Gaussians are appearance models, not geometry models. Mesh extraction from a NeRF (via marching cubes) gives a rough result. For precise geometry, use LiDAR or structured-light 3D scanning. For realistic rendering, use NeRF or 3DGS

3DGS is ideal for static scenes. Dynamic video (an actor moving) needs a different approach. Why does 3DGS scale poorly to video?

Key ideas

  • **NeRF:** MLP (x,y,z,theta,phi) -> (sigma, rgb); volume rendering via ray marching; positional encoding captures high-frequency detail; 1 to 2 days of training
  • **3D Gaussian Splatting:** millions of 3D Gaussians; tile-based rasterization; 30 to 120 fps; 1 hour of training; 300 MB storage
  • **Novel View Synthesis:** COLMAP for camera poses; photorealistic synthesis from any viewpoint; evaluated via PSNR/SSIM/LPIPS
  • **Real-Time:** 3DGS in production for static scenes; dynamic scenes are still an open problem; Apple and Google have integrated this into shipping products

Related topics

Neural rendering overlaps with several areas:

  • Differentiable Rendering — NeRF is an example of a differentiable renderer: gradient descent optimizes MLP weights to minimize photometric loss
  • Self-Supervised Learning — NeRF is self-supervised. It trains on images alone without 3D ground truth. DINO features power Semantic NeRF for segmentation
  • Graphics Engine Architecture — Integrating 3DGS into a game engine requires a dedicated rendering pass and memory management for millions of Gaussians

Вопросы для размышления

  • NeRF has no explicit knowledge of 3D structure. The MLP learns it implicitly from photographs. How does this differ from epipolar geometry and traditional stereo reconstruction? When is NeRF better and when is classical MVS (Multi-View Stereo) the right tool?
  • 3DGS stores a scene as millions of Gaussians, neither mesh nor voxels nor points. How can compression (VQ, entropy coding) shrink 300 MB to 30 MB without major quality loss, and what tradeoff does this create?
  • Editing NeRF scenes is an open problem. Removing a chair to see what is behind it requires the NeRF to guess the background. How could inpainting through diffusion models be combined with NeRF for semantic scene editing?

Связанные уроки

  • la-06-transformations
Neural Rendering: NeRF

0

1

Sign In