Computer Graphics
Deferred Rendering
In 2007 Crysis showed 300 dynamic lights per scene - and demanded GeForce 8800 SLI for 30 fps. In 2020 Doom Eternal delivers 2,000 lights at 60 fps on a mid-range RTX 2060. Over 13 years a quiet revolution happened in rendering pipeline architecture: G-Buffer, tiled, and clustered shading changed how modern GPUs even compute lighting.
- **Doom Eternal** - clustered forward+ with 2,000+ dynamic lights per scene at 60 fps on RTX 2060
- **Unreal Engine 5 Lumen** - hybrid: deferred for opaque + forward for transparent + ray tracing for select reflections
- **Detroit: Become Human** - clustered shading for indoor scenes with dozens of light sources
G-Buffer: deferring geometry
Crysis (2007) had scenes with 300 dynamic lights and 5,000 objects. Classic forward rendering runs every object through every light: 5000 * 300 = 1.5M shader invocations per frame. On a GeForce 8800 that meant 5 fps. **Deferred rendering** flipped the pipeline: first record geometric properties into a **G-Buffer**, then compute lighting per pixel, independently of object count.
**G-Buffer (Geometry Buffer)** is a set of full-screen render targets where the geometry pass writes world position, normal, albedo, metallic/roughness, and emissive. After the geometry pass the geometry is no longer needed - lighting calculations run against the G-Buffer as a large input image. A typical 1920x1080 G-Buffer: 4-5 RTs * 4 bytes/channel * 4 channels = ~150 MB of VRAM.
**Packing tricks**: a 150 MB G-Buffer is expensive on bandwidth. Modern engines (Unreal, Frostbite) pack data: normals via octahedron encoding into 2 channels, world position reconstructed from depth + inverse projection, metallic/roughness packed into one RG.
Why does the G-Buffer remove the 'every object for every light' problem?
Deferred vs Forward
Deferred solves the many-lights problem but has trade-offs. **Transparency** is a fundamental issue: a G-Buffer stores a single fragment per pixel, while glass reveals geometry behind it. The solution: a hybrid pipeline - opaque deferred plus transparent forward in a separate pass. **MSAA** in deferred requires a huge G-Buffer (4x sample count), so it is often replaced by TAA/FXAA.
**Forward+** (forward with light culling) is the modern alternative: classic forward, but with pre-culling of lights into tiles/clusters. In a scene with 100 lights, each object's shader iterates over only 8-12 nearby lights, not all 100. Complexity drops from O(N*L) to O(N*L_local), with no G-Buffer overhead and no transparency issues.
**Material variety**: deferred limits shader diversity - every material must fit a fixed G-Buffer (BRDF parameters). Forward allows fully custom per-object shaders.
A team builds a racing simulator: 50 cars (uniform PBR), 200+ lamp lights, reflections. Which pipeline?
Tiled Shading
The lighting pass in deferred is still expensive: with 1,000 lights every pixel iterates over all 1,000. Most lights are local (point, spotlight) and touch only a small screen region. **Tiled shading** divides the screen into 16x16 or 32x32-pixel tiles; each tile pre-computes a list of lights intersecting its region. Then the lighting pass iterates over only the local list.
Algorithm: (1) A compute shader walks tiles in parallel. (2) Each tile builds a frustum (sub-frustum of the screen). (3) Every light AABB is checked against the frustum. (4) Intersecting light indices are written into an SSBO. The lighting shader for a pixel picks up the tile's light list and iterates only over those.
**Tile size trade-off**: small tiles (8x8) mean fewer lights per tile but more culling overhead. Large tiles (64x64) mean fast culling but more false-positive lights per pixel. Typical balance is 16x16.
What is the key advantage of tiled shading over plain deferred lighting?
Clustered Shading
Tiled shading works in 2D - each tile is a column from near to far. If a tile contains both a close object and a far object, a leafy street and a distant building share the same light list. **Clustered shading** adds a third dimension: it splits the frustum into a 3D grid of clusters (e.g., 16x16x32), where each cluster is part of a tile at a given depth. A 'torch in a corridor' light hits only nearby clusters; a far skybox light hits only the distant ones.
Clustered shading powers Doom 2016, Doom Eternal, and Detroit: Become Human - and is considered state of the art for real-time with many lights. Depth slicing is usually logarithmic (Z in clip space is non-uniform). Light lists are stored as a sparse 3D grid plus index tables.
**Clustered + Forward+ + ray tracing** is the modern hybrid architecture (Doom Eternal, Unreal 5 Lumen): clustered for most lights, ray tracing for a select few high-impact ones (sun, key plot sources).
Deferred rendering is always faster than forward; modern engines use only deferred.
Deferred wins with many lights and uniform materials; forward+ with clustered culling wins with diverse materials and transparency; real engines use a hybrid.
Each technique is optimal in its own profile. Doom Eternal on clustered forward+ outperforms deferred engines on the same scene by avoiding G-Buffer overhead.
Why does clustered shading need a third depth dimension?
Key Ideas
- **G-Buffer** splits geometry pass from lighting pass, turning O(N * L) into O(N + pixels * L) - the many-lights solution
- **Deferred vs Forward+** is not a winner-takes-all but a trade-off: deferred suits PBR with many lights, forward+ suits diverse materials and transparency
- **Tiled shading** splits the screen into 16x16 tiles with local light lists - 100x savings on the lighting pass
- **Clustered shading** adds 3D depth: close and distant pixels in one tile receive different light lists
- Modern engines are hybrid: clustered + forward+ for the main scene, deferred for opaque PBR, ray tracing for select effects
Related Topics
Back to Doom Eternal: clustered shading is computational geometry in action (a 3D grid culling against light spheres) solving real-time lighting. This connects to:
- PBR (Physically Based Rendering) — The G-Buffer stores PBR parameters (metallic, roughness, normal); deferred pipelines map perfectly to PBR materials
- Computational Geometry: interviews — Light culling - frustum vs sphere - is AABB intersection from computational geometry; clustered shading uses the same primitives
Вопросы для размышления
- Real-time ray tracing became viable with RTX 20xx (2018). Will it replace deferred/clustered shading in 5-10 years, or remain a hybrid component for a long while?
- Mobile GPUs (Adreno, Mali) have tile-based architecture at the hardware level. Does this make tiled shading less valuable on mobile - or, conversely, more natural?
- Modern engines spend a sizable budget on the G-Buffer (150+ MB VRAM, bandwidth). Where is the reasonable line: when does the G-Buffer cost more than the problem it solves?
Связанные уроки
- cg-08 — G-buffer is the output of the geometry stage in the Deferred Rendering pipeline
- cg-13 — RTX and Deferred are two strategies for complex scenes with opposite trade-offs
- cg-15 — Post-effects sample from the Deferred G-buffer
- arch-15-gpu-architecture — Bandwidth and VRAM are Deferred's bottleneck - GPU memory architecture matters
- arch-09-cache