Computer Graphics

GPU Graphics Pipeline

Every second, the GPU of a modern graphics card processes 10-50 billion pixels. In the 16 milliseconds of a single 60fps frame, the pipeline transforms millions of vertices, rasterizes millions of triangles, and launches billions of fragment shaders - in parallel, on thousands of cores. Understanding this pipeline means understanding why realistic graphics are possible in real time.

**Game engines (Unreal, Unity):** all rendering is a sequence of draw calls, each pushing geometry through the described pipeline; G-buffer deferred rendering enables thousands of light sources.
**WebGL and WebGPU:** browser games and 3D visualizations use the same pipeline via JavaScript; Three.js abstracts the shaders, but under the hood the stages are identical.
**Machine learning on GPU (CUDA):** GPUs were originally designed for the graphics pipeline; the parallelism of rasterization maps directly to matrix operations in neural networks.

Vertex Shader and MVP Transform

A 3D scene is stored in object coordinates: the vertices of a cube are defined relative to the cube's center. To render the scene on screen, three matrix transforms are applied in sequence: Model (object to world), View (world to camera), Projection (camera to clip space). The product MVP = Projection × View × Model is computed once per vertex in the vertex shader.

After the vertex shader comes Primitive Assembly: the GPU assembles vertices into triangles using the index buffer (EBO). Then an optional Geometry Shader can generate new primitives. Then Clipping: triangles extending outside the frustum are clipped (Sutherland-Hodgman in clip coordinate space). After dividing by w, Normalized Device Coordinates (NDC) in the cube [-1,1]³ are obtained.

**Normal matrix:** directly transforming normals by the model matrix gives incorrect results under non-uniform scaling. The correct matrix for normals: transpose(inverse(mat3(model))). This is a classic mistake among shader beginners.

The vertex shader applies the MVP matrix and outputs gl_Position in clip space. What transform does the GPU perform automatically after the shader, before rasterization?

Rasterization and Barycentric Coordinates

Rasterization is the transformation of a geometric triangle into a set of pixels (fragments). The GPU tests each screen pixel: does its center fall inside the triangle? The test uses three half-plane tests (edge functions). Modern GPUs perform this test for 2×2 pixel blocks in parallel (quad parallelism).

Perspective-correct interpolation is critical for textures: simple linear interpolation in screen space distorts texture coordinates at sharp perspective angles (a checkerboard floor looks 'compressed'). The GPU performs perspective-correct interpolation automatically.

**Quad parallelism and performance:** the GPU tests fragments in 2×2 blocks. If a triangle covers only one pixel in a block - all 4 shader invocations are launched anyway (to compute derivatives dFdx/dFdy for mip-mapping). Very small triangles are inefficient: overshading can reach 75%.

During rasterization, attributes (UV, normals) are interpolated barycentrically. Why does simple linear interpolation in screen space give incorrect results for textures?

Fragment Shader and PBR Lighting

The fragment shader computes the color of each pixel. It receives interpolated attributes (normal, UV, position) and computes lighting. Physically Based Rendering (PBR) - the standard in modern games and engines (Unreal, Unity HDRP) - models the interaction of light with surfaces in a physically correct way.

PBR is based on the rendering equation (Kajiya 1986): Lo(p, ωo) = Le + ∫ fr(p, ωi, ωo) · Li(p, ωi) · (n·ωi) dωi. The BRDF fr describes how the surface scatters light. The Cook-Torrance specular BRDF: D (microfacet distribution, GGX) × F (Fresnel, Schlick approximation) × G (geometry attenuation, Smith) / (4 · NdotV · NdotL).

**Texturing pipeline:** before sampling a texture, the GPU computes LOD (level of detail) via derivatives dFdx(uv), dFdy(uv) - hence the quad parallelism from the previous concept. Mip-mapping selects the appropriate detail level and interpolates between levels (trilinear filtering) to eliminate aliasing.

In a PBR material, roughness = 0 and metallic = 1 (polished metal). How will the surface appear?

Output Merger: z-buffer, Stencil, Alpha Blending

The Output Merger is the final stage of the GPU pipeline: it determines which fragments make it into the framebuffer and how. Three tests are performed in order: Scissor Test (rectangular region), Stencil Test (mask from stencil buffer), Depth Test (z-buffer). Fragments that pass all tests participate in Alpha Blending.

Transparent objects require back-to-front sorting (painter's algorithm) before rendering, since alpha blending is non-commutative: the blending order affects the result. In production renderers Order-Independent Transparency (OIT) is used - Depth Peeling or Weighted Blended OIT (WBOIT) - to avoid sorting.

**Render targets and G-buffer (Deferred Rendering):** instead of a single framebuffer, the GPU can render simultaneously to multiple textures (MRT - Multiple Render Targets). Deferred shading renders geometry into a G-buffer (albedo, normals, depth), then lighting is applied in a separate pass - this allows efficiently handling thousands of light sources.

The GPU runs the fragment shader for every fragment, and only then the z-test discards hidden ones. So the shader runs wastefully for hidden pixels.

Early-Z: modern GPUs perform the z-test BEFORE the fragment shader, discarding fragments that are hidden. The fragment shader is launched only for potentially visible pixels. Exception: if the shader writes to gl_FragDepth or calls discard - Early-Z is disabled.

Early-Z is a key performance optimization. Without it, a scene with many hidden surfaces would spend all GPU time shading invisible pixels. This is exactly why discard in shaders is avoided where possible.

When rendering transparent glass on top of an opaque wall with alpha blending (Porter-Duff 'over'), in what order should objects be drawn?

Key Ideas

**Vertex shader:** MVP transform object→world→camera→clip space; normals require a separate matrix (transpose(inverse(model))).
**Rasterization:** edge functions + barycentric coordinates for interpolation; perspective-correct interpolation is mandatory for textures; Early-Z discards hidden fragments before the shader.
**Fragment shader:** Cook-Torrance PBR BRDF (D×F×G); metallic determines F0, roughness widens the distribution; mip-map LOD via derivatives dFdx/dFdy.
**Output Merger:** z-buffer for hidden surfaces, alpha blending (Porter-Duff 'over') for transparency, stencil for masks; transparent objects require back-to-front sorting.

Вопросы для размышления

Early-Z is disabled if the fragment shader uses discard or writes to gl_FragDepth. Why is this architectural limitation unavoidable - why can the GPU not 'predict' the result of discard in advance?
Deferred shading moves lighting to a separate pass, reading from the G-buffer. MSAA (multisampling antialiasing) works poorly with deferred rendering. Why is geometric information (normals, depth) incompatible with averaging over MSAA samples?
A PBR material is described by a set of textures (albedo, metallic, roughness, normal map). What physical constraints does energy conservation place on the BRDF - and why do some custom shaders without PBR look 'unrealistic'?

Связанные уроки

arch-04-cpu