Computer Graphics

Coordinate Spaces

October 2011. Battlefield 3 - the first AAA shooter on Frostbite 2. Artists complain: characters deform when scaled. The bug took a week to track down. The culprit: normals being transformed by the wrong matrix. Not the Model Matrix but its inverse-transpose. Five spaces, four matrices, one wrong choice - and the scene falls apart.

  • **Vertex Shader on the GPU:** the first stage of the graphics pipeline. Its only required task is to transform a vertex from Model Space to Clip Space via MVP
  • **Camera systems in games:** FPS camera, orbit camera, cinematic camera - all just different ways to build the View Matrix
  • **VR rendering:** two eyes = two View Matrices + two Projection Matrices with different offsets. Doubling the vertex pipeline

Catmull, SGI, and the birth of the graphics pipeline

In 1974 Edwin Catmull laid out the concept of sequential vertex transformation through coordinate spaces in his PhD dissertation. SGI (Silicon Graphics) hardwired the pipeline into their IRIS workstations in 1982. In 1992 OpenGL standardized the exact sequence: Model -> World -> View -> Clip -> NDC. Every modern GPU is a direct descendant of that architecture.

Предварительные знания

  • Linear Algebra for Graphics

Model Space (local coordinates)

A 3D artist sculpts a character. Head at the center, arms to the sides, legs below. All vertex coordinates are defined **relative to the model's center**. That is **Model Space** - each object's own coordinate system. The shape's identity document.

**Model Space** (Object Space, Local Space) - the object's coordinate system. Origin (0,0,0) usually sits at the geometric center or at the object's base. Every vertex of the 3D model lives in model space.

Separating shape (model space) from placement (model matrix) is an expressive abstraction. The same cube drops into the scene 100 times with different positions, rotations, and scales - only the model matrix changes, the vertices stay identical. The foundation of GPU instancing in Unreal and Unity.

PropertyModel SpaceWhy it matters
Center (origin)Center of the objectRotation happens around the origin
Scale"Natural" sizeS = (1,1,1) = original size
Shared verticesOne buffer for all instancesGPU instancing - thousands of trees from one model
Bounding boxIn model spaceFor collision detection before transformation

100 trees in a scene use the same 3D model. What differs for each tree?

World Space (global coordinates)

Every scene object lives in **World Space** - a single global coordinate system. House at (10, 0, 5), tree at (15, 0, 8), character at (12, 0, 6). World Space is the scene's world map. Physics, AI, and lighting all run here.

**World Space** - the scene's global coordinate system. A vertex jumps from Model Space to World Space through the Model Matrix: world_pos = ModelMatrix × model_pos. Lighting, physics, and AI compute in World Space.

In World Space a light source and a surface live in the same coordinate system. Fundamental: dot(normal, light_dir) makes no sense if the normal sits in the object's Model Space while light_dir is a global World Space vector. A common space is mandatory.

Operation in World SpacePurposeExample
Distance between objectsPhysics, AIEnemy within 10-meter radius?
Direction to light sourceLightingdot(normal, light_dir) = brightness
RaycastingSelecting objects with mouseRay from camera through pixel into world
Bounding volume checksBroad phase collisionAABB overlap in world space

Lighting is typically computed in World Space. Why not in Model Space?

View Space and the LookAt Matrix

The camera stands in the world looking in a direction. **View Space** is the world through the camera's eyes: camera at (0,0,0), looking along -Z (OpenGL). The whole world transforms into camera coordinates. The camera does not fly to the scene - the scene flies to the camera.

**View Matrix** (Camera Matrix) transforms coordinates from World Space into View Space. Built via **LookAt**: specify camera position (eye), the look-at point (target), and the up vector. View Matrix = (Camera's Model Matrix)^(-1).

**Why not stay in World Space?** Projection - the next step - assumes the camera sits at the origin looking along the Z axis. The View Matrix "moves the whole world" so the camera ends up at the origin. Mathematically that is inversion: camera_to_world^(-1) = world_to_camera.

LookAt parameterWhat it definesTypical value
eyeCamera position in the world(0, 5, 10) - behind and above
targetWhere to look(0, 0, 0) - center of scene
upWhere is up(0, 1, 0) - Y is up
Result4x4 View MatrixInverse of the camera's position matrix

The View Matrix is the inverse of the camera's transformation matrix. Why inverse?

Projection: Perspective vs Orthographic

The world is three-dimensional. The screen is two-dimensional. **Projection** flattens 3D into 2D. Two flavors: **perspective** (as the eye sees - distant things shrink) and **orthographic** (no perspective - engineering drawings). Games use perspective. CAD, 2D games, and UI use orthographic.

**Perspective Projection** - mimics human vision. Defined by FOV (field of view), aspect ratio, near plane, far plane. Distant objects shrink. **Orthographic** - parallel projection, size stays constant with distance.

**Frustum** (truncated pyramid) - the volume of space visible to the camera in perspective projection. Anything outside the frustum gets **clipped** - the GPU wastes no resources on invisible objects. Clipping happens in Clip Space, before the perspective divide.

ParameterPerspectiveOrthographic
FOV (field of view)Defines width of viewNot used
Near planeMinimum render distanceMinimum distance
Far planeMaximum distanceMaximum distance
Depth effectDistant objects shrinkNo depth effect
Last row of matrix[0, 0, -1, 0][0, 0, 0, 1]

**Never push the near plane close to 0.** The Z-buffer has finite precision, and a near plane near 0 makes distant objects merge (Z-fighting). Recommended: near >= 0.1 for games.

The perspective projection matrix has -1 at position [3][2]. What is it for?

NDC: Normalized Device Coordinates

After projection, a vertex sits in **Clip Space** with coordinates (x, y, z, w). The GPU runs the **perspective divide**: divides x, y, z by w. Out comes **NDC** (Normalized Device Coordinates). Every visible coordinate lands in [-1, 1]. Anything outside is clipped.

**NDC** (Normalized Device Coordinates) - normalized space after the perspective divide. In OpenGL: x ∈ [-1, 1], y ∈ [-1, 1], z ∈ [-1, 1]. In DirectX/Vulkan: z ∈ [0, 1]. Everything outside is clipped.

Final step: **Viewport Transform** - from NDC [-1,1] to screen pixel coordinates [0, width] × [0, height]. Formulas: screen_x = (ndc_x + 1) / 2 * width, screen_y = (ndc_y + 1) / 2 * height.

SpaceTransition matrixCoordinate rangeWhat happens
Model -> WorldModel Matrix (TRS)ArbitraryPlacement in scene
World -> ViewView Matrix (LookAt)ArbitraryCamera -> origin
View -> ClipProjection MatrixArbitrary (with w)Perspective
Clip -> NDCPerspective divide (÷w)[-1, 1]^3Normalization
NDC -> ScreenViewport transform[0,W] × [0,H]Screen pixels

From local model coordinates to a pixel on screen - 5 spaces, 4 matrix transitions. The GPU computes MVP = P × V × M once and slams it onto millions of vertices in parallel. That is the vertex shader - the first stage of the graphics pipeline.

Perspective projection is simply dividing x and y by z

The full perspective projection matrix includes FOV (field of view), aspect ratio, near plane, and far plane. Division by z is only part of the process. The matrix scales x and y for FOV/aspect, transforms z for the correct Z-buffer distribution, and copies -z into w for the perspective divide.

Simple x/z does not account for: 1) FOV - how wide the camera sees, 2) aspect ratio - the screen is not square, 3) near/far - depth clipping and proper Z-buffer precision distribution are needed. The projection matrix solves all of this in a single multiplication.

A vertex after projection has clip coordinates (3, 2, -8, -10). What are the NDC coordinates?

Key Ideas

  • **Model Space** - local object coordinates. Model Matrix (TRS) places it in the world
  • **World Space** - the shared scene. View Matrix (LookAt^-1) transforms the world into camera coordinates
  • **Projection** (perspective/orthographic) compresses 3D into a frustum. Perspective divide (÷w) creates perspective
  • **NDC** [-1,1]^3 -> Viewport Transform -> screen pixels. Full pipeline: MVP = P × V × M

Related Topics

Coordinate spaces are the foundation for understanding the full rendering pipeline:

  • Linear Algebra for Graphics — 4x4 matrices, homogeneous coordinates, TRS - the mathematical foundation of all transformations
  • Rasterization — After NDC->Screen, vertices go to the rasterizer which fills the pixels between them
  • Z-Buffer — The NDC z-coordinate is used to determine which object is closer to the camera (depth testing)

Вопросы для размышления

  • Why does the GPU pre-compute MVP = P × V × M rather than applying three matrices sequentially to each vertex?
  • In VR rendering, two cameras (left and right eye) have different View Matrices. Which parts of the pipeline can be reused?
  • What happens to the perspective if FOV is increased to 170 degrees? Decreased to 5 degrees?

Связанные уроки

  • cg-02 — 4x4 matrices and homogeneous coordinates are the math behind every transition
  • cg-04 — NDC z-coordinate feeds directly into the depth buffer for Z-fighting prevention
  • cg-01 — The rasterizer receives vertices already in Screen Space
  • geo-01 — Affine transformations - formal foundation of TRS matrices
  • arvr-01 — VR stereo: two frustums, two View Matrices, doubled vertex pipeline
  • cg-05 — Shading requires normals in the correct coordinate space
  • la-06-transformations
Coordinate Spaces

0

1

Sign In