Computer Graphics

Coordinate Spaces

October 2011. Battlefield 3 - the first AAA shooter on Frostbite 2. Artists complain: characters deform when scaled. The bug took a week to track down. The culprit: normals being transformed by the wrong matrix. Not the Model Matrix but its inverse-transpose. Five spaces, four matrices, one wrong choice - and the scene falls apart.

**Vertex Shader on the GPU:** the first stage of the graphics pipeline. Its only required task is to transform a vertex from Model Space to Clip Space via MVP
**Camera systems in games:** FPS camera, orbit camera, cinematic camera - all just different ways to build the View Matrix
**VR rendering:** two eyes = two View Matrices + two Projection Matrices with different offsets. Doubling the vertex pipeline

Catmull, SGI, and the birth of the graphics pipeline

In 1974 Edwin Catmull laid out the concept of sequential vertex transformation through coordinate spaces in his PhD dissertation. SGI (Silicon Graphics) hardwired the pipeline into their IRIS workstations in 1982. In 1992 OpenGL standardized the exact sequence: Model -> World -> View -> Clip -> NDC. Every modern GPU is a direct descendant of that architecture.

Предварительные знания

Linear Algebra for Graphics

Model Space (local coordinates)

A 3D artist sculpts a character. Head at the center, arms to the sides, legs below. All vertex coordinates are defined **relative to the model's center**. That is **Model Space** - each object's own coordinate system. The shape's identity document.

**Model Space** (Object Space, Local Space) - the object's coordinate system. Origin (0,0,0) usually sits at the geometric center or at the object's base. Every vertex of the 3D model lives in model space.

Separating shape (model space) from placement (model matrix) is an expressive abstraction. The same cube drops into the scene 100 times with different positions, rotations, and scales - only the model matrix changes, the vertices stay identical. The foundation of GPU instancing in Unreal and Unity.

Property	Model Space	Why it matters
Center (origin)	Center of the object	Rotation happens around the origin
Scale	"Natural" size	S = (1,1,1) = original size
Shared vertices	One buffer for all instances	GPU instancing - thousands of trees from one model
Bounding box	In model space	For collision detection before transformation

100 trees in a scene use the same 3D model. What differs for each tree?

World Space (global coordinates)

Every scene object lives in **World Space** - a single global coordinate system. House at (10, 0, 5), tree at (15, 0, 8), character at (12, 0, 6). World Space is the scene's world map. Physics, AI, and lighting all run here.

**World Space** - the scene's global coordinate system. A vertex jumps from Model Space to World Space through the Model Matrix: world_pos = ModelMatrix × model_pos. Lighting, physics, and AI compute in World Space.

In World Space a light source and a surface live in the same coordinate system. Fundamental: dot(normal, light_dir) makes no sense if the normal sits in the object's Model Space while light_dir is a global World Space vector. A common space is mandatory.

Operation in World Space	Purpose	Example
Distance between objects	Physics, AI	Enemy within 10-meter radius?
Direction to light source	Lighting	dot(normal, light_dir) = brightness
Raycasting	Selecting objects with mouse	Ray from camera through pixel into world
Bounding volume checks	Broad phase collision	AABB overlap in world space

Lighting is typically computed in World Space. Why not in Model Space?

View Space and the LookAt Matrix

The camera stands in the world looking in a direction. **View Space** is the world through the camera's eyes: camera at (0,0,0), looking along -Z (OpenGL). The whole world transforms into camera coordinates. The camera does not fly to the scene - the scene flies to the camera.

**View Matrix** (Camera Matrix) transforms coordinates from World Space into View Space. Built via **LookAt**: specify camera position (eye), the look-at point (target), and the up vector. View Matrix = (Camera's Model Matrix)^(-1).

**Why not stay in World Space?** Projection - the next step - assumes the camera sits at the origin looking along the Z axis. The View Matrix "moves the whole world" so the camera ends up at the origin. Mathematically that is inversion: camera_to_world^(-1) = world_to_camera.

LookAt parameter	What it defines	Typical value
eye	Camera position in the world	(0, 5, 10) - behind and above
target	Where to look	(0, 0, 0) - center of scene
up	Where is up	(0, 1, 0) - Y is up
Result	4x4 View Matrix	Inverse of the camera's position matrix

The View Matrix is the inverse of the camera's transformation matrix. Why inverse?

Projection: Perspective vs Orthographic

The world is three-dimensional. The screen is two-dimensional. **Projection** flattens 3D into 2D. Two flavors: **perspective** (as the eye sees - distant things shrink) and **orthographic** (no perspective - engineering drawings). Games use perspective. CAD, 2D games, and UI use orthographic.

**Perspective Projection** - mimics human vision. Defined by FOV (field of view), aspect ratio, near plane, far plane. Distant objects shrink. **Orthographic** - parallel projection, size stays constant with distance.

**Frustum** (truncated pyramid) - the volume of space visible to the camera in perspective projection. Anything outside the frustum gets **clipped** - the GPU wastes no resources on invisible objects. Clipping happens in Clip Space, before the perspective divide.

Parameter	Perspective	Orthographic
FOV (field of view)	Defines width of view	Not used
Near plane	Minimum render distance	Minimum distance
Far plane	Maximum distance	Maximum distance
Depth effect	Distant objects shrink	No depth effect
Last row of matrix	[0, 0, -1, 0]	[0, 0, 0, 1]

**Never push the near plane close to 0.** The Z-buffer has finite precision, and a near plane near 0 makes distant objects merge (Z-fighting). Recommended: near >= 0.1 for games.

The perspective projection matrix has -1 at position [3][2]. What is it for?

NDC: Normalized Device Coordinates

After projection, a vertex sits in **Clip Space** with coordinates (x, y, z, w). The GPU runs the **perspective divide**: divides x, y, z by w. Out comes **NDC** (Normalized Device Coordinates). Every visible coordinate lands in [-1, 1]. Anything outside is clipped.

**NDC** (Normalized Device Coordinates) - normalized space after the perspective divide. In OpenGL: x ∈ [-1, 1], y ∈ [-1, 1], z ∈ [-1, 1]. In DirectX/Vulkan: z ∈ [0, 1]. Everything outside is clipped.

Final step: **Viewport Transform** - from NDC [-1,1] to screen pixel coordinates [0, width] × [0, height]. Formulas: screen_x = (ndc_x + 1) / 2 * width, screen_y = (ndc_y + 1) / 2 * height.

Space	Transition matrix	Coordinate range	What happens
Model -> World	Model Matrix (TRS)	Arbitrary	Placement in scene
World -> View	View Matrix (LookAt)	Arbitrary	Camera -> origin
View -> Clip	Projection Matrix	Arbitrary (with w)	Perspective
Clip -> NDC	Perspective divide (÷w)	[-1, 1]^3	Normalization
NDC -> Screen	Viewport transform	[0,W] × [0,H]	Screen pixels

From local model coordinates to a pixel on screen - 5 spaces, 4 matrix transitions. The GPU computes MVP = P × V × M once and slams it onto millions of vertices in parallel. That is the vertex shader - the first stage of the graphics pipeline.

Perspective projection is simply dividing x and y by z

The full perspective projection matrix includes FOV (field of view), aspect ratio, near plane, and far plane. Division by z is only part of the process. The matrix scales x and y for FOV/aspect, transforms z for the correct Z-buffer distribution, and copies -z into w for the perspective divide.

Simple x/z does not account for: 1) FOV - how wide the camera sees, 2) aspect ratio - the screen is not square, 3) near/far - depth clipping and proper Z-buffer precision distribution are needed. The projection matrix solves all of this in a single multiplication.

A vertex after projection has clip coordinates (3, 2, -8, -10). What are the NDC coordinates?

Key Ideas

**Model Space** - local object coordinates. Model Matrix (TRS) places it in the world
**World Space** - the shared scene. View Matrix (LookAt^-1) transforms the world into camera coordinates
**Projection** (perspective/orthographic) compresses 3D into a frustum. Perspective divide (÷w) creates perspective
**NDC** [-1,1]^3 -> Viewport Transform -> screen pixels. Full pipeline: MVP = P × V × M

Вопросы для размышления

Why does the GPU pre-compute MVP = P × V × M rather than applying three matrices sequentially to each vertex?
In VR rendering, two cameras (left and right eye) have different View Matrices. Which parts of the pipeline can be reused?
What happens to the perspective if FOV is increased to 170 degrees? Decreased to 5 degrees?

Связанные уроки

cg-02 — 4x4 matrices and homogeneous coordinates are the math behind every transition
cg-04 — NDC z-coordinate feeds directly into the depth buffer for Z-fighting prevention
cg-01 — The rasterizer receives vertices already in Screen Space
geo-01 — Affine transformations - formal foundation of TRS matrices
arvr-01 — VR stereo: two frustums, two View Matrices, doubled vertex pipeline
cg-05 — Shading requires normals in the correct coordinate space
la-06-transformations

Computer Graphics

Coordinate Spaces

**Vertex Shader on the GPU:** the first stage of the graphics pipeline. Its only required task is to transform a vertex from Model Space to Clip Space via MVP
**Camera systems in games:** FPS camera, orbit camera, cinematic camera - all just different ways to build the View Matrix
**VR rendering:** two eyes = two View Matrices + two Projection Matrices with different offsets. Doubling the vertex pipeline

Catmull, SGI, and the birth of the graphics pipeline

Предварительные знания

Linear Algebra for Graphics

Model Space (local coordinates)

Property	Model Space	Why it matters
Center (origin)	Center of the object	Rotation happens around the origin
Scale	"Natural" size	S = (1,1,1) = original size
Shared vertices	One buffer for all instances	GPU instancing - thousands of trees from one model
Bounding box	In model space	For collision detection before transformation

100 trees in a scene use the same 3D model. What differs for each tree?

World Space (global coordinates)

Operation in World Space	Purpose	Example
Distance between objects	Physics, AI	Enemy within 10-meter radius?
Direction to light source	Lighting	dot(normal, light_dir) = brightness
Raycasting	Selecting objects with mouse	Ray from camera through pixel into world
Bounding volume checks	Broad phase collision	AABB overlap in world space

Lighting is typically computed in World Space. Why not in Model Space?

View Space and the LookAt Matrix

LookAt parameter	What it defines	Typical value
eye	Camera position in the world	(0, 5, 10) - behind and above
target	Where to look	(0, 0, 0) - center of scene
up	Where is up	(0, 1, 0) - Y is up
Result	4x4 View Matrix	Inverse of the camera's position matrix

The View Matrix is the inverse of the camera's transformation matrix. Why inverse?

Projection: Perspective vs Orthographic

Parameter	Perspective	Orthographic
FOV (field of view)	Defines width of view	Not used
Near plane	Minimum render distance	Minimum distance
Far plane	Maximum distance	Maximum distance
Depth effect	Distant objects shrink	No depth effect
Last row of matrix	[0, 0, -1, 0]	[0, 0, 0, 1]

**Never push the near plane close to 0.** The Z-buffer has finite precision, and a near plane near 0 makes distant objects merge (Z-fighting). Recommended: near >= 0.1 for games.

The perspective projection matrix has -1 at position [3][2]. What is it for?

NDC: Normalized Device Coordinates

Final step: **Viewport Transform** - from NDC [-1,1] to screen pixel coordinates [0, width] × [0, height]. Formulas: screen_x = (ndc_x + 1) / 2 * width, screen_y = (ndc_y + 1) / 2 * height.

Space	Transition matrix	Coordinate range	What happens
Model -> World	Model Matrix (TRS)	Arbitrary	Placement in scene
World -> View	View Matrix (LookAt)	Arbitrary	Camera -> origin
View -> Clip	Projection Matrix	Arbitrary (with w)	Perspective
Clip -> NDC	Perspective divide (÷w)	[-1, 1]^3	Normalization
NDC -> Screen	Viewport transform	[0,W] × [0,H]	Screen pixels

Perspective projection is simply dividing x and y by z

A vertex after projection has clip coordinates (3, 2, -8, -10). What are the NDC coordinates?

Key Ideas

**Model Space** - local object coordinates. Model Matrix (TRS) places it in the world
**World Space** - the shared scene. View Matrix (LookAt^-1) transforms the world into camera coordinates
**Projection** (perspective/orthographic) compresses 3D into a frustum. Perspective divide (÷w) creates perspective
**NDC** [-1,1]^3 -> Viewport Transform -> screen pixels. Full pipeline: MVP = P × V × M

Вопросы для размышления

Why does the GPU pre-compute MVP = P × V × M rather than applying three matrices sequentially to each vertex?
In VR rendering, two cameras (left and right eye) have different View Matrices. Which parts of the pipeline can be reused?
What happens to the perspective if FOV is increased to 170 degrees? Decreased to 5 degrees?

Связанные уроки

cg-02 — 4x4 matrices and homogeneous coordinates are the math behind every transition
cg-04 — NDC z-coordinate feeds directly into the depth buffer for Z-fighting prevention
cg-01 — The rasterizer receives vertices already in Screen Space
geo-01 — Affine transformations - formal foundation of TRS matrices
arvr-01 — VR stereo: two frustums, two View Matrices, doubled vertex pipeline
cg-05 — Shading requires normals in the correct coordinate space
la-06-transformations

Coordinate Spaces

Catmull, SGI, and the birth of the graphics pipeline

Предварительные знания

Model Space (local coordinates)

World Space (global coordinates)

View Space and the LookAt Matrix

Projection: Perspective vs Orthographic

NDC: Normalized Device Coordinates

Key Ideas

Related Topics

Вопросы для размышления

Связанные уроки

Coordinate Spaces

Catmull, SGI, and the birth of the graphics pipeline

Предварительные знания

Model Space (local coordinates)

World Space (global coordinates)

View Space and the LookAt Matrix

Projection: Perspective vs Orthographic

NDC: Normalized Device Coordinates

Key Ideas

Related Topics

Вопросы для размышления

Связанные уроки