Computer Graphics
Linear Algebra for Graphics
Цели урока
- Compute dot and cross products and understand their geometric meaning
- Build rotation, scale and translation matrices
- Understand why TRS order is critical and read matrix chains right to left
- Work with homogeneous coordinates and 4x4 matrices
Предварительные знания
Pixar 'Toy Story' 1995: the first feature-length CGI film. 800,000 hours of rendering on a farm of 117 Sun SparcStations. Today: Unreal Engine 5 Nanite - real-time rendering of one billion polygons at 60 FPS. Thirty years of progress, one foundation: the same linear algebra, just running on a GPU.
- **Unity/Unreal Engine:** every object's Transform is a TRS 4x4 matrix. The GPU applies it to all mesh vertices simultaneously
- **NVIDIA RTX 4090:** 82.6 teraflops - literally billions of 4x4 matrix multiplications every second
- **Robotics:** kinematics of a Boston Dynamics arm is a chain of 4x4 matrices, each joint is rotation + translation
- **AR/VR (Apple Vision Pro):** real-time head tracking drives the view matrix. Latency above 10 ms and the brain rejects the image
- **NeRF / Gaussian Splatting:** 3D scene representations encode camera poses as transformation matrices - the same 4x4
Ivan Sutherland and the birth of computer graphics
In 1963, MIT graduate student Ivan Sutherland defended his thesis and demonstrated **Sketchpad** - the world's first interactive graphical program. First GUI, first direct manipulation, first use of transformation matrices for 2D objects. Sutherland sketched what lives in every GPU today: object hierarchy plus transformation matrices. He received the Turing Award in 1988.
Vectors: dot and cross product
Pixar, 1995. **Toy Story** - the first feature-length CGI film. 800,000 hours of rendering on a farm of 117 Sun SparcStations. Today, Unreal Engine 5 Nanite runs real-time rendering of one billion polygons on a single GPU. Thirty years - from 800K hours to 60 FPS. The difference comes down to one thing: **linear algebra on the GPU**.
**Vector** in 3D - a triple (x, y, z). Two key products: **dot product** a·b = |a||b|cos(theta) (scalar, reveals the angle between vectors), **cross product** a x b = normal vector to the plane (vector, reveals orientation).
| Operation | Result | Use in graphics |
|---|---|---|
| a · b (dot) | Scalar | Phong shading (cos of angle), frustum culling, FOV |
| a x b (cross) | Vector perp to a and b | Triangle normals, backface culling |
| |a| (length) | Scalar | Distance between points, normalization |
| normalize(a) | Unit vector | Directions: camera, light source, shaders |
**Dot product** answers "how aligned are two vectors" (> 0 - acute, = 0 - perpendicular, < 0 - obtuse). The Phong shader reduces to: brightness = max(0, dot(normal, light)). **Cross product** gives surface orientation - without it there are no normals, no shading, no 3D graphics.
The dot product of two unit vectors equals 0. What does this mean geometrically?
Matrices and Their Multiplication
A vector describes a point or direction. A **matrix** describes a **transformation** - how to change a vector: rotate it, scale it, project it. Multiplying a matrix by a vector applies the transformation. Every vertex in a 3D scene passes through several such multiplications every frame.
**Matrix** NxM - a table of numbers. In graphics: 3x3 for linear transformations, 4x4 for affine (with translation). Multiplying A(m x n) by v(n x 1) produces u(m x 1). The key insight: matrices can be **multiplied together** - a chain of transformations collapses into one.
**Composition of transformations** - matrix multiplication. M1 is rotation, M2 is scale, so M2 @ M1 - first rotate, then scale. One matrix replaces the entire chain. This is not just convenience - it is what allows the GPU to process thousands of vertices in a single shader call.
**Matrix multiplication is NOT commutative!** A x B != B x A. Rotate-then-Scale gives a different result than Scale-then-Rotate. This is the source of hundreds of engine bugs every year.
| Property | True? | Meaning for graphics |
|---|---|---|
| A x B = B x A (commutativity) | No! | Order of transformations is critical |
| (A x B) x C = A x (B x C) (associativity) | Yes | Precompute M = A x B x C in advance |
| I x A = A (identity) | Yes | I - no transformation (identity) |
| A x A^-1 = I (inverse) | Yes (if it exists) | View matrix = inverse(camera TRS) |
Matrix M = R x S (R - rotation, S - scale). When applying M x v, what happens FIRST?
Basic Transformations: TRS
Three fundamental 3D transformations: **Translation** (moving), **Rotation** (rotating), **Scale** (scaling). In Unity, Unreal, and Blender, every Transform component is TRS. Every GameObject, every mesh, every camera is described by this triple.
**TRS** - the standard order: Translation x Rotation x Scale. First scale (S), then rotate (R), then translate (T). Read right to left. This is not merely a convention - it is the only order where Scale does not distort Translation.
But **translation** cannot be expressed with a 3x3 matrix. Addition (x + tx, y + ty, z + tz) is not a linear operation. For that, the switch to **homogeneous coordinates** and 4x4 matrices is required.
| Transformation | 3x3 matrix? | 4x4 matrix? | Property |
|---|---|---|---|
| Scale | Yes | Yes | Linear |
| Rotation | Yes | Yes | Linear, preserves lengths |
| Translation | No! | Yes | Affine (not linear) |
| Perspective projection | No! | Yes | Non-linear (division by w) |
An object is scaled by 2 (S), rotated 90 degrees (R), translated by (3,0,0) (T). How should the matrix be written?
Homogeneous Coordinates and 4x4 Matrices
Scale and rotation are linear operations (v' = M x v). Translation is not (v' = v + t). How can everything be unified into a single matrix? **Homogeneous coordinates**: a fourth coordinate **w** is added, and translation becomes a matrix multiplication on 4x4.
**Homogeneous coordinates** - representing a 3D point (x, y, z) as (x, y, z, 1) in 4D. w = 1 for points, w = 0 for directions (vectors). A 4x4 matrix encodes translation, rotation, scale, and projection in a single multiplication - this is exactly what a vertex shader does.
The w = 0 trick for vectors: translation does not affect directions. Physically sound: a surface normal pointing "right" stays pointing "right" regardless of where the object moves. Shaders rely on this for lighting calculations.
| w coordinate | Type | Translation affects? | Example |
|---|---|---|---|
| w = 1 | Point (position) | Yes | Mesh vertex, camera position |
| w = 0 | Direction (vector) | No | Normal vector, light direction |
| w != 0, 1 | After projection | - | Perspective divide: (x/w, y/w, z/w) |
Every frame of every 3D game involves millions of 4x4 matrix multiplications on 4D vectors. The RTX 4090 delivers 82.6 teraflops. Homogeneous coordinates unify the entire pipeline (TRS + view + projection) into a chain of matrix multiplications. Without this mathematics there is no Cyberpunk, no Avatar, no real-time 3D.
The order of transformations (TRS vs SRT) does not matter - the result is the same
Order is critically important! TRS means: first Scale, then Rotate, then Translate (reading right to left). SRT: first Translate, then Rotate, then Scale. The results are completely different, because Scale also scales the Translation component.
Matrix multiplication is non-commutative: A x B != B x A. With SRT, scaling is applied after translation, distorting the object's position. With TRS, scaling is first, translation is last - and it is not distorted.
Direction vector (1, 0, 0) is stored as (1, 0, 0, 0). What happens when multiplied by translation matrix T?
Key Ideas
- **Dot product** = cos(angle) times lengths. Foundation of Phong lighting, backface culling, FOV
- **Cross product** = perpendicular vector. Foundation of normals, triangle orientation
- **4x4 matrices** in homogeneous coordinates combine TRS + projection into a single multiplication
- **TRS order is critical:** Scale - Rotate - Translate (read right to left). A x B != B x A
- **w = 1** for points (translation applies), **w = 0** for directions (translation ignored)
- **Perspective divide:** dividing by w creates the perspective effect - distant objects appear smaller
Related Topics
Linear algebra is the language of all 3D graphics:
- Rasterization and Pixels — Transformed vertices in screen space are passed to the rasterizer
- Coordinate Spaces — TRS, View, Projection matrices are the transitions model->world->view->clip
- Computational Geometry — Cross product and orientation - shared foundation with comp. geometry
Вопросы для размышления
- Why are GPUs optimized for 4x4 matrix multiplication rather than arbitrary matrix sizes?
- Quaternions are often used instead of matrices for rotation. What advantages do they offer?
- How would rotation around an arbitrary point (not the origin) be implemented using TRS matrices?
Связанные уроки
- cg-01 — Rasterization and screen space introduced in the previous lesson
- cg-03 — Coordinate spaces model/world/view/clip follow directly from TRS and 4x4
- cgeom-01 — Cross product and orientation test - shared foundation with computational geometry
- la-06-transformations — Full theory of linear maps, change-of-basis and eigendecomposition
- arch-15-gpu-architecture — GPUs are optimized precisely for 4x4 matrix multiplication on 4D vectors
- la-05-matrices-intro — Core matrix operations if a refresher is needed