Geometry

Geometric Transformations

Цели урока

Encode rotation, scale, reflection, and shear as two-by-two matrices
Use homogeneous coordinates so translation joins the matrix product
Compose maps right to left through matrix products
Tell affine and projective transforms apart by what they preserve
Spot the same primitives in NeRF, SLAM, ARKit, and torchvision augmentation

Every Vulkan or Metal frame walks each vertex through a four-by-four matrix. An RTX 4090 lands roughly one hundred million of these per second per pixel-pass before anything reaches the screen. NeRF and Tesla Autopilot run the inverse problem: rebuild a three-D scene by inverting the same matrices.

**CSS and SVG transforms (W3C, 2012):** matrix(a,b,c,d,tx,ty) is one affine three-by-three on every animated UI element
**GPU pipeline (OpenGL, Vulkan, Metal):** model, view, projection - three four-by-fours per vertex
**Tesla Autopilot and ARKit:** SLAM stitches the world from a stream of camera extrinsics, millions of matrix products per second
**torchvision.transforms.RandomAffine:** random affine warp on every batch of every CV training run
**Spatial Transformer Networks (Jaderberg, 2015):** the CNN itself learns the affine matrix to apply

Предварительные знания

Coordinate geometry and plane vectors
Matrix multiplication and its core properties

Felix Klein's Erlangen Programme

In 1872 Felix Klein delivered the inaugural lecture at the University of Erlangen and proposed a single organising idea: a geometry is the study of properties left invariant by a group of transformations. Euclidean geometry is the geometry of the rigid-motion group. Affine geometry sits one level up, dropping length and angle but keeping parallelism. Projective geometry sits higher still, keeping only collinearity. The Erlangen Programme is why a graphics engineer in 2026 still works the same hierarchy - isometry, similarity, affine, projective - that Klein laid out a hundred and fifty-four years ago.

Basic Transformations as Matrices

Every frame rendered by Vulkan or Metal walks each vertex through a four-by-four matrix: model, then view, then projection. An RTX 4090 chews through roughly one hundred million of these multiplications per second per pixel-pass, all before a single fragment hits the screen. The whole pipeline ships in every modern UI for free, and the math at the core is one operation: matrix times vector.

Translation, rotation, reflection, and scaling collapse into a single primitive: multiply by a matrix. Stack as many transforms as needed, fold them into one product up front, and the per-vertex cost stays constant. That is exactly why the model-view-projection pipeline is three matrix products in strict order, not a chain of ad-hoc calls. CUDA warps handle thirty-two vertices in lockstep, SIMD lanes on the CPU do four or eight, and the formula on every lane is the same.

Transform	Two-by-two matrix	Parameters
Scale	[[sx, 0], [0, sy]]	sx, sy = scale factors
Rotation by θ	[[cosθ, -sinθ], [sinθ, cosθ]]	θ = angle, counter-clockwise
Reflect across X-axis	[[1, 0], [0, -1]]	Horizontal mirror
Shear	[[1, sh], [0, 1]]	sh = shear factor along X

OpenGL, Vulkan, and Metal compute exactly these matrices: the vertex shader multiplies each vertex by a uniform transform matrix. NVIDIA's CUDA cores process thirty-two vertices per warp on the same formula, and Apple's Metal Performance Shaders ship the same kernels out of the box.

The ninety-degree counter-clockwise rotation matrix sends point (3, 0) to:

Homogeneous Coordinates

Translation refuses to fit a two-by-two matrix. The map f(x) = x + t is not linear: f(0) is t, not zero. Homogeneous coordinates fix this by adding a third slot. Every two-D transform becomes a three-by-three matrix, every three-D transform becomes a four-by-four, and translation joins rotation, scale, and shear inside a single multiplication. That is why ARKit, Tesla Autopilot, and the Chromium GPU compositor all push four-by-four matrices around even on flat input: one shape of operation, zero special cases.

**Two-D homogeneous form:** point (x, y) becomes (x, y, 1) **Translation by (tx, ty):** [[1, 0, tx], [0, 1, ty], [0, 0, 1]] **Full two-D affine matrix:** [[sx·cosθ, -sy·sinθ, tx], [sx·sinθ, sy·cosθ, ty], [0, 0, 1]]

CSS `transform: matrix(a,b,c,d,tx,ty)` is exactly the homogeneous matrix [[a,c,tx],[b,d,ty],[0,0,1]]. Every animated React component on a page recomputes that matrix per frame inside the GPU compositor of Chromium and Safari, alongside other layers.

There is a second prize. Homogeneous coordinates are the native language of projective geometry. Points at infinity get the form (x, y, 0), so parallel lines meet at one ideal point on the horizon. SLAM in computer vision and pose estimation in ARKit lean on this directly: a single four-by-four encodes camera rotation, translation, and perspective at once - no glue code, no special branches.

Homogeneous coordinates are just a trick: pad the vector with a 1

They are the move into projective space, where affine and projective maps both become ordinary linear operators

The ability to write translation as a matrix is a side effect of deeper structure. Projective points sit modulo scale: (x,y,w) ~ (kx,ky,kw). That equivalence class is what makes perspective, camera homographies, and points at infinity speak one language.

Why bring homogeneous coordinates into two-D geometry?

Composition of Transforms

The payoff of the matrix view is composition. The product of N matrices is itself one matrix - the combined transform. Multiply N times once up front, ship the result, then apply it to every point. Robotics has run on this since 1955: Denavit-Hartenberg parameters describe each joint of a manipulator as one four-by-four, and the kinematic chain of an arm is the product of four to six of them. The end of that product is the pose of the gripper.

**Order is load-bearing.** Matrix multiplication does not commute: rotate-then-translate is not the same as translate-then-rotate M_total = M_last · ... · M_2 · M_1 A point transforms as p' = M_total · p Read right to left: the point hits M_1 first, then M_2, and so on out to the leftmost factor.

Computer vision hides the same pattern inside camera extrinsics: the matrix [R|t] carries a point from world frame into camera frame. SLAM in Tesla Autopilot and ARKit optimises exactly these products, shrinking re-projection error across thousands of frames per second using solvers like g2o and Ceres.

Order in a matrix product is just notation - rearrange as convenient

Order is geometry: M_a · M_b means b first, then a. Swapping changes the result everywhere except in narrow commuting cases

A ninety-degree rotation around the origin and a translation by (10, 0) is the textbook example: the two orderings land on different points. A robot that scrambles the order of its DH chain misses the part on the conveyor.

Scaling a sprite around its centre (cx, cy) needs the order:

Affine vs Projective Transforms

Affine maps keep parallel lines parallel and preserve length ratios along any one direction. Projective maps drop parallelism and keep only collinearity: a straight line stays straight, but parallel rails meet at a vanishing point on the horizon. Projective matrices sit at the heart of NeRF (Neural Radiance Fields, 2020): the network ingests photos of a scene, inverts the camera matrices, and reconstructs a three-D radiance field a ray at a time.

Class	Matrix	Preserves	Example
Isometry	Rotation + translation	Distances, angles	Rigid-body physics
Similarity	+ scale	Angles, length ratios	Map zoom
Affine	+ shear	Parallelism, area ratios	CSS transform, 2D sprites
Projective	Full 3x3, eight DOF	Collinearity	Camera homography, AR

Spatial Transformer Networks (Jaderberg, 2015) bake this matrix into the architecture of a convolutional network: the model itself learns the affine parameters and warps the input before classification. The same idea ships drop-in as torchvision.transforms.RandomAffine, the workhorse augmentation step on every ImageNet training run.

Which transform is NOT affine?

Key Ideas

**Rotation, scale, reflection** fit two-by-two matrices; translation needs homogeneous coordinates
**Homogeneous form:** (x, y) becomes (x, y, 1); the three-by-three matrix carries translation
**Composition:** M_total = M_n · ... · M_1, applied right to left
**Affine** preserves parallelism; **projective** preserves only collinearity

Вопросы для размышления

Why does order of transformations matter? Sketch a case where rotate-then-translate lands a point in a different place than translate-then-rotate.
How is the inverse of an affine transform computed, and what does that inverse mean geometrically?
Why does WebGL keep four-by-four matrices on the wire even when the scene is purely two-D?

Связанные уроки

la-07-matrix-multiply — Composition is matrix product, right to left
la-13-linear-maps — Affine maps generalise linear operators
geo-12 — Homography ships full eight-DOF projective matrix
ml-29-cnn — Spatial transformer networks plug affine layer in
cv-05 — Camera extrinsics and SLAM run on these four-by-fours
la-01-vectors-intro

Geometry

Geometric Transformations

Цели урока

Encode rotation, scale, reflection, and shear as two-by-two matrices
Use homogeneous coordinates so translation joins the matrix product
Compose maps right to left through matrix products
Tell affine and projective transforms apart by what they preserve
Spot the same primitives in NeRF, SLAM, ARKit, and torchvision augmentation

**CSS and SVG transforms (W3C, 2012):** matrix(a,b,c,d,tx,ty) is one affine three-by-three on every animated UI element
**GPU pipeline (OpenGL, Vulkan, Metal):** model, view, projection - three four-by-fours per vertex
**Tesla Autopilot and ARKit:** SLAM stitches the world from a stream of camera extrinsics, millions of matrix products per second
**torchvision.transforms.RandomAffine:** random affine warp on every batch of every CV training run
**Spatial Transformer Networks (Jaderberg, 2015):** the CNN itself learns the affine matrix to apply

Предварительные знания

Coordinate geometry and plane vectors
Matrix multiplication and its core properties

Felix Klein's Erlangen Programme

Basic Transformations as Matrices

Transform	Two-by-two matrix	Parameters
Scale	[[sx, 0], [0, sy]]	sx, sy = scale factors
Rotation by θ	[[cosθ, -sinθ], [sinθ, cosθ]]	θ = angle, counter-clockwise
Reflect across X-axis	[[1, 0], [0, -1]]	Horizontal mirror
Shear	[[1, sh], [0, 1]]	sh = shear factor along X

The ninety-degree counter-clockwise rotation matrix sends point (3, 0) to:

Homogeneous Coordinates

Homogeneous coordinates are just a trick: pad the vector with a 1

They are the move into projective space, where affine and projective maps both become ordinary linear operators

Why bring homogeneous coordinates into two-D geometry?

Composition of Transforms

Order in a matrix product is just notation - rearrange as convenient

Order is geometry: M_a · M_b means b first, then a. Swapping changes the result everywhere except in narrow commuting cases

Scaling a sprite around its centre (cx, cy) needs the order:

Affine vs Projective Transforms

Class	Matrix	Preserves	Example
Isometry	Rotation + translation	Distances, angles	Rigid-body physics
Similarity	+ scale	Angles, length ratios	Map zoom
Affine	+ shear	Parallelism, area ratios	CSS transform, 2D sprites
Projective	Full 3x3, eight DOF	Collinearity	Camera homography, AR

Which transform is NOT affine?

Key Ideas

**Rotation, scale, reflection** fit two-by-two matrices; translation needs homogeneous coordinates
**Homogeneous form:** (x, y) becomes (x, y, 1); the three-by-three matrix carries translation
**Composition:** M_total = M_n · ... · M_1, applied right to left
**Affine** preserves parallelism; **projective** preserves only collinearity

Вопросы для размышления

Why does order of transformations matter? Sketch a case where rotate-then-translate lands a point in a different place than translate-then-rotate.
How is the inverse of an affine transform computed, and what does that inverse mean geometrically?
Why does WebGL keep four-by-four matrices on the wire even when the scene is purely two-D?

Связанные уроки

la-07-matrix-multiply — Composition is matrix product, right to left
la-13-linear-maps — Affine maps generalise linear operators
geo-12 — Homography ships full eight-DOF projective matrix
ml-29-cnn — Spatial transformer networks plug affine layer in
cv-05 — Camera extrinsics and SLAM run on these four-by-fours
la-01-vectors-intro

Geometric Transformations

Цели урока

Предварительные знания

Felix Klein's Erlangen Programme

Basic Transformations as Matrices

Homogeneous Coordinates

Composition of Transforms

Affine vs Projective Transforms

Key Ideas

Related Topics

Вопросы для размышления

Связанные уроки

Geometric Transformations

Цели урока

Предварительные знания

Felix Klein's Erlangen Programme

Basic Transformations as Matrices

Homogeneous Coordinates

Composition of Transforms

Affine vs Projective Transforms

Key Ideas

Related Topics

Вопросы для размышления

Связанные уроки