Linear Algebra
Gallery of Transformations: matrices in action
Every frame in a video game is produced by applying dozens of matrix transformations to millions of vertices. Rotation, scaling, perspective projection - all are matrix multiplications. A GPU does this in parallel for every vertex in microseconds.
- 3D graphics: Model-View-Projection - three matrices determining the position of every pixel
- Robotics: forward kinematics of a robot arm is a chain of rotation and translation matrices
- Computer vision: coordinate transforms between camera reference frames
- Medical imaging: aligning MRI volumes requires 3D affine transformations
- Animation: skeletal animation applies a matrix to each bone
Gallery of Transformations: matrices in action
**Every `nn.Linear(in, out)` in PyTorch is a matrix multiplication.** GPT-4 does this hundreds of times for every single token. When torchvision.transforms rotates an image before training - that is a rotation matrix. When a game engine renderer maps a 3D scene to the screen - three matrix multiplications (Model, View, Projection). Matrices don't just store numbers - they transform space. This lesson covers concrete matrices for concrete operations, with code that actually runs.
**What this lesson is actually about**: not a list of formulas to memorize, but the understanding that *any* linear transformation is a matrix, and vice versa. After this lesson, `torchvision.transforms.RandomRotation(30)` or `cv2.warpAffine` stop being black boxes.
What is the key idea of the concept 'Gallery of Transformations: matrices in action'?
Check that the concept material has been understood.
How to read a transformation matrix
How to read a transformation matrix
A 2×2 transformation matrix is fully determined by **where it sends the basis vectors** i = (1,0) and j = (0,1). The first column of the matrix is the image of i, the second is the image of j:
MATRIX A = [a b] [c d] i = (1, 0) ---> A·i = (a, c) <- first column j = (0, 1) ---> A·j = (b, d) <- second column ANY VECTOR v = (x, y) = x·i + y·j A·v = x·(a,c) + y·(b,d) = (ax+by, cx+dy) THIS ALWAYS HOLDS - for any matrix, any dimension. Knowing where the basis vectors go tells everything about the transformation.
**Reading ML code**: `nn.Linear(4, 3)` stores a weight matrix W of shape (3, 4). During forward pass: `out = W @ x` - x is mapped from a 4-dimensional space to a 3-dimensional one. The columns of W are the images of the unit vectors of the input space.
What is the key idea of the concept 'How to read a transformation matrix'?
Check that the concept material has been understood.
1. Scaling
1. Scaling
A diagonal matrix with values s on the diagonal stretches or compresses space along each axis. Each number is an independent "slider" for its own axis:
UNIFORM (same along all axes, k = 2): [2 0] i=(1,0) -> (2,0) [0 2] j=(0,1) -> (0,2) <- all lengths doubled NON-UNIFORM (different per axis): [sₓ 0] stretch along X by sₓ [ 0 sᵧ] stretch along Y by sᵧ IN CNN LAYERS: Adaptive Average Pooling 2×2 is equivalent to a 0.25 * [[1,1],[1,1]] matrix (loosely - it is a projection, but the idea is the same) PYTORCH: nn.Upsample(scale_factor=2) applies scaling to feature maps
What is the key idea of the concept '1. Scaling'?
Check that the concept material has been understood.
2. Rotation
2. Rotation
The rotation matrix for angle θ follows from the condition that basis vector i = (1,0) must map to (cos θ, sin θ). Vector j = (0,1) maps to (-sin θ, cos θ). The matrix is constructed automatically from these two conditions:
R(θ) = [cos θ -sin θ] [sin θ cos θ] EXAMPLES: 90°: R = [ 0 -1] (1,0) -> (0,1), (0,1) -> (-1,0) [ 1 0] 180°: R = [-1 0] (1,0) -> (-1,0), (0,1) -> (0,-1) [ 0 -1] 45°: R = [0.707 -0.707] cos45° = sin45° = 0.707 [0.707 0.707] Swift / C++ / Python - everywhere the same formula. EVERY ROTATION IN TORCHVISION OR OPENCV USES EXACTLY THIS.
**Data augmentation in ML**: torchvision.transforms.RandomRotation, RandomHorizontalFlip - all of these are matrix operations on pixels. Random rotations of +-30 degrees at each epoch make ResNet robust to object orientation. Without this the network fails to generalize to rotated examples.
What is the key idea of the concept '2. Rotation'?
Check that the concept material has been understood.
3. Reflection
3. Reflection
Reflection is a rotation by 180 degrees around an axis. The matrices are simple: just flip the sign of the target coordinate. `RandomHorizontalFlip` in PyTorch is a literal implementation of the reflection matrix across the Y-axis:
| Reflection | Matrix | Effect |
|---|---|---|
| Across Y-axis (left-right) | [-1, 0; 0, 1] | x -> -x, mirror |
| Across X-axis (top-bottom) | [1, 0; 0, -1] | y -> -y, vertical flip |
| Across diagonal y=x | [0, 1; 1, 0] | x <-> y, coordinate swap |
| Through the origin | [-1, 0; 0, -1] | Same as rotation by 180 degrees |
**RandomHorizontalFlip in PyTorch** is equivalent to multiplying by matrix [-1, 0; 0, 1] for every pixel. The simplest and most efficient augmentation - instantly doubles the effective dataset size for symmetric objects (cats, cars, faces).
What is the key idea of the concept '3. Reflection'?
Check that the concept material has been understood.
4. Shear
4. Shear
~Shear~{Shear - skewing, slanting} - one basis vector stays fixed while the other "slides" along the axis. A square becomes a parallelogram:
HORIZONTAL SHEAR (k = 0.7): [1 0.7] i=(1,0) -> (1,0) <- unchanged [0 1] j=(0,1) -> (0.7,1) <- shifted right Effect: j slides k units to the right (tilt to the right) VERTICAL SHEAR: [1 0] i=(1,0) -> (1, k) <- shifted upward [k 1] j=(0,1) -> (0, 1) <- unchanged DETERMINANT of shear = 1 (area unchanged) IN FONTS: Italic = horizontal shear of letters to the right css: font-style: italic <-> matrix: [1, 0.2, 0, 1] TORCHVISION AUGMENTATION: transforms.RandomAffine(degrees=0, shear=15) Used for training on slanted text (OCR)
What is the key idea of the concept '4. Shear'?
Check that the concept material has been understood.
5. Projection
5. Projection
Projection is a special case: the space **collapses** into a lower dimension. The determinant of such a matrix is zero - information is lost irreversibly. In ML, projection is everywhere - and it is always a deliberate loss:
PROJECTION ONTO X-AXIS: [1 0] (x, y) -> (x, 0) y-component destroyed [0 0] det = 0 - no inverse exists PROJECTION ONTO Y-AXIS: [0 0] (x, y) -> (0, y) x-component destroyed [0 1] det = 0 PROJECTION ONTO LINE y = x: [0.5 0.5] (x, y) -> ((x+y)/2, (x+y)/2) - onto the diagonal [0.5 0.5] det = 0 IN ML: PCA projection onto k principal components = matrix Vk · Vkᵀ Dropout = random zeroing mask (like projection, but stochastic) Attention softmax projects into the probability simplex
**PCA is an orthogonal projection**: data X is projected onto the principal component subspace V via X_low = X · V. This is a matrix multiplication - a linear transformation. Matrix V contains eigenvectors of the covariance matrix - they form an orthogonal basis for the new space.
What is the key idea of the concept '5. Projection'?
Check that the concept material has been understood.
6. Affine transformations: translation via homogeneous coordinates
6. Affine transformations: translation via homogeneous coordinates
Translation (shifting the origin) is **not** a linear transformation - it cannot be written as a 2×2 matrix. But by switching to **homogeneous coordinates** (appending a 1), translation becomes a 3×3 matrix. This is the standard in computer graphics and robotics:
HOMOGENEOUS COORDINATES: (x, y) -> (x, y, 1) TRANSLATION by vector (tx, ty): [1 0 tx] [x] [x + tx] [0 1 ty] · [y] = [y + ty] [0 0 1] [1] [ 1 ] ROTATION + TRANSLATION (affine transformation): [cos θ -sin θ tx] - rotate AND translate [sin θ cos θ ty] [ 0 0 1] IMPORTANT: matrix multiplication = composing multiple transformations in sequence. This is exactly how Model-View-Projection works in OpenGL, Vulkan, Three.js - a chain of 4×4 matrices.
What is the key idea of the concept '6. Affine transformations: translation via homogeneous coordinates'?
Check that the concept material has been understood.
Composing transformations: matrix multiplication
Composing transformations: matrix multiplication
Applying transformations in sequence is matrix multiplication. Scale first, then rotate = matrix (R · S). The order matters - matrix multiplication is **non-commutative**:
**The order of matrix multiplication is critical**: R @ S != S @ R. In robotics and 3D graphics, a wrong transformation order is one of the most common bugs. In PyTorch: `transforms.Compose([transforms.RandomRotation(30), transforms.RandomHorizontalFlip()])` applies operations left to right (rotation first, then flip).
What is the key idea of the concept 'Composing transformations: matrix multiplication'?
Check that the concept material has been understood.
Linear layer in a neural network = matrix multiplication
Linear layer in a neural network = matrix multiplication
The most important application of linear transformations in ML is **every Linear layer** in a neural network. `nn.Linear(in_features=4, out_features=3)` stores a weight matrix W of shape (3, 4) and bias b of shape (3). Forward pass: out = W·x + b.
**GPT-4 Transformer block**: each attention head computes Q = W_Q·X, K = W_K·X, V = W_V·X - three separate matrix multiplications. Then FF layers with W1 and W2. In GPT-4 with 96 layers and 96 heads - literally thousands of matrix multiplications per forward pass for a single message.
What is the key idea of the concept 'Linear layer in a neural network = matrix multiplication'?
Check that the concept material has been understood.
Summary table: what each transformation preserves
Summary table: what each transformation preserves
| Transformation | 2D Matrix | det | Preserves |
|---|---|---|---|
| Rotation | [cos theta, -sin theta; sin theta, cos theta] | 1 | Lengths, angles, area, orientation |
| Reflection | [-1, 0; 0, 1] | -1 | Lengths, angles, area (reverses orientation) |
| Uniform scale by k | [k, 0; 0, k] | k squared | Angles, proportions |
| Shear | [1, k; 0, 1] | 1 | Area, axis parallelism |
| Projection onto X | [1, 0; 0, 0] | 0 | Projection onto X (Y lost forever) |
**det is the key metric**: |det| shows by how much area changed. det = 0 means information was lost (no inverse). det < 0 means orientation flipped (left-handed system).
Linear transformations in real systems
From formula to production
| Component | Role | Details |
|---|---|---|
| PyTorch nn.Linear / nn.Conv2d | W·x + b - linear map of weights | Every layer in GPT, ResNet, BERT - a matrix multiplication |
| torchvision.transforms / OpenCV warpAffine | Affine matrices for image augmentation | RandomRotation, RandomFlip, RandomAffine - data augmentation during training |
| Three.js / Unity / Unreal Engine | Model-View-Projection chain of 4×4 matrices | Every 3D frame - three matrix multiplications for every vertex |
| ROS / Boston Dynamics / Tesla Autopilot | Coordinate frame transformations (TF frames) | Robot arm position = chain of 10+ affine matrices |
| PCA / SVD | Projection onto principal components | Orthogonal matrix V as basis of the new space |
What is the key idea of the concept 'Summary table: what each transformation preserves'?
Check that the concept material has been understood.
Practice: rotation matrix
Practice: rotation matrix
Interview questions
How can a transformation matrix be checked for whether it preserves area?
- det(A) is the area scaling factor (volume in 3D) - |det| = 1 - area preserved (rotation, reflection, shear) - |det| = k² - area changes by k² (scale by k) - det = 0 - area collapses to zero, matrix not invertible (projection) - det < 0 - orientation reversed (mirror)
Why is nn.Linear a linear transformation, and what does this mean for backpropagation?
- Linearity: L(ax + by) = aL(x) + bL(y) - holds for W·x + b only at b=0, otherwise it is affine - Gradient of the output with respect to W: dL/dW = (dL/dout) · xᵀ - the outer product - Linearity is what makes backprop analytically computable via the chain rule - Activations (ReLU, GELU) add nonlinearity - without them a stack of Linear layers collapses to one
Why does the order of matrix multiplications matter when composing transformations?
- Matrix multiplication is NOT commutative: A @ B != B @ A in general - R @ S applies S first (the right matrix acts on the vector first) - Scale then rotate = R @ S, rotate then scale = S @ R - different results - In 3D graphics order is: first Model (rotate object), then View (camera), then Projection (screen)
What is the key idea of the concept 'Practice: rotation matrix'?
Check that the concept material has been understood.
Takeaways from this lesson
- **Matrix = transformation**: the columns are images of basis vectors - everything follows from this
- **Rotation**: [cos theta, -sin theta; sin theta, cos theta] - lengths and area preserved, det = 1
- **Projection**: det = 0, information lost forever - the basis of PCA, dropout, average pooling
- **Composing**: R @ S means S is applied first - order is critical
- **nn.Linear = matrix multiplication**: W·x + b - every neural network layer is one matrix
- **Data augmentation**: RandomRotation, RandomFlip - rotation and reflection matrices applied to each pixel
- **Affine transformations**: translation + rotation + scale in homogeneous coordinates - the standard in 3D graphics and robotics
What comes next
Transformations are the visual foundation for deeper concepts
- Inverse matrix — Undoing a transformation: when it is possible to return to the original space
- Eigenvectors — Directions that the transformation does not rotate - only stretches
- SVD decomposition — Any matrix decomposes into rotation * scale * rotation - three transformations
- Gaussian elimination — Algorithm for finding the inverse transformation via elementary operations