Linear Algebra

Gallery of Transformations: matrices in action

Every frame in a video game is produced by applying dozens of matrix transformations to millions of vertices. Rotation, scaling, perspective projection - all are matrix multiplications. A GPU does this in parallel for every vertex in microseconds.

3D graphics: Model-View-Projection - three matrices determining the position of every pixel
Robotics: forward kinematics of a robot arm is a chain of rotation and translation matrices
Computer vision: coordinate transforms between camera reference frames
Medical imaging: aligning MRI volumes requires 3D affine transformations
Animation: skeletal animation applies a matrix to each bone

Gallery of Transformations: matrices in action

**Every `nn.Linear(in, out)` in PyTorch is a matrix multiplication.** GPT-4 does this hundreds of times for every single token. When torchvision.transforms rotates an image before training - that is a rotation matrix. When a game engine renderer maps a 3D scene to the screen - three matrix multiplications (Model, View, Projection). Matrices don't just store numbers - they transform space. This lesson covers concrete matrices for concrete operations, with code that actually runs.

**What this lesson is actually about**: not a list of formulas to memorize, but the understanding that *any* linear transformation is a matrix, and vice versa. After this lesson, `torchvision.transforms.RandomRotation(30)` or `cv2.warpAffine` stop being black boxes.

What is the key idea of the concept 'Gallery of Transformations: matrices in action'?

Check that the concept material has been understood.

How to read a transformation matrix

A 2×2 transformation matrix is fully determined by **where it sends the basis vectors** i = (1,0) and j = (0,1). The first column of the matrix is the image of i, the second is the image of j:

MATRIX A = [a b] [c d] i = (1, 0) ---> A·i = (a, c) <- first column j = (0, 1) ---> A·j = (b, d) <- second column ANY VECTOR v = (x, y) = x·i + y·j A·v = x·(a,c) + y·(b,d) = (ax+by, cx+dy) THIS ALWAYS HOLDS - for any matrix, any dimension. Knowing where the basis vectors go tells everything about the transformation.

**Reading ML code**: `nn.Linear(4, 3)` stores a weight matrix W of shape (3, 4). During forward pass: `out = W @ x` - x is mapped from a 4-dimensional space to a 3-dimensional one. The columns of W are the images of the unit vectors of the input space.

What is the key idea of the concept 'How to read a transformation matrix'?

Check that the concept material has been understood.

1. Scaling

A diagonal matrix with values s on the diagonal stretches or compresses space along each axis. Each number is an independent "slider" for its own axis:

UNIFORM (same along all axes, k = 2): [2 0] i=(1,0) -> (2,0) [0 2] j=(0,1) -> (0,2) <- all lengths doubled NON-UNIFORM (different per axis): [sₓ 0] stretch along X by sₓ [ 0 sᵧ] stretch along Y by sᵧ IN CNN LAYERS: Adaptive Average Pooling 2×2 is equivalent to a 0.25 * [[1,1],[1,1]] matrix (loosely - it is a projection, but the idea is the same) PYTORCH: nn.Upsample(scale_factor=2) applies scaling to feature maps

What is the key idea of the concept '1. Scaling'?

Check that the concept material has been understood.

2. Rotation

The rotation matrix for angle θ follows from the condition that basis vector i = (1,0) must map to (cos θ, sin θ). Vector j = (0,1) maps to (-sin θ, cos θ). The matrix is constructed automatically from these two conditions:

R(θ) = [cos θ -sin θ] [sin θ cos θ] EXAMPLES: 90°: R = [ 0 -1] (1,0) -> (0,1), (0,1) -> (-1,0) [ 1 0] 180°: R = [-1 0] (1,0) -> (-1,0), (0,1) -> (0,-1) [ 0 -1] 45°: R = [0.707 -0.707] cos45° = sin45° = 0.707 [0.707 0.707] Swift / C++ / Python - everywhere the same formula. EVERY ROTATION IN TORCHVISION OR OPENCV USES EXACTLY THIS.

**Data augmentation in ML**: torchvision.transforms.RandomRotation, RandomHorizontalFlip - all of these are matrix operations on pixels. Random rotations of +-30 degrees at each epoch make ResNet robust to object orientation. Without this the network fails to generalize to rotated examples.

What is the key idea of the concept '2. Rotation'?

Check that the concept material has been understood.

3. Reflection

Reflection is a rotation by 180 degrees around an axis. The matrices are simple: just flip the sign of the target coordinate. `RandomHorizontalFlip` in PyTorch is a literal implementation of the reflection matrix across the Y-axis:

Reflection	Matrix	Effect
Across Y-axis (left-right)	[-1, 0; 0, 1]	x -> -x, mirror
Across X-axis (top-bottom)	[1, 0; 0, -1]	y -> -y, vertical flip
Across diagonal y=x	[0, 1; 1, 0]	x <-> y, coordinate swap
Through the origin	[-1, 0; 0, -1]	Same as rotation by 180 degrees

**RandomHorizontalFlip in PyTorch** is equivalent to multiplying by matrix [-1, 0; 0, 1] for every pixel. The simplest and most efficient augmentation - instantly doubles the effective dataset size for symmetric objects (cats, cars, faces).

What is the key idea of the concept '3. Reflection'?

Check that the concept material has been understood.

4. Shear

~Shear~{Shear - skewing, slanting} - one basis vector stays fixed while the other "slides" along the axis. A square becomes a parallelogram:

HORIZONTAL SHEAR (k = 0.7): [1 0.7] i=(1,0) -> (1,0) <- unchanged [0 1] j=(0,1) -> (0.7,1) <- shifted right Effect: j slides k units to the right (tilt to the right) VERTICAL SHEAR: [1 0] i=(1,0) -> (1, k) <- shifted upward [k 1] j=(0,1) -> (0, 1) <- unchanged DETERMINANT of shear = 1 (area unchanged) IN FONTS: Italic = horizontal shear of letters to the right css: font-style: italic <-> matrix: [1, 0.2, 0, 1] TORCHVISION AUGMENTATION: transforms.RandomAffine(degrees=0, shear=15) Used for training on slanted text (OCR)

What is the key idea of the concept '4. Shear'?

Check that the concept material has been understood.

5. Projection

Projection is a special case: the space **collapses** into a lower dimension. The determinant of such a matrix is zero - information is lost irreversibly. In ML, projection is everywhere - and it is always a deliberate loss:

PROJECTION ONTO X-AXIS: [1 0] (x, y) -> (x, 0) y-component destroyed [0 0] det = 0 - no inverse exists PROJECTION ONTO Y-AXIS: [0 0] (x, y) -> (0, y) x-component destroyed [0 1] det = 0 PROJECTION ONTO LINE y = x: [0.5 0.5] (x, y) -> ((x+y)/2, (x+y)/2) - onto the diagonal [0.5 0.5] det = 0 IN ML: PCA projection onto k principal components = matrix Vk · Vkᵀ Dropout = random zeroing mask (like projection, but stochastic) Attention softmax projects into the probability simplex

**PCA is an orthogonal projection**: data X is projected onto the principal component subspace V via X_low = X · V. This is a matrix multiplication - a linear transformation. Matrix V contains eigenvectors of the covariance matrix - they form an orthogonal basis for the new space.

What is the key idea of the concept '5. Projection'?

Check that the concept material has been understood.

6. Affine transformations: translation via homogeneous coordinates

Translation (shifting the origin) is **not** a linear transformation - it cannot be written as a 2×2 matrix. But by switching to **homogeneous coordinates** (appending a 1), translation becomes a 3×3 matrix. This is the standard in computer graphics and robotics:

HOMOGENEOUS COORDINATES: (x, y) -> (x, y, 1) TRANSLATION by vector (tx, ty): [1 0 tx] [x] [x + tx] [0 1 ty] · [y] = [y + ty] [0 0 1] [1] [ 1 ] ROTATION + TRANSLATION (affine transformation): [cos θ -sin θ tx] - rotate AND translate [sin θ cos θ ty] [ 0 0 1] IMPORTANT: matrix multiplication = composing multiple transformations in sequence. This is exactly how Model-View-Projection works in OpenGL, Vulkan, Three.js - a chain of 4×4 matrices.

What is the key idea of the concept '6. Affine transformations: translation via homogeneous coordinates'?

Check that the concept material has been understood.

Composing transformations: matrix multiplication

Applying transformations in sequence is matrix multiplication. Scale first, then rotate = matrix (R · S). The order matters - matrix multiplication is **non-commutative**:

**The order of matrix multiplication is critical**: R @ S != S @ R. In robotics and 3D graphics, a wrong transformation order is one of the most common bugs. In PyTorch: `transforms.Compose([transforms.RandomRotation(30), transforms.RandomHorizontalFlip()])` applies operations left to right (rotation first, then flip).

What is the key idea of the concept 'Composing transformations: matrix multiplication'?

Check that the concept material has been understood.

Linear layer in a neural network = matrix multiplication

The most important application of linear transformations in ML is **every Linear layer** in a neural network. `nn.Linear(in_features=4, out_features=3)` stores a weight matrix W of shape (3, 4) and bias b of shape (3). Forward pass: out = W·x + b.

**GPT-4 Transformer block**: each attention head computes Q = W_Q·X, K = W_K·X, V = W_V·X - three separate matrix multiplications. Then FF layers with W1 and W2. In GPT-4 with 96 layers and 96 heads - literally thousands of matrix multiplications per forward pass for a single message.

What is the key idea of the concept 'Linear layer in a neural network = matrix multiplication'?

Check that the concept material has been understood.

Summary table: what each transformation preserves

Transformation	2D Matrix	det	Preserves
Rotation	[cos theta, -sin theta; sin theta, cos theta]	1	Lengths, angles, area, orientation
Reflection	[-1, 0; 0, 1]	-1	Lengths, angles, area (reverses orientation)
Uniform scale by k	[k, 0; 0, k]	k squared	Angles, proportions
Shear	[1, k; 0, 1]	1	Area, axis parallelism
Projection onto X	[1, 0; 0, 0]	0	Projection onto X (Y lost forever)

**det is the key metric**: |det| shows by how much area changed. det = 0 means information was lost (no inverse). det < 0 means orientation flipped (left-handed system).

Linear transformations in real systems

From formula to production

Component	Role	Details
PyTorch nn.Linear / nn.Conv2d	W·x + b - linear map of weights	Every layer in GPT, ResNet, BERT - a matrix multiplication
torchvision.transforms / OpenCV warpAffine	Affine matrices for image augmentation	RandomRotation, RandomFlip, RandomAffine - data augmentation during training
Three.js / Unity / Unreal Engine	Model-View-Projection chain of 4×4 matrices	Every 3D frame - three matrix multiplications for every vertex
ROS / Boston Dynamics / Tesla Autopilot	Coordinate frame transformations (TF frames)	Robot arm position = chain of 10+ affine matrices
PCA / SVD	Projection onto principal components	Orthogonal matrix V as basis of the new space

What is the key idea of the concept 'Summary table: what each transformation preserves'?

Check that the concept material has been understood.

Practice: rotation matrix

Interview questions

How can a transformation matrix be checked for whether it preserves area?

- det(A) is the area scaling factor (volume in 3D) - |det| = 1 - area preserved (rotation, reflection, shear) - |det| = k² - area changes by k² (scale by k) - det = 0 - area collapses to zero, matrix not invertible (projection) - det < 0 - orientation reversed (mirror)

Why is nn.Linear a linear transformation, and what does this mean for backpropagation?

- Linearity: L(ax + by) = aL(x) + bL(y) - holds for W·x + b only at b=0, otherwise it is affine - Gradient of the output with respect to W: dL/dW = (dL/dout) · xᵀ - the outer product - Linearity is what makes backprop analytically computable via the chain rule - Activations (ReLU, GELU) add nonlinearity - without them a stack of Linear layers collapses to one

Why does the order of matrix multiplications matter when composing transformations?

- Matrix multiplication is NOT commutative: A @ B != B @ A in general - R @ S applies S first (the right matrix acts on the vector first) - Scale then rotate = R @ S, rotate then scale = S @ R - different results - In 3D graphics order is: first Model (rotate object), then View (camera), then Projection (screen)

What is the key idea of the concept 'Practice: rotation matrix'?

Check that the concept material has been understood.

Takeaways from this lesson

**Matrix = transformation**: the columns are images of basis vectors - everything follows from this
**Rotation**: [cos theta, -sin theta; sin theta, cos theta] - lengths and area preserved, det = 1
**Projection**: det = 0, information lost forever - the basis of PCA, dropout, average pooling
**Composing**: R @ S means S is applied first - order is critical
**nn.Linear = matrix multiplication**: W·x + b - every neural network layer is one matrix
**Data augmentation**: RandomRotation, RandomFlip - rotation and reflection matrices applied to each pixel
**Affine transformations**: translation + rotation + scale in homogeneous coordinates - the standard in 3D graphics and robotics

What comes next

Transformations are the visual foundation for deeper concepts

Inverse matrix — Undoing a transformation: when it is possible to return to the original space
Eigenvectors — Directions that the transformation does not rotate - only stretches
SVD decomposition — Any matrix decomposes into rotation * scale * rotation - three transformations
Gaussian elimination — Algorithm for finding the inverse transformation via elementary operations

Связанные уроки

Linear Algebra

Gallery of Transformations: matrices in action

3D graphics: Model-View-Projection - three matrices determining the position of every pixel
Robotics: forward kinematics of a robot arm is a chain of rotation and translation matrices
Computer vision: coordinate transforms between camera reference frames
Medical imaging: aligning MRI volumes requires 3D affine transformations
Animation: skeletal animation applies a matrix to each bone

Gallery of Transformations: matrices in action

What is the key idea of the concept 'Gallery of Transformations: matrices in action'?

Check that the concept material has been understood.

How to read a transformation matrix

A 2×2 transformation matrix is fully determined by **where it sends the basis vectors** i = (1,0) and j = (0,1). The first column of the matrix is the image of i, the second is the image of j:

What is the key idea of the concept 'How to read a transformation matrix'?

Check that the concept material has been understood.

1. Scaling

A diagonal matrix with values s on the diagonal stretches or compresses space along each axis. Each number is an independent "slider" for its own axis:

What is the key idea of the concept '1. Scaling'?

Check that the concept material has been understood.

2. Rotation

What is the key idea of the concept '2. Rotation'?

Check that the concept material has been understood.

3. Reflection

Reflection	Matrix	Effect
Across Y-axis (left-right)	[-1, 0; 0, 1]	x -> -x, mirror
Across X-axis (top-bottom)	[1, 0; 0, -1]	y -> -y, vertical flip
Across diagonal y=x	[0, 1; 1, 0]	x <-> y, coordinate swap
Through the origin	[-1, 0; 0, -1]	Same as rotation by 180 degrees

What is the key idea of the concept '3. Reflection'?

Check that the concept material has been understood.

4. Shear

~Shear~{Shear - skewing, slanting} - one basis vector stays fixed while the other "slides" along the axis. A square becomes a parallelogram:

What is the key idea of the concept '4. Shear'?

Check that the concept material has been understood.

5. Projection

What is the key idea of the concept '5. Projection'?

Check that the concept material has been understood.

6. Affine transformations: translation via homogeneous coordinates

What is the key idea of the concept '6. Affine transformations: translation via homogeneous coordinates'?

Check that the concept material has been understood.

Composing transformations: matrix multiplication

Applying transformations in sequence is matrix multiplication. Scale first, then rotate = matrix (R · S). The order matters - matrix multiplication is **non-commutative**:

What is the key idea of the concept 'Composing transformations: matrix multiplication'?

Check that the concept material has been understood.

Linear layer in a neural network = matrix multiplication

What is the key idea of the concept 'Linear layer in a neural network = matrix multiplication'?

Check that the concept material has been understood.

Summary table: what each transformation preserves

Transformation	2D Matrix	det	Preserves
Rotation	[cos theta, -sin theta; sin theta, cos theta]	1	Lengths, angles, area, orientation
Reflection	[-1, 0; 0, 1]	-1	Lengths, angles, area (reverses orientation)
Uniform scale by k	[k, 0; 0, k]	k squared	Angles, proportions
Shear	[1, k; 0, 1]	1	Area, axis parallelism
Projection onto X	[1, 0; 0, 0]	0	Projection onto X (Y lost forever)

**det is the key metric**: |det| shows by how much area changed. det = 0 means information was lost (no inverse). det < 0 means orientation flipped (left-handed system).

Linear transformations in real systems

From formula to production

Component	Role	Details
PyTorch nn.Linear / nn.Conv2d	W·x + b - linear map of weights	Every layer in GPT, ResNet, BERT - a matrix multiplication
torchvision.transforms / OpenCV warpAffine	Affine matrices for image augmentation	RandomRotation, RandomFlip, RandomAffine - data augmentation during training
Three.js / Unity / Unreal Engine	Model-View-Projection chain of 4×4 matrices	Every 3D frame - three matrix multiplications for every vertex
ROS / Boston Dynamics / Tesla Autopilot	Coordinate frame transformations (TF frames)	Robot arm position = chain of 10+ affine matrices
PCA / SVD	Projection onto principal components	Orthogonal matrix V as basis of the new space

What is the key idea of the concept 'Summary table: what each transformation preserves'?

Check that the concept material has been understood.

Practice: rotation matrix

Interview questions

How can a transformation matrix be checked for whether it preserves area?

Why is nn.Linear a linear transformation, and what does this mean for backpropagation?

Why does the order of matrix multiplications matter when composing transformations?

What is the key idea of the concept 'Practice: rotation matrix'?

Check that the concept material has been understood.

Takeaways from this lesson

**Matrix = transformation**: the columns are images of basis vectors - everything follows from this
**Rotation**: [cos theta, -sin theta; sin theta, cos theta] - lengths and area preserved, det = 1
**Projection**: det = 0, information lost forever - the basis of PCA, dropout, average pooling
**Composing**: R @ S means S is applied first - order is critical
**nn.Linear = matrix multiplication**: W·x + b - every neural network layer is one matrix
**Data augmentation**: RandomRotation, RandomFlip - rotation and reflection matrices applied to each pixel
**Affine transformations**: translation + rotation + scale in homogeneous coordinates - the standard in 3D graphics and robotics

What comes next

Transformations are the visual foundation for deeper concepts

Inverse matrix — Undoing a transformation: when it is possible to return to the original space
Eigenvectors — Directions that the transformation does not rotate - only stretches
SVD decomposition — Any matrix decomposes into rotation * scale * rotation - three transformations
Gaussian elimination — Algorithm for finding the inverse transformation via elementary operations