Linear Algebra

Spectral Graph Theory

Google PageRank from 1998 finds the leading eigenvector of a 50-billion-page matrix. GNNs predicting protein interactions in AlphaFold2 multiply by the normalized graph Laplacian. A graph is a matrix; its spectrum is an X-ray of structure.

  • PageRank: the eigenvector of a transition matrix - power iteration on 50B pages
  • Graph Convolutional Network: layer H^(l+1) = sigma(D^(-1/2) tilde A D^(-1/2) H^(l) W) - multiply by the normalized A
  • Spectral clustering: the Fiedler vector partitions a graph - used in scRNA-seq cell biology
  • AlphaFold2: a residue interaction graph processed by attention over the graph
  • Fraud detection: anomalous nodes in a transaction graph via spectral analysis

**Google changed the web by finding the leading eigenvector of a 50-billion-page matrix.** Not the longest page, not the most visited - the eigenvector of the hyperlink matrix. PageRank 1998 is pure spectral graph theory. Today the same math underpins GNNs (Graph Neural Networks), which predict protein interactions, traffic on Google Maps, and fraudulent transactions.

**What this lesson is about**: a graph is a matrix. Its eigenvectors encode graph structure: clusters, bottlenecks, central nodes. Spectral theory is the linear-algebra X-ray of a graph.

Laplacian matrix: L = D - A

For a graph G = (V, E) we build three matrices. **Adjacency matrix A**: A[i][j] = 1 if (i,j) is an edge. **Degree matrix D**: diagonal, D[i][i] = degree of vertex i. **Laplacian L = D - A** - the difference of the two. Key property: for any vector x we have **x^T L x = sum(x_i - x_j)^2** over all edges - the sum of squared differences of values at the endpoints.

**Number of zero eigenvalues = number of connected components**. For a connected graph lambda_0 = 0 is unique. For a graph of two disconnected parts lambda_0 = lambda_1 = 0. That is an algorithm for testing connectivity through the spectrum.

0

1

Sign In

Fiedler vector: graph cut via algebra

**The second smallest eigenvalue lambda_1** is the algebraic connectivity (the Fiedler number). It is strictly positive if and only if the graph is connected. The corresponding eigenvector **v_1** (the Fiedler vector) carries the full information about a natural partition of the graph: vertices with positive entries and vertices with negative entries form two natural clusters.

A small lambda_1 means the graph has a **bottleneck** - a small cut splitting it into large pieces. Cheeger's theorem ties lambda_1 to the isoperimetric number h(G): lambda_1/2 <= h(G) <= sqrt(2 * lambda_1). It is the linear-algebra version of the question 'how easy is it to split this graph'.

Spectral clustering: k-means in eigenvector space

Spectral clustering finds k clusters where k-means cannot, in cases of non-linear, interleaved clusters (two rings, moons, spirals). The algorithm: take the k smallest nonzero eigenvectors of the normalized Laplacian L_sym = D^(-1/2) L D^(-1/2), stack them in a matrix U, apply k-means to the rows of U. Each vertex is now a point in R^k, where geometry reflects connectivity.

GCN: convolution as multiplication by the normalized Laplacian

Graph Convolutional Networks (Kipf & Welling, 2016) compute new features at every vertex as a weighted sum of neighbor features. The operation is **H_new = A_hat H W**, with A_hat = D^(-1/2) (A + I) D^(-1/2) - the normalized adjacency matrix with self-loops added. Each GCN layer is one matrix multiplication plus a nonlinearity. Depth equals the receptive-field radius on the graph.

**GCN is normalized averaging**. A_hat H takes the average of neighbor features at every vertex (accounting for degrees). It is the graph analog of BatchNorm: each vertex receives context from its local neighborhood. Stacking L layers reaches the L-step neighborhood.

PageRank: eigenvector of the transition matrix

PageRank is the stationary distribution of a random walk on the hyperlink graph. The transition matrix P: P[j][i] = A[j][i] / deg(i) - columns are stochastic. The PageRank vector pi is the **left eigenvector of P for lambda = 1**: pi = P pi (or pi^T P = pi^T). In practice it is computed by **power iteration** with teleport: pi_{t+1} = alpha * P * pi_t + (1-alpha)/n. The teleport (typically alpha = 0.85) guarantees a unique stationary distribution.

Practice: spectral clustering

Graph as a matrix: the Laplacian and its spectrum

A graph $G = (V, E)$ is given by an adjacency matrix $A$ ($A_{ij}=1$ if $(i,j) \in E$) and a degree matrix $D = \text{diag}(d_1, \ldots, d_n)$. The graph Laplacian: $L = D - A$ (or $\mathcal{L} = D^{-1/2}LD^{-1/2}$ for the normalized version). The eigenvectors of $L$ encode the graph's structure.

**Key spectral properties of $L$:** - $L$ is symmetric and positive semidefinite ($x^\top Lx = \sum_{(i,j)\in E}(x_i - x_j)^2 \geq 0$) - $\lambda_1 = 0$ always; eigenvector $(1/\sqrt{n}, \ldots, 1/\sqrt{n})$ - **Connectivity**: $\lambda_2 > 0$ if and only if the graph is connected - **Fiedler vector** $v_2$ (the $\lambda_2$-vector) splits the graph into two clusters

Spectral clustering: two clusters from one SVD

sklearn in three lines

Similarity graph (Gaussian kernel of distances). Compute $L$, take the Fiedler vector $v_2$, partition by sign: $C_1 = \{i: v_2[i] > 0\}$, $C_2 = \{i: v_2[i] \leq 0\}$. For $k$ clusters: the first $k$ eigenvectors of $L$, then k-means in that space. Used in image processing (normalized cuts) and genomics (cell clusters in scRNA-seq).

The second Laplacian eigenvalue $\lambda_2$ is called the algebraic connectivity of the graph. What does $\lambda_2 = 0$ mean?

The multiplicity of the eigenvalue $\lambda = 0$ of $L$ equals the number of connected components. If $\lambda_2 = 0$, the graph has at least two components. If $\lambda_2 > 0$, the graph is connected, and $\lambda_2$ measures how well connected (bottleneck strength).

PageRank and Graph Neural Networks through the spectrum

PageRank is the leading eigenvector of the transition matrix $M$: $r = Mr$ ($\lambda = 1$). With a damping factor: $r = dMr + (1-d)/n \cdot \mathbf{1}$. Power iteration converges in $O(\log(1/\varepsilon) / \log(1/d))$ steps. The spectral gap $1 - d\lambda_2(M)$ controls the convergence rate.

**Graph Convolutional Networks (GCN)**: a GCN layer multiplies the features $H$ by the normalized Laplacian: $H^{(l+1)} = \sigma(\tilde{D}^{-1/2}\tilde{A}\tilde{D}^{-1/2} H^{(l)} W^{(l)})$, with $\tilde{A} = A + I$. That is a weighted average of neighbor features. The matrix $\tilde{D}^{-1/2}\tilde{A}\tilde{D}^{-1/2}$ is a filter in the graph's spectral space (the low-frequency components).

Predicting protein interactions with a GNN

STRING database, 2023

Protein-protein graph: 20K vertices (proteins), 1M edges (interactions). A GCN aggregates neighbor features (sequence, structure) through $\tilde{D}^{-1/2}\tilde{A}\tilde{D}^{-1/2}$. Task: predict the function of an unknown protein from its graph neighbors. AlphaFold2 uses a related architecture for structure prediction.

In a Graph Convolutional Network (GCN) a layer aggregates information from neighbors via $\hat{A} = \tilde{D}^{-1/2}\tilde{A}\tilde{D}^{-1/2}$. What does this operation do geometrically?

$(\hat{A}H)_i = \sum_{j \in \mathcal{N}(i) \cup \{i\}} \frac{1}{\sqrt{d_i d_j}} H_j$ is a weighted sum of neighbor features normalized by degree. Adding $I$ to $A$ includes the node itself in the aggregation. It is the analog of convolution, on a graph.

x^T L x = x^T (D - A) x = x^T D x - x^T A x = sum_i d_i x_i^2 - sum_{(i,j) in E} 2 x_i x_j = sum_{(i,j) in E} (x_i - x_j)^2 Interpretation: - x - a signal on graph vertices (temperature, opinion, activation) - x^T L x = 'smoothness' of the signal: small if neighbors are similar - L - symmetric PSD matrix: lambda_i >= 0 for all i - lambda_0 = 0 always (the all-ones vector is in the kernel: row sums = 0)

  • **PageRank**: left eigenvector of the transition matrix
  • **GCN (Kipf & Welling 2016)**: H_new = A_hat H W (smoothing on a graph)
  • **Spectral clustering**: k-means in Fiedler-vector space
  • **Graph Fourier Transform**: basis of Laplacian eigenvectors
  • **Community detection**: multi-cut spectral clustering

Упражнения

  1. Why is lambda_0 of the Laplacian always zero? — Every row of L sums to 0: row sum = deg(i) - deg(i) = 0; Hence L * 1 = 0, so the all-ones vector 1 = (1, 1, ..., 1) is an eigenvector for lambda = 0; The number of zero eigenvalues equals the number of connected components (a theorem); For a disconnected graph with k parts the first k eigenvalues are zero
  2. How does a GCN differ from an ordinary MLP in linear-algebra terms? — MLP: y = W * x - the weight matrix W is independent of the data structure; GCN: y = A_hat * H * W - adds a multiplication by A_hat (the graph structure); A_hat averages features over neighbors: each vertex sees its neighborhood; This is an inductive bias: vertices similar by connections should be similar by features

Graph as a matrix, structure as a spectrum

  • Laplacian L = D - A; symmetric PSD; x^T L x = sum(x_i - x_j)^2 over edges
  • lambda_0 = 0 always; number of zeros = number of components; lambda_1 > 0 iff the graph is connected
  • Fiedler vector v_1: sign of a component = cluster; small lambda_1 = bottleneck
  • Spectral clustering: k eigenvectors of L_sym -> k-means in R^k
  • GCN: H_new = ReLU(A_hat H W); A_hat = D^(-1/2)(A+I)D^(-1/2); L layers = L-step neighborhood
  • PageRank: pi = alpha P pi + (1-alpha)/n; power iteration; teleport ensures ergodicity
  • Graph Fourier Transform: eigenvectors of L form an orthogonal basis of signals on the graph

Related topics

The final section closes the course with advanced interview problems that combine the whole curriculum: SVD, eigenvalues, conditioning, numerical stability.

  • Final interview problems — Spectral methods are a common subject in deep-understanding linear-algebra questions
  • Linear algebra in deep learning — GNNs use the normalized Laplacian as an aggregation operator - a direct link
  • Eigenvectors and eigenvalues — Spectral graph theory is built entirely on eigenvectors of symmetric matrices

Связанные уроки

  • alg-12-bfs
Spectral Graph Theory