Discrete Mathematics
Graph Theory Fundamentals
Graphs are not a textbook abstraction. Git history is a DAG of commits. npm is a dependency graph. The internet is a graph of web pages. Social networks, maps, compilers, databases: all of these reduce to graph problems.
- **npm/pip:** when installing a package, the manager topologically sorts the dependency graph to install everything in the correct order
- **Google Maps:** BFS or Dijkstra on a weighted road graph builds the optimal route
- **Git:** commit history is a DAG, and `git log --graph` visualizes it
- **LinkedIn:** 'degrees of separation' is BFS on the professional connection graph
Предварительные знания
What a Graph Is
A graph G = (V, E) is a pair: a set of vertices V and a set of edges E, where each edge connects two vertices. Graphs are a universal language for describing relationships: road networks, module dependencies, social connections, and anything that can be modeled as "things and links between them."
**Undirected graph:** edge (u, v) = (v, u), the relationship is symmetric (friendship on a social network). **Directed graph (digraph):** edge (u → v) ≠ (v → u), the relationship is one-way (package imports, subscriptions, hyperlinks).
| Graph type | Edges | Example |
|---|---|---|
| Undirected | (u, v) = (v, u) | Road network |
| Directed | (u → v) ≠ (v → u) | Package dependencies |
| Weighted | edge carries weight w | Distances between cities |
| Multigraph | multiple edges between a pair | Flight routes between airports |
A graph can be stored in three ways: **adjacency matrix** (O(V²) space, O(1) edge lookup), **adjacency list** (O(V+E) space, efficient for sparse graphs), or **edge list** (compact, convenient for algorithms like Kruskal's). The right choice depends on the graph's density.
Real-world graphs are almost always **sparse** (E << V²): each social network user has hundreds of friends, not billions. Adjacency list is the default choice.
A Python module import system (A imports B, B imports C) needs to be modeled as a graph. Which type fits best?
Vertex Degree and Adjacency
The degree of a vertex deg(v) is the number of edges incident to v. In directed graphs, distinguish in-degree (incoming edges) from out-degree (outgoing edges). Degree is a basic characteristic: high degree means a hub in the network.
**Handshaking lemma:** the sum of all vertex degrees equals twice the number of edges: Σ deg(v) = 2|E|. Consequence: the number of vertices with odd degree is always even.
The adjacency matrix A[i][j] = 1 if there is an edge between vertices i and j, 0 otherwise. For weighted graphs, A[i][j] = edge weight. The k-th power of A gives the number of paths of length k between all pairs of vertices, a powerful analytical tool.
Google's PageRank is based on in-degree: a page is 'important' if edges to it come from other important pages. This generalizes simple degree counting into a recursive importance measure.
An undirected graph has 6 vertices and 9 edges. What is the sum of all vertex degrees?
Traversals: DFS and BFS
Graph traversal means systematically visiting all reachable vertices. Two fundamental methods: **DFS** (depth-first search) goes as deep as possible before backtracking. **BFS** (breadth-first search) explores all vertices at the current distance before moving further.
| Criterion | DFS | BFS |
|---|---|---|
| Data structure | Stack (recursion) | Queue (deque) |
| Shortest path | No | Yes (by edge count) |
| Cycle detection | Yes | Yes |
| Topological sort | Yes | No |
| Time complexity | O(V + E) | O(V + E) |
DFS on a large graph can cause **stack overflow**: recursion depth reaches V. Use iterative DFS with an explicit stack for production code on graphs with millions of vertices.
BFS is the natural choice for 'find the nearest' problems: shortest path in a maze, minimum hops between network nodes, degree of separation on a social graph. DFS is better for topological sorting, cycle detection, and exhaustive path enumeration.
The task is to find the shortest path between two users in a social network (minimum intermediate connections). Which algorithm fits?
Connected Components
A graph is **connected** if there is a path between every pair of vertices. If the graph is disconnected, it splits into **connected components**: maximal connected subgraphs. Finding components is the first step in analyzing any real-world graph.
**Strong connectivity** in directed graphs: a strongly connected component (SCC) is a maximal set of vertices where every vertex is reachable from every other. Kosaraju's algorithm finds all SCCs in O(V+E) using two DFS passes.
Union-Find (Disjoint Set Union, DSU) is an efficient data structure for tracking components dynamically. It supports 'merge two components' and 'are these two vertices in the same component?' in amortized O(α(V)) ≈ O(1) with path compression and union by rank.
When analyzing legacy codebases, connected components of the dependency graph reveal islands: isolated modules with no connections to the rest. Multiple components often indicate dead code or unused libraries.
A graph with 8 vertices has edges: {1-2, 2-3, 4-5, 6-7, 7-8}. Vertex 8 is not mentioned otherwise. How many connected components does it have?
Key Ideas
- **G = (V, E):** a graph is a set of vertices and edges; directed or undirected depending on the problem
- **Adjacency list:** the standard representation for sparse graphs, O(V+E) space
- **Handshaking lemma:** Σ deg(v) = 2|E|, a fundamental identity about vertex degrees
- **BFS:** shortest path by edge count, explores level by level
- **DFS:** topological sort, cycle detection, exhaustive path search
- **Connected components:** find isolated islands in O(V+E) via DFS or BFS
Related Topics
Graph theory is the foundation for more advanced algorithms and data structures.
- Trees and Their Properties — A tree is a connected acyclic graph, a special case of a general graph
- Planarity and Graph Coloring — Structural properties of graphs embedded in the plane and vertex coloring
- Network Flows — Maximum flow in directed weighted graphs builds directly on traversal ideas
Вопросы для размышления
- Which representation (matrix or adjacency list) is more efficient for a graph with 10,000 vertices and 15,000 edges, and why?
- How is a cycle detected in a directed dependency graph, and what does its presence imply in practice?
- If BFS and DFS have the same complexity O(V+E), why does the choice between them still matter?