Algorithms
Heap Sort: O(n log n) Guaranteed
Цели урока
- Understand Heap structure and heapify operation
- Explain why build-heap is O(n) and not O(n log n)
- Implement Heap Sort
- Understand the trade-off: guarantee vs practical speed
Предварительные знания
- Heap (data structure)
- O(n log n) sorts
Quick Sort is fast but can degrade to O(n²). Merge Sort is always O(n log n) but needs O(n) memory. Heap Sort is the only sort that simultaneously guarantees O(n log n) and works in-place. So why does almost nobody use it directly?
- **C++ std::sort (IntroSort)** - switches to Heap Sort on deep Quick Sort recursion
- **Embedded systems** - predictable timing + minimal memory, no surprises
- **Priority Queue** - the data structure underlying Heap Sort
- **Interviews** - a classic that tests heap understanding
History of Heap Sort
Heap Sort was invented by J. W. J. Williams in 1964, along with the Heap data structure itself. That same year, Robert Floyd (yes, of Floyd-Warshall fame) improved the algorithm by proposing build-heap in O(n) instead of O(n log n). This made the algorithm practically useful.
Reminder: What Is a Heap
A corporate org chart: the CEO at the top, reports below, their reports below that. A **Heap** is exactly this kind of structure, with one rule: every 'manager' is greater than or equal to their 'subordinates'.
- **Max-Heap:** parent ≥ children → root = **maximum** of the whole tree
- **Min-Heap:** parent ≤ children → root = **minimum** of the whole tree
A heap is stored in a plain array - no pointers needed. Parent and child positions are computed with simple formulas:
**Key operations:** insert O(log n), extract-max O(log n), build-heap O(n). Heap Sort uses exactly build-heap and extract-max.
In a max-heap, the element at index 3 has a parent at index:
Heapify: Restoring the Heap Property
A new element arrives at the root without going through any hiring process - less qualified than its subordinates. **Heapify** fixes this: it 'sinks' the rule-breaker downward until it lands in the right spot.
**Heapify (sift-down)** takes a node and assumes both subtrees are already valid heaps. It compares the node with its children and, if the heap property is violated, swaps with the largest child and repeats deeper.
**Heapify complexity:** O(log n) - in the worst case, the element sinks from root to leaf (tree height = log n).
Heapify is called for node 50 in a tree where both children < 50. How many swaps?
Build Heap in O(n)
How to turn an arbitrary array into a heap? The naive idea - insert elements one by one at O(log n) each, total O(n log n). But there's a better way: **build a heap in O(n)** by applying heapify bottom-up.
Leaves (the bottom half of the array) are already heaps of one element - nothing to do. We start from the last non-leaf and work up to the root, calling heapify at each node:
**Why O(n) and not O(n log n)?** Intuition: most nodes are leaves or near-leaves. They sit low in the tree, so heapify on them finishes quickly. The expensive operations (heapify from the root) are applied to just a handful of nodes.
**Intuition:** think of a pyramid. The bottom row is the widest, but there's nothing to do there (leaves). The top node needs the most work, but there's only one of it. Together, this adds up to O(n).
In an array of 15 elements, how many nodes are NOT leaves?
Heap Sort Algorithm
Knowing build-heap and heapify, the sorting algorithm is simple: the root of a max-heap always holds the maximum. So to sort - just extract the maximum n times and place it at the end of the array.
- **Build max-heap** from the array - O(n)
- **n-1 times:** swap root (max) with last element, shrink heap size, heapify root - O(log n)
After build-heap, where is the maximum element?
Analysis: Strengths and Weaknesses
Heap Sort has a unique combination of properties: guaranteed O(n log n) in all cases **and** O(1) extra memory. Neither Quick Sort nor Merge Sort can claim both at the same time.
| Characteristic | Heap Sort | Quick Sort | Merge Sort |
|---|---|---|---|
| Best case | O(n log n) | O(n log n) | O(n log n) |
| Worst case | O(n log n) | O(n²) ❌ | O(n log n) |
| Space | O(1) ✓ | O(log n) | O(n) ❌ |
| Stability | No | No | Yes |
So why is Heap Sort almost never used directly? The problem is **cache locality**. Quick Sort works with adjacent elements (great for the cache), while Heap Sort constantly jumps between parent and child nodes all across the array.
**IntroSort** (used by C++ `std::sort`) is a smart hybrid: it starts as Quick Sort, but if recursion goes too deep (a sign of a bad pivot), it switches to Heap Sort. This gives Quick Sort speed in practice + O(n log n) guarantee in the worst case.
Heap Sort is slower than Quick Sort in practice because of:
Heap Sort
- Build max-heap from array - O(n), using bottom-up heapify
- Extract max n times: swap root with tail + heapify - O(n log n)
- Guaranteed O(n log n) in all cases
- O(1) extra memory - better than Merge Sort
- Slower in practice than Quick Sort due to cache misses
- Used in IntroSort (C++ sort) as a fallback
Related Topics
Heap is a powerful data structure with many applications beyond sorting.
- Heap: data structure — Foundation of Heap Sort
- Priority Queue — Main application of Heap
- Quick Sort — Faster in practice, but no guarantees
Вопросы для размышления
- In what scenarios is heap sort preferable to quicksort despite its worse constant factor?
- How does the max-heap property enable efficient algorithms for finding the k largest elements?
- Why does in-place sorting matter in memory-constrained systems?
Связанные уроки
- ds-14-heaps-intro — Heap Sort implements build-heap + heapify - without understanding the heap structure, the algorithm makes no sense
- ds-15-heap-applications — Priority Queue is the main application of the heap, and Heap Sort demonstrates that you can extract elements in order
- alg-06-quick-sort — Quick Sort is faster in practice thanks to cache locality, Heap Sort guarantees O(n log n) - a classic trade-off between speed and predictability
- alg-05-merge-sort — Merge Sort also guarantees O(n log n), but requires O(n) memory - Heap Sort wins under tight memory constraints
- alg-14-dijkstra — Dijkstra's algorithm uses a min-heap to efficiently extract the closest vertex - the same operation used in Heap Sort