Algorithms

LCS: Longest Common Subsequence

Цели урока

  • Understand the difference between subsequence and substring
  • Derive the recursive LCS formula
  • Implement the O(mn) DP solution
  • Optimize memory to O(n)
  • Reconstruct the actual LCS, not just its length

Предварительные знания

  • Dynamic Programming
  • Recursion
  • Dynamic Programming

Every time you do git diff, an LCS-based algorithm is working. Bioinformaticians use LCS to compare genomes. This is one of the most practical DP algorithms.

  • git diff, patch - version comparison
  • BLAST - similar gene search
  • Plagiarism detection - text comparison
  • IDE - smart diff and merge

History of LCS

The LCS problem was formulated in the 1970s. Hirschberg's algorithm (1975) allows reconstructing LCS in O(n) space. Myers' algorithm (1986) is used in git for efficient diff.

What is a subsequence?

**Subsequence** - characters in the same order, but not necessarily contiguous. **Substring** - contiguous characters.

**Don't confuse!** LCS ≠ Longest Common Substring. Substring requires contiguity, Subsequence does not.

Is "ACE" a subsequence of "ABCDE"?

Recursive solution

**Recursive formula:** compare the last characters.

**O(2^(m+n))** - exponential! For strings of length 20 - millions of calls.

X="AC", Y="BC". Last characters match (C=C). What next?

DP solution: table

**Store subproblem results in a table.** dp[i][j] = LCS for X[0..i-1] and Y[0..j-1].

**Reconstructing the actual subsequence:**

X = "ABC", Y = "AC". LCS length?

Space optimization

**Computing dp[i] only needs dp[i-1].** Store two rows instead of the full table.

**Trade-off:** with memory optimization, we lose the ability to reconstruct the actual LCS (only the length).

Why is it enough to store only two rows of the table?

Applications of LCS

**LCS is a fundamental algorithm with many applications.**

  • **diff/patch** - file comparison (git diff)
  • **Bioinformatics** - DNA/protein alignment
  • **Plagiarism** - detecting similar texts
  • **Autocomplete** - spell checking
  • **Edit Distance** - minimum changes (next lesson)

**Connection to Edit Distance:**

**LeetCode:** #1143 (LCS), #583 (Delete Operation for Two Strings), #712 (Minimum ASCII Delete Sum)

How does git diff determine changed lines?

LCS

  • Subsequence preserves order, but not contiguity
  • Recursion: match → +1 and both back; no match → max of two branches
  • DP table: O(mn) time and space
  • Optimization: O(n) space, but lose reconstruction
  • Foundation for diff, bioinformatics, edit distance

Related Topics

LCS is part of the family of sequence problems.

  • Edit Distance — Extension with replacements
  • LIS — Similar DP structure
  • DP Basics — Base paradigm

Связанные уроки

  • alg-21-dp — LCS is a canonical 2D dynamic programming table
  • alg-31-edit-distance — LCS leads directly into edit distance
  • alg-29-lis — LIS reduces to LCS of an array and its sorted copy
  • dm-22 — Sequence alignment in bioinformatics is an LCS variant
  • ml-36-seq2seq
LCS: Longest Common Subsequence

0

1

Sign In