Formal Languages
Deterministic Finite Automata (DFA)
Each keystroke of a password - an asterisk appears, and after the right count the system decides: "let in" or "deny". That is a finite automaton - a minimal model that reads input symbol by symbol and makes a decision. Behind this simplicity lies the mechanism powering compiler lexers, network filters, and regular expression engines.
- **Lexical analysis** - the GCC compiler and Python interpreter use DFA to tokenize source code (numbers, keywords, operators)
- **Network filters** - Snort, Suricata scan traffic with DFA engines at gigabit-per-second speeds
- **Hardware controllers** - traffic lights, elevators, and vending machines implement their logic using finite automata
Предварительные знания
Automaton as a Graph
**Finite automata appear everywhere.** A subway turnstile is a finite automaton. A coffee vending machine is a finite automaton. A traffic light is one too.
Each has **states** (locked / unlocked) and **transitions** (insert card → unlocked, pass through → locked). The key idea: the automaton is in EXACTLY ONE state at any moment and changes it according to a precise rule.
**States** are graph nodes (circles). **Transitions** are edges (labeled arrows). One node is marked as the **start state** (→), some as **accepting states** (double circle).
In formal language terms: the automaton **reads a string** symbol by symbol. It starts in the start state and makes a transition on each symbol. If after reading the ENTIRE string it ends up in an accepting state - the string is **accepted**. Otherwise - **rejected**.
Verify: string "101". Start at q0 → read '1' → q1 → read '0' → q0 → read '1' → q1. Ended in q1 (accepting) → string accepted!
String "110" is fed to a DFA that accepts strings ending in '1'. What is the result?
Formal Definition of DFA
The intuition is there - time to nail down the definition. **A DFA is a 5-tuple** (a tuple of five elements), where each element describes one facet of the automaton.
**DFA = (Q, Σ, δ, q₀, F)** - a deterministic finite automaton is fully specified by five components.
| Symbol | Name | Description | Example |
|---|---|---|---|
| Q | Set of states | Finite set of all states | {q0, q1, q2} |
| Σ | Alphabet | Finite set of input symbols | {0, 1} |
| δ | Transition function | δ: Q × Σ → Q - where to go from a state on a symbol | δ(q0, 1) = q1 |
| q₀ | Initial state | A single state from Q where we begin | q0 |
| F | Set of accepting states | F ⊆ Q - the subset of "good" states | {q1} |
**The key word is deterministic.** For every (state, symbol) pair there is EXACTLY ONE transition. Not zero, not two - one. The automaton never "hesitates" about where to go.
Let's write out the complete transition table for the automaton "strings ending in 1":
| State | Symbol 0 | Symbol 1 |
|---|---|---|
| → q0 | q0 | q1 |
| * q1 | q0 | q1 |
In the transition table, **→** marks the start state, ***** marks an accepting state. Each cell contains exactly one state - that is determinism.
What does "deterministic" mean in the name DFA?
Building DFA: Three Examples
Theory without practice is dead. Let's build three DFAs with increasing complexity - from counting symbols to binary arithmetic.
Example 1: Even Number of Zeros
**Problem:** accept all strings from {0, 1}* where the number of '0' symbols is even (0, 2, 4, ...). The empty string also qualifies (0 is even).
**Idea:** we need to track the parity of the zero count. That's exactly two states: "currently even" and "currently odd". The symbol '1' does not change the counter.
| State | Symbol 0 | Symbol 1 |
|---|---|---|
| → * EVEN | ODD | EVEN |
| ODD | EVEN | ODD |
Example 2: Strings Ending in "01"
**Problem:** accept all strings from {0, 1}* that end with the substring "01".
**Idea:** we need to remember which suffix of pattern "01" we have already seen. Three states: nothing matched (q0), last symbol was '0' (q1), we saw "01" at the end (q2).
Check: string "1001". q0→1→q0→0→q1→0→q1→1→q2 ✓ Accepted! String "100": q0→1→q0→0→q1→0→q1 - not in an accepting state ✗
Example 3: Binary Numbers Divisible by 3
**Problem:** a string from {0, 1}* represents a binary number. Accept if the number is divisible by 3.
**Key idea:** track the remainder when dividing by 3. Reading a new bit doubles the number and adds the bit: `n' = 2n + bit`. Remainder: `r' = (2r + bit) mod 3`.
| State (remainder) | Bit 0: (2r+0) mod 3 | Bit 1: (2r+1) mod 3 |
|---|---|---|
| r0 (remainder 0) | r0 - (0·2+0)%3=0 | r1 - (0·2+1)%3=1 |
| r1 (remainder 1) | r2 - (1·2+0)%3=2 | r0 - (1·2+1)%3=0 |
| r2 (remainder 2) | r1 - (2·2+0)%3=1 | r2 - (2·2+1)%3=2 |
This technique works for division by ANY number n: the DFA needs n states (remainders 0..n-1), transitions via the formula (2r + bit) mod n. DFA for divisibility by 5? Exactly 5 states!
How many states does a DFA need to check whether a binary number is divisible by 7?
DFA Implementation in Python
The formal model is great, but we are programmers. Let's implement DFA as a Python class and watch the automaton process a string **step by step**.
Let's create our automaton for an even number of zeros and test it:
The most useful part - **tracing**. Watch what happens inside the automaton:
**DFA runtime is O(n)**, where n is the length of the input string. One pass, one transition per symbol, no backtracking. This makes DFA an ideal model for streaming scanning (compiler lexers, network filters).
McCulloch and Pitts: Neurons as Automata
In 1943, neurophysiologist Warren McCulloch and mathematician Walter Pitts proposed a model of neural networks that effectively described finite automata. Their work inspired Stephen Kleene, who in 1956 formalized the connection between automata and regular expressions - one of the fundamental theorems of computer science.
The McCulloch-Pitts model became the ancestor of both finite automata and neural networks - the two pillars of modern Computer Science.
DFA must have a transition for every alphabet symbol from every state, otherwise it is not a DFA
Formally - yes, function δ must be total. But in practice, missing transitions lead to an implicit dead state from which there is no path to an accepting state
The dead state is often omitted from diagrams for compactness, but it is implied. This does not violate determinism - the transition exists, it just leads to "nowhere"
What is the time complexity of a DFA for a string of length n?
Key Ideas
- **DFA = (Q, Σ, δ, q₀, F)** - five components fully specify the automaton
- **Determinism** - for every (state, symbol) pair exactly one transition, no ambiguity
- **O(n) runtime** - DFA reads the string in one pass, one transition per symbol
- **States encode "memory"** - DFA remembers only which state it is in, not the computation history
Connection to Other Topics
DFA is the central object of formal language theory, connected to a whole set of topics:
- Regular expressions — Every regular expression can be converted to an equivalent DFA (Kleene's theorem)
- NFA - nondeterministic automata — Next step: NFA allows multiple transitions but recognizes the same languages
- DFA Minimization — For every regular language there exists a unique minimal DFA
Вопросы для размышления
- DFA remembers only the current state - a finite amount of memory. What problems are fundamentally IMPOSSIBLE to solve with bounded memory? (Hint: try building a DFA for the language aⁿbⁿ)
- If a DFA for divisibility by n requires n states, how many states are needed to check divisibility by n AND by m simultaneously?
- A compiler uses DFA for lexical analysis. Why DFA specifically, and not a more powerful model?
Связанные уроки
- fl-05-regex — Regular expressions describe the same class as DFAs
- fl-07-nfa — NFA is a relaxed form of DFA - contrast highlights determinism
- fl-08-nfa-to-dfa — Subset construction converts NFA to equivalent DFA
- fl-10-dfa-minimization — Minimization reduces DFA to canonical smallest form
- comp-06-lexer-basics — Lexer implementation is a DFA running on character input