Calculus
The Concept of a Limit
Цели урока
- Understand the intuitive meaning of a function limit
- Master the rigorous ε-δ definition of a limit
- Distinguish one-sided limits
- Compute limits at infinity
- Recognize types of indeterminate forms
Предварительные знания
- Sequences and their limits
- The concept of a function and its graph
- Absolute value $|x|$
Here is a provocation: the limit of $f$ at point $a$ has nothing to do with the value $f(a)$. Nothing at all. The function may be undefined at that point - and the limit still exists. It may equal 100 at the point - while the limit equals 5. This is not an error or a paradox. It is a tool Cauchy built in 1821 to describe what happens *near* a point, not *at* it. That is exactly what lets PyTorch compute a derivative as $(f(x+h) - f(x))/h$ with $h \to 0$ - without dividing by zero.
- **PyTorch autograd**: `autograd` computes $\frac{\partial L}{\partial w}$ as the limit of a difference quotient at $h \to 0$. Without limits, backpropagation is just magic
- **Asymptotic complexity O(n)**: saying an algorithm runs in $O(n^2)$ is a statement about the limit of $T(n)/n^2$ as $n \to \infty$. Without limits, big-O notation has no formal meaning
- **Continuous compounding**: $e^r = \lim_{n \to \infty}(1 + r/n)^n$. Every financial model with continuous time uses this limit
- **Instantaneous velocity**: a GPS tracker computes speed as $\Delta x / \Delta t$ over a small interval. The true instantaneous velocity is the limit as $\Delta t \to 0$
- **Softmax temperature**: as $T \to 0$, softmax approaches argmax; as $T \to \infty$, it approaches the uniform distribution. These are limits of a function over a parameter
From Zeno's paradoxes to the rigor of Weierstrass
The ancient Greeks already understood limits intuitively - Zeno's paradoxes about the arrow and Achilles and the tortoise are essentially about infinite approximation. But a rigorous definition appeared only in the 19th century. **Augustin-Louis Cauchy** (1821) was the first to give a verbal definition of a limit. **Karl Weierstrass** (1861) brought it to the modern ε-δ form. This rigor was necessary: before it, mathematicians argued whether the sum $1 - 1 + 1 - 1 + ...$ equals zero or one!
Intuition: a limit is a tendency, not a value
Intuition: a limit is a tendency, not a value
There is a function $f$. The interest is in its behavior near the point $x = a$. Not at $a$ - near it. Take $x$ closer and closer to $a$ (but not equal to $a$). Where does $f(x)$ tend?
Written as: $\lim_{x \to a} f(x) = L$ - "the limit of the function $f(x)$ as $x$ approaches $a$ equals $L$". The value $f(a)$ is ignored. It may not even exist.
**Key difference from sequences**: the limit of a sequence is a tendency as $n \to \infty$. The limit of a function is a tendency as $x \to a$, and $a$ can be any number. A function can "behave" near a point differently from how it is defined at that point.
The limit is independent of the value at the point
A function with a hole at a point
$f(x) = \frac{x^2 - 1}{x - 1}$ is undefined at $x = 1$ (division by zero). But simplifying: $\frac{x^2-1}{x-1} = \frac{(x-1)(x+1)}{x-1} = x + 1$ for $x \neq 1$ So $\lim_{x \to 1} \frac{x^2-1}{x-1} = \lim_{x \to 1}(x+1) = 2$ The limit exists and equals 2, even though the function is undefined at the point 1. **PyTorch parallel**: this is exactly how the derivative of a complex activation at a non-differentiable point works - a limit is taken, not the value.
Function $g(x) = x + 3$ for $x \neq 2$, and $g(2) = 100$. What is $\lim_{x \to 2} g(x)$?
The limit is determined by the behavior of the function **near** the point, not **at** it. Near x = 2 the function equals x + 3, so the limit equals 2 + 3 = 5. The value $g(2) = 100$ is a value, not a limit.
Formal definition: the ε-δ language
Formal definition: the ε-δ language
"Close" is a feeling, not mathematics. Weierstrass replaced the feeling with a two-player game. That is the ε-δ definition.
This is a **game**: the Skeptic sets a precision $\varepsilon$ - how close $f(x)$ must be to $L$. The Defender responds with a neighborhood $\delta$ - where to take $x$ from. If the Defender can respond to **any** $\varepsilon$ - the limit exists.
**Why an ML engineer needs this**: numerical differentiation via finite differences $\frac{f(x+h) - f(x)}{h}$ works precisely because this limit exists. Choosing $h$ is choosing $\delta$. Approximation error is $\varepsilon$. The ε-δ definition is not academic artifact - it is the formal foundation of gradient checking in PyTorch.
$\lim_{x \to a} f(x) = f(a)$ always
The limit may exist even if $f(a)$ is undefined or differs from the limit
The limit describes the behavior of a function NEAR the point, ignoring the point itself. The function can be discontinuous, have holes, or be completely undefined at the point - the limit still exists. That is why the derivative via a difference quotient limit is valid even where the function is "sharp".
In the ε-δ definition, what does the condition $0 < |x - a|$ (strict inequality) mean?
The condition $0 < |x - a|$ excludes the point $a$ itself. The limit describes the behavior of the function NEAR the point, so the value at the point itself is irrelevant - and may not even exist.
One-sided limits
One-sided limits
A function can behave differently when approaching from the **left** and from the **right**. This is not exotic - it is exactly what happens with ReLU at zero:
The two-sided limit $\lim_{x \to a} f(x)$ exists if and only if $L^- = L^+$.
ReLU at zero
One-sided limits in ML
$\text{ReLU}(x) = \max(0, x) = \begin{cases} 0, & x \leq 0 \\ x, & x > 0 \end{cases}$ Approaching zero: - **From the left**: $\lim_{x \to 0^-} \text{ReLU}(x) = 0$ - **From the right**: $\lim_{x \to 0^+} \text{ReLU}(x) = 0$ Both limits are equal to zero, so the two-sided limit **exists and equals 0**. But the derivative at zero? From the left: $(\text{ReLU})' = 0$. From the right: $(\text{ReLU})' = 1$. The one-sided derivatives **do not match** - that is why ReLU is not differentiable at zero. This is a separate limit.
The sign function
Classic example of different one-sided limits
$\text{sign}(x) = \begin{cases} -1, & x < 0 \\ 0, & x = 0 \\ +1, & x > 0 \end{cases}$ Approaching zero: - **From the left**: $\lim_{x \to 0^-} \text{sign}(x) = -1$ - **From the right**: $\lim_{x \to 0^+} \text{sign}(x) = +1$ Since $-1 \neq +1$, the two-sided limit $\lim_{x \to 0} \text{sign}(x)$ **does not exist**.
For the function $f(x) = \lfloor x \rfloor$ (floor function), what is $\lim_{x \to 2^-} f(x)$?
Approaching 2 from the left (e.g. x = 1.9, 1.99, 1.999...), the floor is always 1. From the right (x = 2.001) it would be 2, so the two-sided limit does not exist.
Limits at infinity
Limits at infinity
What happens to a function as $x$ grows without bound? This is a question about **asymptotic behavior** - what ML engineers study when analyzing algorithm complexity and model behavior on large data:
Instead of a $\delta$-neighborhood of a point, we take "far away" values $x > M$. If the function stabilizes near $L$ - the limit exists. In big-O notation: $f(n) = O(g(n))$ means the limit $f(n)/g(n)$ is finite and non-zero.
$\infty$ is a number that admits arithmetic
$\infty$ is a symbol for "unbounded growth", not a number
When we write $\lim f(x) = \infty$, we mean: the limit does NOT exist; the function grows without bound. Division by $\infty$ is undefined, and adding it as a number is not valid. In ML: "softmax saturates at large logits" is an informal description of a logit tending to infinity.
What is $\lim_{x \to \infty} \frac{3x^2 + 2x}{x^2 - 5}$?
Divide numerator and denominator by the highest power $x^2$: ratio of leading coefficients = $3/1 = 3$.
Remarkable limits
Remarkable limits
Two limits are so fundamental they are called **remarkable** - they generate entire families of results:
The first remarkable limit explains why ML papers write $\sin \theta \approx \theta$ for small $\theta$. This is not an approximation - it is an exact statement about behavior in the limit. The second limit is connected to $e \approx 2.71828$ - the base of the natural logarithm, living inside softmax, temperature scaling, and continuous discounting.
What is $\lim_{x \to 0} \frac{\sin(3x)}{x}$?
Use the first remarkable limit: $\frac{\sin(3x)}{x} = 3 \cdot \frac{\sin(3x)}{3x}$. As $x \to 0$ we have $3x \to 0$, and $\frac{\sin(3x)}{3x} \to 1$. Answer: $3 \cdot 1 = 3$.
Properties of limits
Properties of limits
If $\lim_{x \to a} f(x) = L$ and $\lim_{x \to a} g(x) = M$ exist, then:
- **Sum/difference**: $\lim(f \pm g) = L \pm M$
- **Product**: $\lim(f \cdot g) = L \cdot M$
- **Quotient**: $\lim(f / g) = L / M$ if $M \neq 0$
- **Constant**: $\lim(c \cdot f) = c \cdot L$
- **Power**: $\lim f^n = L^n$
If $\lim_{x \to a} f(x) = 4$ and $\lim_{x \to a} g(x) = 2$, what is $\lim_{x \to a} \frac{f(x)}{g(x)}$?
By the quotient property of limits (valid because $\lim g \ne 0$): $\lim (f/g) = (\lim f)/(\lim g) = 4/2 = 2$.
Indeterminate forms
Indeterminate forms
Direct substitution sometimes gives a meaningless expression - an **indeterminate form**. Not an answer - a signal that a different method is needed:
| Form | Example | Resolution method |
|---|---|---|
| $\frac{0}{0}$ | $\lim_{x\to1}\frac{x^2-1}{x-1}$ | Factor and cancel |
| $\frac{\infty}{\infty}$ | $\lim_{x\to\infty}\frac{x^2}{x^3}$ | Divide by the highest power |
| $0 \cdot \infty$ | $\lim_{x\to0^+} x \ln x$ | Convert to a fraction |
| $\infty - \infty$ | $\lim_{x\to\infty}(\sqrt{x+1} - \sqrt{x})$ | Multiply by conjugate |
| $1^\infty$ | $\lim_{x\to\infty}(1+1/x)^x$ | Second remarkable limit |
| $0^0, \infty^0$ | $\lim_{x\to0^+} x^x$ | Take logarithm |
**L'Hopital's rule** is a powerful tool for resolving $\frac{0}{0}$ and $\frac{\infty}{\infty}$: differentiate numerator and denominator separately. More details - in the lesson on computing limits!
Which of these expressions is an indeterminate form?
The seven classical indeterminate forms include $0/0$, $\infty/\infty$, $0 \cdot \infty$, $\infty - \infty$, $0^0$, $1^\infty$, $\infty^0$. Forms like $1/0$ or $\infty + \infty$ are not indeterminate.
Practice
Practice
Compute $\lim_{x \to 2} \frac{x^2 - 4}{x - 2}$
Numerator: $x^2 - 4 = (x-2)(x+2)$ $\lim_{x \to 2} \frac{(x-2)(x+2)}{x-2} = \lim_{x \to 2} (x+2) = 4$
Compute $\lim_{x \to \infty} \frac{3x^2 + 5x - 1}{2x^2 - x + 7}$
Divide everything by $x^2$: $\lim_{x \to \infty} \frac{3 + 5/x - 1/x^2}{2 - 1/x + 7/x^2}$ As $x \to \infty$ all fractions with $x$ in the denominator → 0: $= \frac{3 + 0 - 0}{2 - 0 + 0} = \frac{3}{2}$
Prove using the ε-δ definition that $\lim_{x \to 3} (2x + 1) = 7$
We need to prove: $\forall \varepsilon > 0 \; \exists \delta > 0: 0 < |x - 3| < \delta \Rightarrow |(2x+1) - 7| < \varepsilon$ Simplify $|(2x+1) - 7| = |2x - 6| = 2|x - 3|$ We want: $2|x - 3| < \varepsilon$, i.e. $|x - 3| < \varepsilon/2$ **Choose $\delta = \varepsilon/2$** Verification: if $0 < |x - 3| < \delta = \varepsilon/2$, then: $|(2x+1) - 7| = 2|x - 3| < 2 \cdot \varepsilon/2 = \varepsilon$ ✓
What is $\lim_{x \to 0} \frac{\sin(3x)}{x}$?
Use the standard limit $\lim_{u \to 0} \sin(u)/u = 1$: rewrite as $3 \cdot \sin(3x)/(3x) \to 3 \cdot 1 = 3$.
Connection with other topics
The limit of a function is the foundation of all analysis
- Sequences — The limit of a function generalizes the limit of a sequence (Heine's definition)
- Continuity — A function is continuous at a point if the limit equals the value
- Derivative — The derivative is the limit of a difference quotient. All of backprop lives here
- Integral — The definite integral is the limit of Riemann sums
- Taylor series — Use limits to approximate functions
Итоги
- **Limit $\lim_{x \to a} f(x) = L$** describes function behavior near $a$, not at it - $f(a)$ may not even exist
- **ε-δ definition**: for any precision ε there exists a neighborhood δ ensuring that precision. The formal foundation of gradient checking
- **One-sided limits** may differ. The two-sided limit exists only when they are equal - exactly how ReLU is analyzed
- **Remarkable limits**: $\frac{\sin x}{x} \to 1$ and $(1+x)^{1/x} \to e$ as $x \to 0$. The second generates all of continuous-time mathematics
- **Indeterminate forms** (0/0, ∞/∞, etc.) require special resolution methods - not an error, a signal
Вопросы для размышления
- Why can the limit of a function at a point differ from the value of the function at that point? Give an example from ML.
- How can the ε-δ definition be explained to a junior developer using gradient checking as an analogy?
- Why is $\infty$ not a number? What breaks when treating it as one in a softmax computation?
- How are the limit of a sequence and the limit of a function related?