Numerical Methods
Interpolation: Lagrange, Newton, and the Runge Phenomenon
Bilinear interpolation in `torch.nn.functional.grid_sample` is Lagrange degree 1 applied in two spatial dimensions. RIFE interpolates video frames with learned flow plus bilinear sampling. Gauss-Chebyshev quadrature in scipy uses Chebyshev polynomial roots as optimal integration nodes. Interpolation is not an academic exercise - it is the algorithmic core of computer graphics, numerical integration, and signal processing.
- **torch.nn.functional.grid_sample**: bilinear/bicubic interpolation of feature maps in spatial transformers, deformable convolutions, and RIFE video interpolation
- **scipy.interpolate**: CubicSpline, BarycentricInterpolator - Lagrange formula in barycentric form for numerical stability
- **Gauss-Chebyshev quadrature**: scipy.integrate.fixed_quad uses Chebyshev polynomial roots as optimal integration nodes
- **Computer graphics**: Bezier curves are a special case of polynomial interpolation; splines drive TrueType fonts and SVG paths
- **torch.autograd.gradcheck**: validates Jacobians numerically via finite differences - first-order divided differences with step h -> 0
Предварительные знания
Lagrange Formula
1795. Lagrange publishes a formula that produces an explicit polynomial through n+1 points - no linear systems, no iterations. The idea is elegant: for each point, build a basis polynomial equal to 1 at that point and 0 at all others. Neural networks use exactly the same principle in bilinear interpolation when rescaling feature maps.
**Lagrange interpolating polynomial:** L(x) = sum(i=0..n) yi * li(x) where **basis polynomials:** li(x) = product(j != i) (x - xj) / (xi - xj) **Kronecker property:** li(xj) = 1 if i=j, 0 otherwise. Therefore: L(xi) = yi - the polynomial passes through all points.
In PyTorch, `torch.nn.functional.grid_sample` implements bilinear interpolation for feature map warping: each new pixel is a weighted sum of four neighbors - exactly Lagrange degree 1 in each spatial dimension. RIFE (Real-time Intermediate Flow Estimation) for video frame interpolation uses the same mechanism with learned flow fields.
The main drawback: adding a new point forces recomputation of **all** basis polynomials - $O(n^2)$ work. Newton's form solves this cleanly.
The Lagrange basis polynomial li(x) at node xj equals:
Newton's Form
Newton proposed a different representation of the same polynomial - via **divided differences**. The critical property: adding a new point $x_{n+1}$ requires computing only one new coefficient; all previous ones stay intact. This is online updating - the same pattern as Adam maintaining moment estimates and updating only what a new gradient touches.
**Divided differences (recursive):** f[xi] = yi f[xi, xi+1] = (f[xi+1] - f[xi]) / (xi+1 - xi) f[xi, ..., xi+k] = (f[xi+1, ..., xi+k] - f[xi, ..., xi+k-1]) / (xi+k - xi) **Newton's polynomial:** N(x) = f[x0] + f[x0,x1]*(x-x0) + f[x0,x1,x2]*(x-x0)(x-x1) + ... Uniqueness theorem: Lagrange and Newton give the **same** polynomial.
Horner's scheme ($O(n)$ for polynomial evaluation) reuses intermediate products - the same principle behind fast neural network forward passes. `torch.autograd.gradcheck` validates Jacobians numerically via finite differences $(f(x+h) - f(x-h)) / 2h$ - a first-order divided difference with step $h \to 0$.
The main advantage of Newton's form over Lagrange's form:
Chebyshev Nodes
Interpolation quality depends not on the number of points but on their **placement**. A uniform grid seems like the natural choice - and turns out to be catastrophic. In 1854 Chebyshev found the optimal distribution: nodes that minimize the maximum error. These same nodes appear as optimal integration points in Gauss-Chebyshev quadrature inside scipy.
**Chebyshev nodes on [-1, 1]:** xk = cos((2k + 1) * pi / (2(n + 1))), k = 0, 1, ..., n On an arbitrary [a, b]: xk = (a + b)/2 + (b - a)/2 * cos((2k + 1) * pi / (2(n + 1))) **Why they are optimal:** minimize max|omega(x)| where omega(x) = product(x - xk) is the interpolation error factor.
Minimax theorem: among all monic polynomials of degree n+1, the Chebyshev polynomial $T_{n+1}(x)/2^n$ has the smallest deviation from zero on [-1, 1]. Its roots are the Chebyshev nodes - that is what makes them optimal. In `scipy.integrate.fixed_quad`, Gauss-Legendre nodes follow the same logic: roots of orthogonal polynomials minimize integration error.
Why do Chebyshev nodes cluster at the endpoints of the interval?
Runge Phenomenon
1901. Carl Runge takes $f(x) = 1/(1 + 25x^2)$ and demonstrates a paradox: increasing the number of uniform nodes causes the interpolating polynomial to **diverge**. Not "converge slowly" - the polynomial literally explodes at the endpoints. This kills the intuition that "more data = more accurate" for polynomial interpolation on a uniform grid.
**Runge phenomenon:** for f(x) = 1/(1 + 25x^2) on [-1, 1], polynomial interpolation on a uniform grid diverges as n -> inf. The maximum error grows exponentially with the number of nodes! **Solution:** Chebyshev nodes (convergence) or splines (low-degree polynomials on each subinterval).
Three fixes for Runge: 1. **Chebyshev nodes** - polynomial interpolation converges 2. **splines** - piecewise low-degree polynomials, the standard in scipy and sklearn preprocessing 3. **trigonometric interpolation** for periodic functions (FFT). In neural networks the same logic applies: local operations (convolutions) generalize better than global high-degree polynomials.
More interpolation points always give a better approximation
For polynomial interpolation on a uniform grid, adding more points can WORSEN the result (Runge phenomenon). Quality depends on node placement, not just count
A degree-n polynomial can oscillate between nodes with amplitude growing exponentially in n. On a uniform grid these oscillations concentrate near the endpoints. Chebyshev nodes solve the problem; splines circumvent it by keeping polynomial degree low.
When interpolating f(x) = 1/(1+25x^2) on [-1,1] with uniform nodes as n increases:
Key Ideas
- **Lagrange formula:** L(x) = sum yi * li(x), explicit via basis polynomials with the Kronecker delta property
- **Newton's form:** divided differences + Horner's scheme; adding a point costs one new coefficient
- **Uniqueness theorem:** Lagrange and Newton give the same polynomial - just different representations
- **Chebyshev nodes:** xk = cos(...), clustering at endpoints, minimize the max-error over the whole interval
- **Runge phenomenon:** uniform nodes + high degree = exponentially growing oscillations at endpoints
- **Callback:** bilinear in PyTorch = Lagrange degree 1; Chebyshev nodes power scipy quadratures
Related Topics
Interpolation is the foundation for splines and numerical integration:
- Errors and floating point — Rounding errors limit the practical degree of the interpolating polynomial
- Splines — Piecewise interpolation - the fix for Runge: low-degree polynomials on each subinterval
- Taylor series — Alternative polynomial approximation: local expansion vs global node-based interpolation
Вопросы для размышления
- Why do uniform nodes work fine for trigonometric functions (sin, cos) but fail for 1/(1+25x^2)? What property of a function prevents the Runge phenomenon?
- torch.autograd.gradcheck computes (f(x+h) - f(x-h)) / 2h. This is a first-order divided difference. Why can h not be made arbitrarily small - and how does this connect to catastrophic cancellation from lesson nm-01?
- Bilinear interpolation in grid_sample is a tensor product of two degree-1 Lagrange polynomials. What happens to quality with bicubic (degree 3)? When is degree 3 worse than degree 1?
Связанные уроки
- nm-01 — float64 rounding errors cap the practical degree of interpolating polynomials
- nm-03 — Splines are piecewise interpolation - the standard fix for the Runge phenomenon
- calc-16-taylor — Taylor series is also polynomial approximation, but local rather than global
- la-15-svd — SVD underlies polynomial least-squares fitting and regression
- nm-04 — Gauss-Chebyshev quadrature uses Chebyshev nodes as optimal integration points
- calc-11-definite