Numerical Methods

Interpolation: Lagrange, Newton, and the Runge Phenomenon

Bilinear interpolation in `torch.nn.functional.grid_sample` is Lagrange degree 1 applied in two spatial dimensions. RIFE interpolates video frames with learned flow plus bilinear sampling. Gauss-Chebyshev quadrature in scipy uses Chebyshev polynomial roots as optimal integration nodes. Interpolation is not an academic exercise - it is the algorithmic core of computer graphics, numerical integration, and signal processing.

**torch.nn.functional.grid_sample**: bilinear/bicubic interpolation of feature maps in spatial transformers, deformable convolutions, and RIFE video interpolation
**scipy.interpolate**: CubicSpline, BarycentricInterpolator - Lagrange formula in barycentric form for numerical stability
**Gauss-Chebyshev quadrature**: scipy.integrate.fixed_quad uses Chebyshev polynomial roots as optimal integration nodes
**Computer graphics**: Bezier curves are a special case of polynomial interpolation; splines drive TrueType fonts and SVG paths
**torch.autograd.gradcheck**: validates Jacobians numerically via finite differences - first-order divided differences with step h -> 0

Предварительные знания

Errors and Floating-Point Arithmetic

Lagrange Formula

1795. Lagrange publishes a formula that produces an explicit polynomial through n+1 points - no linear systems, no iterations. The idea is elegant: for each point, build a basis polynomial equal to 1 at that point and 0 at all others. Neural networks use exactly the same principle in bilinear interpolation when rescaling feature maps.

**Lagrange interpolating polynomial:** L(x) = sum(i=0..n) yi * li(x) where **basis polynomials:** li(x) = product(j != i) (x - xj) / (xi - xj) **Kronecker property:** li(xj) = 1 if i=j, 0 otherwise. Therefore: L(xi) = yi - the polynomial passes through all points.

In PyTorch, `torch.nn.functional.grid_sample` implements bilinear interpolation for feature map warping: each new pixel is a weighted sum of four neighbors - exactly Lagrange degree 1 in each spatial dimension. RIFE (Real-time Intermediate Flow Estimation) for video frame interpolation uses the same mechanism with learned flow fields.

The main drawback: adding a new point forces recomputation of **all** basis polynomials - $O(n^2)$ work. Newton's form solves this cleanly.

The Lagrange basis polynomial li(x) at node xj equals:

Newton's Form

Newton proposed a different representation of the same polynomial - via **divided differences**. The critical property: adding a new point $x_{n+1}$ requires computing only one new coefficient; all previous ones stay intact. This is online updating - the same pattern as Adam maintaining moment estimates and updating only what a new gradient touches.

**Divided differences (recursive):** f[xi] = yi f[xi, xi+1] = (f[xi+1] - f[xi]) / (xi+1 - xi) f[xi, ..., xi+k] = (f[xi+1, ..., xi+k] - f[xi, ..., xi+k-1]) / (xi+k - xi) **Newton's polynomial:** N(x) = f[x0] + f[x0,x1]*(x-x0) + f[x0,x1,x2]*(x-x0)(x-x1) + ... Uniqueness theorem: Lagrange and Newton give the **same** polynomial.

Horner's scheme ($O(n)$ for polynomial evaluation) reuses intermediate products - the same principle behind fast neural network forward passes. `torch.autograd.gradcheck` validates Jacobians numerically via finite differences $(f(x+h) - f(x-h)) / 2h$ - a first-order divided difference with step $h \to 0$.

The main advantage of Newton's form over Lagrange's form:

Chebyshev Nodes

Interpolation quality depends not on the number of points but on their **placement**. A uniform grid seems like the natural choice - and turns out to be catastrophic. In 1854 Chebyshev found the optimal distribution: nodes that minimize the maximum error. These same nodes appear as optimal integration points in Gauss-Chebyshev quadrature inside scipy.

**Chebyshev nodes on [-1, 1]:** xk = cos((2k + 1) * pi / (2(n + 1))), k = 0, 1, ..., n On an arbitrary [a, b]: xk = (a + b)/2 + (b - a)/2 * cos((2k + 1) * pi / (2(n + 1))) **Why they are optimal:** minimize max|omega(x)| where omega(x) = product(x - xk) is the interpolation error factor.

Minimax theorem: among all monic polynomials of degree n+1, the Chebyshev polynomial $T_{n+1}(x)/2^n$ has the smallest deviation from zero on [-1, 1]. Its roots are the Chebyshev nodes - that is what makes them optimal. In `scipy.integrate.fixed_quad`, Gauss-Legendre nodes follow the same logic: roots of orthogonal polynomials minimize integration error.

Why do Chebyshev nodes cluster at the endpoints of the interval?

Runge Phenomenon

1901. Carl Runge takes $f(x) = 1/(1 + 25x^2)$ and demonstrates a paradox: increasing the number of uniform nodes causes the interpolating polynomial to **diverge**. Not "converge slowly" - the polynomial literally explodes at the endpoints. This kills the intuition that "more data = more accurate" for polynomial interpolation on a uniform grid.

**Runge phenomenon:** for f(x) = 1/(1 + 25x^2) on [-1, 1], polynomial interpolation on a uniform grid diverges as n -> inf. The maximum error grows exponentially with the number of nodes! **Solution:** Chebyshev nodes (convergence) or splines (low-degree polynomials on each subinterval).

Three fixes for Runge: 1. **Chebyshev nodes** - polynomial interpolation converges 2. **splines** - piecewise low-degree polynomials, the standard in scipy and sklearn preprocessing 3. **trigonometric interpolation** for periodic functions (FFT). In neural networks the same logic applies: local operations (convolutions) generalize better than global high-degree polynomials.

More interpolation points always give a better approximation

For polynomial interpolation on a uniform grid, adding more points can WORSEN the result (Runge phenomenon). Quality depends on node placement, not just count

A degree-n polynomial can oscillate between nodes with amplitude growing exponentially in n. On a uniform grid these oscillations concentrate near the endpoints. Chebyshev nodes solve the problem; splines circumvent it by keeping polynomial degree low.

When interpolating f(x) = 1/(1+25x^2) on [-1,1] with uniform nodes as n increases:

Key Ideas

**Lagrange formula:** L(x) = sum yi * li(x), explicit via basis polynomials with the Kronecker delta property
**Newton's form:** divided differences + Horner's scheme; adding a point costs one new coefficient
**Uniqueness theorem:** Lagrange and Newton give the same polynomial - just different representations
**Chebyshev nodes:** xk = cos(...), clustering at endpoints, minimize the max-error over the whole interval
**Runge phenomenon:** uniform nodes + high degree = exponentially growing oscillations at endpoints
**Callback:** bilinear in PyTorch = Lagrange degree 1; Chebyshev nodes power scipy quadratures

Вопросы для размышления

Why do uniform nodes work fine for trigonometric functions (sin, cos) but fail for 1/(1+25x^2)? What property of a function prevents the Runge phenomenon?
torch.autograd.gradcheck computes (f(x+h) - f(x-h)) / 2h. This is a first-order divided difference. Why can h not be made arbitrarily small - and how does this connect to catastrophic cancellation from lesson nm-01?
Bilinear interpolation in grid_sample is a tensor product of two degree-1 Lagrange polynomials. What happens to quality with bicubic (degree 3)? When is degree 3 worse than degree 1?

Связанные уроки

nm-01 — float64 rounding errors cap the practical degree of interpolating polynomials
nm-03 — Splines are piecewise interpolation - the standard fix for the Runge phenomenon
calc-16-taylor — Taylor series is also polynomial approximation, but local rather than global
la-15-svd — SVD underlies polynomial least-squares fitting and regression
nm-04 — Gauss-Chebyshev quadrature uses Chebyshev nodes as optimal integration points
calc-11-definite

Numerical Methods

Interpolation: Lagrange, Newton, and the Runge Phenomenon

**torch.nn.functional.grid_sample**: bilinear/bicubic interpolation of feature maps in spatial transformers, deformable convolutions, and RIFE video interpolation
**scipy.interpolate**: CubicSpline, BarycentricInterpolator - Lagrange formula in barycentric form for numerical stability
**Gauss-Chebyshev quadrature**: scipy.integrate.fixed_quad uses Chebyshev polynomial roots as optimal integration nodes
**Computer graphics**: Bezier curves are a special case of polynomial interpolation; splines drive TrueType fonts and SVG paths
**torch.autograd.gradcheck**: validates Jacobians numerically via finite differences - first-order divided differences with step h -> 0

Предварительные знания

Errors and Floating-Point Arithmetic

Lagrange Formula

The main drawback: adding a new point forces recomputation of **all** basis polynomials - $O(n^2)$ work. Newton's form solves this cleanly.

The Lagrange basis polynomial li(x) at node xj equals:

Newton's Form

The main advantage of Newton's form over Lagrange's form:

Chebyshev Nodes

Why do Chebyshev nodes cluster at the endpoints of the interval?

Runge Phenomenon

More interpolation points always give a better approximation

For polynomial interpolation on a uniform grid, adding more points can WORSEN the result (Runge phenomenon). Quality depends on node placement, not just count

When interpolating f(x) = 1/(1+25x^2) on [-1,1] with uniform nodes as n increases:

Key Ideas

**Lagrange formula:** L(x) = sum yi * li(x), explicit via basis polynomials with the Kronecker delta property
**Newton's form:** divided differences + Horner's scheme; adding a point costs one new coefficient
**Uniqueness theorem:** Lagrange and Newton give the same polynomial - just different representations
**Chebyshev nodes:** xk = cos(...), clustering at endpoints, minimize the max-error over the whole interval
**Runge phenomenon:** uniform nodes + high degree = exponentially growing oscillations at endpoints
**Callback:** bilinear in PyTorch = Lagrange degree 1; Chebyshev nodes power scipy quadratures

Вопросы для размышления

Why do uniform nodes work fine for trigonometric functions (sin, cos) but fail for 1/(1+25x^2)? What property of a function prevents the Runge phenomenon?
torch.autograd.gradcheck computes (f(x+h) - f(x-h)) / 2h. This is a first-order divided difference. Why can h not be made arbitrarily small - and how does this connect to catastrophic cancellation from lesson nm-01?
Bilinear interpolation in grid_sample is a tensor product of two degree-1 Lagrange polynomials. What happens to quality with bicubic (degree 3)? When is degree 3 worse than degree 1?

Связанные уроки

nm-01 — float64 rounding errors cap the practical degree of interpolating polynomials
nm-03 — Splines are piecewise interpolation - the standard fix for the Runge phenomenon
calc-16-taylor — Taylor series is also polynomial approximation, but local rather than global
la-15-svd — SVD underlies polynomial least-squares fitting and regression
nm-04 — Gauss-Chebyshev quadrature uses Chebyshev nodes as optimal integration points
calc-11-definite

Interpolation: Lagrange, Newton, and the Runge Phenomenon

Предварительные знания

Lagrange Formula

Newton's Form

Chebyshev Nodes

Runge Phenomenon

Key Ideas

Related Topics

Вопросы для размышления

Связанные уроки

Interpolation: Lagrange, Newton, and the Runge Phenomenon

Предварительные знания

Lagrange Formula

Newton's Form

Chebyshev Nodes

Runge Phenomenon

Key Ideas

Related Topics

Вопросы для размышления

Связанные уроки