Computer Vision

Filtering and Convolution

Цели урока

Understand convolution: kernel, padding, stride - and the output size formula
Choose the right blur type for a given noise profile
Apply Canny edge detection with correct threshold parameters
Build a morphological pipeline to clean binary masks

Предварительные знания

Digital Images: Pixels and Color

Yann LeCun and the First Practical CNN

1989. Yann LeCun at Bell Labs trains LeNet-5 - a convolutional neural network for handwritten digit recognition deployed by the US Postal Service. The revolutionary insight: instead of fully connected layers - 5x5 convolution kernels with shared weights. Fewer parameters, and crucially - position invariance. The same digit in different positions activates the same kernel. By 1998, LeNet processed 10-20% of bank checks in the US. In 2012, its ideas powered AlexNet, which broke ImageNet records and launched the deep learning era.

2015. ResNet-152: 3.57% error on ImageNet - better than humans (5.1%). 152 layers. One trick: residual connections. Today, DALL-E 3, Stable Diffusion, Tesla Autopilot - all built on convolutional architectures. At the core: a 3x3 matrix sliding across an image. The same idea Yann LeCun applied in 1989.

**Portrait Mode on smartphones** - Gaussian blur applied to a mask computed by a neural network. Every photo taken involves convolution running 30-40 times in sequence
**Tesla Autopilot** - Canny edge detection + Hough transform for lane markings, 30 frames per second across 8 cameras
**OCR (Google Lens, Apple Live Text)** - morphological operations join broken letter segments and remove scan artifacts before the neural network runs
**DALL-E 3 and Stable Diffusion** - a convolutional encoder compresses the image into latent space before the diffusion process begins

The Convolution Operation

1989. Yann LeCun at Bell Labs trains LeNet on handwritten digits for the US Postal Service. The key decision: instead of a fully connected layer - a **convolution kernel**, a 5x5 matrix sliding across the image. Same numbers, same weights - at every position. That gave 10x fewer parameters and the first practical CV neural network in history. Today, convolution kernels are inside iPhone Portrait Mode, Tesla Autopilot, and DALL-E 3.

**Padding** solves the border problem: when a 3x3 kernel slides near the edges, boundary pixels have no neighbors. **Same padding** (zero padding) - pad with zeros so output matches input size. **Valid** - no padding, output shrinks. **Stride** - step size of the kernel shift (typically 1). In CNNs: stride=2 replaces pooling for resolution reduction.

The pure-Python loop implementation is **catastrophically slow** (seconds per image). OpenCV calls optimized C++ code: `cv2.filter2D(img, -1, kernel)` - thousands of times faster. Write convolution by hand only to understand the mechanics.

The kernel determines **what the convolution does**. All-ones - averaging (blur). Positive center, negative surroundings - edge detection. Gaussian - smooth blur without artifacts. One operation, infinite effects. CNNs learn kernels automatically from data - that is the central breakthrough of deep learning in CV.

A 100x100 image is convolved with a 5x5 kernel, no padding (valid). What is the output size?

Blurring and Noise Reduction

A paradox: blurring is one of the most useful operations in CV. It removes noise (random brightness fluctuations), smooths textures before edge detection, and reduces distracting detail. Without blur: Canny on a noisy image generates thousands of false edges. With blur: only real contours survive. Harris, SIFT, Canny - all start with Gaussian blur.

**Averaging blur** - the simplest: a kernel of equal values. Each pixel is replaced by the average of its neighbors. It blurs everything uniformly including edges - which is why it is rarely used in practice. **Gaussian blur** - the gold standard: weights distributed along a bell curve, the center pixel contributes the most.

Filter	Kernel	Best for	Preserves edges?
Averaging	All values = 1/N2	Quick preview	No
Gaussian	Gaussian-weighted	General noise reduction	Partially
Median	Median of neighbors	Salt-and-pepper noise	Yes!
Bilateral	Gaussian + intensity weighting	Noise reduction with edge preservation	Yes

In a CV pipeline, **Gaussian blur almost always comes first**. Canny edge detector, Harris corner detector, SIFT - all start with blurring. The **sigma** parameter controls blur strength: sigma=1 is gentle, sigma=10 is aggressive.

A surveillance camera image has heavy salt-and-pepper noise (random black and white pixels). Which filter works best?

Edge Detection: Sobel and Canny

Tesla Autopilot processes 8 video streams in real time. The first task: find lane markings, curbs, object silhouettes. These are **edges**: locations of sharp brightness changes. For an algorithm, an edge = the location where the **gradient** (rate of brightness change) is maximal. Two 3x3 convolution kernels are all it takes for primary detection.

**Sobel** - the simplest detector: two 3x3 kernels, one finds horizontal gradients (Gx), the other vertical (Gy). Full gradient magnitude: G = sqrt(Gx^2 + Gy^2).

**Canny** - the gold standard in edge detection, by John Canny in 1986. Not just one filter, but a **4-step pipeline**, each step fixing a specific weakness of Sobel: Gaussian blur (remove noise) - Sobel (find gradient) - Non-maximum suppression (thin to 1 pixel) - Hysteresis thresholding (drop weak isolated edges).

**Hysteresis thresholding** is Canny's defining feature. Two thresholds instead of one: a low threshold produces noise, a high threshold produces broken contours. Canny takes the best of both: strong edges are found with the high threshold, then weak edge segments along the contour are extended down to the low threshold. Practical rule: high:low = 2:1 or 3:1.

In Canny edge detection, non-maximum suppression is needed to:

Morphological Operations

OCR systems (Google Lens, Tesseract, Apple Live Text) work with binary masks. After threshold segmentation of text - holes inside letters, small noise, broken contours. **Morphological operations** are the cleanup tools. They work with a **structuring element** (a small kernel) that defines the neighborhood shape.

Two basic operations: **erosion** - if at least one pixel under the kernel = 0, the result = 0. Objects shrink, small specks disappear. **Dilation** - if at least one pixel = 1, the result = 1. Objects grow, holes fill in.

**Opening** = erosion -> dilation. Removes small noise (erosion deletes specks), then restores object size (dilation). **Closing** = dilation -> erosion. Fills holes and gaps (dilation expands), then returns the contour (erosion shrinks). Standard pipeline for cleaning OCR masks: opening first, then closing.

Operation	Formula	Effect	Use case
Erosion	Shrink	Objects get smaller	Remove thin connections, noise
Dilation	Expand	Objects get larger	Fill holes, connect segments
Opening	Erode -> Dilate	Remove small objects	Noise cleanup
Closing	Dilate -> Erode	Fill holes	Close contours
Gradient	Dilate - Erode	Object outline	Outline visualization

**Morphological gradient** (`MORPH_GRADIENT`) = dilation - erosion. The result is an object outline with thickness proportional to the kernel size. A fast alternative to Canny for binary masks.

Convolution and correlation are the same operation

Mathematically, convolution flips the kernel 180 degrees before applying it (reflecting along both axes). Correlation applies the kernel as-is. For symmetric kernels (Gaussian, averaging) there is no difference. For asymmetric kernels (Sobel, directional) the difference is significant.

OpenCV and most deep learning frameworks implement correlation but call it convolution. True mathematical convolution is used in signal processing. When constructing a kernel for a specific direction, remember that cv2.filter2D() performs correlation.

A binary mask has small black holes inside an object and small white noise specks around it. Which pipeline is correct?

Main points

**Convolution** - one operation (kernel x region), but different kernels produce different effects: blur, edge detection, sharpening. Yann LeCun applied this idea in 1989 - it underlies all modern CV
**Blur** removes noise and prepares for analysis. Gaussian - universal, Median - for salt-and-pepper noise, Bilateral - when edges need to be preserved
**Canny** = 4-step pipeline (blur - gradient - NMS - hysteresis). Two thresholds solve the noise-vs-broken-contours dilemma. Rule: 2:1 or 3:1
**Morphology** cleans binary masks: opening removes noise, closing fills holes. OCR, medical scanners, object detection - all use this pipeline
**CNNs in 2024** learn kernels automatically - but understanding Sobel, Gaussian, Canny helps interpret what the network has learned

Вопросы для размышления

Why does the Canny edge detector start with blurring when blurring destroys detail? Doesn't that contradict the goal of finding edges?
If CNNs learn convolution kernels automatically, why understand Sobel and Canny? In which situations are classical filters more reliable?
Morphological operations work on binary masks. What if an object has semi-transparent edges (alpha gradient)? How would the pipeline be adapted?

Связанные уроки

cv-01 — Pixels, coordinates, and data types - the foundation for convolution
cv-03 — Feature detectors SIFT and ORB use Gaussian blur and gradients internally
dl-05 — CNNs learn convolution kernels automatically - this is the evolution of manual filters
aie-25-multimodal — DALL-E 3 and Stable Diffusion use convolutional encoders inside
cv-04 — Object detection and tracking build on edge detection and morphology
la-06-transformations

Computer Vision

Filtering and Convolution

Цели урока

Understand convolution: kernel, padding, stride - and the output size formula
Choose the right blur type for a given noise profile
Apply Canny edge detection with correct threshold parameters
Build a morphological pipeline to clean binary masks

Предварительные знания

Digital Images: Pixels and Color

Yann LeCun and the First Practical CNN

**Portrait Mode on smartphones** - Gaussian blur applied to a mask computed by a neural network. Every photo taken involves convolution running 30-40 times in sequence
**Tesla Autopilot** - Canny edge detection + Hough transform for lane markings, 30 frames per second across 8 cameras
**OCR (Google Lens, Apple Live Text)** - morphological operations join broken letter segments and remove scan artifacts before the neural network runs
**DALL-E 3 and Stable Diffusion** - a convolutional encoder compresses the image into latent space before the diffusion process begins

The Convolution Operation

A 100x100 image is convolved with a 5x5 kernel, no padding (valid). What is the output size?

Blurring and Noise Reduction

Filter	Kernel	Best for	Preserves edges?
Averaging	All values = 1/N2	Quick preview	No
Gaussian	Gaussian-weighted	General noise reduction	Partially
Median	Median of neighbors	Salt-and-pepper noise	Yes!
Bilateral	Gaussian + intensity weighting	Noise reduction with edge preservation	Yes

A surveillance camera image has heavy salt-and-pepper noise (random black and white pixels). Which filter works best?

Edge Detection: Sobel and Canny

**Sobel** - the simplest detector: two 3x3 kernels, one finds horizontal gradients (Gx), the other vertical (Gy). Full gradient magnitude: G = sqrt(Gx^2 + Gy^2).

In Canny edge detection, non-maximum suppression is needed to:

Morphological Operations

Operation	Formula	Effect	Use case
Erosion	Shrink	Objects get smaller	Remove thin connections, noise
Dilation	Expand	Objects get larger	Fill holes, connect segments
Opening	Erode -> Dilate	Remove small objects	Noise cleanup
Closing	Dilate -> Erode	Fill holes	Close contours
Gradient	Dilate - Erode	Object outline	Outline visualization

**Morphological gradient** (`MORPH_GRADIENT`) = dilation - erosion. The result is an object outline with thickness proportional to the kernel size. A fast alternative to Canny for binary masks.

Convolution and correlation are the same operation

A binary mask has small black holes inside an object and small white noise specks around it. Which pipeline is correct?

Main points

**Convolution** - one operation (kernel x region), but different kernels produce different effects: blur, edge detection, sharpening. Yann LeCun applied this idea in 1989 - it underlies all modern CV
**Blur** removes noise and prepares for analysis. Gaussian - universal, Median - for salt-and-pepper noise, Bilateral - when edges need to be preserved
**Canny** = 4-step pipeline (blur - gradient - NMS - hysteresis). Two thresholds solve the noise-vs-broken-contours dilemma. Rule: 2:1 or 3:1
**Morphology** cleans binary masks: opening removes noise, closing fills holes. OCR, medical scanners, object detection - all use this pipeline
**CNNs in 2024** learn kernels automatically - but understanding Sobel, Gaussian, Canny helps interpret what the network has learned

Вопросы для размышления

Why does the Canny edge detector start with blurring when blurring destroys detail? Doesn't that contradict the goal of finding edges?
If CNNs learn convolution kernels automatically, why understand Sobel and Canny? In which situations are classical filters more reliable?
Morphological operations work on binary masks. What if an object has semi-transparent edges (alpha gradient)? How would the pipeline be adapted?

Связанные уроки

cv-01 — Pixels, coordinates, and data types - the foundation for convolution
cv-03 — Feature detectors SIFT and ORB use Gaussian blur and gradients internally
dl-05 — CNNs learn convolution kernels automatically - this is the evolution of manual filters
aie-25-multimodal — DALL-E 3 and Stable Diffusion use convolutional encoders inside
cv-04 — Object detection and tracking build on edge detection and morphology
la-06-transformations

Filtering and Convolution

Цели урока

Предварительные знания

Yann LeCun and the First Practical CNN

The Convolution Operation

Blurring and Noise Reduction

Edge Detection: Sobel and Canny

Morphological Operations

Main points

Related topics

Вопросы для размышления

Связанные уроки

Filtering and Convolution

Цели урока

Предварительные знания

Yann LeCun and the First Practical CNN

The Convolution Operation

Blurring and Noise Reduction

Edge Detection: Sobel and Canny

Morphological Operations

Main points

Related topics

Вопросы для размышления

Связанные уроки