Machine Learning
Support Vector Machines
Three steps to the support vector machine
The SVM was built in three steps over three decades. In 1963 Vladimir Vapnik and Alexey Chervonenkis developed VC theory, the statistical framework explaining why a wide margin between classes leads to better generalization. In 1992 Bernhard Boser, Isabelle Guyon, and Vapnik added the kernel trick: by replacing dot products with kernel functions, a linear separator could carve out nonlinear boundaries without ever computing high-dimensional coordinates. Then in 1995 Corinna Cortes and Vapnik introduced the soft-margin SVM, allowing a few misclassifications so the method could cope with noisy, overlapping data. The result dominated practical classification until deep learning took over a decade later.
You are a security guard in a museum and need to stretch a rope between two groups of exhibits so that the distance from the rope to the nearest exhibit is maximized. The larger the gap - the more reliable the separation: a random visitor won't confuse the zones. That is exactly how SVM works - it doesn't just look for any boundary between classes, but for a boundary with the maximum safety margin. And if the exhibits are mixed and a rope cannot separate them? Then SVM uses the kernel trick - a mathematical trick that allows separating the inseparable.
- **Handwritten digit recognition** - SVM was the standard for the MNIST task (70,000 images of digits 0-9) before deep learning, achieving 98.5% accuracy with RBF kernel, and is still used as a baseline in image classification tasks
- **Text classification** - spam filters, sentiment analysis of reviews, news categorization: SVM with linear kernel handles thousands of features (words) and remains one of the best methods for text tasks with limited data