Machine Learning

What Is Machine Learning

Every time Netflix recommends a film, Google translates a text, or a bank blocks a suspicious transaction in milliseconds - machine learning is behind it. But what does it actually mean for a computer to "learn"? It has no brain, no experience, no intuition. And how is it different from regular programming, where a developer writes every rule by hand?

**Google Translate** processes 100+ billion words daily in 130 languages - and its quality improved more in 10 years than in the previous 50 years of hand-written linguistic rules
**Fraud detection** in banks analyzes every transaction in 50 ms, catching fraud with 99.5% accuracy - saving $30+ billion in losses per year worldwide
**AlphaFold** by DeepMind solved a 50-year-old biology problem - predicting the 3D structure of a protein from its amino acid sequence - in minutes instead of months of lab experiments

Arthur Samuel coins the term

The phrase "machine learning" traces back to Arthur Samuel, an IBM engineer who in 1959 built a checkers program that played thousands of games against itself and grew strong enough to beat him. Samuel never coded the strategy; the program worked it out from experience. The groundwork came earlier: in 1950 Alan Turing's paper "Computing Machinery and Intelligence" asked whether machines could think and proposed what we now call the Turing Test, and in 1958 Frank Rosenblatt unveiled the perceptron, the first model that learned to recognize patterns by adjusting its own weights. Those three ideas, a learning definition, a test for intelligence, and a trainable model, set the agenda for everything that followed.

Defining Machine Learning

In 1959, Arthur Samuel - an IBM engineer - gave a definition still used today: **Machine Learning is the field of study that gives computers the ability to learn without being explicitly programmed.** Samuel wrote a checkers program that played itself thousands of times and eventually learned to beat its creator. He never programmed strategy - the program *found it* on its own.

In 1997, Tom Mitchell formalized this more precisely: *"A computer program is said to learn from experience E with respect to task T and performance measure P, if its performance on T, measured by P, improves with experience E."* Academic-sounding, but the idea is simple: the program **gets better with practice**, like a person.

**Mitchell's definition in action:** Gmail's spam filter. - **Task T:** classify an email as spam or not spam - **Experience E:** millions of emails labeled by users (the 'Report spam' button) - **Performance P:** percentage of correctly classified emails The more labeled emails - the more accurate the filter. That's "learning."

What does "learning" mean mathematically? ML searches for a **function f(x) -> y** that maps input x to the correct output y. Learning is the process of finding f such that it **minimizes error** on available data. When the function makes mistakes, the algorithm adjusts its parameters and tries again.

**No Free Lunch Theorem (Wolpert, 1996):** there is no single ML algorithm that is best for *all* problems. An algorithm perfect for face recognition may be useless for predicting stock prices. The choice of approach always depends on the data and the task.

What fundamentally distinguishes ML from traditional programming?

ML History: From the Perceptron to Deep Learning

The history of ML is not a smooth progression - it's **a series of breakthroughs, disappointments, and revivals**. Twice the field experienced "AI winters" - periods when funding dried up and researchers moved on. And twice, new ideas brought it back to life.

In **1943**, Warren McCulloch and Walter Pitts proposed a mathematical model of a neuron - a simple function that takes several inputs and produces one output. In **1957**, Frank Rosenblatt built the **perceptron** - the first machine capable of *learning* to recognize patterns. The New York Times wrote: "The embryo of an electronic computer that is expected to be able to walk, talk, see, and... be conscious of its existence."

But in **1969**, Marvin Minsky and Seymour Papert published *"Perceptrons"*, mathematically proving that a single-layer perceptron cannot even solve XOR. This triggered the **first AI winter**: neural network funding evaporated, researchers moved to other fields.

Revival came in **1986**: Rumelhart, Hinton, and Williams published **backpropagation**, enabling training of *multi-layer* networks. But 1980s computers lacked the power, and the **second winter** followed. The turning point: **2012**, when AlexNet won the ImageNet challenge, cutting the error rate from 26% to 15.3%. This launched the **deep learning** revolution that continues today.

**Three factors that changed everything after 2012:** 1. **Data** - the internet generated petabytes of labeled data (ImageNet - 14 million images) 2. **Compute** - GPUs turned out to be ideal for the matrix operations neural nets require 3. **Algorithms** - dropout, batch normalization, and residual connections solved deep training problems Without all three together, deep learning wouldn't have happened.

In **2017**, a Google team published *"Attention Is All You Need"*, introducing the **Transformer** architecture. This became the foundation for GPT, BERT, Claude, and all modern language models. In **2022**, ChatGPT brought ML to the masses: the product reached 100 million users in 2 months - faster than any app in history.

Why did the first AI winter follow the publication of Minsky and Papert's "Perceptrons" (1969)?

ML Applications in the Real World

Today ML is not experimental technology - it's **the backbone of products used by billions of people**. Opening a social feed, searching on Google, or getting a Netflix recommendation - each interaction is powered by an ML model trained on petabytes of data.

**Computer Vision:** In 2020, a Google Health ML model detected breast cancer in mammograms better than radiologists - **11.5%** fewer false negatives. Tesla Autopilot processes 8 cameras simultaneously, analyzing road conditions 36 times per second. Face ID on iPhone uses a neural network robust to beard growth and glasses.

**Recommendation systems** generate a massive share of revenue for the world's largest companies. Netflix estimates their recommendations are worth **$1 billion per year** in subscriber retention. YouTube: **70%** of watched content comes from recommendations. Spotify Discover Weekly analyzes the behavior of 600+ million users to create a personalized 30-track playlist every Monday.

**ML is already everywhere:** - **Smartphone keyboard** - predicts the next word (a language model) - **Google Maps** - estimates travel time based on data from millions of drivers - **Banking app** - blocks a suspicious transaction in 50 ms - **Shazam** - identifies a song from a 10-second clip among 100+ million tracks - **Medicine** - DeepMind's AlphaFold predicted the structure of 200+ million proteins, accelerating drug discovery by years

ML/AI will soon replace all programmers and most professions

ML automates routine tasks, but still requires people to define problems, prepare data, interpret results, and provide ethical oversight

ML models operate in narrow domains (narrow AI), have no general intelligence (AGI), and their quality critically depends on data and human problem framing

Which statement about the relationship between AI, ML, and Deep Learning is correct?

Key Takeaways

**The core of ML:** instead of writing rules by hand, an algorithm extracts patterns from data (data + answers = model)
**ML history** is 80 years of breakthroughs and AI winters: from the perceptron (1957) through backpropagation (1986) and AlexNet (2012) to Transformer and ChatGPT
**AI, ML, Deep Learning** - nested sets: every DL is ML, every ML is AI, but not vice versa
**ML is everywhere:** from smartphone keyboards to cancer diagnostics - powering Netflix recommendations and enabling banks to block suspicious transactions in milliseconds

Вопросы для размышления

If an ML model is trained on historical data, can it predict truly novel events? What is the core limitation of the 'learn from experience' approach?
What ethical problems arise when an ML model makes decisions about people - like approving a loan or hiring for a job?
Why did ML 'explode' in the 2010s, even though the core ideas (neural networks, backpropagation) existed since the 1980s?

Связанные уроки

ml-02-types — Next step: types of learning (supervised, unsupervised, RL)
prob-01-intro — Probabilities and distributions - the language of ML
la-01-vectors-intro — Vectors and matrices - math of features and weights
stat-01-sampling — Sampling and estimation - how ML works with data
aie-01-ai-for-backend-dev — Applying ML models in real production systems
st-01-feedback-loops — Feedback loop as the archetype of learning