Machine Learning
Decision Trees
Three lineages of the decision tree
Modern decision trees grew from three separate roots. In 1963 the social scientists James Morgan and John Sonquist built AID (Automatic Interaction Detection), an early program that split survey data into subgroups to study social patterns. In 1984 the statisticians Leo Breiman, Jerome Friedman, Richard Olshen, and Charles Stone published CART (Classification and Regression Trees), giving the field a rigorous statistical foundation with binary splits and cost-complexity pruning. In parallel, the computer scientist Ross Quinlan came from the AI side: his ID3 algorithm (1986) used information gain to choose splits, and his 1993 successor C4.5 added handling of continuous features, missing values, and pruning. Together these lines define how trees are grown today.
When a doctor makes a diagnosis, they ask a series of questions: is there a fever? If yes - above 38? Is there a cough? Dry or wet? Each answer narrows the list of possible diagnoses. Decision trees work exactly the same way - they split data with a series of questions, from general to specific. But how does the algorithm decide which question to ask first? Why is asking about temperature more useful than asking about eye color? The mathematics behind this is Information Gain and Gini Impurity - metrics that measure how much each question reduces uncertainty.
- **Bank credit scoring:** decision trees at major banks determine whether to approve a loan - and unlike neural networks, they can explain why: "rejected because income < 30k and late payments > 2 in a year". Regulators require exactly this kind of transparency
- **Medical diagnosis:** a tree classifier in the emergency room triages patients by urgency: if pulse > 120 and blood pressure < 90 - immediate care. Simplicity of interpretation saves lives when there is no time to wait for a complex model