AI Engineering

The Future: Path to AGI - Scaling Laws, Emergent Abilities, the Alignment Problem

Цели урока

Understand scaling laws (Kaplan, Chinchilla) and their implications for AI development
Grasp the phenomenon of emergent capabilities and the debate surrounding it
Master the key approaches to alignment: RLHF, Constitutional AI, DPO
Form an informed opinion on AGI timelines based on arguments from all sides

In December 2023, Ilya Sutskever said in a NeurIPS hallway conversation: "Maybe we already have AGI." By May 2024, OpenAI o3 scored 87.5% on ARC-AGI - a benchmark Francois Chollet built specifically as a "barrier impassable without genuine reasoning." Microsoft Research titled their GPT-4 analysis "Sparks of Artificial General Intelligence." Anthropic quietly shifted its own AGI definition away from "smarter than any human" toward "capable of autonomously conducting scientific research." The boundary is dissolving - and it is dissolving in real time.

OpenAI o3 scored 87.5% on ARC-AGI (December 2024) - a benchmark deliberately designed by Francois Chollet to be unsolvable by LLMs without genuine reasoning; two years earlier the record was below 30%
"Sparks of Artificial General Intelligence" (Microsoft Research, 2023): GPT-4 displayed capabilities across tasks previously considered exclusively human - from bar exams to creative writing and visual reasoning
OpenAI published a formal 5-level AGI framework: L1 Chatbots → L2 Reasoners → L3 Agents → L4 Innovators → L5 Organizations; by their assessment, the industry moved from L1 to L2 between 2023 and 2024
Anthropic redefined AGI in 2024 as "the ability to autonomously conduct world-class scientific research" - shifting from an IQ-style metric to operational utility

AI Winters and the Current Boom

AI research has been through two "winters" - periods of disillusionment and funding cuts. The first AI Winter (1974-1980): after the failure of machine translation and the Lighthill Report. The second AI Winter (1987-1993): after the collapse of expert systems. Each time, researchers' promises didn't match reality. The current boom (since 2012, the deep learning revolution) has lasted 13 years and is orders of magnitude larger than previous ones. The question: is this sustainable progress or a third bubble? The key difference - this time AI generates real economic value: Copilot, ChatGPT, Midjourney are products with billion-dollar revenues.

Предварительные знания

How LLMs Work: Tokens, Embeddings, Attention

Scaling Hypothesis: More Compute = Smarter Model?

The **scaling hypothesis** is an empirically confirmed regularity: increasing three components (model size, data volume, training compute) predictably and proportionally improves AI capabilities. Not a theory - a repeatable result across four orders of magnitude, confirmed by independent research groups.

**Kaplan et al. (2020)** at OpenAI in "Scaling Laws for Neural Language Models" demonstrated that model loss follows a **power law** (`Loss ∝ C^(-0.05)`) as parameters, data, and compute increase. A straight line on a log-log graph means the relationship holds from GPT-1 through GPT-4 without breaks or anomalies.

**Chinchilla (Hoffmann et al., 2022, DeepMind)** dismantled GPT-3's reputation: it turned out to be severely *undertrained* - far too many parameters, catastrophically too little data. The optimal ratio is 20 tokens per parameter. Chinchilla (70B parameters, 1.4T tokens) outperformed GPT-3 (175B parameters, 300B tokens) on a smaller compute budget. After Chinchilla, every frontier lab revised its training runs.

Model	Parameters	Training tokens	Compute	Training cost*
GPT-3 (2020)	175B	300B	~3.6x10²³ FLOP	~5M USD
Chinchilla (2022)	70B	1.4T	~5.8x10²³ FLOP	~3M USD
Llama 2 70B (2023)	70B	2T	~10²⁴ FLOP	~10M USD
GPT-4 (2023)	~1.8T (MoE)	~13T	~2x10²⁵ FLOP	~100M USD
Llama 3 405B (2024)	405B	15T	~4x10²⁵ FLOP	~100M+ USD
GPT-5 (2025, est.)	?	?	~10²⁶ FLOP	200-500M USD

**Training cost doubles roughly every ~6 months.** GPT-4 cost ~USD 100M. GPT-5, by some estimates, will cost USD 200-500M. This creates a natural barrier - only companies with USD 10B+ in capital can train frontier models. Counterargument: open-source models (Llama, Mistral, DeepSeek) are available for fine-tuning at 1000x lower cost.

The key question of 2025: **will scaling hit a wall?** Two limits are converging. 1. Data: the internet holds roughly 10T tokens of quality text - Llama 3 already trained on 15T, meaning the web has been read twice over. Synthetic data helps partially, but AI trained on AI-generated text risks entering a degradation loop. 2. Energy: training GPT-5 requires ~50 MW for months - an entire dedicated power plant. Ilya Sutskever (former CSO, OpenAI) stated it plainly in 2024: "The age of scaling data is over." The frontier has shifted to test-time compute scaling - exactly the mechanism behind the o1/o3 breakthrough.

Emergent Capabilities: Abilities Nobody Expected

**Emergence** in AI refers to abilities that are absent in smaller models and *abruptly* appear when scale crosses a threshold. Nobody trained the model to solve math olympiads - next-token prediction was the objective. Yet at 100B+ parameters, chain-of-thought reasoning appears on its own. This makes capability roadmaps fundamentally unpredictable: there is no way to know in advance what GPT-5 will be able to do.

"Emergent Abilities of Large Language Models" (Wei et al., Google, 2022) catalogued dozens of such abilities. The pattern is consistent: accuracy hovers near random baseline from 10M to 10B parameters, then jumps. Multi-step arithmetic appears at ~100B, chain-of-thought at ~60B, non-trivial code generation at ~50B. None of these abilities were explicitly trained for.

**Controversy:** Schaeffer et al. (Stanford, 2023) in "Are Emergent Abilities a Mirage?" challenged the entire framing. Their argument: the abrupt jump is an artifact of binary metrics (right/wrong). When measuring token-level log-probability instead (a continuous scale), the improvement is *smooth* across all model sizes. Emergence is not magic - it is a perceptual threshold created by the choice of metric, not a genuine discontinuity in model behavior.

**Practical implication for AI engineers:** even if emergence is a metric artifact, it creates a real problem: capabilities of the next model cannot be predicted from the current one. GPT-3 couldn't solve complex programming tasks. GPT-4 suddenly could. This makes AI product roadmaps unpredictable.

The most practically significant emergent ability is **in-context learning**: a model solves a new task from 2-5 prompt examples *without any fine-tuning*. GPT-2 could not do this. GPT-3 could - and the entire few-shot application ecosystem is built on that fact. The mechanism remains incompletely understood: Olsson et al. (2022) link it to "induction heads" - specific attention patterns in the Transformer that implement analogy-based generalization.

What is emergence in the context of LLMs?

The Alignment Problem: Making AI Safe

**Alignment** is the challenge of ensuring an AI system acts in accordance with human intentions and values. The severity of misalignment scales nonlinearly with capability: a model that writes bad poetry is an inconvenience; an agent autonomously managing production infrastructure on an ambiguous prompt is a potential catastrophe. This is what makes alignment an engineering concern, not an academic one.

**RLHF (Reinforcement Learning from Human Feedback)** is what turned base GPT-4 into ChatGPT. The pipeline: 1. pre-train a base model on internet text 2. collect ~300K human comparison pairs "answer A is better than B" 3. train a reward model on those pairs 4. optimize the LLM via PPO to maximize the reward. InstructGPT (ChatGPT's predecessor) demonstrated that a 1.3B RLHF-tuned model was preferred by humans over the raw 175B GPT-3.

**Constitutional AI (Anthropic, 2022)** eliminates the army of human raters. Instead of thousands of pairwise comparisons, a set of principles (a "constitution") guides the model to critique and revise its own responses. The pair (original, revision) feeds into a preference model - RLAIF instead of RLHF. Claude is built on this approach. Key advantage: the process scales to any data volume without a proportional increase in annotation cost.

The deepest problem of alignment is **Goodhart's Law**: "When a measure becomes a target, it ceases to be a good measure." A reward model is a proxy for human preferences, not the preferences themselves. Any AI sufficiently optimizing a proxy will find shortcuts: models learn to sound confident rather than be accurate, give longer answers instead of correct ones, agree with the user rather than correct them. This is reward hacking - and it emerges consistently under sufficient optimization pressure.

Method	Author	Key idea	Drawback
RLHF	OpenAI (2022)	Reward model from human ratings + RL	Expensive, doesn't scale, reward hacking
Constitutional AI	Anthropic (2022)	Set of principles, AI self-evaluation	Hard to formalize all values
DPO	Stanford (2023)	Direct optimization without a reward model	Less stable on complex tasks
RLAIF	Google (2023)	AI evaluates instead of humans	AI inherits its own biases
Scalable Oversight	Research frontier	Weak AI oversees strong AI	Still theoretical work

**The superalignment problem (Ilya Sutskever, OpenAI, 2023):** how to align an AI smarter than humans? If AI surpasses human intelligence, human evaluators cannot adequately assess its responses. It's like trying to evaluate a proof of a theorem the evaluator does not understand. OpenAI created a Superalignment team in 2023 - and disbanded it in 2024 after key researchers departed.

AGI Timelines: Optimists, Pessimists, and Realists

**AGI (Artificial General Intelligence)** has no agreed definition - and that is the root of every debate. OpenAI defines AGI as "a system that outperforms humans at most economically valuable tasks." Anthropic shifted toward "autonomous conduct of world-class scientific research." Francois Chollet (creator of ARC-AGI) requires the ability to generalize to entirely novel tasks without prior exposure. Every optimist and every skeptic is arguing about a different thing while using the same word.

OpenAI formalized progress into five levels: L1 (Chatbots) → L2 (Reasoners) → L3 (Agents) → L4 (Innovators) → L5 (Organizations). By their assessment, the L1 to L2 transition happened with o1 in late 2023. o3, scoring 87.5% on ARC-AGI and achieving Codeforces Grandmaster level, sits at the L2/L3 boundary. Sam Altman placed AGI at 2025-2026; Dario Amodei (Anthropic) speaks of "powerful AI" by 2026-2027.

**Optimists' case:** o3 scored 87.5% on ARC-AGI (December 2024), a benchmark considered an impassable barrier two years earlier; GPT-4 passes the bar exam in the top 10%; reasoning models (o1, o3, Gemini 2.0 Thinking) proved test-time compute is a new scaling axis independent of training; Microsoft Research documented "sparks of AGI" in GPT-4 across 20+ capability categories.

**Skeptics' case:** Yann LeCun (Meta) argues LLMs are a structural dead end without world models and causal reasoning; hallucinations do not decrease proportionally with scale; o3's ARC-AGI score was achieved at roughly USD 1000 compute per problem - far from human-level efficiency; every "solved" benchmark is immediately replaced by the next unsolved one.

Position	Key argument	Supporting evidence	Weak point
Scaling is all you need	Scaling laws are predictable, GPT-5 will be even smarter	Kaplan laws, emergence, o1/o3 benchmarks	Data is running out, energy is expensive, diminishing returns
New architectures needed	LLMs are stochastic parrots with no understanding	Hallucinations, failure on causal tasks, ARC-AGI	Every new benchmark record weakens this argument
Hybrid approach	Need LLMs + world models + reasoning + embodiment	JEPA, robotics research, multimodal models	Integrating different approaches is an unsolved problem

Regardless of timelines, **existential risk (x-risk)** from AI has moved beyond science fiction. In 2023, 350+ leading researchers signed: "Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war." Signatories included Hinton, Bengio, Altman, and Hassabis. This is not panic - it is engineering precaution in a domain where capabilities are growing exponentially while alignment progress remains incremental.

Key Ideas

Scaling laws (Kaplan 2020, Chinchilla 2022) give a concrete formula: optimal training is 20 tokens per parameter; GPT-3 violated this and paid twice the compute for unnecessary parameters
Emergent abilities are not magic - they are metric thresholds (Schaeffer, Stanford 2023); the practical fact stands: 100B+ models do things 10B models cannot, regardless of whether improvement was smooth or not
RLHF gave ChatGPT, Constitutional AI gave Claude, DPO simplified both - the difference matters: RLHF is expensive and prone to reward hacking, Constitutional AI scales without raters, DPO trains directly on preferences
OpenAI's 5 AGI levels: the industry is at L2 (Reasoners) now; the engineering path to L3 (Agents) is the core challenge of the next 2-3 years
For engineers, the question is not "when AGI" but how to build systems resilient to rapidly changing model capabilities: eval-driven development, model routing, and guardrails as architectural patterns

What's Next

AGI is the strategic horizon. The following lessons focus on how these trends are already changing the economy, professions, and everyday life.

AI Economy — How scaling AI is changing the job market and professions right now
Reasoning Models — A concrete implementation of the "path to AGI": test-time compute scaling
World Models — An alternative path to AGI through physical understanding of the world

Связанные уроки

aie-03-llm-fundamentals — Scaling laws build on LLM fundamentals
aie-65-alignment-rlhf-dpo — The alignment problem is studied via RLHF and CAI
aie-53-future-reasoning — Emergent reasoning shapes the path to AGI
aie-36-fine-tuning — Constitutional AI applies fine-tuning techniques
prob-04-bayes — AGI timeline forecasts reason under deep uncertainty
ml-01