AI Engineering
Reasoning Models: o3, o4, Extended Thinking - How Next-Gen Models Think
Цели урока
- Understand test-time compute scaling and how it differs from train-time scaling
- Grasp the architecture of reasoning models (o1, o3, DeepSeek-R1)
- Learn to identify tasks where reasoning models deliver a dramatic advantage
- Master model routing and escalating reasoning patterns for production systems
o4-mini passes AIME at the 99th percentile of humans. Claude 4 Opus solves PhD-level chemistry and biology problems - not because it memorized the answer, but because it thinks out loud for several minutes. DeepSeek R2 does the same in open-source. This is not the future - this is production 2026. Reasoning models are already embedded in Cursor, GitHub Copilot, Notion AI. The question is not 'will they arrive' - but 'how many reasoning tokens does this specific task cost'.
- o4-mini (OpenAI, 2025) in production: used in Cursor to analyze complex bug reports where standard generation produced errors
- Claude 4.x Extended Thinking (Anthropic, 2025-2026) in production for code audits and architecture reviews - the thinking process is visible and budget-controllable
- DeepSeek R2 - open-source reasoning at frontier level, available for self-hosting: reasoning without vendor lock-in is real
- Reasoning token cost is the key 2026 metric: o4-mini is 10x cheaper than o3 at comparable quality on most tasks - routing by task type saves 70-90% of the budget