AI Engineering
AI Agents: ReAct, Planning, Memory, Observe-Think-Act Loop
Цели урока
- Understand the architecture of an AI agent and how it differs from a chain and a simple API call
- Implement the ReAct pattern (Yao et al. 2022): the Thought → Action → Observation cycle
- Design short-term and long-term memory for agents
- Apply planning strategies: task decomposition, plan-and-execute, self-reflection
- Protect an agent from typical errors: loops, cost explosion, hallucinated tools
Предварительные знания
- Tool calling: the request → execute → respond cycle
- JSON Schema for describing functions
An agent is an LLM in a loop. The model acts, observes the result, decides what to do next. In theory. In practice, the agent gets stuck in a loop, loses context at step 15, and confidently does the wrong thing. That's not a bug - it's an architectural problem. March 2024: Cognition demos Devin - the "first AI software engineer." The video hits 10 million views in 48 hours. Under the hood: not magic. The same GPT-4, but running in a ReAct loop with planning and memory. The agent architecture is what separates "smart autocomplete" from a system that can actually finish a task.
- Cursor - an agent with codebase access: reads files, runs tests, interprets compiler errors through a ReAct loop
- Devin (Cognition, 2024) - plans, codes, tests, and deploys in an autonomous cycle without human intervention
- AutoGPT (2023) - the first viral open-source agent, 150K GitHub stars in a month - and a vivid demonstration of failure modes without guardrails
- Claude Code (Anthropic) - agentic mode for working with codebases: navigation, editing, running commands, loop detection
How Agents Emerged
**Yao et al. 2022** - the paper "ReAct: Synergizing Reasoning and Acting in Language Models." A simple idea - interleave reasoning and actions - pushed accuracy on HotpotQA by +21% over act-only. This became the foundation of all agent architecture. **April 2023** - AutoGPT. 150K GitHub stars in the first month. People expected AGI. They got an agent that looped forever, spent `USD 50` per run, and finished nothing. But AutoGPT proved that guardrails aren't optional. **March 2024** - Devin. Cognition releases a demo: the agent receives a task, opens an IDE, writes code, runs tests, reads errors, fixes them - and deploys. No human clicks. Under the hood: ReAct + Plan-and-Execute + memory. Nothing magical. Just engineering.
What Is an AI Agent: How It Differs from a Simple API Call
An agent is an LLM in a loop. The model acts, observes the result, decides what to do next. In theory. In practice, the agent gets stuck in a loop, loses context at step 15, and confidently does the wrong thing. That's not a bug - it's an architectural problem.
A regular LLM call is **one question → one answer**. Even with tool calling: user asks about the weather - model calls a function - returns the result. But the task "find the cheapest flight to Barcelona, check the hotel is near the beach, and book everything" is a **chain of decisions** where each step depends on the previous one. No stateless API handles that.
An **AI agent** is a program where the LLM works in a **loop**: receives a task, reasons, picks an action, observes the result, reasons again. Until the goal is reached. The key difference from a chain: the agent decides the next step on its own rather than following a hardcoded route. That autonomy is what makes Cursor, Devin, and Claude Code fundamentally different from autocomplete.
| Characteristic | Simple API Call | Chain | Agent |
|---|---|---|---|
| Number of steps | 1 | Fixed (2-5) | Dynamic (1-N) |
| Who decides the next step | No next step | Developer (hardcoded) | LLM based on context |
| Tool usage | 0-1 tool call | Specific tools in a specific order | Any tools in any order |
| Error handling | Crash or retry | Crash at step N | Rethinking and finding an alternative path |
| Example | "What's the weather?" | "Find → filter → format" | "Plan a trip from A to Z" |
The agent loop is a loop with an iteration limit where the LLM decides at each iteration: (a) call a tool and continue, or (b) return a final answer. It is precisely this "right to choose" that makes the system an agent rather than a chain. MAX_ITERATIONS is not an implementation detail. It is the first line of defense against infinite loops.
What fundamentally distinguishes an AI agent from a chain of prompts?
ReAct: Reasoning + Acting in a Single Loop
**ReAct** (Reasoning + Acting) is a pattern from Yao et al. 2022, and it became the standard for AI agents. The idea is deceptively simple: the model alternates between **reasoning** (Thought) and **actions** (Action), and after each action receives an **observation** (Observation). The cycle: Think → Act → Observe → Think → Act → Observe → ... → Final Answer. On HotpotQA, this pushed accuracy from 52% to 73% compared to act-only approaches.
Without ReAct, the model would try to answer from its training knowledge - and would likely hallucinate. ReAct forces the model to **reason explicitly** before each action and **adjust the plan** based on observations. This is exactly how Cursor works: it doesn't just insert code - it reads files, runs the compiler, interprets errors, reasons again.
| Approach | Accuracy (HotpotQA) | Wasted steps | Characteristic |
|---|---|---|---|
| Direct answer (no tools) | 28% | 0 | Hallucinations due to lack of facts |
| Act-only (tools without reasoning) | 52% | Many | Mindless tool trial-and-error |
| ReAct (Think + Act) | 73% | Few | Reasoning guides the search |
| ReAct + Self-Reflection | 81% | Minimal | Error analysis between steps |
ReAct works better when the system prompt explicitly requires reasoning. Without the instruction "reason before acting," the model often skips the Thought phase and calls tools randomly - leading to unnecessary calls and errors.
What three phases alternate in a ReAct agent cycle?
Agent Memory: Short-Term and Long-Term
An agent without memory is 128K tokens melting away with every step. GPT-4o: 128K token context. With active tool use, 50-60 iterations fill it completely. At step 61 the agent literally doesn't remember the beginning of the task - and starts circling, expensively and confidently.
**Short-term memory** (the messages array in context) stores the current task and intermediate results. **Long-term memory** (external storage - vector DB or database) persists knowledge across sessions. MemGPT (2023) attacks exactly this problem: an OS-like virtual memory where the agent manages what stays in context and what gets paged out.
| Memory Type | Storage | Lifetime | Example |
|---|---|---|---|
| Short-term (working) | Messages array in context | One session / task | Intermediate tool call results, reasoning |
| Long-term (episodic) | Vector DB, file, database | Across sessions | "Last time, searching via Aviasales worked better" |
| Long-term (semantic) | Knowledge base, embeddings | Permanent | Product documentation, FAQ, company policies |
| Procedural | Tool code, prompts | Permanent | How to call an API, what data format to use |
The context window is the agent's main enemy. GPT-4o has 128K tokens, but with active tool usage 50-60 steps fill the context completely. Without a compaction strategy (summarization, sliding window), the agent forgets the beginning of the task and starts going in circles.
An agent is solving a long task, and the messages array has grown to 80 messages. The context window is 90% full. What should be done?
Planning Strategies: Decomposition and Tree-of-Thought
A plain ReAct agent acts **reactively**: step by step, without an overall plan. Like a developer who opens a ticket and immediately starts writing code with no architecture in mind. For complex tasks - "plan a database migration" or "prepare a competitive analysis across 5 companies" - a **planning strategy** is needed: build the plan first, then execute it, adjusting along the way.
- **Task Decomposition** - breaking a task into subtasks. "Prepare a report" → ["Gather data A", "Gather data B", "Compare", "Format"]
- **Plan-and-Execute** - first a full plan, then step-by-step execution with the ability to adjust
- **Tree-of-Thought (ToT)** - for each step, multiple options are generated, each is evaluated, and the best is chosen
- **Reflection/Self-Critique** - after each step, the agent evaluates progress: "Does this result bring us closer to the goal?"
Plan-and-Execute is effective for tasks with a clear structure: research, comparison, report generation. For creative and open-ended tasks, pure ReAct works better - a rigid plan hinders adaptation.
An agent is given the task: "Compare 3 cloud providers by price, performance, and support." Which planning strategy is most appropriate?
Agent Errors: Loops, Hallucinations, Cost Explosion
AutoGPT launched in April 2023 and hit 150K GitHub stars in one month. Everyone wanted an autonomous agent. Everyone ran it. Almost nobody finished a task: agents got stuck in loops, hallucinated function names, burned through `USD 50` per run, and crashed on timeout. That was the first mass lesson: **an agent without guardrails is not a feature, it's a liability.**
Real statistics from early autonomous agent testing: 30% of sessions end in an infinite loop, 15% - calling nonexistent tools, 8% - cost explosion (a single request exceeds `USD 50`). This isn't an edge case. It's the default behavior without protection.
| Problem | Symptom | Cause | Solution |
|---|---|---|---|
| Infinite loop | Agent repeats the same actions | Tool result provides no new information | MAX_ITERATIONS + repeat detection |
| Hallucinated tools | Calling a tool with a name that doesn't exist in the list | The model "invented" a function | Name whitelist + fallback message |
| Cost explosion | Bill of USD 50+ for a single task | Too many iterations, long context | Budget per task + cost tracking |
| Stuck state | Agent makes no progress but doesn't finish | No suitable tool for the next step | Timeout + forced termination prompt |
| Wrong tool selection | Calling search instead of calculate | Poor tool descriptions | Improve descriptions, add examples |
The golden rule: **an agent without limits is a liability, not a feature**. Always set maxIterations, token budget, and timeout. In production, add alerts: if the agent uses more than 50% of the budget - log it; if 100% - graceful termination.
An agent calls search("best restaurants") with the same arguments for the third time in a row. What is happening and how should it be handled?
Agent = a smarter ChatGPT that will handle everything on its own - just give it a task
An agent without guardrails will break production: loop indefinitely, exceed the budget, call nonexistent functions, and return a confidently wrong answer
ChatGPT gives one answer. An agent makes dozens of decisions in a loop - and each decision can be wrong. Without maxIterations, the agent loops forever. Without a token budget, a single run can cost `USD 50+`. Without loop detection, the agent calls the same tool hundreds of times. Autonomy is an engineering responsibility, not magic.
Key Takeaways
- AI agent = LLM in a loop that decides the next step on its own - unlike a chain with a fixed hardcoded route
- ReAct (Yao et al. 2022): Think → Act → Observe - reasoning before action pushes accuracy from 52% to 73%
- Short-term memory (messages, 128K tokens) and long-term memory (vector DB) are both needed; without compaction, context fills up at 50-60 steps
- Plan-and-Execute for structured tasks: a plan of subtasks, each executed by a ReAct sub-agent
- A production agent must have: maxIterations, token budget, timeout, loop detection, graceful termination - an agent without limits is a liability
Вопросы для размышления
- AutoGPT failed despite the hype in 2023 - which specific failure modes were not handled? How do modern agent frameworks address them?
- Cursor behaves like an agent, but users don't notice - how is that UX achieved? What hides behind the surface?
- When is an agent with Plan-and-Execute worse than plain ReAct? Describe a scenario where a rigid plan hurts.
What's Next
The agent architecture is clear. But writing an agent loop from scratch for every project is tedious. Frameworks exist that abstract common tasks: state management, tool orchestration, memory.
- Agent Frameworks — LangGraph, CrewAI, Vercel AI SDK - ready-made frameworks for building agents
- Multi-Agent Systems — When one agent isn't enough - multiple specialized agents work together
- RAG and Knowledge Base — Long-term memory through retrieval - how an agent accesses a knowledge base
Связанные уроки
- aie-16-tool-calling — Agents act in the world through tool calls
- aie-18-agent-frameworks — Patterns here are codified by agent frameworks
- aie-12-rag-fundamentals — Long-term memory uses retrieval over a knowledge base
- aie-19-multi-agent — A single agent loop scales into multi-agent systems
- ml-48-rl-intro — The observe-act loop mirrors the RL agent-environment loop
- alg-13-dfs
- alg-14-dijkstra