AI Engineering

Production Prompt Patterns: system/user/assistant, Few-Shot, Chain-of-Thought

Цели урока

Write structured system prompts - with sections, rules, and format
Use few-shot examples for stable output
Apply Chain-of-Thought for tasks requiring logic
Choose an output formatting strategy: JSON mode, Zod, XML tags
Build prompt templates for production - reusable and testable

Предварительные знания

LLM API Integration

API Integration

Prompt engineering is not an art. It's an interface to probabilistic computation - and it has strict rules. Structure the model has seen in training data. Examples that shift the output distribution. Phrases that activate the right "chains" through the weights. The quality gap between a naive and an engineered prompt reaches 40%. Same model. Same money. Different result.

Notion AI - 50+ prompt templates for different tasks (summary, translate, brainstorm), all A/B tested like code
Cursor - chain-of-thought in prompts improved code autocomplete accuracy by 30% without touching the model
Stripe - few-shot examples for ticket classification hit 95% accuracy without fine-tuning (Brown et al. 2020 in the wild)
GitHub Copilot - a structured system prompt of 2000+ tokens sets repository context; not a "hint" - a specification

Five Words That Changed ML

2022. Jason Wei (Google Brain) publishes "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models" - show the model a few worked examples with reasoning written out, and accuracy on math problems climbs sharply. Months later Kojima et al. 2022 found the trick needs no examples at all: just append "Let's think step by step" and accuracy on MultiArith jumps from about 18% to 79%. The model didn't change. Only the prompt. Until then, the assumption was: better results require a bigger model or fine-tuning. Chain-of-Thought revealed a third path: **ask correctly**. And earlier, Brown et al. 2020 (GPT-3) found few-shot learning - the model learns from in-context examples with zero gradient steps. Both discoveries underlie every production prompt written today.

System Prompt: Architecture, Not a Hint

One developer writes a one-line system prompt: "You are a helpful assistant." Another structures it as a **specification** - with sections, constraints, and format. The second approach gets better results - not because the model is different, but because a well-structured prompt aligns with what the model was trained on.

Why does this work? The model was trained on billions of documents - and most of them are structured: READMEs, specs, API docs, markdown files. When a prompt looks like a structured document, it lands in the statistical "zone of the familiar" - and instruction-following improves sharply. Not magic. Statistics.

**Role** - who the model is, what context it operates in
**Rules** - what it can and can't do (constraints)
**Tone** - communication style
**Format** - what the response should look like
**Examples** (optional) - 1-2 examples of an ideal response

**System prompt does NOT guarantee behavior.** Users can "convince" the model to break the rules (prompt injection). Don't rely on the system prompt as a security boundary - validate output on the backend.

Why does a structured system prompt (with sections like ## Role, ## Rules) work better than a single line?

Few-Shot: Teaching the Model by Example

Brown et al. 2020 - the GPT-3 paper - discovered a phenomenon they called **few-shot learning**: show the model 2-5 "input → output" examples directly in the prompt and it immediately grasps the pattern. No training. No gradient update. Just context. It rewrote what "learning" means for an LLM.

**Rule: 3 examples is the sweet spot.** 1 example - the model might not catch the pattern. 5+ examples - tokens get burned without meaningful improvement. 3 examples cover positive, negative, and edge case.

**When few-shot is critical:**

Non-standard output format (specific JSON schema, CSV, XML)
Classification with custom categories (not general purpose)
Stylistic tasks - the model should mimic the style of examples
Extraction from unstructured text into a specific structure

**Store few-shot examples in a database**, not in code. This enables A/B testing of different example sets and updates without redeploying.

Building an API that classifies support tickets into 12 custom categories. Which approach is more reliable?

Chain-of-Thought: Making the Model Think Aloud

2022. Jason Wei from Google Brain showed that worked reasoning examples elicit step-by-step thinking - Chain-of-Thought prompting. Then Kojima et al. found one phrase was enough: **"Let's think step by step"**. Math accuracy jumps from 18% to 79%. Not a new architecture. Not more training data. Not fine-tuning. Five words. It became one of the most cited ML discoveries of the year.

Why it works - mechanism, not intuition. LLMs generate one token at a time. When the model "thinks aloud," the intermediate reasoning tokens become **context** for the next tokens. A scratch pad built directly into the context window. The model literally uses its own text as working memory - it has no other kind.

Task	Without CoT	With CoT	Improvement
Math (GSM8K)	~57%	~93%	+36%
Logic puzzles	~45%	~85%	+40%
Multi-step analysis	~60%	~90%	+30%
Simple classification	~95%	~95%	0% (not needed)

**CoT uses more output tokens** - reasoning takes up space. Don't use CoT for simple tasks (classification, extraction) - it's a waste of money.

Which task would benefit most from Chain-of-Thought?

Output Formatting: JSON, XML Tags, Structured Output

In production, "text" isn't enough - what's needed is **structured data**: JSON that can be parsed and stored in a database. There are several strategies to get stable output. The right choice depends on the provider and strictness requirements.

**Strategy 1: JSON mode** (OpenAI) - guarantees valid JSON:

**Strategy 2: Structured Outputs** (OpenAI) - even stricter, with a Zod schema:

**Strategy 3: XML tags** - works with any provider (Claude, open-source):

Method	Format Guarantee	Provider	When to Use
JSON mode	Valid JSON	OpenAI	Simple JSON responses
Structured Outputs + Zod	Exact schema match	OpenAI	When strict typing is needed
XML tags	No guarantee (needs validation)	Any	Multi-provider, complex responses with CoT + data
Plaintext + regex	No guarantee	Any	Simple responses (yes/no, a number)

**Best practice:** for OpenAI use Structured Outputs + Zod. For Anthropic and open-source - XML tags + backend validation. In any case - always wrap parsing in try/catch.

Building a production API that extracts structured data and must work with both OpenAI and Anthropic. Which approach?

Prompt Composition: A Template Engine for AI

In production, a prompt isn't a hardcoded string. It's a **template** where data from the request, database, and config gets injected. The prompt is assembled dynamically - like a SQL query or an HTML template. The difference: a prompt has no compiler to catch mistakes. That's why the architecture matters even more.

**Advanced pattern: prompts in files** - store templates separately from code:

**Why extract prompts?** A product manager can edit prompts via CMS/admin panel without a developer. A/B tests of different prompts - without redeploying. Prompt versioning - rollback to a previous version if quality degrades.

The main reason to use prompt templates instead of string literals in code:

The longer and more detailed the prompt, the better the result

Extra instructions add noise: the model loses focus as attention spreads across irrelevant parts of the context

Attention in a transformer is literally a distribution of weights across all tokens in the context window. A system prompt with 50 rules gives each rule less "attention" weight. The optimal system prompt is precise and minimal - only what genuinely shapes behavior. Everything else isn't help, it's noise.

The system prompt reliably restricts model behavior - it's a security boundary

A system prompt is a strong shift in the probability distribution, not a hard constraint

The model is a probabilistic machine. A clever enough user prompt can shift the distribution back. This is called prompt injection. The first CVEs have already been filed. The only reliable barrier is backend output validation, guardrails, and privilege separation.

Patterns at a Glance

System prompt - a specification, not a hint. Structure (Role → Rules → Format) works because the model was trained on structured documents
Few-shot (3 examples, Brown et al. 2020) - shifts the output distribution without fine-tuning; store examples in a database, not in code
Chain-of-Thought (Wei et al. 2022) - five words give +40% on logic tasks; intermediate tokens become a scratch pad in context
Structured Outputs (Zod) for OpenAI, XML tags + validation for multi-provider setups
Prompts in templates separate from code - A/B tests and edits without redeploys
Longer prompt does not mean better: extra instructions add noise and dilute attention

What's Next

Now it's clear how to write prompts like an engineer - as an interface to probabilistic computation with strict rules. The next step: getting the model to return data in a strict format and call backend functions.

Structured Output — A closer look at JSON Schema, function calling, tool use
Prompt Injection — How to protect prompts from user attacks
Evaluation — How to measure prompt quality and automate testing

Вопросы для размышления

Which pattern (few-shot, CoT, structured output) would give the biggest impact in a typical LLM project? Why that one?
If a system prompt isn't a security boundary, what needs to be added to the backend architecture for real protection?
CoT increases the number of output tokens. What would a cost calculation look like for 100K requests per day with CoT vs. without?

Связанные уроки

aie-05-api-integration — Prompt patterns run on top of the chat API
aie-07-structured-output — Patterns lead into schema-constrained outputs
aie-34-prompt-injection-deep — Robust prompts must resist injection attacks
aie-31-evaluation — Prompt quality needs systematic measurement
ml-37-bert-gpt — Few-shot prompting exploits in-context learning of GPT models
alg-20-greedy