The five layers of every real agent system.
If your agent project gets stuck, it's almost always one of these layers. Diagnose by layer, fix by layer.
Bottom up.
Evaluation
How do you know it's working? Offline evals (golden set + judge LLM), online evals (user signals, win-rate), regression suites. Without this layer you're guessing.
Reasoning
The model itself + how you prompt for thought. Choose a model with the reasoning depth your task needs; turn on extended reasoning when complexity demands it.
Orchestration
The control flow between LLM calls — the patterns from Chapter 2. Plus: timeouts, retries, fallbacks, audit logging, streaming.
Tools
The world the agent can act on. Web search, code execution, DB queries, vector retrieval, third-party APIs. Each tool is a contract: name, schema, side effects, errors.
Context
Everything the model sees: system prompt, profile, guideline, retrieved chunks, conversation history, tool results. Curating this is half the job.
Four forces that shape every decision.
Each token costs money. Cheaper model for routing/classification, expensive for the final draft. Cache aggressively.
Every loop iteration adds seconds. Stream early. Parallelize what's independent. Use reasoning effort sparingly.
What's the worst output the user will tolerate? An internal tool can be 80%; a customer-facing email can't.
PII handling, audit trail, data residency, content moderation. Bake it in early — bolted on later, it's painful.