The AI Engineering Stack

◢ Chapter 04 · Stack

The five layers of every real agent system.

If your agent project gets stuck, it's almost always one of these layers. Diagnose by layer, fix by layer.

◢ The stack

Bottom up.

05Layer

Evaluation

Golden datasetsLLM-as-judgePairwise A/BOnline win-rateRegression alerts

How do you know it's working? Offline evals (golden set + judge LLM), online evals (user signals, win-rate), regression suites. Without this layer you're guessing.

04Layer

Reasoning

Model choiceReasoning effortChain-of-thoughtSelf-consistencyReflection

The model itself + how you prompt for thought. Choose a model with the reasoning depth your task needs; turn on extended reasoning when complexity demands it.

03Layer

Orchestration

Pattern compositionSSE streamingRetries & fallbacksAudit logsIdempotency

The control flow between LLM calls — the patterns from Chapter 2. Plus: timeouts, retries, fallbacks, audit logging, streaming.

02Layer

Tools

Web searchCode execRAG retrievalMCP serversCustom APIs

The world the agent can act on. Web search, code execution, DB queries, vector retrieval, third-party APIs. Each tool is a contract: name, schema, side effects, errors.

01Layer

Context

PromptsRAG chunksMemoryFew-shotToken budget

Everything the model sees: system prompt, profile, guideline, retrieved chunks, conversation history, tool results. Curating this is half the job.

◢ Constraints

Four forces that shape every decision.

Cost

Each token costs money. Cheaper model for routing/classification, expensive for the final draft. Cache aggressively.

Latency

Every loop iteration adds seconds. Stream early. Parallelize what's independent. Use reasoning effort sparingly.

Quality bar

What's the worst output the user will tolerate? An internal tool can be 80%; a customer-facing email can't.

Compliance

PII handling, audit trail, data residency, content moderation. Bake it in early — bolted on later, it's painful.

◢ The diagnostic loop

Output is bad? Look at Context (layer 01) first. Output is slow? Look at Orchestration (03). Output is wrong about facts? Look at Tools (02) and Context (01). Output is inconsistent? Look at Reasoning (04) and Evaluation (05).

◇ See also· Stack layers in production

→ PM Patterns: cost · quality · governance → PM Spec: Latency & Cost · Failure Modes → Briefing: full 7-stage stack

Next chapter: case study →