◢ Reference appendix

The agentic AI vocabulary.

22 terms used across this guide. Definitions are deliberately short. Click a card to jump to the chapter that uses it.

RAG

Retrieval-Augmented Generation. Pull relevant chunks from a vector store (or hybrid search) and stuff them into the prompt so the model can answer with facts beyond its training cutoff.

CAG

Cache-Augmented Generation. Precompute and cache an expensive retrieval/synthesis step; serve future queries from the cache when input matches.

Tool calling

The LLM emits a structured 'I want to call X with args Y' message. Your code runs X and feeds the result back as another turn. Foundation of every agent.

MCP

Model Context Protocol. An open standard for exposing tools/resources to LLMs across vendors. Think 'USB-C for tools'.

ReAct

Reason + Act loop. The model alternates between thinking out loud and calling tools, until it decides it's done. The classic single-agent pattern.

Chain-of-Thought

Asking the model to reason step-by-step before answering. Improves accuracy on multi-step problems at the cost of tokens.

Reflection

An agent reviewing its own output and revising. The simplest form of evaluator-optimizer.

Guardrails

Programmatic checks before/after model calls. Block PII leaks, off-topic answers, or jailbreaks. Layer them; don't trust the model to police itself.

Evals

Tests for LLMs. Golden datasets + scoring (exact match, judge LLM, pairwise). The only way to know if a change is real or wishful thinking.

Token budget

The max tokens (input + output) a request can use. Shapes how much retrieval, history, and few-shot you can include.

Context window

The max tokens a model can attend to in one call. Bigger windows ≠ better — quality often degrades past the middle of the window.

Few-shot

Including 2-10 input/output examples in the prompt to demonstrate the desired behavior. Cheap, ferocious behavioral lever.

Structured outputs

Forcing the model to return JSON conforming to a schema, usually via tool calling with tool_choice locked. Eliminates parsing nightmares.

System prompt

The first, persistent message that defines the agent's identity, goals, and hard rules. Not user-overridable.

Temperature

Randomness knob. 0 = deterministic, 1 = creative, >1 = unhinged. Lower for analysis; higher for ideation.

Orchestrator

The top-level controller in a multi-agent system. Plans subtasks, dispatches to workers, synthesizes the result.

Evaluator-Optimizer

A two-role loop where one model produces and another critiques (with structured output), then a third revises. The quality powerhouse pattern.

Streaming (SSE)

Server-Sent Events. One-way stream from server to browser. Used to render tokens and audit events as they're produced.

Audit log

Append-only record of every event in an agent run: status, tool calls, tool results, reviews. Critical for debugging and trust.

Writing profile

Static rules describing structure, terminology, and character of the writer. Travels across all guidelines.

Guideline

Per-task contract: angle, audience, beats, tone, must-include/avoid. Generated, edited by humans, consumed by the writer.

Reasoning effort

A knob on reasoning models that allocates more compute to internal thought. 'low'/'medium'/'high'/'xhigh'. Use sparingly.