The agentic AI vocabulary.
22 terms used across this guide. Definitions are deliberately short. Click a card to jump to the chapter that uses it.
RAG
↗Retrieval-Augmented Generation. Pull relevant chunks from a vector store (or hybrid search) and stuff them into the prompt so the model can answer with facts beyond its training cutoff.
CAG
↗Cache-Augmented Generation. Precompute and cache an expensive retrieval/synthesis step; serve future queries from the cache when input matches.
Tool calling
↗The LLM emits a structured 'I want to call X with args Y' message. Your code runs X and feeds the result back as another turn. Foundation of every agent.
MCP
↗Model Context Protocol. An open standard for exposing tools/resources to LLMs across vendors. Think 'USB-C for tools'.
ReAct
↗Reason + Act loop. The model alternates between thinking out loud and calling tools, until it decides it's done. The classic single-agent pattern.
Chain-of-Thought
↗Asking the model to reason step-by-step before answering. Improves accuracy on multi-step problems at the cost of tokens.
Reflection
↗An agent reviewing its own output and revising. The simplest form of evaluator-optimizer.
Guardrails
↗Programmatic checks before/after model calls. Block PII leaks, off-topic answers, or jailbreaks. Layer them; don't trust the model to police itself.
Evals
↗Tests for LLMs. Golden datasets + scoring (exact match, judge LLM, pairwise). The only way to know if a change is real or wishful thinking.
Token budget
↗The max tokens (input + output) a request can use. Shapes how much retrieval, history, and few-shot you can include.
Context window
↗The max tokens a model can attend to in one call. Bigger windows ≠ better — quality often degrades past the middle of the window.
Few-shot
↗Including 2-10 input/output examples in the prompt to demonstrate the desired behavior. Cheap, ferocious behavioral lever.
Structured outputs
↗Forcing the model to return JSON conforming to a schema, usually via tool calling with tool_choice locked. Eliminates parsing nightmares.
System prompt
↗The first, persistent message that defines the agent's identity, goals, and hard rules. Not user-overridable.
Temperature
↗Randomness knob. 0 = deterministic, 1 = creative, >1 = unhinged. Lower for analysis; higher for ideation.
Orchestrator
↗The top-level controller in a multi-agent system. Plans subtasks, dispatches to workers, synthesizes the result.
Evaluator-Optimizer
↗A two-role loop where one model produces and another critiques (with structured output), then a third revises. The quality powerhouse pattern.
Streaming (SSE)
↗Server-Sent Events. One-way stream from server to browser. Used to render tokens and audit events as they're produced.
Audit log
↗Append-only record of every event in an agent run: status, tool calls, tool results, reviews. Critical for debugging and trust.
Writing profile
↗Static rules describing structure, terminology, and character of the writer. Travels across all guidelines.
Guideline
↗Per-task contract: angle, audience, beats, tone, must-include/avoid. Generated, edited by humans, consumed by the writer.
Reasoning effort
↗A knob on reasoning models that allocates more compute to internal thought. 'low'/'medium'/'high'/'xhigh'. Use sparingly.