◢ Chapter 04¾ · PM Patterns

PM design patterns for agent flows.

The engineering patterns in Chapter 02 are the LEGO bricks. These are the recipes — patterns that emerged specifically from PM-led agent design where the hard problem isn't 'how does the model work' but 'how do we make it trustworthy, on-voice, and operationally sane'. Each one is mapped to (a) the concrete flow components that implement it and (b) the PM Spec slots it forces you to think hardest about.

◢ How to read each pattern
Every card has the same shape: Problem · Solution, then a side-by-side map of flow components (left) and the PM Spec slots (right) the pattern forces you to nail down. Use the family chip to scan: Quality, Governance, Voice, or Cost.
◢ The library

Six PM-shaped patterns. Grouped by what they solve.

PM Pattern · 01

Confidence-Gated Escalation

GovernanceBuilds on: Routing + HITL
Problem

Agents that auto-respond to everything fail loudly on edge cases. Agents that escalate everything kill operational efficiency.

Solution

Generate a confidence score with every output. Auto-handle above the threshold; route below it to a human queue with full context attached.

◢ Maps to flow components
  • GeneratorPrimary LLM with structured output

    Returns answer + self-rated confidence 0–1

  • Threshold gateDeterministic logic

    Compares confidence vs. policy threshold (often 0.7–0.85)

  • Auto-pathTool call or response

    Sends final output to user or downstream system

  • Escalation queueDatabase + notification

    Routes case to human reviewer with input, draft, and confidence breakdown

◆ Stresses these PM Spec slots
Example in the wild

Support agent auto-resolves tickets with confidence ≥ 0.8. Below that, ticket is queued for human review with the agent's draft, sources, and uncertainty reasons attached.

PM Pattern · 02

Critic Rubric

QualityBuilds on: Evaluator–Optimizer
Problem

'Good' is subjective. Without an explicit rubric, the Reflection loop has nothing meaningful to improve against and the critic agent nitpicks irrelevantly.

Solution

Encode quality as a weighted, explicit rubric the critic LLM scores against. Each criterion has a weight, a definition, and a 'what good looks like' example.

◢ Maps to flow components
  • Rubric documentMarkdown / JSON config

    Weighted criteria with examples — versioned alongside the prompt

  • Critic agentLLM with structured output

    Returns per-criterion score + overall weighted score

  • Score thresholdDeterministic logic

    Below threshold triggers regeneration; above triggers acceptance

  • Editor agentLLM with diff capability

    Applies critic feedback to produce next iteration

◆ Stresses these PM Spec slots
Example in the wild

AI PM Briefing post generator: Hook 30% · Voice 25% · Signal 20% · Format 15% · CTA 10%. Critic must score ≥ 7.5 weighted before any variant is shown to the human.

PM Pattern · 03

Voice-Anchored Generation

VoiceBuilds on: Single Agent + Few-Shot
Problem

LLM outputs sound generic, hedged, and AI-flavored by default. For brand-voiced or persona-driven content, generic output kills trust.

Solution

Anchor every generation pass against a curated voice guide and a few-shot pack of canonical good outputs. The critic also scores against the voice guide.

◢ Maps to flow components
  • Voice guideMarkdown document

    Tone markers, phrases to use, phrases to avoid, structural rules

  • Few-shot pack5–10 archetypal examples

    Real past outputs that exemplify the voice; loaded as user/assistant pairs

  • GeneratorLLM with system prompt

    Receives voice guide in system prompt, few-shots in messages

  • Voice criticLLM with rubric subset

    Scores tone/voice match independently from content quality

◆ Stresses these PM Spec slots
Example in the wild

LinkedIn Writer loads a voice guide describing Rahul's tone, plus 8 of his best-performing posts. The critic explicitly scores 'Voice match' as 25% of overall quality.

PM Pattern · 04

Diversity Fan-Out

QualityBuilds on: Parallelization / Fan-Out
Problem

Vanilla parallel sampling produces three rewrites of the same idea. The point of fan-out is genuine diversity — different angles, not different phrasings.

Solution

Predefine N distinct generation strategies (hook types, framing angles, personas) and fan out one variant per strategy. The critic scores; human picks.

◢ Maps to flow components
  • Strategy definitionsConstants in prompt config

    E.g. Hook A: surprising stat · Hook B: trend read · Hook C: contrarian take

  • Parallel generatorsN LLM calls in parallel

    Each receives same input + its assigned strategy

  • CriticSingle LLM call scoring all N

    Comparative scoring — relative quality, not absolute

  • Selection UISide-by-side diff view

    Human picks; selection logged for future strategy tuning

◆ Stresses these PM Spec slots
Example in the wild

AI PM Briefing post generator runs 3 fan-out strategies (surprising claim · trend read · contrarian take) in parallel, scores all three, surfaces the best for human selection.

PM Pattern · 05

Memory Write-Approval

GovernanceBuilds on: Memory-Augmented + HITL
Problem

Agents that write to long-term memory unsupervised will eventually poison their own context. Bad data persists and compounds over time.

Solution

All memory writes pass through a human approval gate in v1. Once you have signal on what good memories look like, replace human approval with a critic LLM gated by rubric.

◢ Maps to flow components
  • Fact extractorLLM with structured output

    Proposes candidate memories from a session — never writes directly

  • Approval queueDatabase + UI

    Pending memories shown with source context for human review

  • Memory storeVector DB or KV store

    Only writes accept-flagged memories

  • Audit logAppend-only table

    Every write — and who/what approved it — is logged for rollback

◆ Stresses these PM Spec slots
Example in the wild

Personal assistant agent extracts 'user prefers morning meetings' from a chat. The memory is queued; user sees and approves before it persists to long-term memory.

PM Pattern · 06

Cheap-First Cascade

CostBuilds on: Routing + Reflection
Problem

Using a frontier model for every step burns budget on tasks where a small model would do. But hardcoding 'use small model here' is brittle as quality bars shift.

Solution

Try the cheap model first. If its output fails a structured quality check, escalate to the expensive model. Log both outputs for ongoing tuning.

◢ Maps to flow components
  • Tier 1 modelSmall/cheap LLM (e.g. Flash/Mini)

    Handles the 70–80% common case

  • Quality gateStructured validation or critic LLM

    Checks output against rubric — fails fast on weak responses

  • Tier 2 modelFrontier LLM (e.g. Pro/GPT-5)

    Only invoked when Tier 1 fails the gate

  • Tier-shift telemetryLogging + dashboards

    Tracks escalation rate; informs future tier-routing decisions

◆ Stresses these PM Spec slots
Example in the wild

Support classifier uses a small model for routine intents. If classification confidence < 0.7, the case re-runs on the frontier model before being dispatched.