The Pattern Index

Appendices · Appendix B


"A field guide is only useful if you can navigate it. This index is the navigation layer."


This index lists every chapter and pattern in the book by part, by category, and by the problem they address. Use it to:

  • Find a chapter you half-remember
  • Discover all chapters relevant to a particular problem
  • Navigate by archetype or by phase of the practice

Part 1 — Decisions

The decisions you commit to before you start.

TitleKey question
Pick an ArchetypeWhat kind of system is this — Advisor, Executor, Guardian, Synthesizer, or Orchestrator?
The AdvisorInformation-surfacing archetype: full specification
The ExecutorBounded-action archetype: full specification
The GuardianConstraint-enforcement archetype: full specification
The SynthesizerComposite-output archetype: full specification
The OrchestratorMulti-agent coordination archetype: full specification
Calibrate the Four DimensionsHow much autonomy, agency, responsibility, and reversibility does this system get?
Four Dimensions of GovernanceHow do agency, risk, oversight, and reversibility interact in formal governance terms?
The Archetype Selection TreeHow do you choose the right archetype when the answer isn't obvious?
Composing ArchetypesHow do multiple archetypes work together in a single deployment?
Governed Archetype EvolutionHow do you update the archetype catalog as the technology and your domain change?
Multi-Agent GovernanceHow do you govern an N-agent system as a system, not as N individually-specified components?
Intent vs. ImplementationWhen something goes wrong, was the spec wrong, or did the agent fail to execute it?
Failure Modes and How to Diagnose ThemWhat are the seven failure categories, and how do you diagnose them?
The Intent Design SessionWhat is the time-boxed working ritual that turns the framework into a session a team can run?
What Changes for the Senior EngineerIf late judgment was the senior engineer's value-add, what is the value-add now?

Part 2 — The Spec

How to write the artifact the agent executes against.

TitleKey question
Spec-Driven DevelopmentWhat is SDD and how is it different from requirements writing?
The Spec as Control SurfaceHow does a spec actually control what an agent does?
The Spec LifecycleWhat phases does a spec move through from intent to validation?
Writing for Machine ExecutionWhat makes an agent-executable spec different from a human-readable one?
The Living SpecHow do specs evolve after execution and capture learning?
The Canonical Spec TemplateWhat does a complete spec look like?
Architectural Decision RecordsHow do ADRs and specs relate; when to write each; the canonical ADR format with Spec Mapping section
SpecKitHow does the SpecKit toolchain support spec-driven development?

Part 3 — The Agent

What agents are structurally, what capabilities they need, how to bound them.

TitleKey question
What Agents AreWhat precisely is an agent, and what are its operational limits?
Autonomy Without AgencyWhy does the autonomy/agency distinction matter in practice?
The Executor ModelHow do agents relate to the intent encoded in specs?
Least CapabilityHow do tool manifests and MCP define what an agent can reach?
Portable Domain KnowledgeWhat are SKILL.md files and how do they carry domain context?
Coding AgentsHow do the framework's archetypes, spec, and oversight apply to the most-deployed agent class (Cursor, Cline, Devin, Claude Code)?
Computer-Use AgentsHow do the framework's disciplines apply to GUI-acting agents (Claude Computer Use, OpenAI Operator, Gemini computer use); the new Cat 7 Perceptual Failure category

Knowledge & Context

TitlePurpose
The System PromptThe agent's constitution at runtime
The Skill FileEncoding domain knowledge the agent can reference
The Tool ManifestDeclaring what tools the agent can access
Per-Task ContextTask-scoped context provision
Retrieval-Augmented GenerationGrounding outputs in retrieved content
Long-Term MemoryCross-session memory patterns
Context Window BudgetManaging context window allocation
Grounding with Verified SourcesConstraining outputs to verified facts

Tools and MCP

TitlePurpose
The Model Context ProtocolProtocol overview
Designing MCP ToolsDesigning tools that enforce intent rather than expose raw capability
MCP SafetySafety considerations for MCP tool design
The Read-Only ToolBoundary pattern for read-only access
The State-Changing ToolPattern for stateful operations
The Idempotent ToolIdempotency guarantee pattern
The MCP ServerStandard MCP server design
Direct Function CallingTool calling protocol
Code Execution SandboxSafe code execution boundary
File System AccessFile I/O patterns

Part 4 — Oversight, Safety & Operations

TitlePurpose
Proportional OversightThe four oversight models (Monitoring / Periodic / Output Gate / Pre-authorized)
Human-in-the-Loop GateStructured decision gate before consequential actions
Retry with Structured FeedbackStructured retry that improves first-pass execution
Escalation ChainEscalation hierarchy design

Safety

TitlePurpose
Prompt Injection DefenseMulti-layer defense for any externally-facing agent
Output Validation GateTiered validation (programmatic → Guardian → human)
Sensitive Data BoundaryPII/secret handling pattern
Graceful DegradationPartial-failure handling
Rate Limiting and ThrottlePreventing runaway execution
Blast Radius ContainmentLimiting the consequence of a single failure

Observability

TitlePurpose
Structured Execution LogAuditable execution trace
Cost Tracking per SpecCost attribution per agent and spec
Distributed TraceTracing multi-agent flows
Health Check and HeartbeatAgent health monitoring
Anomaly Detection BaselineAnomaly detection setup

Testing & Validation

TitlePurpose
Spec Conformance TestingMaking spec constraints testable and verifiable
Adversarial Input TestRobustness testing
Multi-Agent Integration TestTesting agent coordination
Evaluation by Judge AgentUsing an agent to validate another agent's output

Part 5 — Ship

TitlePurpose
Canary DeploymentSafe spec rollout
Rollback on FailureReverting a broken spec
Spec VersioningManaging spec versions
Model Upgrade ValidationRe-validating when the underlying model changes
Agent Deprecation PathSunsetting old agents and specs
Proportional GovernanceThe lightest governance structure that prevents both chaos and bureaucracy
Intent Review Before Output ReviewSpec review as a practice
Four Signal MetricsWhat to measure, what not to
Evals and BenchmarksThe four-level eval stack: unit asserts, spec acceptance, regression, production sampling
Red-Team ProtocolFour red-team batteries (pre-launch, per-release, monthly regression, quarterly fresh-attacks) feeding the spec gap log
Cost and Latency EngineeringModel-tier selection, prompt caching strategy, latency budget decomposition, anti-patterns
Cacheable Prompt ArchitecturePrompt caching as architecture, not optimization: layered prompt structure, cache breakpoints, prompt-stability spec constraint, eval-time pre-warm, cache_hit_rate as first-class telemetry
Production TelemetryThe integrated telemetry stack: what to instrument, what to retain, alerts vs monitors, OpenTelemetry GenAI semantic conventions
Adoption PlaybookHow to introduce SDD discipline to a team without big-bang rollout, spec theater, or governance over-investment; CI/CD wiring with hard-gate / soft-gate / observe tiers
Minimum Viable Architecture of IntentThe floor of the discipline for small systems: when is the IDS too heavy, what's the smallest set of artifacts that still does work, when should an MVP graduate to the full framework
Signs Your Architecture of Intent Is DegradingThe 12-anti-pattern catalog of how the discipline itself decays — spec theater, oversight kabuki, metrics theater, citation theater, prompt-patch drift, archetype drift, the retrofit IDS — and the quarterly discipline-health audit that surfaces them
Mapping the Framework to the DevSquad 8-Phase CadencePhase-by-phase mapping of the book's artifacts and disciplines into Microsoft DevSquad Copilot's 8-phase iterative cycle
Co-adoption with DevSquad CopilotThe minimum additions from this book that give a DevSquad team the most leverage; vocabulary translation; 30-day co-adoption plan
Multi-Tenant Fleet GovernanceThe four structural moves a platform team needs to scale single-system governance to a fleet of tenant teams sharing infrastructure: constraint inheritance hierarchy, cross-tenant isolation contract, fleet-partitioned telemetry, platform-tier failure-locus rule

Part 6 — Worked Pilots

TitleDemonstrates
How to Use These ExamplesReading guide
Designing an AI Customer Support SystemMulti-agent Orchestrator + Executor + Guardian + Advisor
Selecting the Archetypes (Example 1)Five-archetype evaluation worked through
Writing the Spec (Example 1)Annotated SDD spec for the Account Executor
Agent Instructions (Example 1)Operational instructions derived from spec
Validating Outcomes (Example 1)14-test acceptance suite
Post-mortem Through Intent (Example 1)$0.00 refund incident — spec gap traced and closed
A Code Generation PipelineSynthesizer-Executor-Guardian pipeline with no live human
Selecting the Archetypes (Example 2)Orchestrator rejected; Synthesizer as primary coordinator
Writing the Spec (Example 2)Annotated spec for the Scaffold Synthesizer
Agent Instructions (Example 2)Non-conversational instructions for all three agents
Validating Outcomes (Example 2)9-test pipeline acceptance suite
Designing an AI Coding AgentIn-loop coding agent for an internal repo; Executor with Synthesizer composition; explicit decision against Devin-style autonomy
Selecting the Archetypes (Example 3)Decision-tree walk for a coding agent; the "why not Orchestrator-over-self" decision recorded explicitly
Writing the Spec (Example 3)Full canonical spec with coding-agent specifics: file-system scope, dependency allowlist, test-set protection
Agent Instructions (Example 3)System prompt + tool manifest with capability minimalism (no general shell, no web fetch, no merge/close)
Evals and Acceptance (Example 3)The four-level eval stack instantiated; 75-issue golden set construction methodology
Post-mortem Through Intent (Example 3)The deleted-test incident; spec v1.1 → v1.2 change with constraint-library entry

Cross-Cutting Patterns

Coordination and state patterns to consult once your pilot is running. Most patterns in the book live inside Parts 3–5 alongside their parent chapters; this section gathers the cross-cutting ones.

Coordination

TitlePurpose
Sequential PipelineLinear pipeline pattern
Parallel Fan-OutParallel execution pattern
Conditional RoutingDecision-based routing
Event-Driven Agent ActivationEvent-based coordination
Supervisor AgentSupervisor agent pattern
Agent-to-Agent ContractContracted agent-to-agent interaction

State & Memory

TitlePurpose
Session IsolationMulti-user isolation
Shared Context StoreContext sharing between agents
Checkpoint and ResumeLong-running execution pattern
Conversation History ManagementStoring conversation state
Agent RegistryRegistry of agent capabilities
Artifact StoreStoring agent-produced artifacts

Repertoire

TitlePurpose
The Organizational RepertoireWhy repertoires exist and how they compound
The Intent Archetype CatalogDecision-ready archetype catalog entries
Spec Template LibraryOrganized spec templates
Feature Spec TemplateTemplate for feature-development tasks
Agent Instruction TemplateTemplate for system-prompt instructions
Integration Spec TemplateTemplate for integration and API tasks
Constraint Library TemplateTemplate for reusable constraint sets
Validation & Acceptance TemplatesReusable acceptance test templates

Code Standards

TitlePurpose
Standards as Agent Skill SourceHow code standards are structured for agent validation
Standards for .NET / C#.NET constraints, patterns, and validation rules
Standards for TypeScript / NodeTypeScript constraints and patterns
Standards for PythonPython constraints and patterns
Standards for REST APIsREST API design constraints
Standards for Infrastructure as CodeIaC constraints for Bicep, Terraform, YAML

Pattern Justification Map

For each of the ~50 patterns in the book, the spec section of the Canonical Spec Template that pulls it. A pattern that cannot be mapped to a spec section is inventory, not infrastructure — either remove it or amend the spec template to add the missing section. A spec section that needs patterns it doesn't currently name is a candidate for elaboration.

This is the audit that prevents the "pattern inventory" anti-pattern in Signs Your Architecture of Intent Is Degrading: patterns adopted from a generic best-practice catalog rather than from what one specific spec requires.

Capability patterns (Knowledge & Context, Tools)

PatternJustified byWhy this spec section pulls it
The System Prompt§11 Agent Execution InstructionsThe runtime constitution the agent reads each turn — §11 is where it gets specified
The Skill File§5 Functional Intent + §11Encodes the domain knowledge the agent's functional intent depends on
The Tool Manifest§8 Authorization BoundaryThe manifest is the expression of what tools the agent may reach
Per-Task Context§11 Agent Execution InstructionsPer-step context provision is §11's territory
Retrieval-Augmented Generation§5 Functional Intent + §11Grounds output in a retrieved source the spec names as authoritative
Long-Term Memory§6 Invariants + §11What persists across sessions is an invariant; how it's accessed is in §11
Context Window Budget§7 Non-Functional Constraints (Cost Posture)The latency/cost budget that the context budget operationalizes
Grounding with Verified Sources§6 Invariants"Outputs grounded in verified sources" is an invariant clause

Integration patterns (Tools and MCP)

PatternJustified byWhy this spec section pulls it
The Read-Only Tool§8 Authorization BoundaryThe boundary that distinguishes read from write
The State-Changing Tool§8 Authorization BoundaryThe boundary on what state the agent may mutate
The Idempotent Tool§6 Invariants + §8Idempotency is an invariant the tool enforces
The MCP Server§8 Authorization BoundaryThe protocol-layer instantiation of §8
Direct Function Calling§8 Authorization BoundaryTool-calling protocol; alternative to MCP
Code Execution Sandbox§8 Authorization Boundary + §6 InvariantsSandbox boundary is §8; "no escape" is an invariant
File System Access§8 Authorization BoundaryFile-system scope is part of §8

Coordination patterns (Sequencing, Routing, Oversight)

PatternJustified byWhy this spec section pulls it
Sequential Pipeline§4 Composition Declaration + §11Linear composition shape; declared in §4, executed per §11
Parallel Fan-Out§4 Composition Declaration + §11Parallel composition shape
Conditional Routing§11 Agent Execution InstructionsPer-step routing decisions
Event-Driven Agent Activation§11 Agent Execution InstructionsTrigger-to-step mapping
Supervisor Agent§4 Composition DeclarationOrchestrator-over-Executors composition
Agent-to-Agent Contract§4 Composition Declaration + §6 InvariantsCross-mode invariants between composed agents
Human-in-the-Loop Gate§11 + §6 InvariantsWhen the gate fires is in §11; the invariant that it must fire is §6
Retry with Structured Feedback§11 Agent Execution InstructionsThe retry rhythm is per-step instruction
Escalation Chain§11 + §6 InvariantsEscalation triggers in §11; the invariant that escalation must occur in §6

Safety patterns

PatternJustified byWhy this spec section pulls it
Prompt Injection Defense§6 InvariantsInvariants must hold under adversarial input
Output Validation Gate§9 Acceptance Criteria + §12 Validation ChecklistDefines what passes the gate
Sensitive Data Boundary§6 Invariants + §8 Authorization BoundaryPII/secret invariants; auth-boundary restrictions
Graceful Degradation§6 Invariants + §11Partial-failure invariants; degradation rhythm
Rate Limiting and Throttle§7 Non-Functional ConstraintsCost/availability budget
Blast Radius Containment§6 Invariants + §8 Authorization BoundaryContainment as invariant; scope as boundary

Observability patterns

PatternJustified byWhy this spec section pulls it
Structured Execution Log§12 Validation ChecklistAudit trail the validation step reads
Cost Tracking per Spec§7 Non-Functional (Cost Posture) + §12Cost ceiling enforcement and reporting
Distributed Trace§12 Validation ChecklistMulti-agent flows need cross-agent traces to validate
Health Check and Heartbeat§7 Non-Functional + §12Availability budget; validation that the agent is up
Anomaly Detection Baseline§12 Validation ChecklistDrift detection in production

Testing patterns

PatternJustified byWhy this spec section pulls it
Spec Conformance Testing§9 Acceptance Criteria + §12Makes acceptance criteria executable
Adversarial Input Test§6 InvariantsTests invariants under adversarial conditions
Multi-Agent Integration Test§4 Composition Declaration + §9Tests cross-mode invariants between composed agents
Evaluation by Judge Agent§9 Acceptance Criteria + §12Judge agent operationalizes subjective acceptance criteria

State & Memory patterns

PatternJustified byWhy this spec section pulls it
Session Isolation§6 Invariants + §8 Authorization BoundaryCross-session isolation is both an invariant and a boundary
Shared Context Store§11 + §6 InvariantsCross-agent state-sharing rhythm; consistency invariants
Checkpoint and Resume§11 + §6 InvariantsLong-running rhythm; transactional invariants on restart
Conversation History Management§11 Agent Execution InstructionsWhat history the agent reads each turn
Agent Registry§4 Composition Declaration + §8Registry expresses composition graph and authorization scope
Artifact Store§11 + §6 InvariantsWhere outputs land; integrity invariants

Deployment patterns

PatternJustified byWhy this spec section pulls it
Canary Deployment§7 Non-Functional (Availability) + §6 Reversibility invariantsPhased rollout preserves reversibility
Rollback on Failure§6 Reversibility invariantsRollback is the reversibility mechanism
Spec Versioning§10 Assumptions & Open QuestionsSpec evolution requires versioning
Model Upgrade Validation§9 Acceptance Criteria + §12Re-validation when the model underneath shifts
Agent Deprecation Path§6 Reversibility + §10Sunsetting must preserve reversibility; documented in §10

Audit results

All 50 patterns in the book map to at least one section of the canonical 12-section spec template plus the Composition Declaration sub-block (§4). No pattern is unjustified inventory. The pattern density per spec section is uneven — §11 (Agent Execution Instructions), §8 (Authorization Boundary), and §6 (Invariants) pull the most patterns; §1 (Problem Statement) and §2 (Desired Outcome) pull none, which is correct because those sections are framing rather than enforcement.

When you add a new pattern to the book, add a row to this map first. If you cannot name the spec section that pulls the pattern, the pattern does not belong in the book — or the spec template needs a new section to justify it. Either is a real design decision; neither is "ship the pattern anyway."


Cross-Reference: By Problem

Find patterns by the problem you're trying to solve.

"I don't know which archetype to use"

"I don't know how to write a good spec"

"I don't know what constraints to include"

"I'm trying to calibrate how much autonomy to give"

"Something went wrong and I need to diagnose it"

"I need to design oversight for this agent"

"I need to set up safety controls"

"I need to set up governance"

"I need to measure and report on the practice"

"I need to design a multi-agent system"

"I'm building a coding agent (Cursor / Cline / Devin / Claude Code style)"

"I need to red-team my system"

"My agent program's cost or latency isn't penciling"

"I need real production observability for my agents"

"I'm trying to introduce this framework to my team"

"I'm evaluating the framework, not yet adopting it"

"I'm a senior engineer wondering what this all means for me"

"My system is too small for the full framework"

"My team has been using the framework for a while and something feels off"

"My team already uses Microsoft DevSquad Copilot"

"I'm building a computer-use / browser-use agent (Claude Computer Use / Operator / Gemini)"

"I need to design safe agent tools"

"I need to ship safely without making the change irreversible"

"I need to build or expand a team repertoire"


Cross-Reference: By Archetype

Find all chapters relevant to a specific archetype.

ArchetypeDefinitionUsed in exampleGovernanceConstraints
Advisoradvisor.mdExample 1 (Policy Advisor)Proportional GovernanceSpec template library
Executorexecutor.mdExample 1 (Account Executor), Example 3 (Coding Agent)Proportional GovernanceValidation templates
Guardianguardian.mdExample 1 (Compliance Guardian), Example 2 (Standards Guardian)Proportional GovernanceLeast Capability
Synthesizersynthesizer.mdExample 2 (Scaffold Synthesizer)Proportional GovernanceSpec template library
Orchestratororchestrator.mdExample 1 (Inquiry Orchestrator)Proportional GovernanceProportional Oversight

Cross-Reference: By Agent Class

Find all chapters relevant to a specific deployment class. The book treats archetypes (Advisor / Executor / Guardian / Synthesizer / Orchestrator) and agent classes (coding agents, computer-use agents, multi-agent systems) as orthogonal — every agent class is a composition of one or more archetypes.

Agent classPrimary chapterWorked exampleSpecific failure modesSpecific red-team patterns
Conversational support agentThe Five Archetypes (Advisor or Executor depending on action authority)Designing an AI Customer Support SystemCat 1–6 (general taxonomy)OWASP LLM01, LLM07, LLM02 (system-prompt extraction, sensitive-data disclosure)
Code generation pipelineMulti-Agent Governance (Synthesizer + Executor + Guardian composition)A Code Generation PipelineCat 5 (compounding) particularly relevantOWASP LLM05 (improper output handling)
Coding agent (in-loop)Coding Agents (Executor with Synthesizer composition; can escalate to Orchestrator-over-self)Designing an AI Coding AgentTest deletion (Cat 1+3), dependency typosquat (Cat 2), hallucinated APIs (Cat 6), scope-creep refactors (Cat 3)Supply-chain (LLM03), excessive agency (LLM06), coding-agent-specific patterns in Red-Team Protocol
Computer-use / browser-use agentComputer-Use Agents (deployment-posture-dependent: Advisor / Executor / Orchestrator-over-self)(no worked example yet — under-served chapter)Cat 1–6 plus Cat 7 (Perceptual Failure) with 4 sub-categoriesComputer-use-specific test patterns in Red-Team Protocol: lookalike domains, visual instruction injection, modal popup interception, etc.
Multi-agent systemMulti-Agent Governance (any composition; supervisor / pipeline / peer patterns)Both Example 1 and Example 2MAST 14-category empirical taxonomy applies; the book's Cat 5 (compounding) is the dominant shapeCross-agent injection, handoff manipulation, A2A protocol-layer attacks

Cross-Reference: By 2024–2026 Innovation

Find where each significant 2024–2026 development is addressed, and how the framework responds to it. This is the practitioner's "what's new and where do I read about it" index. The full citations live in the References appendix.

InnovationYearWhere addressed in the bookWhat the book contributes around it
Anthropic MCP + cross-vendor adoption (OpenAI, Google, Microsoft)2024–25The Model Context Protocol, Designing MCP Tools, MCP Safety, Least CapabilityThe protocol layer through which Least Capability becomes operationally enforceable; capability-gating discipline at the tool layer
GitHub spec-kit2024–25Spec-Driven Development, SpecKitDirect ancestor of the canonical spec template; the book extends spec-kit's discipline with the archetype framework and the failure taxonomy
Microsoft DevSquad Copilot2026DevSquad Mapping, Co-adoption with DevSquad, Architectural Decision RecordsA complete bridge: phase-by-phase mapping, vocabulary translation, ranked addition list, 30-day co-adoption plan, ADRs as a first-class artifact
Anthropic Computer UseOct 2024Computer-Use Agents, Red-Team ProtocolNew agent class chapter with archetype mapping by deployment posture; new Cat 7 (Perceptual Failure) added to the diagnostic protocol; four structural controls (sandboxed environment, auth scope minimization, domain allowlist, high-consequence confirmation gate); computer-use-specific red-team patterns
OpenAI Operator / Gemini computer use2025Computer-Use AgentsSame chapter — three implementations of the new class, all subject to the same structural controls and Cat 7 framework
Reasoning-tier models (o1, o3, Claude extended thinking, Gemini reasoning)2024–25Cost and Latency EngineeringDistinct model tier in the per-role selection table; explicit cost/latency profile (2–10× cost, 5–60s latency); when-to-use vs when-not-to budgeting discipline
Anthropic Constitutional Classifiers2025Prompt Injection DefenseTreated honestly as a probabilistic perimeter, not a fix; documented escape rate and over-refusal cost made explicit
Anthropic prompt caching / OpenAI cached input / Gemini context caching2024–25Cacheable Prompt Architecture, Cost and Latency EngineeringCaching as architecture (layered prompt with cache breakpoints; prompt-stability as a spec constraint; cache-hit-rate as first-class telemetry); 40–70% input-cost reduction is normal when treated architecturally
Google Agent2Agent (A2A) Protocol2025Multi-Agent GovernanceProtocol-layer counterpart to MCP at the tool layer; the governance question for protocol-mediated multi-agent systems
OpenTelemetry GenAI semantic conventions2024–25Production TelemetryVendor-neutral observability standard; the book recommends emitting OTel-compliant spans alongside vendor SDK telemetry for portability
OWASP LLM Top 10 (2025 update)2025Prompt Injection Defense, Red-Team Protocol, Computer-Use AgentsBaseline coverage for the four red-team batteries; instantiation per deployment specifics
MAST taxonomy (Cemri et al.)2025Failure Modes and How to Diagnose Them, Multi-Agent GovernanceEmpirical 14-category multi-agent failure partition; complementary to (not replacing) the book's seven-category fix-locus taxonomy
Indirect prompt injection (Greshake et al. 2023) + the lethal trifecta (Willison)2023, ongoingPrompt Injection DefenseThe structural defense (trifecta reduction; capability gating) is centered on the indirect injection class that cannot be filtered at the prompt layer
SWE-bench Verified, AgentBench, τ-bench, GAIA, BFCL, WebArena, OSWorld, ScreenSpot-Pro2023–25Evals and Benchmarks, Coding Agents, Computer-Use AgentsExternal calibration benchmarks; the book recommends using public benchmarks for harness calibration and team-built golden sets for actual task fit
Open-source eval / red-team frameworks (Inspect, OpenAI Evals, Promptfoo, PyRIT, Garak)2024–25Evals and Benchmarks, Red-Team ProtocolThe toolchain layer the book recommends adopting rather than building custom
Production observability stacks (LangSmith, Langfuse, Phoenix, Helicone, Datadog LLM)2024–25Production TelemetryVendor-stack landscape with a clear "which to choose if you have X" decision rule
Coding agent platforms (Cursor, Cline, Aider, Devin, Claude Code, Codex CLI)2023–25Coding Agents, Designing an AI Coding AgentTreated as deployment-posture-dependent compositions; explicit decision-against-Devin-style-autonomy criteria documented in Example 3
Anthropic Skills as deployable artifact2025Portable Domain KnowledgeThe maturation of "domain knowledge as packaged context" — skills as versioned, distributed deployment units
Lost in the Middle long-context attention degradation (Liu et al. 2023)2023, ongoingCoding Agents, Cost and Latency EngineeringEmpirical grounding for the long-context anti-pattern; informs context-budget discipline and the warning against long-context dumping
NIST AI RMF / ISO 42001 / Anthropic RSP / OpenAI Preparedness Framework2023–25Calibrate Agency, Autonomy, Responsibility, ReversibilityCompliance-layer reference points; the book's four-dimensions framing is compatible with each