Proportional Governance

Part 6 — Operations


"Every governance failure has the same root cause: the control was in the wrong place at the wrong time. Bureaucracy controls everything moderately. Good governance controls the right things exactly."


Where this sits in v2.0.0: this chapter is part of Part 6 — Operations. Governance is the role-and-responsibility frame that makes the closed loop survive contact with a real organization; without it, the loop runs as a per-incident discipline only and stops at the team boundary. The chapter develops what proportional means structurally — control concentrated where consequence is concentrated — and how that shape composes with the per-system Discipline-Health Audit and the cross-team Adoption Playbook.

Where this sits in the work: the chapters in Part 6 sustain what the Intent Design Session sets up — they run alongside the five activities, not in sequence with them. Governance is the standing structure that decides who reviews what, on what cadence; the IDS produces the per-system commitment, governance keeps the organization's portfolio of those commitments coherent.


Context

Governance of AI agent systems tends to fail in one of two directions. Organizations that have experienced an AI-related incident make governance heavy: extensive approval chains, mandatory committees, documentation requirements that add weeks to every deployment. Organizations that have not yet experienced an incident make governance minimal: verbal approval, informal guidelines, the implicit assumption that the engineers are being careful.

Both patterns fail. Heavy governance is slow enough that teams route around it — using lightweight tools or informal channels to avoid the process. Minimal governance leaves the organization exposed until the first significant failure, at which point the response is usually overcorrection.

This chapter describes a third model: governance that is proportional, embedded in the workflow, and lightweight enough to not generate avoidance behavior.


The Problem

Traditional governance has a structural problem in the agent context: it is designed for human actors making human decisions, where the governance moment is a conversation, an approval meeting, or a signature.

Agent systems operate differently. They execute repeatedly, automatically, at scale. A governance model that requires a human conversation before every execution cannot work at any meaningful deployment scale. But a governance model that requires nothing between deployments cannot catch the decision drift, scope creep, and spec decay that builds up over time.

The second problem is that governance of the wrong things wastes trust. If every PR requires a three-person approval before any agent task can proceed, teams learn that approvals are procedural rather than substantive — they look for what to write to pass the review, rather than what might actually be wrong. The approval process is observed, but it provides no meaningful protection.

Effective governance is selective and structural: it controls the decisions that have large downstream effects, embeds control in the workflow rather than adding it as a separate process, and trusts practitioners to execute within governed bounds without per-execution approval.


Forces

  • Governance overhead vs. governance necessity. Every governance activity consumes human attention. Yet ungoverned agent systems drift, compound errors, and lose architectural coherence.
  • Heavyweight process vs. lightweight chaos. Full bureaucracy creates compliance theater. No process creates invisible risk. The framework must be proportional.
  • Automated efficiency vs. human judgment. Some governance activities can be automated (structural validation, monitoring). Others require human judgment (spec quality, archetype selection).
  • Consistency vs. context-sensitivity. Standard governance cadences apply across teams. Yet different teams have different risk profiles and deployment scales.

The Solution

The Four Governance Layers

Governance in the architecture of intent operates at four layers, each with a different control mechanism and a different pace:

Layer 1: Constitutional (high leverage, slow pace)

The constitutional layer governs the permanent constraints of the system: what archetypes exist and what they mean, what the organizational constraint libraries say, what the approval authority matrix requires.

Constitutional layer decisions persist until explicitly changed. They are version-controlled. They require the highest authority level to change (per Chapter 7.3). They apply to all work in their domain regardless of what any individual spec says.

This is where governance investment is highest-value and process friction is most justified. A constitutional layer decision made well governs thousands of executions. A constitutional layer decision made carelessly does the same.

Layer 2: Spec approval (medium leverage, per-task pace)

Every spec is approved before execution. This is where the decision "should this agent do this?" is formally answered. Spec approval is not a bureaucratic rubber-stamp — it is the moment when the intent is examined by a qualified reviewer against the archetype profile and constraint library.

What makes spec approval lightweight rather than bureaucratic:

  • The review template is specific: reviewers answer five questions, not an open-ended review
  • Decision has a time limit: approval or rejection with specific reasons within [N hours]
  • Templated specs reduce review burden: a well-structured spec is fast to review
  • Approval authority is proportional to archetype risk: low-risk specs need one reviewer, high-risk need three

The five spec approval questions:

  1. Does the objective clearly describe a single, completable outcome?
  2. Are the NOT-authorized clauses sufficient to prevent obvious gaps-in-scope?
  3. Are the success criteria testable without the author's involvement?
  4. Does the archetype selection match the task's risk and reversibility profile?
  5. Is the tool manifest minimal — only what the task actually requires?

A reviewer who can answer all five questions confidently in under 15 minutes is working with a well-written spec. A reviewer who cannot is holding a spec that isn't ready.

Layer 3: Execution monitoring (low leverage per execution, aggregate signal)

During execution, governance manifests as monitoring rather than control. The agent runs; the audit log records what it did; anomaly detection surfaces patterns that warrant review.

This layer should be mostly automated. Manual review of every execution is not governance; it is microsupervision. The human attention is reserved for anomalies: unusual call volumes, unexpected tool usage, error rates outside declared bounds, escalations not responded to within SLA.

A team operating model for execution monitoring:

  • Automated: call volume, error rates, latency, cost against declared limits
  • Weekly human review: anomaly report, recent escalations, Spec Gap Log additions
  • Monthly human review: archetype catalog accuracy, constraint library currency, skill review cycle

Layer 4: Post-execution validation (high leverage for learning, per-task pace)

Every completed agent task generates a validation record: did the output satisfy the spec's success criteria? What was the outcome? Were any gaps identified?

Post-execution validation is where the feedback loop closes. A team that validates every output against its spec, logs gaps, and routes discoveries back to the repertoire is the team that continuously improves. A team that validates nothing is the team that repeats the same gaps indefinitely.

The validation record doesn't require a separate system — a structured comment on the ticket, a row in a shared log, a field in the spec file. The discipline is completing it, not the tooling that holds it.

The Governance Calendar

Governance operates on multiple timescales simultaneously:

CadenceActivityOwnerTime Investment
Per-taskSpec approvalDesignated reviewer15 min per spec
Per-taskOutput validationSpec author10–30 min per output
WeeklyAnomaly report reviewTeam lead30 min
WeeklySpec Gap Log reviewIntent architect20 min
MonthlyConstraint library auditDomain owner1–2 hours
MonthlyArchetype catalog reviewStaff engineer/above1–2 hours
MonthlySkill file reviewSkill owner30 min per skill
QuarterlyConstitutional layer reviewPrincipal/VP levelHalf day

The total monthly governance overhead for a five-person team operating at moderate agent deployment volume: approximately 8–12 hours across the team. This estimate is derived from the time budgets in the table above, aggregated over a typical month of spec reviews, anomaly reports, and library audits. Individual teams will vary based on deployment complexity, agent maturity, and organizational overhead. But as a baseline, this is comparable to the overhead of a traditional code review and sprint retrospective cycle, and it produces a much more durable signal.

What to Automate and What Not To

Automate: Structural validation (does the spec include all required sections? are there references to approved constraint libraries? is the tool manifest non-empty?), execution monitoring and alerting, escalation SLA tracking, cost/volume anomaly detection.

Do not automate: Spec quality assessment, archetype selection correctness, success criteria completeness, output validation against spec. These require human judgment. Tools that claim to automate spec quality review are currently unreliable and create false confidence.

The dangerous middle ground: Automated approval based on structural completeness. A spec that passes all structural checks can still be substantively wrong. Automation should gate structural quality; it cannot replace substantive review.

Governance Anti-Patterns

The approval theater. An approval process exists but reviewers don't read the spec — they look for obvious red flags and sign. This is common when reviewers are overloaded, approval criteria are unclear, or the team culture treats approval as a formality. The remedy is shorter, more specific review checklists and explicit time allocation for spec review.

The governance gap. Governance exists for initial deployment but nothing governs changes to running agents. The system that was carefully reviewed six months ago has accumulated spec debt, tool scope creep, and skill files that drift from the constraint library. Scheduled cadence reviews exist specifically to catch this.

Governance by department. A central AI governance team reviews and approves all agent work across the organization. They have no domain knowledge. They apply uniform treatments to non-uniform risks. They become a bottleneck that teams route around. The remedy is delegated governance with clear authority matrices — the central team governs the framework; domain teams govern the execution within the framework.

The post-incident overreaction. After a significant failure, governance becomes comprehensive and heavy. Every execution requires multiple approvals. The cost of operating agents rises until teams stop using them, or use them informally, or transfer liability to a vendor ("the vendor's product, not our agent"). The antidote is proportional response calibrated to the failure's category, not to its emotional intensity.


Resulting Context

After applying this pattern:

  • Governance has a named, bounded overhead. Teams know the approximate time cost of governance and can plan for it.
  • Four governance layers create proportional control. Constitutional, spec approval, execution monitoring, and post-execution validation provide defense in depth without redundancy.
  • What to automate is explicit. Structural validation and monitoring are automated. Substantive review remains human. The boundary is named.
  • Anti-patterns are recognized and preventable. Approval theater, governance gaps, and governance-by-department are named failures with described remedies.

Therefore

Effective governance is proportional and structural: controlled at the constitutional layer (archetype and constraint definitions), confirmed at the spec layer (approval before execution), observed at the execution layer (monitoring without microsupervision), and learned at the validation layer (gap log and repertoire update). The total overhead is comparable to traditional engineering processes; its value is durability — risks caught before execution rather than repaired after, and organizational learning captured in artifacts rather than lost with personnel.


Connections

This pattern assumes:

This pattern enables: