Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Appendix B: Platform Landscape and Governance Standards

The agentic systems landscape is evolving rapidly. This appendix maps the current ecosystem — platforms, standards, and governance frameworks — to help practitioners orient their architectural decisions within the broader industry context.

This appendix will age faster than any other part of this book. Use it as a snapshot of the landscape at the time of writing and as a framework for evaluating new entries as they appear.

Agent Platforms

The market has stratified into distinct categories:

Foundation Model Providers with Agent Capabilities

These companies provide the underlying models and are adding agent infrastructure directly to their APIs.

ProviderAgent OfferingStrengthsLimitations
OpenAIAssistants API, GPT Actions, Function CallingMature API, code interpreter, file handling, managed threadsVendor lock-in; limited orchestration flexibility
AnthropicClaude tool use, computer use, extended thinkingStrong reasoning, large context (200K+), careful safety designNo managed agent infrastructure; BYO orchestration
GoogleGemini + ADK (Agent Development Kit), Vertex AI AgentsMulti-modal, long context (2M), tight GCP integrationEcosystem still maturing
AmazonBedrock Agents, action groups, knowledge basesMulti-model support, AWS integration, managed infrastructureAWS-centric; less flexibility for multi-cloud
MicrosoftAzure AI Agent Service, Copilot StudioEnterprise integration (M365, Dynamics), Semantic KernelComplex licensing; enterprise-focused

Orchestration Frameworks

These are the open-source and commercial frameworks for building agent systems.

FrameworkArchitectureCommunityProduction Readiness
LangGraphGraph-based state machinesLarge (LangChain ecosystem)High — used in production by many companies
Semantic KernelPlugin-based with plannersGrowing (Microsoft backing)High — production-grade with enterprise support
AutoGenConversation-based multi-agentActive research communityMedium — strong for research, evolving for production
CrewAIRole-based agent teamsGrowing rapidlyMedium — maturing quickly
LlamaIndexData-focused agent workflowsLargeHigh for RAG-centric applications
HaystackPipeline-based NLP/agent workflowsEstablishedHigh — production-tested

Agent Infrastructure

These platforms provide the runtime infrastructure for agent systems.

PlatformFocusKey Capability
LangSmithObservability, testing, evaluationEnd-to-end tracing, prompt playground, dataset management
LangfuseOpen-source LLM observabilitySelf-hostable, cost tracking, prompt management
Arize PhoenixLLM observability and evaluationTraces, evaluations, embedding analysis
E2BCode sandboxingSecure code execution environments for agents
ModalServerless computeGPU-enabled serverless for agent workloads
Weights & Biases WeaveExperiment trackingLLM application monitoring and evaluation

Emerging Standards

Model Context Protocol (MCP)

Origin: Anthropic (open-sourced November 2024) Status: Rapidly adopted across the industry Purpose: Standardized protocol for connecting AI models to external tools and data sources

MCP is the most significant standardization effort in the agent tooling space. It maps directly to the Operator Fabric’s tool registry and tool invocation patterns:

  • Resources: Expose data to the agent (files, database records, API responses).
  • Tools: Expose actions the agent can take (create file, run query, send message).
  • Prompts: Expose reusable prompt templates.
  • Sampling: Allow servers to request LLM completions from the host.

Architectural significance: MCP decouples tool implementation from agent implementation. A tool built as an MCP server works with any MCP-compatible agent, regardless of the orchestration framework. This is the Operator Adapter pattern implemented as an industry standard.

OpenAI Function Calling Schema

Status: De facto standard adopted by most providers Purpose: Standard format for declaring tool schemas that models can invoke

Most model providers have converged on a JSON Schema-based format for function calling. This near-standard enables portable tool definitions:

{
  "name": "search_codebase",
  "description": "Search the codebase for files matching a pattern",
  "parameters": {
    "type": "object",
    "properties": {
      "query": { "type": "string", "description": "Search pattern" },
      "max_results": { "type": "integer", "default": 10 }
    },
    "required": ["query"]
  }
}

Agent-to-Agent Protocols

Status: Early stage Purpose: Standardized communication between independent agent systems

Several proposals are emerging for agent-to-agent communication:

  • Google A2A (Agent-to-Agent): Protocol for agent interoperability, discovery, and task delegation between independent agent systems.
  • AGNTCY / ACP (Agent Communication Protocol): Open standards initiative for inter-agent messaging.

These map to the Multi-OS Coordination patterns (Chapter 34) — federation bus, capability discovery, and cross-OS messaging. The standards are nascent, but the architectural patterns are stable.

Governance and Safety Standards

Regulatory Landscape

Regulation / FrameworkJurisdictionAgent-Relevant Requirements
EU AI ActEuropean UnionRisk classification for AI systems; high-risk systems require conformity assessment, human oversight, transparency, and record-keeping
NIST AI RMFUnited StatesRisk management framework: govern, map, measure, manage. Voluntary but influential
ISO/IEC 42001InternationalAI management system standard. Certification for responsible AI practices
Executive Order 14110United StatesRequirements for AI safety testing, red-teaming, and reporting for frontier models
Singapore AI Governance FrameworkSingaporePrinciples-based governance with practical implementation guidance

How the Agentic OS Maps to Regulatory Requirements

Regulatory RequirementAgentic OS Component
Human oversightPermission Gates, Human Escalation, Staged Autonomy
TransparencyExecution Journal, Auditable Action, Active Plan Board
Risk managementRisk-Tiered Execution, Policy-Aware Scheduler, Governance Plane
Record-keepingAudit logging in the Governance Plane, Execution Journal
RobustnessFailure Containment, Recovery Process, Checkpoints and Rollback
Data governanceMemory Plane scoping, Capability-Based Access, data classification at boundaries

The Agentic OS architecture is not designed for compliance with any specific regulation. It is designed around principles — governance, transparency, accountability, isolation — that happen to align well with what regulators require. This is not coincidence; well-engineered systems and well-designed regulations both derive from the same insight: autonomous systems need structure.

Industry Safety Frameworks

FrameworkFocusRelevance
OWASP Top 10 for LLM ApplicationsSecurity vulnerabilities specific to LLM-powered applicationsDirectly applicable: prompt injection, data leakage, excessive agency, insecure plugins
MLCommons AI Safety BenchmarksStandardized safety evaluations for AI modelsUseful for evaluating model providers in the Model Provider Layer
Anthropic Responsible Scaling PolicyFramework for scaling AI capabilities safelyInforms Staged Autonomy and Risk-Tiered Execution patterns
MITRE ATLASAdversarial threat landscape for AI systemsThreat modeling for the Governance Plane

Practical Governance Checklist

For teams deploying agentic systems, a minimum governance implementation:

  • Audit trail: Every agent action is logged with timestamp, inputs, outputs, and authorization context.
  • Human-in-the-loop: Irreversible actions require human approval. The system can halt on demand.
  • Cost controls: Per-task and per-session budgets with automatic cutoff.
  • Capability scoping: Each worker has explicit tool permissions. No worker has unconstrained access.
  • Output validation: Generated outputs (code, communications, data modifications) are validated before delivery.
  • Incident response: A process exists for investigating and responding to agent misbehavior.
  • Data boundaries: Sensitive data is classified and scoped. Data does not leak across security boundaries.
  • Model evaluation: Regular evaluation of model outputs against quality and safety benchmarks.

Evaluation and Testing Ecosystem

Evaluation Frameworks

FrameworkPurpose
LMSYS Chatbot ArenaCrowd-sourced model comparison via blind pairwise evaluation
HELM (Stanford)Holistic evaluation of language models across scenarios and metrics
SWE-benchEvaluating agent capability on real-world software engineering tasks
Agent-benchCross-environment benchmark for agent capabilities
Inspect AI (UK AISI)Framework for evaluating AI system capabilities and safety properties

Testing Agentic Systems

Traditional software testing assumes deterministic behavior. Agentic systems require additional testing strategies:

  • Behavioral benchmarks: Curated test suites that evaluate the system’s behavior across representative scenarios, scored on multiple dimensions (correctness, safety, efficiency).
  • Regression detection: Compare system outputs before and after changes. Flag significant behavioral differences using LLM-as-judge evaluations.
  • Red teaming: Adversarial testing where evaluators attempt to cause the system to violate its governance policies, leak data, or produce harmful outputs.
  • Simulation testing: Run the system against simulated environments and users to test behavior at scale without real-world consequences.
  • Cost benchmarking: Track tokens consumed, latency, and monetary cost per task type. Detect efficiency regressions.

The number of tools, frameworks, and standards is overwhelming and growing. Three principles help navigate it:

  1. Architecture over tools. Choose your architecture first (this book provides one). Then select tools that implement each layer. Do not let a tool’s capabilities define your architecture.

  2. Standards over proprietary. Where standards exist (MCP for tools, OpenTelemetry for observability, JSON Schema for function calling), prefer them. They reduce lock-in and increase composability.

  3. Governance from day one. Do not treat governance as a phase-two concern. Audit logging, cost controls, and capability scoping are cheap to implement early and expensive to retrofit. The regulatory landscape is tightening, not loosening.

The landscape will look different in a year. The principles will not.