Appendix B: Platform Landscape and Governance Standards

The agentic systems landscape is evolving rapidly. This appendix maps the current ecosystem — platforms, standards, and governance frameworks — to help practitioners orient their architectural decisions within the broader industry context.

This appendix will age faster than any other part of this book. Use it as a snapshot of the landscape at the time of writing and as a framework for evaluating new entries as they appear.

Agent Platforms

The market has stratified into distinct categories:

Foundation Model Providers with Agent Capabilities

These companies provide the underlying models and are adding agent infrastructure directly to their APIs.

Provider	Agent Offering	Strengths	Limitations
OpenAI	Assistants API, GPT Actions, Function Calling	Mature API, code interpreter, file handling, managed threads	Vendor lock-in; limited orchestration flexibility
Anthropic	Claude tool use, computer use, extended thinking	Strong reasoning, large context (200K+), careful safety design	No managed agent infrastructure; BYO orchestration
Google	Gemini + ADK (Agent Development Kit), Vertex AI Agents	Multi-modal, long context (2M), tight GCP integration	Ecosystem still maturing
Amazon	Bedrock Agents, action groups, knowledge bases	Multi-model support, AWS integration, managed infrastructure	AWS-centric; less flexibility for multi-cloud
Microsoft	Azure AI Agent Service, Copilot Studio	Enterprise integration (M365, Dynamics), Semantic Kernel	Complex licensing; enterprise-focused

Orchestration Frameworks

These are the open-source and commercial frameworks for building agent systems.

Framework	Architecture	Community	Production Readiness
LangGraph	Graph-based state machines	Large (LangChain ecosystem)	High — used in production by many companies
Semantic Kernel	Plugin-based with planners	Growing (Microsoft backing)	High — production-grade with enterprise support
AutoGen	Conversation-based multi-agent	Active research community	Medium — strong for research, evolving for production
CrewAI	Role-based agent teams	Growing rapidly	Medium — maturing quickly
LlamaIndex	Data-focused agent workflows	Large	High for RAG-centric applications
Haystack	Pipeline-based NLP/agent workflows	Established	High — production-tested

Agent Infrastructure

These platforms provide the runtime infrastructure for agent systems.

Platform	Focus	Key Capability
LangSmith	Observability, testing, evaluation	End-to-end tracing, prompt playground, dataset management
Langfuse	Open-source LLM observability	Self-hostable, cost tracking, prompt management
Arize Phoenix	LLM observability and evaluation	Traces, evaluations, embedding analysis
E2B	Code sandboxing	Secure code execution environments for agents
Modal	Serverless compute	GPU-enabled serverless for agent workloads
Weights & Biases Weave	Experiment tracking	LLM application monitoring and evaluation

Emerging Standards

Model Context Protocol (MCP)

Origin: Anthropic (open-sourced November 2024) Status: Rapidly adopted across the industry Purpose: Standardized protocol for connecting AI models to external tools and data sources

MCP is the most significant standardization effort in the agent tooling space. It maps directly to the Operator Fabric’s tool registry and tool invocation patterns:

Resources: Expose data to the agent (files, database records, API responses).
Tools: Expose actions the agent can take (create file, run query, send message).
Prompts: Expose reusable prompt templates.
Sampling: Allow servers to request LLM completions from the host.

Architectural significance: MCP decouples tool implementation from agent implementation. A tool built as an MCP server works with any MCP-compatible agent, regardless of the orchestration framework. This is the Operator Adapter pattern implemented as an industry standard.

OpenAI Function Calling Schema

Status: De facto standard adopted by most providers Purpose: Standard format for declaring tool schemas that models can invoke

Most model providers have converged on a JSON Schema-based format for function calling. This near-standard enables portable tool definitions:

{
  "name": "search_codebase",
  "description": "Search the codebase for files matching a pattern",
  "parameters": {
    "type": "object",
    "properties": {
      "query": { "type": "string", "description": "Search pattern" },
      "max_results": { "type": "integer", "default": 10 }
    },
    "required": ["query"]
  }
}

Agent-to-Agent Protocols

Status: Early stage Purpose: Standardized communication between independent agent systems

Several proposals are emerging for agent-to-agent communication:

Google A2A (Agent-to-Agent): Protocol for agent interoperability, discovery, and task delegation between independent agent systems.
AGNTCY / ACP (Agent Communication Protocol): Open standards initiative for inter-agent messaging.

These map to the Multi-OS Coordination patterns (Chapter 34) — federation bus, capability discovery, and cross-OS messaging. The standards are nascent, but the architectural patterns are stable.

Governance and Safety Standards

Regulatory Landscape

Regulation / Framework	Jurisdiction	Agent-Relevant Requirements
EU AI Act	European Union	Risk classification for AI systems; high-risk systems require conformity assessment, human oversight, transparency, and record-keeping
NIST AI RMF	United States	Risk management framework: govern, map, measure, manage. Voluntary but influential
ISO/IEC 42001	International	AI management system standard. Certification for responsible AI practices
Executive Order 14110	United States	Requirements for AI safety testing, red-teaming, and reporting for frontier models
Singapore AI Governance Framework	Singapore	Principles-based governance with practical implementation guidance

How the Agentic OS Maps to Regulatory Requirements

Regulatory Requirement	Agentic OS Component
Human oversight	Permission Gates, Human Escalation, Staged Autonomy
Transparency	Execution Journal, Auditable Action, Active Plan Board
Risk management	Risk-Tiered Execution, Policy-Aware Scheduler, Governance Plane
Record-keeping	Audit logging in the Governance Plane, Execution Journal
Robustness	Failure Containment, Recovery Process, Checkpoints and Rollback
Data governance	Memory Plane scoping, Capability-Based Access, data classification at boundaries

The Agentic OS architecture is not designed for compliance with any specific regulation. It is designed around principles — governance, transparency, accountability, isolation — that happen to align well with what regulators require. This is not coincidence; well-engineered systems and well-designed regulations both derive from the same insight: autonomous systems need structure.

Industry Safety Frameworks

Framework	Focus	Relevance
OWASP Top 10 for LLM Applications	Security vulnerabilities specific to LLM-powered applications	Directly applicable: prompt injection, data leakage, excessive agency, insecure plugins
MLCommons AI Safety Benchmarks	Standardized safety evaluations for AI models	Useful for evaluating model providers in the Model Provider Layer
Anthropic Responsible Scaling Policy	Framework for scaling AI capabilities safely	Informs Staged Autonomy and Risk-Tiered Execution patterns
MITRE ATLAS	Adversarial threat landscape for AI systems	Threat modeling for the Governance Plane

Practical Governance Checklist

For teams deploying agentic systems, a minimum governance implementation:

Audit trail: Every agent action is logged with timestamp, inputs, outputs, and authorization context.
Human-in-the-loop: Irreversible actions require human approval. The system can halt on demand.
Cost controls: Per-task and per-session budgets with automatic cutoff.
Capability scoping: Each worker has explicit tool permissions. No worker has unconstrained access.
Output validation: Generated outputs (code, communications, data modifications) are validated before delivery.
Incident response: A process exists for investigating and responding to agent misbehavior.
Data boundaries: Sensitive data is classified and scoped. Data does not leak across security boundaries.
Model evaluation: Regular evaluation of model outputs against quality and safety benchmarks.

Evaluation and Testing Ecosystem

Evaluation Frameworks

Framework	Purpose
LMSYS Chatbot Arena	Crowd-sourced model comparison via blind pairwise evaluation
HELM (Stanford)	Holistic evaluation of language models across scenarios and metrics
SWE-bench	Evaluating agent capability on real-world software engineering tasks
Agent-bench	Cross-environment benchmark for agent capabilities
Inspect AI (UK AISI)	Framework for evaluating AI system capabilities and safety properties

Testing Agentic Systems

Traditional software testing assumes deterministic behavior. Agentic systems require additional testing strategies:

Behavioral benchmarks: Curated test suites that evaluate the system’s behavior across representative scenarios, scored on multiple dimensions (correctness, safety, efficiency).
Regression detection: Compare system outputs before and after changes. Flag significant behavioral differences using LLM-as-judge evaluations.
Red teaming: Adversarial testing where evaluators attempt to cause the system to violate its governance policies, leak data, or produce harmful outputs.
Simulation testing: Run the system against simulated environments and users to test behavior at scale without real-world consequences.
Cost benchmarking: Track tokens consumed, latency, and monetary cost per task type. Detect efficiency regressions.

Navigating the Landscape

The number of tools, frameworks, and standards is overwhelming and growing. Three principles help navigate it:

Architecture over tools. Choose your architecture first (this book provides one). Then select tools that implement each layer. Do not let a tool’s capabilities define your architecture.
Standards over proprietary. Where standards exist (MCP for tools, OpenTelemetry for observability, JSON Schema for function calling), prefer them. They reduce lock-in and increase composability.
Governance from day one. Do not treat governance as a phase-two concern. Audit logging, cost controls, and capability scoping are cheap to implement early and expensive to retrofit. The regulatory landscape is tightening, not loosening.

The landscape will look different in a year. The principles will not.

Keyboard shortcuts

The Agentic OS