Skip to main content

Use MCP Servers as Primary Infrastructure Access Layer

Context and Problem Statement

Claude Ops is an AI agent that monitors and remediates infrastructure from inside a Docker container. To do its job, it must interact with diverse infrastructure components: Docker containers, PostgreSQL databases, web UIs (for credential rotation), and arbitrary HTTP endpoints. The agent needs a consistent, safe, and extensible mechanism for accessing these systems.

The Claude Code CLI supports the Model Context Protocol (MCP), which defines a standard interface for connecting AI models to external tools and data sources. MCP servers run as stdio subprocesses (spawned via npx -y <package>) and expose typed tool interfaces that Claude can invoke directly. The question is whether MCP should be the primary way Claude Ops accesses infrastructure, or whether alternative approaches (direct CLI commands, client libraries, or a custom API gateway) would better serve the system's goals.

Currently, four baseline MCP servers are configured in .claude/mcp.json:

  1. Docker MCP (@anthropic-ai/mcp-docker) -- container inspection and management
  2. PostgreSQL MCP (@anthropic-ai/mcp-postgres) -- database querying
  3. Chrome DevTools MCP (@anthropic-ai/mcp-chrome-devtools) -- browser automation for credential rotation via web UIs
  4. Fetch MCP (@anthropic-ai/mcp-fetch) -- HTTP requests

Mounted infrastructure repos can extend this set by providing .claude-ops/mcp.json files, which the entrypoint merges into the baseline before each run.

Decision Drivers

  • Structured tool interfaces -- Claude performs better when it has typed, well-documented tool schemas rather than constructing arbitrary CLI commands. MCP servers expose JSON Schema-defined tools that Claude can reason about without memorizing CLI flags.
  • Safety through constrained surface area -- An MCP server exposes a finite set of operations. This is inherently safer than giving the agent unrestricted shell access, where any command is possible. The tool surface area can be audited.
  • Extensibility for mounted repos -- Claude Ops discovers infrastructure repos at runtime. Repos need a way to bring their own integrations (e.g., a custom Ansible inventory server, a Prometheus query server) without modifying the base container. The mechanism must support runtime composition.
  • Ecosystem alignment -- MCP is the emerging standard for AI-tool integration in the Claude ecosystem. Building on it means benefiting from upstream improvements, community-contributed servers, and Anthropic's investment in the protocol.
  • Operational simplicity -- The agent runs in a container with Node.js available. MCP servers via npx -y require no pre-installation, compilation, or dependency management beyond a network connection and npm registry access.
  • Debuggability -- When the agent takes an action, operators need to understand what happened. The mechanism should produce clear, traceable interactions.

Considered Options

  1. MCP servers for structured infrastructure access -- Use MCP as the primary access layer, with baseline servers for common infrastructure and repo-provided servers for custom integrations.
  2. Direct CLI execution -- Have Claude construct and execute shell commands (docker, psql, curl, etc.) via the Bash tool for all infrastructure interactions.
  3. Language-specific client libraries -- Write a Node.js or Python service layer using Docker SDK, pg client, puppeteer, etc., that Claude calls through a custom tool interface.
  4. Custom REST API gateway -- Build and deploy a REST API that wraps all infrastructure access behind authenticated endpoints, which Claude calls via HTTP.

Decision Outcome

Chosen option: "MCP servers for structured infrastructure access", because it provides the best combination of safety, extensibility, and operational simplicity for an AI agent that must interact with heterogeneous infrastructure at runtime.

MCP servers give Claude typed tool schemas to reason about, which reduces the risk of malformed commands compared to raw CLI execution. The stdio subprocess model means servers start on demand, require no persistent daemons, and are isolated per process. The merge-based extension mechanism (entrypoint.sh lines 32-58) allows mounted repos to bring their own MCP servers via .claude-ops/mcp.json without rebuilding the container, which is essential for Claude Ops' runtime-discovery architecture.

The four baseline servers cover the core infrastructure access patterns: container management (Docker MCP), database querying (PostgreSQL MCP), browser automation for web UIs that lack APIs (Chrome DevTools MCP), and general HTTP requests (Fetch MCP). This maps directly to the playbooks the agent executes -- for example, the rotate-api-key.md playbook uses both the Fetch MCP (for REST API-based rotation) and Chrome DevTools MCP (for browser-based rotation when no API is available).

Consequences

Positive:

  • Claude interacts with infrastructure through well-defined tool schemas rather than constructing arbitrary shell commands, reducing the risk of malformed or dangerous operations.
  • The finite tool surface area of each MCP server can be audited and understood. Operators can review what operations are possible by inspecting the MCP server's tool definitions.
  • Mounted repos can extend the agent's capabilities by providing .claude-ops/mcp.json files. The entrypoint's merge logic (jq -s merge with repo-wins-on-collision semantics) handles composition automatically.
  • MCP servers are started as stdio subprocesses via npx -y, requiring no pre-installation or persistent infrastructure. Adding a new integration is as simple as specifying an npm package name.
  • The system benefits from Anthropic's and the community's investment in MCP server quality, documentation, and new server packages. As the ecosystem grows, more infrastructure integrations become available without custom development.
  • Each MCP server runs in its own process with its own environment variables, providing natural isolation between infrastructure concerns (e.g., the PostgreSQL connection string is only available to the PostgreSQL MCP server).

Negative:

  • Runtime npx installs introduce latency and fragility. Each MCP server is fetched from npm on first invocation. If the npm registry is unreachable, network is slow, or a package version is yanked, the agent cannot function. This is a hard dependency on external infrastructure for the infrastructure monitoring tool itself.
  • Version pinning is implicit. Using npx -y <package> without version pinning means the agent could pick up breaking changes in MCP server packages between runs. Production deployments should consider pinning versions (e.g., @anthropic-ai/mcp-docker@1.2.3) at the cost of manual update management.
  • MCP is a relatively new protocol. Server implementations may have bugs, incomplete tool coverage, or undocumented behavior. The agent may need to fall back to the Bash tool for operations that MCP servers do not yet support.
  • Debugging MCP interactions requires understanding the MCP protocol layer. Raw CLI commands are more transparent -- operators can copy-paste and run them manually. MCP tool invocations require inspecting the tool schema and the server's behavior.
  • The merge mechanism for repo-provided MCP configs uses a last-writer-wins strategy based on alphabetical repo ordering. If two repos define an MCP server with the same name, the behavior depends on directory sort order, which could be surprising.

Pros and Cons of the Options

MCP servers for structured infrastructure access

  • Good, because Claude receives typed tool schemas with parameter descriptions, enabling more reliable tool use than constructing CLI commands from memory.
  • Good, because the tool surface area is bounded and auditable -- each MCP server exposes a known set of operations, unlike unrestricted shell access.
  • Good, because the extension mechanism (.claude-ops/mcp.json merge) allows repos to bring custom integrations without container rebuilds, aligning with Claude Ops' runtime-discovery architecture.
  • Good, because npx -y subprocess spawning requires zero pre-installation -- any MCP server on npm is available immediately.
  • Good, because MCP servers handle connection management, error handling, and response formatting, reducing the logic Claude must implement in each interaction.
  • Good, because the protocol is an active Anthropic investment, meaning continued ecosystem growth and quality improvements.
  • Bad, because npx -y runtime installs depend on npm registry availability and network connectivity, creating a bootstrap dependency for the monitoring tool.
  • Bad, because without explicit version pinning, MCP server behavior may change between runs due to automatic package updates.
  • Bad, because MCP adds a protocol layer between Claude and the infrastructure, which can obscure what is actually happening and complicate debugging compared to raw CLI commands.
  • Bad, because the Chrome DevTools MCP requires a separate browser sidecar container, adding deployment complexity for browser automation capabilities.

Direct CLI execution (docker, psql, curl commands via Bash tool)

  • Good, because CLI commands are universally understood, debuggable, and can be copy-pasted by operators to reproduce actions.
  • Good, because there are no additional dependencies -- docker, psql, and curl are standard tools available in any infrastructure container.
  • Good, because there is no protocol overhead or abstraction layer between the agent and the infrastructure.
  • Good, because CLI output is plain text that Claude can parse, and operators can read directly in logs.
  • Bad, because Claude must construct correct command strings from memory, including flags, quoting, and escaping, which is error-prone for complex operations.
  • Bad, because the full power of the shell is available, meaning the agent could construct any command -- there is no bounded tool surface area to audit.
  • Bad, because there is no standard extension mechanism. Repos cannot "bring their own tools" without modifying the container or adding shell scripts.
  • Bad, because connection management (database connections, Docker socket access, authentication) must be handled ad-hoc in each command.
  • Bad, because different CLI tools have different output formats, requiring Claude to parse each one differently.

Language-specific client libraries (Docker SDK, pg client, puppeteer)

  • Good, because client libraries provide strong typing, error handling, and connection pooling out of the box.
  • Good, because they offer the most complete coverage of each service's API surface.
  • Good, because library-level abstractions can handle retries, timeouts, and authentication transparently.
  • Bad, because it requires writing and maintaining a custom service layer (Node.js or Python) that wraps each library and exposes it to Claude, effectively building custom MCP servers without the protocol.
  • Bad, because each library has its own dependency tree, version constraints, and breaking change cadence, creating a significant maintenance burden.
  • Bad, because mounted repos cannot extend the system at runtime -- new libraries require container rebuilds with the new dependencies.
  • Bad, because it couples the system to specific language runtimes and package ecosystems, reducing flexibility.
  • Bad, because Claude Ops is intentionally a "no application code" system -- introducing a compiled service layer contradicts the architecture of markdown runbooks executed by the CLI.

Custom REST API gateway that wraps infrastructure access

  • Good, because it provides a single, well-defined API surface that can be versioned, authenticated, and rate-limited.
  • Good, because the API gateway can enforce fine-grained authorization policies (e.g., read-only endpoints for Tier 1, write endpoints for Tier 2+).
  • Good, because it decouples the agent from infrastructure details -- Claude calls a stable API, and the gateway handles the messy integration work.
  • Bad, because it requires designing, building, deploying, and maintaining a separate API service -- a significant engineering investment for a project that deliberately avoids application code.
  • Bad, because the gateway becomes a single point of failure. If it goes down, the monitoring agent is blind.
  • Bad, because every new infrastructure integration requires adding an endpoint to the gateway, deploying a new version, and updating the API documentation.
  • Bad, because repos cannot extend the agent's capabilities at runtime -- new integrations require gateway changes, not just a JSON config file in the repo.
  • Bad, because the API gateway must itself be monitored and maintained, creating a recursive monitoring problem.