Invoke Claude via Claude Code CLI as Subprocess

Context and Problem Statement

Claude Ops is an AI infrastructure monitoring agent that runs inside a Docker container on a scheduled loop. Each cycle, it must invoke one or more Claude models to read markdown-based checks and playbooks, execute commands against mounted infrastructure repos, and optionally escalate through tiered remediation. The system needs a mechanism to invoke Claude that supports model selection, prompt loading from files, tool filtering for permission enforcement, runtime environment injection, MCP server management, and subagent spawning for tier escalation.

The core question is: How should Claude Ops invoke Claude models at runtime -- through a CLI tool, a language-specific SDK, or a custom agent framework?

The answer has significant implications for the project's architecture. Claude Ops is deliberately not a traditional application codebase -- it contains no application code to compile or test. Its "code" is markdown documents, a shell entrypoint, a Dockerfile, and configuration files. Whatever invocation mechanism is chosen must fit this paradigm and not force the project into maintaining a runtime application in Python, TypeScript, or another language.

Decision Drivers

Zero application code -- Claude Ops is an AI agent runbook, not a software application. The invocation mechanism should not require writing or maintaining application code in any programming language. The entrypoint is a Bash script; the intelligence lives in markdown prompts.
Feature completeness out of the box -- The mechanism must support model selection, prompt file loading, tool filtering, system prompt injection, non-interactive output, MCP server configuration, and subagent spawning (the Task tool) without custom implementation.
Permission enforcement -- The tiered escalation model (ADR-0001) requires restricting which tools each tier can use. The mechanism must support allowlisting tools at invocation time, not just at the prompt level, so that Tier 1 (observe-only) genuinely cannot call remediation tools.
MCP server management -- Claude Ops uses MCP servers (Docker, Postgres, Chrome DevTools, Fetch) defined in .claude/mcp.json, with repo-level overrides merged at startup. The mechanism must natively read and manage MCP server configurations without custom connection code.
Subagent spawning -- Tier escalation relies on spawning subagents (Tier 1 spawns Tier 2, Tier 2 spawns Tier 3) via the Task tool, passing context forward. The mechanism must support this natively.
Container-friendly -- The mechanism must install cleanly in a Docker image, require minimal dependencies, and not introduce version conflicts with the existing Node.js 22 base image.
Operational simplicity -- Debugging, logging, and monitoring should be straightforward. The mechanism's output should be capturable via standard Unix tools (tee, pipe, redirect).

Considered Options

Claude Code CLI as subprocess -- Install @anthropic-ai/claude-code globally via npm; invoke it from entrypoint.sh with command-line flags.
Anthropic Python SDK -- Write a Python script that uses the anthropic Python package to make direct API calls.
Anthropic TypeScript SDK -- Write a TypeScript/Node.js script that uses the @anthropic-ai/sdk package to make direct API calls (leveraging the existing Node.js runtime).
Claude Agent SDK -- Build a custom agent runtime using the @anthropic-ai/claude-code SDK programmatically in TypeScript, managing the agent loop, tool execution, and subagent coordination in code.

Decision Outcome

Chosen option: "Claude Code CLI as subprocess", because it is the only option that preserves the zero-application-code architecture while providing every required feature -- model selection, prompt file loading, tool filtering, system prompt injection, MCP server management, and subagent spawning -- as built-in CLI flags. The entire invocation is a single shell command:

claude \
    --model "${MODEL}" \
    --print \
    --prompt-file "${PROMPT_FILE}" \
    --allowedTools "${ALLOWED_TOOLS}" \
    --append-system-prompt "Environment: ${ENV_CONTEXT}"

This maps directly to the project's design philosophy: the entrypoint is a Bash script, the intelligence lives in markdown prompts, and the CLI handles everything in between. There is no Python script, no TypeScript application, no custom agent loop to maintain. The CLI reads .claude/mcp.json for MCP server configuration, supports the Task tool for subagent spawning, and enforces tool allowlists for permission tiers -- all without a single line of application code.

Consequences

Positive:

The project maintains its zero-application-code architecture. The entrypoint remains a Bash script; all intelligence stays in markdown prompts. There is no src/ directory, no main.py, no compiled artifacts.
Every required feature is available as a CLI flag, eliminating the need to implement model selection, prompt loading, tool filtering, system prompt injection, or MCP management in code.
Tool filtering via --allowedTools enforces permission tiers at the runtime level, not just the prompt level. Tier 1 genuinely cannot invoke remediation tools even if a prompt injection or reasoning error occurs.
MCP server configuration is managed by the CLI natively. The entrypoint merges repo-level configs into .claude/mcp.json, and the CLI handles server lifecycle (starting, connecting, shutting down MCP servers).
Subagent spawning via the Task tool works out of the box. Tier 1 can spawn Tier 2 with Task(model: "sonnet", prompt: ...) without any custom orchestration code.
Output is plain text on stdout/stderr, trivially captured via tee and standard Unix logging.
The CLI is installed with a single npm install -g in the Dockerfile, adding no new language runtimes or dependency chains beyond the existing Node.js base image.
Upgrades to the CLI (new features, bug fixes, new model support) are picked up by updating a single npm package version -- no code changes required.

Negative:

The project takes a hard dependency on the Claude Code CLI's command-line interface. Flag names, output format, or behavioral changes in CLI updates could break the entrypoint.
The CLI is a larger installation than an SDK (~100MB+ for the npm package vs. a lightweight SDK). This increases Docker image size.
Error handling is limited to exit codes and stderr parsing. An SDK would provide structured error types (rate limit, authentication failure, context window exceeded) that could be handled programmatically.
The CLI's internal agent loop is opaque. There is no way to hook into tool execution, intercept MCP responses, or add custom middleware -- the CLI is a black box between invocation and output.
Subprocess invocation adds process startup overhead on each cycle compared to a long-running SDK-based process that maintains connections.
The --print flag produces streaming text output that must be parsed if structured results are needed. An SDK would return structured responses natively.

Pros and Cons of the Options

Claude Code CLI as Subprocess

Good, because it requires zero application code -- the entire invocation is a single line in a Bash script.
Good, because --allowedTools enforces permission tiers at the runtime level, providing defense-in-depth beyond prompt instructions.
Good, because --prompt-file loads tier prompts directly from markdown files, aligning with the markdown-as-instructions architecture (ADR-0002).
Good, because --append-system-prompt injects runtime environment variables (state dir, repos dir, dry run mode, Apprise URLs) without modifying prompt files.
Good, because --model enables the tiered escalation model (ADR-0001) through simple flag changes per invocation.
Good, because MCP server management (.claude/mcp.json) is handled natively, including server lifecycle and connection pooling.
Good, because the Task tool enables subagent spawning for tier escalation without custom orchestration.
Good, because --print produces non-interactive output suitable for logging and piping.
Bad, because the project is tightly coupled to the CLI's interface -- breaking changes in flag names or behavior require entrypoint updates.
Bad, because the CLI is a black box: there is no visibility into or control over tool execution, MCP communication, or internal reasoning steps.
Bad, because error handling is limited to exit codes and text parsing rather than structured error types.
Bad, because the npm package is large, increasing Docker image size relative to a lightweight SDK.

Anthropic Python SDK (Direct API Calls)

Good, because it provides structured, typed responses and explicit error types (rate limit, auth, context window).
Good, because Python has a large ecosystem of infrastructure tooling (Ansible, Fabric, Boto3) that could be leveraged alongside API calls.
Good, because it offers fine-grained control over request parameters, retry logic, and streaming.
Bad, because it requires writing and maintaining a Python application -- a main.py, requirements management, error handling, logging, and all the overhead of application code.
Bad, because tool execution must be implemented manually. The SDK returns tool-use requests; the application must execute the tools and return results. This means reimplementing what the CLI does natively (Bash execution, file reading, grep, glob, MCP connections).
Bad, because MCP server management must be implemented from scratch -- starting servers, managing connections, routing tool calls to the correct server.
Bad, because subagent spawning (the Task tool) must be implemented as custom code rather than using the built-in mechanism.
Bad, because --allowedTools filtering must be reimplemented as application-level tool filtering, which is more error-prone.
Bad, because it adds Python as a runtime dependency beyond what the project needs (Python is currently installed only for Apprise).
Bad, because it fundamentally changes the project from an AI agent runbook into a Python application that happens to call an AI API.

Anthropic TypeScript SDK (Direct API Calls)

Good, because the container already runs Node.js 22, so no additional runtime is needed.
Good, because it provides typed responses and structured error handling in TypeScript.
Good, because it could share the same Node.js process as MCP servers, potentially reducing overhead.
Bad, because it requires writing and maintaining a TypeScript application -- src/ directory, tsconfig.json, build step, compiled output.
Bad, because all tool execution (Bash, file operations, grep, glob) must be implemented manually, duplicating CLI functionality.
Bad, because MCP server management must be built from scratch, including the merge logic currently handled by shell + jq.
Bad, because subagent spawning must be implemented as recursive API calls with manual context threading.
Bad, because permission enforcement (--allowedTools) must be reimplemented as application-level logic.
Bad, because introducing a TypeScript build step contradicts the project's zero-build, zero-compile design.
Bad, because it transforms the project from markdown-driven runbook into a TypeScript application, requiring developers to understand both the prompts and the application code.

Claude Agent SDK (Custom Agent Runtime)

Good, because it provides programmatic access to the full agent loop, including tool execution hooks and middleware.
Good, because it could enable advanced features like custom tool implementations, execution tracing, and structured result collection.
Good, because it offers the most control over agent behavior, including custom retry logic, context window management, and parallel tool execution.
Good, because it could integrate MCP servers programmatically with fine-grained control over server lifecycle.
Bad, because it requires building and maintaining a substantial TypeScript application -- an agent runtime is significantly more code than a simple API wrapper.
Bad, because it introduces the highest maintenance burden: the custom runtime must be updated for SDK changes, new tool types, and protocol evolution.
Bad, because it is the most radical departure from the zero-code architecture. The project would become a TypeScript agent framework with markdown prompts as configuration.
Bad, because the complexity of a custom agent runtime (error recovery, state management, tool routing, MCP lifecycle) is orders of magnitude beyond the current ~95-line entrypoint script.
Bad, because debugging shifts from reading agent logs to debugging a custom TypeScript runtime, raising the bar for contributors.
Bad, because the Agent SDK is relatively new and has a smaller community, meaning less documentation and fewer examples for edge cases.

Context and Problem Statement​

Decision Drivers​

Considered Options​

Decision Outcome​

Consequences​

Pros and Cons of the Options​

Claude Code CLI as Subprocess​

Anthropic Python SDK (Direct API Calls)​

Anthropic TypeScript SDK (Direct API Calls)​

Claude Agent SDK (Custom Agent Runtime)​