Design: Markdown as Executable Instructions

Overview

This document describes the technical design of Claude Ops' approach to using markdown documents as the primary format for all operational instructions — health checks, remediation playbooks, agent prompts, and repo-contributed extensions. Instead of executable scripts or structured DSLs, the system relies on the Claude Code CLI's ability to read prose instructions and interpret them contextually at runtime.

Architecture

Document Categories

The system uses four categories of markdown documents, each serving a distinct role:

/app/
├── checks/                     # Built-in health check instructions
│   ├── http.md                 # HTTP endpoint checks
│   ├── dns.md                  # DNS resolution checks
│   ├── containers.md           # Docker container state checks
│   ├── databases.md            # Database connectivity checks
│   └── services.md             # Service-specific checks
├── playbooks/                  # Built-in remediation procedures
│   ├── restart-container.md    # Container restart procedure
│   ├── rotate-api-key.md       # API key rotation via browser
│   └── redeploy-service.md     # Full service redeployment
├── prompts/                    # Agent tier behavior definitions
│   ├── tier1-observe.md        # Tier 1: observation-only
│   ├── tier2-investigate.md    # Tier 2: safe remediation
│   └── tier3-remediate.md      # Tier 3: full remediation
└── CLAUDE.md                   # Top-level agent runbook

/repos/<repo>/                  # Mounted infrastructure repos
└── .claude-ops/
    ├── checks/                 # Repo-specific health checks
    ├── playbooks/              # Repo-specific playbooks
    └── skills/                 # Repo-specific capabilities

Execution Model

The execution model is fundamentally different from traditional automation:

No interpreter/executor split: There is no separate engine that parses instructions and runs them. The Claude model IS the interpreter. It reads the markdown, understands the intent, and executes the appropriate commands using the tools available to it (Bash, Read, Glob, etc.).
No schema or parsing: Documents are not parsed into structured data. The model reads them as natural language and extracts the relevant information contextually. There is no document schema to validate against.
Contextual parameter substitution: Embedded command examples use placeholders like <url>, <container>, and <service>. The agent substitutes these with actual values from the infrastructure context (service inventory, container names, URLs) at runtime.
Judgment-based execution: Instructions can contain conditional guidance in prose ("if a service has a dedicated /health endpoint, prefer that"). The agent interprets these conditions against the actual infrastructure state, rather than following rigid if/else branches.

Discovery and Loading Flow

Monitoring Cycle Start
  │
  ├── 1. Read /app/checks/*.md (built-in checks)
  │
  ├── 2. For each repo in $CLAUDEOPS_REPOS_DIR:
  │     ├── Read CLAUDE-OPS.md (repo manifest)
  │     └── Read .claude-ops/checks/*.md (repo-specific checks)
  │
  ├── 3. Agent has full set of check instructions in context
  │
  ├── 4. Execute checks against discovered services
  │     └── For each service × check:
  │           ├── Read check document
  │           ├── Determine if this check applies to this service
  │           ├── Adapt commands to service-specific parameters
  │           └── Execute and evaluate results
  │
  └── 5. If issues found, load playbook instructions:
        ├── Read /app/playbooks/*.md
        ├── Read .claude-ops/playbooks/*.md from repos
        └── Select and execute the appropriate playbook

The agent does not load all documents into memory at once. It reads them on demand as needed during the monitoring cycle, using the Read tool to access files from the filesystem.

Document Format Conventions

While documents are freeform prose, conventions have emerged for consistency:

Check documents follow a pattern:

Title (H1): what the check covers
"When to Run" (H2): which services trigger this check
"How to Check" (H2): embedded command examples in code blocks
"What's Healthy" (H2): evaluation criteria, often as a bullet list
"What to Record" (H2): data points to capture
"Special Cases" (H2): edge conditions and exceptions

Playbook documents follow a pattern:

Title (H1): the remediation procedure name
Tier (bold text): minimum permission tier required
"When to Use" (H2): triggering conditions
"Prerequisites" (H2): conditions to verify before execution
"Steps" (H2): ordered remediation steps with embedded commands
"If It Doesn't Work" (H2): escalation and failure handling

Tier prompts follow a pattern:

Title (H1): tier identity
Environment section: which variables to read
Permissions section: explicit lists of permitted and prohibited actions
Steps section: ordered procedural steps for the cycle
Output format section: expected output structure

These patterns are conventions, not enforced schemas. The agent interprets documents that deviate from the pattern using its general language understanding.

Data Flow

Check Execution Flow

Agent reads checks/http.md
  → Understands: "check services with web endpoints using curl"
  → Understands: "200-299 is healthy, 500-599 is unhealthy"
  → Understands: "prefer /health endpoint if available"

Agent retrieves service inventory
  → Service: myapp at https://myapp.example.com
  → Service has known /health endpoint

Agent adapts and executes:
  → curl -s -o /dev/null -w "HTTP %{http_code}" --max-time 10 https://myapp.example.com/health
  → Result: HTTP 200
  → Classification: healthy
  → Records: service=myapp, url=https://myapp.example.com/health, status=200, result=healthy

Playbook Execution Flow

Agent reads playbooks/restart-container.md
  → Understands: requires Tier 2 minimum
  → Understands: check cooldown first (max 2 restarts per 4h)
  → Understands: record state → restart → wait → verify → update cooldown

Agent checks prerequisites:
  → Reads cooldown.json: myapp restart_count_4h = 1 (under limit)
  → Verifies container exists and is unhealthy

Agent executes steps:
  1. docker inspect myapp → records state
  2. docker restart myapp → executes restart
  3. Waits 20 seconds (not a database, so mid-range wait)
  4. curl https://myapp.example.com/health → HTTP 200
  5. Updates cooldown.json: restart_count_4h = 2

Extension Loading Flow

Tier 1 discovers repos:
  → /repos/myinfra/CLAUDE-OPS.md exists → reads manifest
  → /repos/myinfra/.claude-ops/checks/custom-api.md exists → queued for execution
  → /repos/myinfra/.claude-ops/playbooks/fix-custom.md exists → available for Tier 2+

Agent runs standard checks:
  → checks/http.md, checks/dns.md, checks/containers.md, ...

Agent runs repo-specific checks:
  → /repos/myinfra/.claude-ops/checks/custom-api.md

If remediation needed, repo playbooks are available:
  → /repos/myinfra/.claude-ops/playbooks/fix-custom.md

Key Decisions

Why prose instructions instead of executable scripts

Traditional automation uses scripts because execution must be deterministic and mechanical. In Claude Ops, the executor is an AI model that excels at interpreting natural language. Prose instructions leverage this capability:

Special cases are trivially expressed: "Services behind authentication may return 401/403 — this is expected and healthy" is a single sentence in prose but would require explicit conditional logic in a script.
Contextual adaptation is built-in: "If a service has a dedicated /health endpoint, prefer that" requires no code — the model adapts naturally.
No execution engine to maintain: There is no parser, schema validator, or DSL runtime. The model IS the runtime.

The trade-off is non-determinism: the same markdown may be interpreted slightly differently across runs. This is acceptable because infrastructure operations inherently involve judgment, and the model's variations are typically within the bounds of reasonable judgment.

Why no document schema validation

A schema would provide guarantees about document structure but would create friction:

Contributors would need to learn the schema
A validation step would be required before changes take effect
The schema would need to evolve as new document patterns emerge
The agent can handle missing sections by falling back to general knowledge

Instead, the system relies on conventions (documented in this design) and the agent's ability to interpret incomplete or non-standard documents.

Why embedded command examples rather than separate command files

Commands are embedded as code blocks within the prose because:

Context preservation: The command appears next to the prose that explains when and how to use it, making the document self-contained.
Adaptation guidance: Prose around the command explains how to adapt it ("adjust based on service — databases need longer"), which would be lost if commands were in separate files.
Single source of truth: The check description, the command to execute, and the evaluation criteria are all in one document.

Why playbooks specify minimum tiers in the document

Tier gating is a metadata annotation within the playbook's markdown rather than a separate permissions configuration because:

Co-location: The tier requirement is part of the playbook's operational context, alongside prerequisites and steps.
Self-documenting: A human reading the playbook immediately sees what permission level is needed.
No external permission registry: There is no separate file mapping playbooks to tiers that could drift from the actual documents.

The agent enforces tier gating by reading the playbook's tier annotation and comparing it against its own tier level (defined in its prompt).

Trade-offs

Gained

Zero authoring friction: Anyone who can write prose can add a health check or playbook. No programming language, DSL syntax, or build tools required.
Self-documenting system: The instructions ARE the documentation. There is no separate doc layer that can drift from the actual procedures.
Graceful handling of ambiguity: Infrastructure operations involve judgment calls that are naturally expressed in prose and naturally handled by the AI agent.
Instant extensibility: Drop a markdown file into .claude-ops/checks/ and it takes effect on the next cycle. No registration, compilation, or restart.
Reduced maintenance surface: No script interpreter, DSL parser, schema validator, or plugin system to maintain.

Lost

Deterministic execution: The same markdown may be interpreted differently across runs. The agent may choose different wait times, check different endpoints, or classify edge cases differently.
Static analysis: Typos in embedded commands are not caught until runtime. A missing section in a check document is only discovered when the agent tries to use it.
Unit testability: There is no function to call with known inputs. Testing requires running the full agent against real or mock infrastructure.
Performance overhead: The agent reads and reasons about prose on every cycle. A pre-parsed script or DSL would execute faster. However, since the monitoring interval is 60 minutes and a cycle typically completes in seconds to a few minutes, this overhead is negligible.
Debugging opacity: When the agent does something unexpected, debugging requires reading agent logs to understand how it interpreted the instructions, rather than stepping through deterministic code.

Future Considerations

Document linting: A lightweight linter could verify that check documents contain the expected sections (When to Run, How to Check, etc.) without imposing a rigid schema. This would catch structural omissions before runtime.
Instruction testing framework: A test harness could run the agent against mock infrastructure with expected outcomes, validating that markdown instructions produce correct behavior. This would not be unit testing but rather integration testing of the instruction set.
Version-controlled instruction sets: Tagging instruction versions would allow rollback if a modified check or playbook causes issues. Currently, changes take effect immediately on the next cycle with no rollback mechanism.
Instruction metrics: Tracking which checks and playbooks are read, how often they lead to successful remediation, and which produce false positives would enable data-driven refinement of the instruction set.
Template system: For teams contributing many similar checks, a lightweight template mechanism (markdown includes or parameterized documents) could reduce duplication while preserving the prose-based approach.

Overview​

Architecture​

Document Categories​

Execution Model​

Discovery and Loading Flow​

Document Format Conventions​

Data Flow​

Check Execution Flow​

Playbook Execution Flow​

Extension Loading Flow​

Key Decisions​

Why prose instructions instead of executable scripts​

Why no document schema validation​

Why embedded command examples rather than separate command files​

Why playbooks specify minimum tiers in the document​

Trade-offs​

Gained​

Lost​

Future Considerations​