SPEC-0005: Mounted Repo Extension Model
Overview
Claude Ops monitors infrastructure health by running checks and remediating issues autonomously inside a Docker container. The infrastructure it monitors is defined across multiple repositories owned by different teams -- Ansible inventories, Docker image repos, Helm chart repos, and others. These repos are mounted as Docker volumes under a configurable parent directory (default: /repos/).
This specification defines a convention-based extension model that allows mounted repositories to extend the agent's behavior at runtime. Repos can declare their identity and rules via a manifest file (CLAUDE-OPS.md), provide custom health checks, remediation playbooks, operational skills, and MCP server configurations through a well-known directory structure (.claude-ops/). The agent discovers these extensions by scanning the repos directory at the start of each monitoring cycle, requiring zero configuration changes to the Claude Ops platform itself.
Definitions
- Mounted repo: An infrastructure repository mounted as a Docker volume subdirectory under
$CLAUDEOPS_REPOS_DIR(default:/repos/). - Manifest: A
CLAUDE-OPS.mdmarkdown file at a repo's root that declares the repo's identity, capabilities, and rules for the agent to follow. - Extension directory: A
.claude-ops/directory at a repo's root containing custom checks, playbooks, skills, and MCP configurations. - Check: A markdown file describing a health check procedure that the agent reads and executes by running the described commands.
- Playbook: A markdown file describing a remediation procedure for a specific failure scenario.
- Skill: A markdown file describing a freeform operational capability (maintenance, reporting, cleanup).
- MCP config: A JSON file (
.claude-ops/mcp.json) defining additional MCP servers to merge into the agent's baseline configuration. - Monitoring cycle: One complete execution of the agent's observe-investigate-remediate loop, initiated by
entrypoint.sh. - Tier: A permission level (1=observe, 2=safe remediation, 3=full remediation) that constrains what actions the agent may take.
Requirements
REQ-1: Repo Discovery via Directory Scanning
The system MUST discover mounted repos by scanning all immediate subdirectories of $CLAUDEOPS_REPOS_DIR at the start of each monitoring cycle. The scan MUST be performed every cycle so that newly mounted or removed repos are detected without requiring a container restart.
Scenario: Discover repos at startup
Given the agent starts a new monitoring cycle
And $CLAUDEOPS_REPOS_DIR is set to /repos
And /repos/ contains subdirectories infra-ansible/, docker-images/, and helm-charts/
When the agent scans for mounted repos
Then it identifies three repos: infra-ansible, docker-images, and helm-charts
Scenario: New repo detected on subsequent cycle
Given the agent completed a monitoring cycle with repos infra-ansible/ and docker-images/
And a new volume helm-charts/ is mounted under /repos/ before the next cycle
When the next monitoring cycle begins
Then the agent discovers and processes all three repos including the newly added helm-charts/
Scenario: Custom repos directory
Given $CLAUDEOPS_REPOS_DIR is set to /my/custom/repos
And /my/custom/repos/ contains subdirectory my-infra/
When the agent scans for mounted repos
Then it discovers the repo at /my/custom/repos/my-infra/
REQ-2: Manifest Discovery and Reading
The system MUST look for a CLAUDE-OPS.md file at the root of each discovered repo. If found, the agent MUST read the manifest and incorporate its declared capabilities and rules into the current monitoring cycle's context.
Scenario: Repo with CLAUDE-OPS.md manifest
Given a repo is mounted at /repos/infra-ansible
And it contains a CLAUDE-OPS.md at the root
When the agent scans for repo extensions
Then it reads the manifest and incorporates the repo's declared capabilities and rules
Scenario: Repo without CLAUDE-OPS.md manifest
Given a repo is mounted at /repos/docker-images
And it does not contain a CLAUDE-OPS.md file
When the agent scans for repo extensions
Then it proceeds to check for a .claude-ops/ directory and falls back to file inference (REQ-8)
REQ-3: Manifest Content Structure
The manifest (CLAUDE-OPS.md) SHOULD include the following sections to describe the repo to the agent:
- Kind: The type of infrastructure repo (e.g., "Ansible infrastructure", "Docker images", "Helm charts").
- Capabilities: What the repo provides to the agent (e.g.,
service-discovery,redeployment,image-inspection), including which tiers are required. - Rules: Constraints the agent MUST follow when interacting with this repo (e.g., "never modify files", "playbooks require Tier 3", "always use
--limit").
The agent MUST respect all rules declared in a manifest throughout the monitoring cycle across all tiers.
Scenario: Manifest declares tier-restricted capabilities
Given a repo's CLAUDE-OPS.md declares the capability redeployment with the note "Only Opus (Tier 3) should run playbooks"
When a Tier 2 agent encounters a failure that could be resolved by redeployment
Then the Tier 2 agent MUST NOT execute redeployment playbooks from this repo and MUST escalate to Tier 3
Scenario: Manifest declares operational rules
Given a repo's CLAUDE-OPS.md includes the rule "Always use --limit when running playbooks to target specific hosts"
When a Tier 3 agent runs a playbook from this repo
Then the agent includes the --limit flag targeting specific hosts
Scenario: Manifest declares read-only rule
Given a repo's CLAUDE-OPS.md includes the rule "Never modify any files in this repo"
When the agent processes this repo at any tier
Then the agent MUST NOT write to, delete, or modify any files within the repo's directory tree
REQ-4: Extension Directory Discovery
The system MUST check for a .claude-ops/ directory at the root of each discovered repo. If present, the system MUST scan its contents for the following subdirectories: checks/, playbooks/, skills/, and an mcp.json file.
Scenario: Repo with full extension directory
Given a repo at /repos/infra-ansible contains a .claude-ops/ directory
And the directory contains checks/, playbooks/, skills/, and mcp.json
When the agent scans for extensions
Then it discovers and processes all four extension types
Scenario: Repo with partial extension directory
Given a repo at /repos/docker-images contains a .claude-ops/ directory
And the directory contains only checks/
When the agent scans for extensions
Then it discovers the custom checks and does not error on missing playbooks/, skills/, or mcp.json
Scenario: Repo without extension directory
Given a repo at /repos/simple-repo does not contain a .claude-ops/ directory
When the agent scans for extensions
Then it skips extension scanning for this repo without error
REQ-5: Custom Health Checks
Custom health check files placed in .claude-ops/checks/ MUST be discovered and executed alongside the built-in health checks during Tier 1 observation. Each check file MUST be a markdown document describing what to check and how. The agent reads the check description and executes the appropriate commands.
Custom checks from all mounted repos MUST be combined into a single health check routine. If two repos define checks for the same service or concern, both MUST run.
Scenario: Custom check discovered and executed
Given a repo at /repos/infra-ansible has .claude-ops/checks/verify-backups.md
And the file describes checking for backup freshness in /data/backups/
When the Tier 1 agent runs health checks
Then it executes the backup freshness check described in the file alongside the built-in checks
Scenario: Multiple repos contribute checks
Given repo infra-ansible has .claude-ops/checks/verify-backups.md
And repo docker-images has .claude-ops/checks/check-base-images.md
When the Tier 1 agent runs health checks
Then both custom checks run alongside built-in checks from checks/
Scenario: Custom check format
Given a check file at .claude-ops/checks/verify-backups.md
Then the file SHOULD contain: a title, a description of what to check and how, criteria for healthy state, and criteria for unhealthy state
REQ-6: Custom Playbooks
Playbook files placed in .claude-ops/playbooks/ MUST be discovered and made available for remediation. Custom playbooks MUST follow the same tier permission model as built-in playbooks: a playbook that requires actions beyond the current tier's permissions MUST NOT be executed and MUST trigger escalation.
Scenario: Custom playbook available to correct tier
Given a repo has .claude-ops/playbooks/fix-media-perms.md that describes fixing file ownership (a Tier 2 action)
When a Tier 2 agent encounters a media permissions issue
Then the agent MAY use the custom playbook to remediate
Scenario: Custom playbook requires higher tier
Given a repo has .claude-ops/playbooks/redeploy-service.md that describes running Ansible playbooks (a Tier 3 action)
When a Tier 2 agent encounters a failure that matches this playbook
Then the Tier 2 agent MUST NOT execute the playbook and MUST escalate to Tier 3
Scenario: Custom playbook supplements built-in playbooks
Given the built-in playbooks include restart-container.md
And a repo provides .claude-ops/playbooks/restart-with-custom-config.md for its specific service
When the agent needs to remediate a failure for that service
Then both the built-in and custom playbooks are available for the agent to choose from
REQ-7: Custom Skills
Skill files placed in .claude-ops/skills/ MUST be discovered and made available to the agent. Skills are freeform capabilities (maintenance tasks, reporting, cleanup operations) and MUST follow the same tier permission model as checks and playbooks.
Scenario: Custom skill discovered
Given a repo has .claude-ops/skills/prune-old-logs.md describing a log cleanup procedure
When the agent scans for extensions
Then the skill is available for use when appropriate during the monitoring cycle
Scenario: Skill respects tier permissions
Given a custom skill describes an operation requiring docker restart (a Tier 2 action)
When a Tier 1 agent processes this skill
Then the Tier 1 agent MUST NOT execute the skill
REQ-8: Fallback Discovery for Repos Without Extensions
If a mounted repo has neither a CLAUDE-OPS.md manifest nor a .claude-ops/ directory, the agent SHOULD attempt to infer the repo's purpose by reading top-level files such as README.md, examining the directory structure, and inspecting configuration files.
Scenario: Repo without any Claude Ops files
Given a repo at /repos/legacy-scripts has no CLAUDE-OPS.md or .claude-ops/ directory
And it contains a README.md and several shell scripts
When the agent scans this repo
Then it reads the README.md and directory listing to infer the repo's purpose and notes what it found
Scenario: Graceful degradation
Given a repo cannot be fully understood through file inference When the agent completes scanning Then it records that the repo was discovered but has limited operational context, without treating this as an error
REQ-9: MCP Configuration Merging
Repos MAY provide additional MCP server definitions in .claude-ops/mcp.json. The entrypoint script MUST merge these into the baseline MCP configuration (/app/.claude/mcp.json) before each monitoring cycle. The merge MUST use additive semantics: repo-defined MCP servers are added to the baseline set.
If a repo defines an MCP server with the same name as a baseline server, the repo version MUST override the baseline version.
If multiple repos define an MCP server with the same name, the merge order MUST be deterministic. Repos MUST be processed in alphabetical order by directory name, with later repos overriding earlier ones on name collision.
Scenario: Repo adds a new MCP server
Given the baseline config defines servers docker, postgres, chrome-devtools, and fetch
And repo infra-ansible provides .claude-ops/mcp.json defining server ansible-inventory
When the entrypoint merges MCP configs
Then the merged config contains all five servers: docker, postgres, chrome-devtools, fetch, and ansible-inventory
Scenario: Repo overrides a baseline MCP server
Given the baseline config defines a postgres server with POSTGRES_CONNECTION set to a default
And repo prod-db provides .claude-ops/mcp.json redefining postgres with a different connection string
When the entrypoint merges MCP configs
Then the merged config uses the postgres definition from prod-db, not the baseline
Scenario: Multiple repos with name collision
Given repo alpha-infra defines MCP server custom-monitor
And repo beta-infra also defines MCP server custom-monitor with different settings
When the entrypoint merges configs in alphabetical order
Then beta-infra's definition of custom-monitor wins because it is processed after alpha-infra
Scenario: Baseline preserved on first run
Given the baseline MCP config exists at /app/.claude/mcp.json
When the entrypoint runs for the first time
Then it saves a copy of the baseline to /app/.claude/mcp.json.baseline before merging
Scenario: Baseline restored each cycle
Given a baseline backup exists at /app/.claude/mcp.json.baseline
When a new monitoring cycle begins
Then the entrypoint restores the baseline before re-merging repo configs, ensuring removed repos or changed configs take effect
REQ-10: Extension Tier Permission Enforcement
All repo-provided extensions (checks, playbooks, skills) MUST operate within the same permission tier model as built-in extensions. The agent MUST NOT execute an operation from a repo extension that exceeds the current tier's permissions, regardless of what the extension instructs.
Scenario: Repo extension attempts privilege escalation
Given a repo provides a check file that instructs the agent to run docker restart (a Tier 2 action)
When a Tier 1 agent reads this check
Then the Tier 1 agent MUST NOT execute the restart command and MUST note the issue for escalation
Scenario: Repo extension within tier permissions
Given a repo provides a check file that instructs the agent to run curl to check an HTTP endpoint
When a Tier 1 agent reads this check
Then the Tier 1 agent executes the curl command, as HTTP checks are within Tier 1 permissions
REQ-11: Read-Only Mount Convention
Repos SHOULD be mounted with the :ro (read-only) Docker volume flag by default. The system MUST NOT require write access to mounted repos under normal operation. If a repo requires write access, this MUST be explicitly documented in the repo's CLAUDE-OPS.md manifest.
Scenario: Standard read-only mount
Given a repo is mounted with the :ro flag
When the agent processes the repo
Then all operations complete successfully without requiring write access to the repo
Scenario: Write access explicitly documented
Given a repo is mounted with :rw
And its CLAUDE-OPS.md documents the reason write access is needed
When the agent processes the repo
Then the agent may use write operations only as described in the manifest
REQ-12: Unified Repo Map
After scanning all repos, the agent MUST build a unified map of discovered repos, their capabilities, extensions, and rules. This map MUST be used throughout the monitoring cycle to inform health check execution, remediation decisions, and escalation context.
Scenario: Unified map constructed
Given repos infra-ansible (capabilities: service-discovery, redeployment), docker-images (capabilities: image-inspection), and helm-charts (capabilities: redeployment) are mounted
When the agent completes repo scanning
Then the unified map contains all three repos with their respective capabilities, custom checks, playbooks, skills, and rules
Scenario: Map used during remediation
Given the unified map indicates that infra-ansible provides redeployment capability restricted to Tier 3
When a Tier 3 agent needs to redeploy a service
Then it consults the map to locate the correct playbook and follows the repo's declared rules
REQ-13: Extension Composability
Extensions from multiple repos MUST compose without conflict. Two repos defining checks for different services MUST both be executed. The system MUST NOT require coordination between repos except for MCP server naming (REQ-9).
Scenario: Independent checks from multiple repos
Given repo infra-ansible provides a check for backup freshness
And repo docker-images provides a check for base image staleness
When the agent runs health checks
Then both checks execute independently without conflict
Scenario: Multiple repos provide playbooks for different services
Given repo web-app provides a playbook for restarting the web service
And repo database provides a playbook for PostgreSQL recovery
When failures are detected in both services
Then the appropriate playbook from each repo is available for the corresponding failure