Design: Upstream Model Auto-Discovery
Context
Claude Ops invokes the Claude Code CLI as a subprocess for each escalation tier and
passes the per-tier model via --model (SPEC-0010, ADR-0010). When the operator points
Claude Ops at an upstream OpenAI/Anthropic-compatible gateway (LiteLLM) by setting the
ANTHROPIC_BASE_URL environment variable (see docker-compose.yaml, .env.example,
and README), the CLI's API traffic is routed through that gateway, which can in turn
route to many backing models.
Currently the tier model fields — tier1_model, tier2_model, tier3_model — are
free-text strings in internal/config/config.go (Config struct, loaded via viper).
They are read and written through:
- The REST API:
handleAPIGetConfig/handleAPIUpdateConfigininternal/web/api_handlers.go, with DTOsAPIConfig/APIUpdateConfigRequestininternal/web/api_types.go(SPEC-0017 REQ-12/REQ-13). - The HTML config page:
handleConfigGet/handleConfigPostininternal/web/handlers.go, renderingtemplates/config.html(SPEC-0008 REQ-10).
Operators must know exact model identifiers to type them in. LiteLLM (and any
OpenAI-compatible gateway) exposes GET /v1/models listing the routable models, which we
can query to populate a selectable list.
A critical disambiguation: Claude Ops ALSO serves its own OpenAI-compatible
GET /v1/models endpoint (handleModels in internal/web/chat_handler.go, SPEC-0024 /
ADR-0020) advertising the synthetic claude-ops, claude-ops-tier1/2/3 identifiers used
by chat clients. That is a SERVED endpoint for inbound chat clients. This capability adds
a CONSUMED query against the operator's UPSTREAM gateway. They are unrelated and must not
be merged.
This spec depends on SPEC-0008 (configuration) and SPEC-0017 (REST API config endpoints).
Goals / Non-Goals
Goals
- Discover the models a configured upstream gateway can route to, via
${ANTHROPIC_BASE_URL}/v1/models. - Expose the discovered list through the config API and the dashboard config page so operators pick from a list instead of typing blind.
- Let operators assign any discovered (or manually typed) model to any tier.
- Cache discovered models with a TTL plus on-demand refresh; keep upstream load low.
- Fail safe: discovery problems never break config rendering, retrieval, or saving.
Non-Goals
- Validating that a chosen model actually works end-to-end (the gateway/CLI surface that at runtime; an invalid model fails at invocation, as it does today).
- Tracking the upstream base URL or credential as new managed config fields — they remain process environment variables consumed at request time. (A future spec may promote them to managed config.)
- Changing Claude Ops' own served
/v1/modelsendpoint (SPEC-0024). - Per-model metadata beyond the identifier (context window, pricing, capabilities) — out
of scope for this iteration; only
idis consumed.
Decisions
Discovery source is the upstream gateway, read from the environment
Choice: Build the discovery target from ANTHROPIC_BASE_URL (env var), querying the
gateway's models-list endpoint (${ANTHROPIC_BASE_URL}/v1/models), authenticating with
ANTHROPIC_API_KEY.
Rationale: This is exactly where the operator already configures the gateway the CLI
routes through; reusing it guarantees the discovered list matches what the tiers will
actually hit. No new configuration surface is required.
Alternatives considered:
- Promote base URL / key into the
Configstruct now: rejected — larger change, and the env vars are the established source of truth consumed by the CLI subprocess; promoting them is a separable decision. - Query Claude Ops' own
/v1/models: rejected — that returns synthetic tier IDs, not the upstream model inventory; it would be circular and wrong.
Use the Anthropic Go SDK for the models query (not a hand-rolled HTTP client)
Choice: Fetch the model list via the official Anthropic Go SDK
(github.com/anthropics/anthropic-sdk-go, already a project dependency) configured with
option.WithBaseURL(ANTHROPIC_BASE_URL), option.WithAPIKey(ANTHROPIC_API_KEY), a
bounded option.WithHTTPClient + option.WithRequestTimeout, and option.WithMaxRetries(0),
calling client.Models.ListAutoPaging.
Rationale: Per reviewer request, use a maintained SDK rather than hand-rolling the
HTTP call. The gateway is Anthropic-compatible (it is configured through
ANTHROPIC_BASE_URL/ANTHROPIC_API_KEY and is the same endpoint the CLI subprocess
hits), and the Anthropic SDK is already vendored, so this adds no new dependency and
matches the client family used elsewhere. The SDK authenticates via the x-api-key
header (which LiteLLM accepts for Anthropic-compatible routing) rather than
Authorization: Bearer. Body-size bounding is preserved by wrapping the injected
http.Client's transport with a LimitReader-based round-tripper.
Alternatives considered:
- The OpenAI Go SDK (
github.com/openai/openai-go): rejected — it would add a second, redundant LLM SDK for a single list call when the Anthropic SDK is already present and the endpoint is reached as an Anthropic-compatible gateway. - Keep the hand-rolled
http.Client+ manual JSON parsing: rejected per reviewer feedback in favor of the maintained SDK.
Dedicated discovery component with a cached, single-flight client
Choice: Introduce a small discovery component (internal/models) that owns a bounded
http.Client (injected into the SDK), a mutex-guarded cache holding
(modelIDs, lastRefreshed, available), and single-flight refresh.
Rationale: Centralizes timeout, auth, parsing, caching, and concurrency safety in one
testable place; both the API handler and the HTML handler consume the same cache. Single
-flight prevents refresh stampedes against the gateway.
Alternatives considered:
- Query on every config render: rejected — adds latency and upstream load to every page view and API read.
- Background poller only (no lazy refresh): rejected — wastes upstream calls when the config page is rarely viewed; lazy-on-access with TTL is simpler and sufficient. A background refresh MAY be layered in later without changing the contract.
New API endpoints rather than folding into GET /api/v1/config
Choice: Add GET /api/v1/models/available (with ?refresh=true) and
POST /api/v1/models/available/refresh, separate from GET /api/v1/config.
Rationale: Keeps the config payload stable (SPEC-0017 consumers unaffected), lets the
discovered list and its freshness metadata evolve independently, and gives the UI a clean
endpoint to refresh without re-fetching all config. Discovery latency/failure is isolated
from the config read path. Handlers should follow the existing handleAPI* naming
convention in internal/web/api_handlers.go (e.g. handleAPIModelsAvailable).
Note on auth baseline: internal/web/server.go currently registers /api/v1/* routes
with no auth/security-header/CSRF middleware, so the new endpoints inherit the same
(currently unauthenticated) baseline as /api/v1/config. The spec's security
requirements define the target regime; they do not assume reusable middleware exists
today. Establishing that middleware project-wide is a separable effort.
Alternatives considered:
- Embed
available_modelsinAPIConfig: rejected — couples config reads to upstream latency and bloats a stable DTO; a failed discovery shouldn't perturb config reads.
Free-text fallback is always permitted
Choice: Tier model fields remain free-text-capable; discovery only augments the UI
with a dropdown and the API with a list. Any value (discovered or not) is accepted and
persisted.
Rationale: Preserves existing behavior, supports gateways that under-report models,
private/aliased models, and air-gapped setups with no ANTHROPIC_BASE_URL. Discovery is
an aid, never a gate.
Graceful degradation returns 200 with discovery_available: false
Choice: When discovery can't run (no base URL, timeout, non-2xx, parse error), the
available-models endpoint returns 200 with an empty list and
discovery_available: false instead of an error status; the HTML page falls back to
free-text inputs.
Rationale: Discovery is best-effort metadata. Returning an error status would push
failure handling onto every consumer and risk breaking the config page. A clear flag lets
clients decide how to present the fallback.
Architecture
The web Server gains a model-discovery dependency. The API and HTML config handlers
read the cached discovered list through it; the cache lazily refreshes from the upstream
gateway on TTL expiry or explicit refresh.
Refresh / cache decision flow for a single access:
Risks / Trade-offs
- Upstream gateway slow or down stalls config UX → bounded HTTP timeout (default ~5s), cache serves prior results, and degradation returns immediately with a fallback flag.
- Refresh stampede floods the gateway → single-flight collapses concurrent refreshes to one in-flight upstream call; explicit refresh is the only TTL bypass.
- Leaking the upstream API key → key is read from the environment only, passed solely
to the SDK (which sends it in the
x-api-keyheader), and excluded from logs, errors, API payloads, and rendered HTML; structured logging avoids interpolating secrets. - Discovered list diverges from what the CLI can actually use → discovery is advisory; free-text assignment is always allowed and the runtime CLI invocation is unchanged, so an unusable model fails exactly as it does today.
- Oversized/malicious upstream body → response read is bounded to a maximum size; overflow is treated as a parse failure (discovery unavailable).
- New endpoints widen attack surface → both require the same auth as
/api/v1/config; the upstream target is derived only fromANTHROPIC_BASE_URL, never from request input.
Migration Plan
Additive; no data migration. Deployment steps:
- Ship the discovery component, the two new API endpoints, and the config-page dropdown
with refresh control. Behavior is unchanged when
ANTHROPIC_BASE_URLis unset. - Operators with a gateway configured see populated dropdowns automatically on next config-page load; no action required.
- Rollback is safe: removing the feature reverts the config page to free-text inputs and drops the new endpoints. Persisted tier model values (free-text strings) are unaffected since the storage format is unchanged.
Open Questions
- Should the upstream base URL and key eventually become managed
Configfields (so they can be changed from the dashboard without a restart), rather than env-only? Deferred to a follow-up spec. - Should a background refresh run on the existing run interval to keep the cache warm, or is lazy-on-access sufficient? Lazy is the initial choice; revisit if operators report staleness.
- Should discovery surface per-model metadata (context window, pricing) when the gateway provides it, to aid tier selection? Out of scope for this iteration.