ADR-0014: Real-Time Dashboard and Events System

Context and Problem Statement

The dashboard currently has two problems:

Static pages: The sessions list and overview page are rendered once on load. When a session starts, completes, or changes status, the user must manually refresh to see updates. For a monitoring dashboard, this is unacceptable.
Services section is misplaced: The Services page duplicates functionality that belongs in external monitoring tools (Uptime Kuma, Prometheus, etc.). Claude Ops should not be a service status dashboard — it's an autonomous agent. What the operator actually needs is a feed of notable events the agent discovers: "Jellyfin's database is corrupted," "Restarted nginx after config drift," "DNS resolution failing for 3 services." These are the agent's observations and actions, surfaced as a first-class concept.

How should the dashboard become real-time, and what should replace the Services section?

Decision Drivers

The operator should see session status changes without refreshing the page
The agent should be able to surface notable findings as discrete, severity-tagged events
Events should flow in real-time — when the agent finds something during a session, it appears immediately in the dashboard
The solution must work with the existing HTMX + server-rendered HTML architecture (no client-side JS framework)
Events need persistence (SQLite) so they survive page reloads and are browsable historically
The approach should be simple enough that a prompt instruction is all the LLM needs to emit events

Considered Options

Option A: HTMX polling + parsed event markers — Use hx-trigger="every Ns" for page sections. Parse a structured marker format ([EVENT:level] message) from the LLM's assistant text during stream-json processing. Insert parsed events into a new events table.
Option B: SSE-driven page updates — Create a global SSE endpoint that pushes HTML fragments for sessions list, overview, and events feed. Use htmx-ext-sse with sse-swap to update page sections.
Option C: MCP tool for events — Expose an MCP server with a create_event tool that the LLM calls directly to insert events into the database.

Decision Outcome

Chosen option: Option A (HTMX polling + parsed event markers), because it requires zero new infrastructure (no new SSE endpoints, no MCP server), works reliably with HTMX's built-in polling, and the event parsing piggybacks on the existing stream-json processing pipeline. The LLM just writes a line in a known format — no tool calls needed.

Consequences

Good, because HTMX polling is trivial to add (hx-trigger="every 5s") and works with the existing server-rendered architecture
Good, because event parsing integrates into the existing FormatStreamEvent pipeline — no new goroutines or channels
Good, because the LLM only needs a prompt instruction to emit events; no new tools or MCP configuration
Good, because removing Services simplifies the codebase and the nav
Bad, because polling adds server load proportional to connected clients x poll frequency (acceptable for single-user dashboard)
Bad, because parsed markers can produce false positives if the LLM accidentally outputs the pattern (mitigated by requiring strict format and ignoring tool result blocks)

Confirmation

Sessions list updates within 5 seconds when a session starts or completes
Overview page reflects the current session state without manual refresh
Events appear in the feed within seconds of the LLM writing them during a session
Removing Services does not break any existing functionality

Pros and Cons of the Options

Option A: HTMX Polling + Parsed Event Markers

Sessions list and overview sections add hx-trigger="every 5s" with hx-get targeting themselves. The server re-renders the HTML fragment and HTMX swaps it in. For events, the stream-json parser scans assistant text blocks for lines matching [EVENT:info|warning|critical] message. Matched events are inserted into an events table with session_id, level, message, service (optional), and timestamp.

Good, because HTMX polling is 2 attributes — zero JavaScript, zero new endpoints
Good, because event parsing is ~20 lines added to the existing FormatStreamEvent function
Good, because the events table is a simple append-only log, trivial to query and paginate
Good, because the LLM needs only a prompt section explaining the [EVENT:...] format
Neutral, because polling interval trades off freshness vs server load (5s is a good default)
Bad, because polling re-renders the full fragment each time (but fragments are small HTML tables)

Option B: SSE-Driven Page Updates

A new global SSE endpoint (/stream/dashboard) pushes HTML fragments when state changes. The hub would need a new "dashboard" channel that the session manager publishes to when sessions start, complete, or emit events.

Good, because updates are instant — no polling delay
Good, because it uses the existing SSE infrastructure (hub pattern)
Bad, because it requires a new SSE endpoint and a new publish channel type
Bad, because managing SSE subscriptions for page sections is complex (what if the user navigates between pages?)
Bad, because HTMX SSE swap (sse-swap) requires specific event naming and fragment targeting that gets fragile with multiple dynamic sections
Bad, because it's significantly more complex than polling for marginal latency improvement on a single-user dashboard

Option C: MCP Tool for Events

Expose an MCP server endpoint that provides a create_event tool. The LLM calls it during sessions to insert events. The dashboard polls or uses SSE to display them.

Good, because it gives the LLM explicit control over event creation (structured input, error handling)
Good, because events are guaranteed to be well-formed (validated by the tool handler)
Bad, because it requires running an MCP server alongside the main process
Bad, because it adds a dependency between the Claude CLI session and a local service
Bad, because MCP tools add latency to each event creation (tool call round-trip)
Bad, because it requires MCP configuration changes and is harder to test locally

Architecture Diagram

More Information

Event marker format: [EVENT:info|warning|critical] message — only parsed from assistant text blocks (not tool results, not user messages). Optional service tag: [EVENT:warning:jellyfin] message.
Event levels: info (routine observation), warning (degraded but not down), critical (needs human attention immediately).
Events table schema: id, session_id, level, service, message, created_at. Simple append-only log.
Prompt integration: Add an "Event Reporting" section to the tier prompts explaining the format. The agent is instructed to emit events for: service state changes, remediation actions taken, cooldown limit hits, and anything requiring human attention.
Removing Services: Drop /services, /services/{name} routes, handlers, templates, and the Services nav link. The health_checks table stays (the agent still writes to it internally) but it's not exposed in the UI.
Overview redesign: Replace the services grid on the overview page with the latest events feed and a running session indicator.
Related: ADR-0008 (web dashboard), ADR-0011 (session CLI output), ADR-0013 (ad-hoc sessions)

Context and Problem Statement​

Decision Drivers​

Considered Options​

Decision Outcome​

Consequences​

Confirmation​

Pros and Cons of the Options​

Option A: HTMX Polling + Parsed Event Markers​

Option B: SSE-Driven Page Updates​

Option C: MCP Tool for Events​

Architecture Diagram​

More Information​