theaios-agent-monitor¶

Governance-first observability for AI agents -- real-time metrics, anomaly detection, kill switches, compliance export.

theaios-agent-monitor is a monitoring engine that lets you observe, baseline, and control AI agent behavior with YAML configs. Record events, compute real-time metrics, detect anomalies via z-score baselines, trigger automatic kill switches, and export compliance reports. No external services. No vendor lock-in.

Why Governance-First?¶

Because monitoring AI agents is not the same as monitoring web apps. The failure modes are different:

An agent that starts denying everything is broken -- even if the server is up
A cost spike from $0.01/min to $5/min is invisible to standard APM tools
A sudden change in denial patterns means your guardrails are either catching a real attack or misconfigured
When something goes wrong, you need to kill the agent instantly -- not wait for an on-call rotation

theaios-agent-monitor is built for these scenarios. Traditional observability tools (Datadog, Grafana, LangSmith) collect data. This library collects data and acts on it -- anomaly detection, automatic kill switches, compliance export.

What It Does¶

Agent event (action, guardrail_trigger, denial, approval_request, cost, error, ...)
    |
    v
EventStore (append-only JSONL log)
    |
    v
MetricsEngine (rolling window: event_count, denial_rate, cost/min, latency)
    |
    v
BaselineTracker (Welford's algorithm: mean, stddev, z-score)
    |
    v
AnomalyDetector (z-score threshold rules with cooldown)
    |
    v
KillSwitch (manual or auto kill/revive, persistence)
    |
    v
AlertDispatcher (console, file, webhook)
    |
    v
ComplianceExporter (SOC 2, GDPR, JSON)

Every event is recorded. Every metric is computed in real time. Every anomaly is detectable. Every agent is killable.

Quick Start¶

pip install theaios-agent-monitor

# monitor.yaml
version: "1.0"
metadata:
  name: my-monitor

metrics:
  default_window_seconds: 300

kill_switch:
  enabled: true
  policies:
    - name: auto-kill-on-high-cost
      metric: cost_per_minute
      operator: ">"
      threshold: 5.0
      action: kill_agent
      severity: critical

alerts:
  channels:
    - type: console

import time
from theaios.agent_monitor import Monitor, load_config, AgentEvent

monitor = Monitor(load_config("monitor.yaml"))

monitor.record(AgentEvent(
    timestamp=time.time(), event_type="action", agent="sales-agent",
    cost_usd=0.007, latency_ms=350.0,
    data={"model": "gpt-4"},
))

snap = monitor.get_metrics("sales-agent")
print(f"Events: {snap.event_count}, Cost/min: ${snap.cost_per_minute:.4f}")

Documentation¶

Page	What you'll learn
Concepts	Monitor pipeline, event model, metrics, baselines, anomaly detection, kill switches
Config Syntax	Complete YAML reference for every field
Events	Event types, how to record, EventStore
Metrics & Baselines	Metrics engine, Welford's algorithm, z-score
Kill Switches	Kill/revive, auto-kill policies, persistence
Compliance	SOC 2, GDPR, JSON export
Integration	Guardrails adapter, OpenTelemetry, custom adapters
CLI Reference	`agent-monitor version`, `validate`, `inspect`, `status`, `events`, `kill`, `revive`, `export`
Python API	`Monitor`, `load_config`, `AgentEvent`, all data types
AI Config Generator	Copy-paste prompts for generating monitor.yaml with any LLM

Part of the theaios Ecosystem¶

theaios-agent-monitor is one of the theaios platform components. It works standalone or alongside:

theaios-guardrails -- declarative guardrails for AI agent governance
theaios-context-router -- intelligent context routing for AI agents