Skip to content

theaios-agent-monitor

Governance-first observability for AI agents -- real-time metrics, anomaly detection, kill switches, compliance export.

theaios-agent-monitor is a monitoring engine that lets you observe, baseline, and control AI agent behavior with YAML configs. Record events, compute real-time metrics, detect anomalies via z-score baselines, trigger automatic kill switches, and export compliance reports. No external services. No vendor lock-in.

Why Governance-First?

Because monitoring AI agents is not the same as monitoring web apps. The failure modes are different:

  • An agent that starts denying everything is broken -- even if the server is up
  • A cost spike from $0.01/min to $5/min is invisible to standard APM tools
  • A sudden change in denial patterns means your guardrails are either catching a real attack or misconfigured
  • When something goes wrong, you need to kill the agent instantly -- not wait for an on-call rotation

theaios-agent-monitor is built for these scenarios. Traditional observability tools (Datadog, Grafana, LangSmith) collect data. This library collects data and acts on it -- anomaly detection, automatic kill switches, compliance export.

What It Does

Agent event (action, guardrail_trigger, denial, approval_request, cost, error, ...)
    |
    v
EventStore (append-only JSONL log)
    |
    v
MetricsEngine (rolling window: event_count, denial_rate, cost/min, latency)
    |
    v
BaselineTracker (Welford's algorithm: mean, stddev, z-score)
    |
    v
AnomalyDetector (z-score threshold rules with cooldown)
    |
    v
KillSwitch (manual or auto kill/revive, persistence)
    |
    v
AlertDispatcher (console, file, webhook)
    |
    v
ComplianceExporter (SOC 2, GDPR, JSON)

Every event is recorded. Every metric is computed in real time. Every anomaly is detectable. Every agent is killable.

Quick Start

pip install theaios-agent-monitor
# monitor.yaml
version: "1.0"
metadata:
  name: my-monitor

metrics:
  default_window_seconds: 300

kill_switch:
  enabled: true
  policies:
    - name: auto-kill-on-high-cost
      metric: cost_per_minute
      operator: ">"
      threshold: 5.0
      action: kill_agent
      severity: critical

alerts:
  channels:
    - type: console
import time
from theaios.agent_monitor import Monitor, load_config, AgentEvent

monitor = Monitor(load_config("monitor.yaml"))

monitor.record(AgentEvent(
    timestamp=time.time(), event_type="action", agent="sales-agent",
    cost_usd=0.007, latency_ms=350.0,
    data={"model": "gpt-4"},
))

snap = monitor.get_metrics("sales-agent")
print(f"Events: {snap.event_count}, Cost/min: ${snap.cost_per_minute:.4f}")

Documentation

Page What you'll learn
Concepts Monitor pipeline, event model, metrics, baselines, anomaly detection, kill switches
Config Syntax Complete YAML reference for every field
Events Event types, how to record, EventStore
Metrics & Baselines Metrics engine, Welford's algorithm, z-score
Kill Switches Kill/revive, auto-kill policies, persistence
Compliance SOC 2, GDPR, JSON export
Integration Guardrails adapter, OpenTelemetry, custom adapters
CLI Reference agent-monitor version, validate, inspect, status, events, kill, revive, export
Python API Monitor, load_config, AgentEvent, all data types
AI Config Generator Copy-paste prompts for generating monitor.yaml with any LLM

Part of the theaios Ecosystem

theaios-agent-monitor is one of the theaios platform components. It works standalone or alongside: