Concepts¶
How theaios-agent-monitor works under the hood.
The Event Model¶
Everything in agent-monitor starts with an event -- something happening in your agentic system that needs to be recorded and analyzed.
import time
AgentEvent(
timestamp=time.time(), # Required: epoch seconds
agent="sales-agent", # Required: which agent
event_type="action", # Required: what kind of event
data={ # Optional: arbitrary event data
"model": "gpt-4",
"prompt_tokens": 150,
"completion_tokens": 80,
},
cost_usd=0.007, # Optional: cost in USD
latency_ms=350.0, # Optional: latency in ms
session_id="sess-123", # Optional: for session-level kill
user="user@example.com", # Optional: user identifier
tags=["production"], # Optional: tags for filtering
)
Cost and latency are top-level fields on AgentEvent (not inside data). The data dict is freeform -- you put whatever additional fields are relevant. Everything is stored as-is for compliance export and auditing.
Event Types¶
| Event Type | When to record | What it feeds |
|---|---|---|
action |
An agent performs an action (LLM call, tool call, etc.) | action_count, event_count |
guardrail_trigger |
A guardrail evaluates (non-denial: allow, redact, log) | event_count |
denial |
A guardrail denies a request | denial_count, denial_rate |
approval_request |
An action requires human approval | approval_count |
approval_response |
A human responds to an approval request | approval_count |
cost |
An explicit cost record | cost_total, cost_per_minute |
error |
Something goes wrong | error_count |
session_start |
An agent session begins | event_count |
session_end |
An agent session ends | event_count |
The Monitor Pipeline¶
When monitor.record(event) is called, the event flows through seven stages:
Event arrives
|
+-- 1. Kill Switch Check
| Is this agent/session killed?
| YES --> silently drop event, return None
| NO --> continue
|
+-- 2. Agent Track Filter
| Is this agent/event_type tracked by config?
| NO --> silently drop event
| YES --> continue
|
+-- 3. Event Storage
| Append to the EventStore (JSONL on disk)
|
+-- 4. Metrics Computation
| Update the MetricsEngine with the new event
| Compute rolling window metrics
|
+-- 5. Baseline Update
| Feed current metric values into the BaselineTracker
| Update mean and stddev via Welford's algorithm
|
+-- 6. Anomaly Detection
| For each anomaly rule, compute z-score against baseline
| If z-score > threshold --> trigger alert
|
+-- 7. Kill Switch Evaluation
For each kill policy, check metric against threshold
If exceeded --> kill the agent automatically
This entire pipeline runs synchronously and in-process. No external calls. No background threads. No message queues. The pipeline adds microseconds of overhead per event.
Metrics Engine¶
The metrics engine computes rolling-window metrics for each agent independently. Key metrics:
| Metric | How it's computed | Source field |
|---|---|---|
event_count |
Count of events in the window | -- |
action_count |
Count of action events |
event_type |
denial_count |
Count of denial events |
event_type |
denial_rate |
denial_count / (action_count + denial_count) |
event_type |
approval_count |
Count of approval events | event_type |
error_count |
Count of error events |
event_type |
cost_total |
Sum of cost_usd in the window |
cost_usd |
cost_per_minute |
cost_total / (window_seconds / 60) |
cost_usd |
avg_latency_ms |
Mean of latency_ms for events with latency |
latency_ms |
The rolling window is configured by metrics.default_window_seconds (default: 300 seconds). Events older than the window are automatically excluded.
MetricSnapshot¶
@dataclass
class MetricSnapshot:
agent: str
window_seconds: int
timestamp: float
event_count: int = 0
action_count: int = 0
denial_count: int = 0
denial_rate: float = 0.0
approval_count: int = 0
approval_rate: float = 0.0
error_count: int = 0
cost_total: float = 0.0
cost_per_minute: float = 0.0
avg_latency_ms: float = 0.0
Baselines (Welford's Algorithm)¶
The baseline tracker uses Welford's online algorithm to maintain running mean and standard deviation for each metric, per agent. This is an incremental algorithm -- it doesn't store historical values, just the running statistics.
After each metric computation, the current value is fed into the baseline:
update("sales-agent", "cost_per_minute", 0.03)
--> count += 1
--> delta = value - mean
--> mean += delta / count
--> M2 += delta * (value - mean)
--> variance = M2 / count
--> stddev = sqrt(variance)
The z-score for any new value is:
A z-score of 3.0 means the value is 3 standard deviations above the mean -- a strong signal of anomalous behavior.
Min samples
Baselines require min_samples data points before z-scores are computed. This prevents false alerts during cold start. Default: 30 samples.
Anomaly Detection¶
Anomaly rules define when to trigger alerts. Each rule specifies:
- metric -- which metric to monitor
- z_threshold -- how many standard deviations before alerting
- severity -- alert severity (critical, high, medium, low)
- cooldown_seconds -- minimum time between repeated alerts for the same rule
anomaly_detection:
enabled: true
rules:
- name: cost-spike
metric: cost_per_minute
z_threshold: 2.5
severity: critical
cooldown_seconds: 600
When a metric's z-score exceeds the threshold, the detector:
- Creates an alert with the rule name, metric value, z-score, and severity
- Dispatches the alert to all configured channels
- Records the alert time for cooldown tracking
The cooldown prevents alert storms. If cooldown_seconds: 600, the same rule won't fire again for 10 minutes even if the anomaly persists.
Kill Switches¶
Kill switches are the most important safety mechanism. They provide three levels of control:
| Level | Method | What it does |
|---|---|---|
| Agent | kill_agent(name, reason) |
Blocks all events for a specific agent |
| Session | kill_session(session_id) |
Blocks all events for a specific session |
| Global | kill_global(reason) |
Blocks all events for all agents |
When an agent is killed, monitor.record() silently drops the event (returns None). The event is not processed. This is the fastest possible circuit breaker -- it runs before any metrics computation.
Auto-Kill Policies¶
Kill policies evaluate after every metric snapshot. If a metric exceeds the threshold, the corresponding action fires automatically:
kill_switch:
enabled: true
policies:
- name: auto-kill-on-high-cost
metric: cost_per_minute
operator: ">"
threshold: 5.0
action: kill_agent
severity: critical
Actions: kill_agent, kill_session, kill_global.
Persistence¶
Kill state can be persisted to disk. On restart, the monitor loads the saved state -- agents that were killed stay killed until explicitly revived.
Alert Channels¶
Alerts are dispatched to one or more channels:
| Channel | Output | Use case |
|---|---|---|
console |
stderr | Development, debugging |
file |
JSONL file | Production logging, audit trail |
webhook |
HTTP POST | PagerDuty, Slack, OpsGenie |
All channels receive the same alert payload:
{
"timestamp": "2026-03-28T14:23:01.123Z",
"rule": "cost-spike",
"agent": "sales-agent",
"severity": "critical",
"message": "cost_per_minute z-score 4.2 exceeds threshold 2.5",
"metric_value": 2.5,
"z_score": 4.2
}
Compliance Export¶
The compliance exporter generates reports from the event store. Three formats are supported:
| Format | Purpose | Fields |
|---|---|---|
soc2 |
SOC 2 Type II audits | Events, summary, access controls, guardrail enforcement |
gdpr |
GDPR data processing records | Events, data subjects, processing activities |
json |
Generic machine-readable export | Events, filters, total count |
Reports can be filtered by agent, time range, and event type.
Performance¶
The monitor is designed for inline use -- it runs in the same process as your agent.
| Metric | Value |
|---|---|
| Record + metrics computation | <0.1ms per event |
| Baseline update | <0.01ms per metric |
| Anomaly detection | <0.01ms per rule |
| Memory per event | ~200 bytes |
| Dependencies | 3 (pyyaml, click, rich) |
This is fast because there are no external calls, no serialization overhead, and no background threads. Everything is a pure in-memory computation.