Configuration Reference¶

A context-router configuration is a single YAML file that defines your sources, routing rules, permissions, and budget. It is the source of truth for how your AI agents receive context.

This page is the complete reference. Read it once, then use it as a lookup.

File Structure¶

Every config file has these top-level sections:

version: "1.0"              # Required. Always "1.0" for now.
metadata:                    # Optional. Who wrote this, when, why.
variables:                   # Optional. Shared values used in route expressions.
sources:                     # Required. Where context lives.
routes:                      # Required. Which sources to consult for which queries.
permissions:                 # Optional. Per-agent access control.
budget:                      # Optional. Token budget and ranking configuration.
cache:                       # Optional. Disk cache settings.

Only version and at least one source + one route are required. Everything else has sensible defaults.

Version¶

version: "1.0"

Always "1.0" for the current release. The engine rejects unknown versions.

Field	Type	Required	Description
`version`	string	Yes	Config format version. Must be `"1.0"`.

Metadata¶

metadata:
  name: acme-corp-context-router
  description: Context routing for ACME's AI agents
  author: engineering@acme.com

Metadata is for humans and tooling. The engine doesn't use it for routing -- it stores it for context-router inspect output and audit purposes.

Field	Type	Default	Description
`name`	string	`""`	Router configuration name. Shows in `context-router inspect`.
`description`	string	`""`	What this configuration governs.
`author`	string	`""`	Who owns this configuration.

Variables¶

Variables are shared values that route expressions reference with $variable_name. They keep your routes DRY and make it easy to update a value in one place.

variables:
  company_name: "ACME Corp"
  engineering_teams: ["platform", "backend", "frontend", "data"]
  max_context_priority: 10

Use variables in when clauses:

routes:
  - name: engineering-docs
    when: "agent in $engineering_teams"
    sources: [eng_docs, api_docs]

Variables can be strings, numbers, booleans, or lists. They are substituted at evaluation time, not at parse time -- so they work correctly with all operators including in.

# This works: checks if agent is in the list
when: "agent in $engineering_teams"

Sources¶

Sources define where context lives. Each source has a name, a type, and type-specific configuration fields.

sources:
  system_prompt:
    type: inline
    content: "You are a helpful engineering assistant."
    priority: 10

  docs:
    type: directory
    path: "./docs"
    patterns: ["**/*.md"]
    exclude_patterns: ["**/drafts/**"]
    tags: [documentation]

Common Source Fields¶

These fields apply to all source types:

Field	Type	Default	Description
`type`	string	(required)	Source type: `inline`, `directory`, `git_repo`, or `http_api`.
`enabled`	bool	`true`	Set to `false` to disable without removing.
`description`	string	`""`	Human-readable description of this source.
`tags`	list	`[]`	Labels for filtering with `context-router inspect --tag`.
`priority`	int	`0`	Priority value. Higher priority sources are fetched first. Used by manual ranking.

Source Type: `inline`¶

Static content embedded directly in the config. Returns a single chunk.

sources:
  system_prompt:
    type: inline
    content: |
      You are a helpful assistant for ACME Corp.
      Be concise and accurate. Cite your sources.
    priority: 10

Field	Type	Default	Required	Description
`content`	string	`""`	Yes	The text content to return. Supports YAML multiline strings.

Source Type: `directory`¶

Reads files from a local directory. Supports glob patterns, recursive traversal, file size limits, encoding configuration, and automatic markdown H2 splitting.

sources:
  knowledge_base:
    type: directory
    path: "./data/knowledge"
    patterns: ["**/*.md", "**/*.txt"]
    exclude_patterns: ["**/archive/**", "**/.git/**"]
    recursive: true
    encoding: utf-8
    max_file_size: 1000000

Field	Type	Default	Required	Description
`path`	string	`""`	Yes	Path to the directory. Relative paths are resolved from the working directory.
`patterns`	list	`["*/"]`	No	Glob patterns for files to include. Matched against relative paths within the directory.
`exclude_patterns`	list	`[]`	No	Glob patterns for files to exclude. Checked after include patterns.
`recursive`	bool	`true`	No	Whether to traverse subdirectories.
`encoding`	string	`"utf-8"`	No	File encoding. Files that fail to decode are skipped.
`max_file_size`	int	`1000000`	No	Maximum file size in bytes. Files exceeding this are skipped.

Markdown splitting: Files with .md or .markdown extensions are automatically split by H2 headings (##). Each section becomes a separate chunk with the heading text as its title. Files without H2 headings are returned as a single chunk.

Metadata: Each chunk from a directory source includes metadata.mtime (file modification timestamp as a float), which enables recency ranking.

Source Type: `git_repo`¶

Reads files from a git repository at a specific ref (branch, tag, or commit). Uses git ls-tree to enumerate files and git show to read content. No working tree modifications -- safe for production use.

sources:
  versioned_docs:
    type: git_repo
    path: "/path/to/repo"
    ref: "main"
    patterns: ["docs/**/*.md"]
    exclude_patterns: ["docs/drafts/**"]
    max_file_size: 500000

Field	Type	Default	Required	Description
`path`	string	`""`	Yes	Path to the git repository root.
`ref`	string	`"HEAD"`	No	Git ref to read from: branch name, tag, or commit SHA.
`patterns`	list	`["*/"]`	No	Glob patterns for files to include.
`exclude_patterns`	list	`[]`	No	Glob patterns for files to exclude.
`max_file_size`	int	`1000000`	No	Maximum file size in bytes (estimated from content length).

Markdown splitting: Same H2-based splitting as the directory source.

Metadata: Each chunk includes metadata.ref (the git ref used).

Requirements: git must be available on the system PATH. Operations time out after 30 seconds (ls-tree) or 10 seconds (show per file).

Source Type: `http_api`¶

Queries a REST API endpoint. Supports GET and POST methods, template substitution for the query text, custom headers, and JSON response parsing with dot-notation path navigation.

sources:
  search_api:
    type: http_api
    url: "https://api.example.com/search?q={{query}}"
    method: GET
    headers:
      Authorization: "Bearer ${API_TOKEN}"
      Accept: "application/json"
    response_path: "results"
    result_text_field: "content"
    result_title_field: "title"

  vector_search:
    type: http_api
    url: "https://api.example.com/embeddings/search"
    method: POST
    headers:
      Authorization: "Bearer ${API_TOKEN}"
    body_template: '{"query": "{{query}}", "top_k": 5}'
    response_path: "matches"
    result_text_field: "text"
    result_title_field: "title"

Field	Type	Default	Required	Description
`url`	string	`""`	Yes	API endpoint URL. `{{query}}` is replaced with the query text.
`method`	string	`"GET"`	No	HTTP method: `GET` or `POST`.
`headers`	dict	`{}`	No	HTTP headers. Values support `${ENV_VAR}` interpolation.
`body_template`	string	`""`	No	Request body for POST requests. `{{query}}` is replaced with the query text.
`response_path`	string	`""`	No	Dot-notation path to navigate the JSON response to the results array. Empty means the root.
`result_text_field`	string	`"text"`	No	Field name in each result object containing the text content.
`result_title_field`	string	`"title"`	No	Field name in each result object containing the title.

Template substitution: The {{query}} placeholder in url and body_template is replaced with the raw query text at request time.

Response parsing:

The JSON response is navigated using response_path (e.g., "results" navigates to data["results"], "data.items" navigates to data["data"]["items"])
The result is normalized to a list (a single dict becomes [dict])
Each item's result_text_field and result_title_field are extracted as chunk content and title
Non-JSON responses are returned as a single chunk

Metadata: Each chunk includes metadata.url (the source URL).

Timeouts: HTTP requests time out after 30 seconds.

Routes¶

Routes map query conditions to sources. They answer: "For this query, which sources should we consult?"

routes:
  - name: default
    when: ""
    sources: [system_prompt, docs]
    description: "Always include system prompt and general docs"

  - name: policy-questions
    when: 'text contains "policy" or text contains "compliance"'
    sources: [compliance_docs]
    description: "Add compliance docs for policy-related queries"
    tags: [compliance]

  - name: engineering
    when: 'agent == "eng-assistant"'
    sources: [api_docs, code_examples]
    tags: [engineering]

Route Fields¶

Field	Type	Required	Default	Description
`name`	string	Yes	--	Unique identifier. Shows in `response.matched_routes` and `context-router inspect`.
`sources`	list	Yes	--	Source names to consult when this route matches. Must reference defined sources.
`when`	string	No	`""`	Condition expression. Empty = always matches. See Expression Language.
`description`	string	No	`""`	Human-readable description.
`enabled`	bool	No	`true`	Set to `false` to disable without deleting.
`tags`	list	No	`[]`	Labels for filtering and reporting.

How Routes Are Evaluated¶

Routes are evaluated top-to-bottom in declaration order
All matching routes contribute sources -- this is not first-match-wins
Source lists from matching routes are merged and deduplicated
An empty when clause ("") always matches
Disabled routes (enabled: false) are skipped entirely

This means you can have a catch-all default route plus specialized routes that add extra sources for specific query types.

The `when` Clause¶

The when clause uses a safe expression language to evaluate conditions against the query. The primary field for most conditions is text (the query text).

Available context fields:

Field	Source	Example
`text`	`query.text`	`text contains "policy"`
`agent`	`query.agent`	`agent == "hr-bot"`
`tags`	`query.tags`	`"onboarding" in tags`
Any key	`query.metadata`	`department == "engineering"`
`$variable`	`config.variables`	`agent in $allowed_agents`

See the Expression Language page for the full syntax reference.

Permissions¶

Permissions control which agents can access which sources. They are evaluated after route matching -- even if a route matches, the agent might not have access to all of the route's sources.

permissions:
  - agent: "*"
    default: allow

  - agent: "intern-bot"
    allow_sources: [public_docs]
    deny_sources: [internal_docs, financial_data]
    deny_paths: ["**/confidential/**"]
    default: deny

  - agent: "eng-assistant"
    deny_sources: [hr_docs]
    deny_paths: ["**/salaries/**"]

Permission Fields¶

Field	Type	Default	Description
`agent`	string	`"*"`	Agent identifier to match. `"*"` matches all agents.
`allow_sources`	list	`[]`	Sources this agent is explicitly allowed to access.
`deny_sources`	list	`[]`	Sources this agent is explicitly denied from accessing. Deny takes precedence over allow.
`deny_paths`	list	`[]`	Glob patterns for file paths to exclude from results. Applied after fetching.
`default`	string	`"allow"`	Default permission when a source is not in either list: `"allow"` or `"deny"`.

How Permissions Are Resolved¶

Permission rules are evaluated in order
Rules matching the exact agent name and wildcard rules ("*") are merged
deny_sources lists are unioned
allow_sources lists are unioned
deny_paths lists are concatenated
The most restrictive default wins ("deny" beats "allow")
If no rules match the agent, the default is "allow"

For each source, the decision logic is:

If the source is in deny_sources -> denied
If the source is in allow_sources -> allowed
Otherwise -> use the default setting

See Permissions for detailed examples and patterns.

Budget¶

The budget section controls token limits, ranking strategy, and truncation behavior.

budget:
  max_tokens: 8000
  ranking: relevance
  truncation: drop
  estimator: chars_div4
  reserve_tokens: 500

Budget Fields¶

Field	Type	Default	Description
`max_tokens`	int	`8000`	Maximum total tokens for the assembled context. Must be >= 1.
`ranking`	string	`"relevance"`	How to rank chunks: `"relevance"`, `"recency"`, `"manual"`, or `"embedding"`.
`truncation`	string	`"drop"`	What to do when a chunk exceeds the remaining budget: `"drop"`, `"truncate_end"`, or `"truncate_middle"`.
`estimator`	string	`"chars_div4"`	Token estimation method: `"chars_div4"`, `"words"`, or `"whitespace"`.
`reserve_tokens`	int	`0`	Tokens to reserve (subtracted from `max_tokens`). Useful for system prompts injected separately. Must be >= 0.
`embedding`	object	—	Embedding configuration. Required when `ranking` is `"embedding"`. See Token Budgets.

Ranking Strategies¶

Value	Sort key	Description
`relevance`	`relevance_score` descending	Keyword overlap scoring. Free, ~0.6ms, deterministic. Default.
`recency`	`metadata.mtime` descending	Most recently modified files appear first. Best for "what changed" queries.
`manual`	Insertion order	Preserves the order sources are declared. Best when source `priority` controls ordering.
`embedding`	Cosine similarity descending	Semantic similarity via OpenAI embeddings. +10% P@1 vs keyword, ~200ms, ~$0.0002/query. Requires `pip install theaios-context-router[embeddings]`.

Truncation Strategies¶

Value	Behavior	When to use
`drop`	Skip the chunk entirely	Safe default. No partial content that might confuse the LLM.
`truncate_end`	Cut from the end, append `[...]`	When the beginning of a document is most important.
`truncate_middle`	Keep start and end, insert `[...truncated...]` in the middle	When both the beginning (introduction) and end (conclusion) matter.

Token Estimators¶

Value	Method	Accuracy
`chars_div4`	`ceil(len(text) / 4)`	Good approximation for GPT-like tokenizers. Default.
`words`	`len(text.split())`	Rougher estimate, faster.
`whitespace`	`len(text.split())`	Same as `words` (whitespace-based splitting).

See Budget Management for tuning guidance.

Cache¶

The cache section enables disk-based caching of source fetch results.

cache:
  enabled: true
  directory: ".context-router-cache"
  ttl: 300
  max_entries: 1000

Cache Fields¶

Field	Type	Default	Description
`enabled`	bool	`false`	Enable disk caching. When false, every query fetches fresh data.
`directory`	string	`".context-router-cache"`	Directory for cache files. Created automatically if it doesn't exist.
`ttl`	int	`300`	Time-to-live in seconds. Entries older than this are evicted on read. Must be >= 0.
`max_entries`	int	`1000`	Maximum number of cached entries. Oldest entries are evicted when the limit is reached. Must be >= 1.

Cache is keyed by (source_name, query_text) -- the SHA-256 hash of these two values. Each source's results are cached independently.

Environment Variable Interpolation¶

All string values in the config support ${ENV_VAR} interpolation:

sources:
  api:
    type: http_api
    url: "https://${API_HOST}/search"
    headers:
      Authorization: "Bearer ${API_TOKEN}"

If the environment variable is not set, the placeholder is left as-is (no error, no expansion). Interpolation is applied recursively to all string values before parsing.

Complete Enterprise Example¶

A production-ready config for a multi-agent enterprise system:

version: "1.0"
metadata:
  name: acme-corp-context-router
  description: Context routing for ACME Corp's AI agent fleet
  author: engineering@acme.com

variables:
  company_name: "ACME Corp"
  engineering_teams: ["platform", "backend", "frontend", "data", "infra"]
  sensitive_departments: ["finance", "legal", "hr"]

sources:
  # --- Static context ---
  system_prompt:
    type: inline
    content: |
      You are an AI assistant for ACME Corp.
      Be helpful, concise, and always cite your sources.
      Do not share confidential information outside of authorized contexts.
    priority: 10
    tags: [core]

  agent_instructions:
    type: inline
    content: |
      When answering questions:
      1. Start with the most relevant information
      2. Include links to source documents when available
      3. Flag if the information might be outdated
    priority: 9
    tags: [core]

  # --- Documentation ---
  public_docs:
    type: directory
    path: "./docs/public"
    patterns: ["**/*.md", "**/*.txt"]
    exclude_patterns: ["**/drafts/**"]
    description: Public-facing documentation
    tags: [documentation, public]

  internal_docs:
    type: directory
    path: "./docs/internal"
    patterns: ["**/*.md"]
    exclude_patterns: ["**/archived/**"]
    max_file_size: 500000
    description: Internal engineering documentation
    tags: [documentation, internal]

  hr_handbook:
    type: directory
    path: "./docs/hr"
    patterns: ["**/*.md", "**/*.pdf"]
    description: HR policies and employee handbook
    tags: [hr, compliance]

  # --- Git-versioned content ---
  api_reference:
    type: git_repo
    path: "/repos/api-docs"
    ref: "main"
    patterns: ["docs/**/*.md", "openapi/**/*.yaml"]
    description: API reference from the docs repo
    tags: [engineering, api]

  runbooks:
    type: git_repo
    path: "/repos/runbooks"
    ref: "main"
    patterns: ["**/*.md"]
    exclude_patterns: ["**/deprecated/**"]
    description: Operational runbooks
    tags: [engineering, ops]

  # --- External APIs ---
  knowledge_search:
    type: http_api
    url: "https://search.internal.acme.com/api/v1/search?q={{query}}&limit=5"
    method: GET
    headers:
      Authorization: "Bearer ${SEARCH_API_TOKEN}"
    response_path: "results"
    result_text_field: "content"
    result_title_field: "title"
    description: Internal knowledge base search
    tags: [search, knowledge]

  confluence_api:
    type: http_api
    url: "https://acme.atlassian.net/wiki/rest/api/search"
    method: POST
    headers:
      Authorization: "Bearer ${CONFLUENCE_TOKEN}"
      Content-Type: "application/json"
    body_template: '{"cql": "text ~ \"{{query}}\"", "limit": 5}'
    response_path: "results"
    result_text_field: "content.body.view.value"
    result_title_field: "content.title"
    description: Confluence wiki search
    tags: [search, wiki]

routes:
  # Default route: always provide system context
  - name: always
    when: ""
    sources: [system_prompt, agent_instructions]
    description: "System prompt and instructions for every query"
    tags: [core]

  # General documentation
  - name: general-docs
    when: ""
    sources: [public_docs]
    description: "Public docs available for all queries"
    tags: [documentation]

  # Policy and HR questions
  - name: hr-questions
    when: 'text contains "policy" or text contains "handbook" or text contains "pto" or text contains "benefits" or text contains "onboarding"'
    sources: [hr_handbook]
    description: "HR handbook for policy-related queries"
    tags: [hr]

  # Engineering queries
  - name: engineering
    when: 'agent in $engineering_teams or text contains "api" or text contains "deploy" or text contains "architecture"'
    sources: [internal_docs, api_reference, runbooks]
    description: "Engineering docs for technical queries"
    tags: [engineering]

  # Search augmentation for complex queries
  - name: knowledge-search
    when: 'text contains "how" or text contains "why" or text contains "explain" or text contains "troubleshoot"'
    sources: [knowledge_search, confluence_api]
    description: "Search APIs for complex questions"
    tags: [search]

permissions:
  # Default: all agents can access public sources
  - agent: "*"
    allow_sources: [system_prompt, agent_instructions, public_docs]
    default: deny

  # Engineering agents: broad access, deny HR
  - agent: "eng-assistant"
    allow_sources: [internal_docs, api_reference, runbooks, knowledge_search, confluence_api]
    deny_sources: [hr_handbook]
    deny_paths: ["**/salaries/**", "**/compensation/**"]
    default: deny

  # HR agent: HR access, deny engineering internals
  - agent: "hr-bot"
    allow_sources: [hr_handbook, public_docs, knowledge_search]
    deny_sources: [internal_docs, api_reference, runbooks]
    deny_paths: ["**/security/**", "**/credentials/**"]
    default: deny

  # Executive agent: broad read access
  - agent: "exec-assistant"
    allow_sources: [public_docs, internal_docs, hr_handbook, knowledge_search, confluence_api]
    deny_paths: ["**/credentials/**", "**/secrets/**"]
    default: allow

budget:
  max_tokens: 8000
  ranking: relevance
  truncation: truncate_end
  estimator: chars_div4
  reserve_tokens: 500

cache:
  enabled: true
  directory: ".context-router-cache"
  ttl: 300
  max_entries: 2000

Validation¶

Always validate your config before deploying:

context-router validate --config context-router.yaml
# Config is valid: 9 sources, 5 routes, 4 permissions

The validator checks:

Version is supported ("1.0")
Every source has a valid type and its required fields
Every route has a unique name and references defined sources
Permission rules reference defined sources
Budget values are within valid ranges
Cache values are within valid ranges

If validation fails, the error output includes the specific field and reason:

Validation failed:
  - sources.api: http_api source requires 'url'
  - routes[2] (engineering): source 'api_docs' is not defined
  - budget.ranking: invalid value 'custom', expected one of ['manual', 'recency', 'relevance']

Tips for Writing Good Configs¶

Start small. Begin with 2-3 sources and a default route. Add complexity as you learn what your agents actually need.

Use the default route pattern. Have one route with when: "" that provides baseline context (system prompt, core docs). Add specialized routes on top.

Name sources clearly. The source name shows up in chunk metadata and CLI output. internal_docs is better than source_3.

Use tags for organization. Tags let you filter sources in context-router inspect --tag engineering and understand your config at a glance.

Use variables for values that change. Team lists, API hosts, department names -- put them in variables so config updates don't require editing every route.

Set appropriate budgets. 4000 tokens for simple assistants, 8000-16000 for complex agents. Always set reserve_tokens if you inject a system prompt separately.

Enable cache for API sources. HTTP API sources benefit most from caching. Directory sources are already fast. Set TTL based on how often your data changes.

Version in git. The YAML file is the config. Treat it like code: pull requests, reviews, blame history.