What are Agent Gates?
Agent gates extend standard gates with session tracking, tool visibility, and per-session spending controls. They’re designed for multi-step AI agent workflows where a single user intent triggers many model calls.
Agent gates are not a separate system — they’re gates with additional configuration. The same proxy, SDK, and dashboard, just with session awareness layered on top.
When to Use Agent Gates
| Scenario | Gate Type |
|---|
| Single LLM call (chatbot, summarization) | Standard gate |
| Multi-turn agent with tool calls | Agent gate |
| Agent with multiple models for different tasks | Agent gate (orchestrated) |
| Need session-level cost tracking | Agent gate |
Key Concepts
Sessions
A session groups related LLM requests into one agent run. You generate a session ID with the SDK, and all requests sharing that ID are tracked together.
import { Layer } from '@layer-ai/sdk';
const layer = new Layer({ apiKey: process.env.LAYER_API_KEY });
const sessionId = layer.generateSessionId(); // UUIDv4, no API call
Sessions have a lifecycle:
active → idle → completed
→ runaway (requests after completed)
→ budget_exceeded (hard limit hit, blocks requests)
- Active — Accepting requests. Default state when a session is created.
- Idle — No requests received within the timeout window (default 30 min). Automatically reactivates to
active if a new request arrives with the same session ID — idle is not terminal.
- Completed — Developer called
endSession(). If requests continue after completion, the session flips to runaway.
- Runaway — Requests arrived after the session was marked
completed. Layer allows the requests but flags the session. Useful for detecting agent loops or unexpected behavior.
- Budget exceeded — Hard spending limit hit. New requests are rejected with HTTP 402. This is the only status that blocks requests.
Sessions are created implicitly — the first request with an unseen sessionId creates the session. No explicit session creation call is needed.
Modes
Agent gates operate in one of two modes:
Observability
Passthrough mode. All requests use the gate’s configured model. Layer tracks sessions, extracts tool calls, and provides analytics — but makes no routing decisions.
Orchestrated
Active routing mode. Requests route to different sub-gates based on the task. Currently supports static orchestration where the developer calls sub-gates directly.
| Orchestration Type | Who Decides Routing |
|---|
| Static | Developer calls specific sub-gates at runtime. Layer tracks and groups everything under one session. |
| Dynamic | Layer picks the best sub-gate from a pool per request. (Coming soon) |
Routing Behavior
How requests flow depends on the mode:
Observability
Request arrives at agent gate with sessionId
→ Layer logs the request to the session
→ Request routes through the gate's configured model
→ Response logged, session metrics updated
No routing decisions. Layer is a transparent proxy with session awareness.
Static Orchestrated
Request arrives at agent gate with sessionId
→ Treated as orchestrator reasoning
→ Routes through the agent gate's configured model
Request arrives at sub-gate with sessionId
→ Layer resolves parent agent gate via sub-gate relationship
→ Associates request with the parent's session
→ Routes through the sub-gate's own model, params, and fallbacks
→ Trace labels the request with the sub-gate's name
Both types of requests update the same session’s metrics (cost, tokens, latency, request count).
Sub-Gates
A sub-gate is a regular gate used as a routing target within an agent workflow. Any existing gate can serve as a sub-gate — it doesn’t know or care that it’s being used as one.
Sub-gates let you use different models for different parts of your agent:
- Orchestrator calls → expensive reasoning model (e.g., Claude Opus)
- Data extraction → cheap fast model (e.g., Claude Haiku)
- Code generation → code-specialized model (e.g., Codestral)
Session Spending Limits
Two-tier cost control per session:
| Tier | Behavior | Default |
|---|
| Soft limit | Request proceeds. Layer adds X-Layer-Session-Warning: soft_limit_exceeded header so your code can react. | User-configured |
| Hard limit | Request rejected (HTTP 402). Session marked budget_exceeded. | 2x soft limit |
Integration
Layer SDK
import { Layer } from '@layer-ai/sdk';
const layer = new Layer({ apiKey: process.env.LAYER_API_KEY });
const sessionId = layer.generateSessionId();
// Make requests with the session ID
await layer.chat({
gateId: 'your-agent-gate-uuid',
sessionId,
data: {
messages: [{ role: 'user', content: 'Research climate change impacts' }]
}
});
// End the session when done (optional — sessions idle out naturally)
await layer.endSession(sessionId);
Anthropic SDK (Drop-in)
Keep your existing Anthropic code. Just change the base URL, API key, and add two headers:
import Anthropic from '@anthropic-ai/sdk';
import { Layer } from '@layer-ai/sdk';
const layer = new Layer({ apiKey: process.env.LAYER_API_KEY });
const sessionId = layer.generateSessionId();
const anthropic = new Anthropic({
baseURL: 'https://api.uselayer.ai',
apiKey: process.env.LAYER_API_KEY,
defaultHeaders: {
'x-layer-gate-id': 'your-agent-gate-uuid',
'x-layer-session-id': sessionId,
},
});
// All existing code stays identical
const response = await anthropic.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 4096,
messages: [{ role: 'user', content: 'Research climate change impacts' }],
tools: toolDefinitions,
});
OpenAI SDK (Drop-in)
import OpenAI from 'openai';
import { Layer } from '@layer-ai/sdk';
const layer = new Layer({ apiKey: process.env.LAYER_API_KEY });
const sessionId = layer.generateSessionId();
const openai = new OpenAI({
baseURL: 'https://api.uselayer.ai/v1',
apiKey: process.env.LAYER_API_KEY,
defaultHeaders: {
'x-layer-gate-id': 'your-agent-gate-uuid',
'x-layer-session-id': sessionId,
},
});
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Research climate change impacts' }],
tools: toolDefinitions,
});
Multi-Gate Pattern (Static Orchestrated)
When your agent uses multiple models for different tasks, create separate SDK clients pointing at different sub-gates:
import Anthropic from '@anthropic-ai/sdk';
import { Layer } from '@layer-ai/sdk';
const layer = new Layer({ apiKey: process.env.LAYER_API_KEY });
const sessionId = layer.generateSessionId();
// Orchestrator — expensive model for reasoning
const orchestrator = new Anthropic({
baseURL: 'https://api.uselayer.ai',
apiKey: process.env.LAYER_API_KEY,
defaultHeaders: {
'x-layer-gate-id': 'your-agent-gate-uuid',
'x-layer-session-id': sessionId,
},
});
// Extractor — cheap model for data extraction
const extractor = new Anthropic({
baseURL: 'https://api.uselayer.ai',
apiKey: process.env.LAYER_API_KEY,
defaultHeaders: {
'x-layer-gate-id': 'your-extraction-sub-gate-uuid',
'x-layer-session-id': sessionId, // same session ties it together
},
});
// Main agent loop
const response = await orchestrator.messages.create({ ... });
// Inside a tool function
async function extractData(content: string) {
return extractor.messages.create({
model: 'claude-haiku-4-5-20251001',
messages: [{ role: 'user', content: `Extract key facts: ${content}` }],
});
}
Layer sees both calls under the same session. The orchestrator calls are labeled with the agent gate name; extraction calls are labeled with the sub-gate name.
Layer automatically extracts tool usage from request and response content — no additional configuration needed:
tool_use blocks in responses — what tools the LLM decided to call
tool_result blocks in requests — what the tools returned
This gives the dashboard a complete picture of your agent’s behavior, including tools that never touch an LLM (like database queries or API calls).
What Layer Does NOT Require
- No code rewrite — Keep your existing Anthropic/OpenAI SDK code. Just change the base URL and add headers.
- No action labels — Layer infers what each request is doing from which gate/sub-gate was called and from
tool_use/tool_result blocks in the message content.
- No parent/child wiring — Layer infers request relationships from timestamps, gate identity, and response content. You never manage a request tree.
- No session creation call — Sessions are created implicitly on the first request with a new
sessionId.
- No session ID management — The SDK generates collision-safe UUIDs.
- No session cleanup — Idle sessions reactivate automatically. You can call
endSession() to explicitly mark a session as done, but it’s optional — sessions idle out naturally.
Dashboard
Session List
The agent gate dashboard shows all sessions with:
- Status, duration, request count, total cost
- Expandable timeline of requests within each session
Session Detail (Trace View)
Three view modes for inspecting a session:
- Timeline — Chronological view of all requests with gate labels
- By Action — Grouped by sub-gate/action with aggregate metrics
- Trace — Tree-structured view showing orchestrator turns, tool calls, and sub-gate responses
Intelligence Insights
After a session completes, Layer’s intelligence agent automatically analyzes it for:
- Behavioral patterns and loop detection
- Quality scoring and anomaly detection
- Cost efficiency with potential savings estimation
- Actionable recommendations (model suggestions, prompt improvements)
Creating an Agent Gate
- Go to Dashboard → Agent Gates → Create New
- Configure the gate using the dashboard tabs described below.
Dashboard Tabs
When editing an agent gate in the dashboard, configuration is organized into tabs:
Basic Info
Name, description, mode (Observability or Orchestrated), orchestration type (Static or Dynamic for orchestrated mode), and tags.
Primary Model
Primary model selection, fallback chain, routing strategy (single/fallback/round-robin), optimization weights (cost/latency/quality), and smart routing via the Architect. Same configuration surface as standard gates.
Model Pool
Sub-gate management for orchestrated mode. Create new sub-gates inline (name + description + model recommendations) or attach existing gates. Each sub-gate shows its assigned model and a copyable gate ID. Advanced configuration opens the full gate settings for that sub-gate.
Session Settings
Per-session spending limits (soft + hard), session timeout duration, and alert configuration.
Request Settings
System prompt, temperature, max tokens, top P, and override permissions. Same as standard gates.
Sessions
Live session list with status filters, duration, request count, and total cost. Expand a session to see the request timeline. Includes aggregated intelligence insights across sessions (average quality, cost efficiency, loop rate, anomaly rate).
Intelligence
Gate-level intelligence overview with aggregated metrics and recommendations from Layer’s intelligence agent across all analyzed sessions.
Danger Zone
Delete the agent gate. This action is irreversible. Active sessions will no longer accept new requests.
Agent Gate Configuration Reference
Agent gates share all configuration fields from standard gates (model, fallbacks, routing strategy, spending limits, smart routing, etc. — see Gates). The fields below are specific to agent gates.
Agent Gate Fields
| Field | Type | Required | Description |
|---|
gateType | 'standard' | 'agent' | Yes | Must be 'agent' for agent gates. |
mode | 'observability' | 'orchestrated' | Yes | Operating mode. Observability is passthrough; orchestrated enables sub-gate routing. |
orchestrationType | 'static' | 'dynamic' | No | Only for orchestrated mode. static means the developer calls sub-gates directly. dynamic means Layer selects from a pool (coming soon). |
subGates | string[] | No | Array of sub-gate IDs for static orchestrated mode. Managed via the Sub-Gates tab in the dashboard. |
subGatePool | string[] | No | Array of sub-gate IDs for dynamic orchestrated mode. Layer selects from this pool at runtime. |
Session Settings
| Field | Type | Default | Description |
|---|
sessionSpendingLimit | number | null | null | Soft spending limit per session (USD). When exceeded, requests proceed but Layer adds X-Layer-Session-Warning: soft_limit_exceeded to the response. |
sessionHardLimit | number | null | 2x soft limit | Hard spending limit per session (USD). When exceeded, requests are rejected with HTTP 402 and the session is marked budget_exceeded. |
sessionTimeoutMinutes | number | 30 | Minutes of inactivity before a session transitions from active to idle. |
Session Read-Only Fields
These fields are returned on session objects but cannot be set directly.
| Field | Type | Description |
|---|
id | string | Session UUID (auto-generated). |
sessionId | string | SDK-generated session identifier (the value from generateSessionId()). |
gateId | string | The parent agent gate UUID. |
status | SessionStatus | One of: active, idle, completed, runaway, budget_exceeded. |
mode | AgentGateMode | Mode inherited from the gate at session creation time. |
totalRequests | number | Running count of requests in this session. |
totalCost | number | Cumulative cost in USD. |
totalTokens | number | Cumulative token count. |
totalLatencyMs | number | Cumulative latency in milliseconds. |
startedAt | string | Timestamp of the first request. |
lastRequestAt | string | Timestamp of the most recent request. |
completedAt | string | null | When the session ended (any terminal state). |
| Header | Required | Description |
|---|
X-Layer-Gate-Id | Yes | The agent gate or sub-gate UUID |
X-Layer-Session-Id | Yes* | Session identifier from generateSessionId(). *Required for agent gates, ignored for standard gates. |
Response headers:
| Header | Description |
|---|
X-Layer-Session-Warning | Set to soft_limit_exceeded when session cost exceeds the soft spending limit |