What is a Gate?
A gate is the core building block of Layer AI. It’s a configuration that sits between your application and AI providers, controlling how requests are routed, which models are used, and how failures are handled. When you make a request through Layer, you reference a gate by its ID (a UUID). The gate determines:- Which model handles the request
- What happens if that model fails
- What parameters are applied (temperature, max tokens, etc.)
- Whether spending limits are enforced
- What system prompt is prepended
Gate Types
Layer supports two gate types:| Type | Use Case |
|---|---|
| Standard | Discrete, stateless LLM calls — chat, image generation, embeddings, etc. |
| Agent | Multi-turn agent workflows with session tracking, tool visibility, and per-session spending. See Agent Gates. |
Task Types
Every gate is configured for a specific task type, which determines which models are available:| Task Type | Description | Example Models |
|---|---|---|
chat | Text generation, conversation | GPT-4o, Claude Sonnet, Gemini |
image | Image generation | DALL-E, Stable Diffusion |
video | Video generation | Runway, Pika |
tts | Text-to-speech | OpenAI TTS, ElevenLabs |
embeddings | Text embeddings | text-embedding-3, Ada |
ocr | Document processing | OpenAI Vision, Claude Vision |
- Reasoning — o3, o4-mini, Gemini 2.5 Pro
- Code — Codestral, Devstral
- Realtime — gpt-4o-realtime
Routing Strategies
Gates support three strategies for handling requests:Single (Default)
Use only the primary model. No fallbacks. Simplest and most predictable.Fallback
Try the primary model first. If it fails (provider outage, rate limit, etc.), try each fallback model in order until one succeeds.Round-Robin
Randomly distribute requests across the primary model and all fallback models. Useful for load balancing or informal cost distribution across providers.Creating a Gate
From the Dashboard
- Go to Dashboard → Gates → Create New Gate
- Fill in the required fields:
- Name — A human-readable label for the gate (e.g.,
customer-support). The gate ID (UUID) is what you use in API calls. - Task Type — What kind of requests this gate handles
- Model — The primary model to use
- Name — A human-readable label for the gate (e.g.,
- Optionally configure:
- Fallback models and routing strategy
- System prompt applied to all requests
- Temperature, max tokens, and top P defaults
- Spending limits with alert or block enforcement
- Structured output (JSON schema) for consistent response formats
Via the API
Using a Gate
Once created, reference the gate in your requests:- Layer SDK
- OpenAI SDK
- cURL
Gate Configuration Reference
The full gate configuration interface. Fields marked required must be provided when creating a gate. All other fields are optional.Core Fields
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Human-readable label for the gate (e.g., customer-support). Must be unique per user. Used for display in the dashboard and logs. |
model | SupportedModel | Yes | Primary model to use for requests (e.g., claude-sonnet-4-20250514, gpt-4o). |
taskType | ModelType | Yes | Determines which models are available. One of: chat, image, video, tts, embeddings, ocr, audio, stt, moderation. |
taskSubtype | ModelSubtype | No | Specialization for chat models. One of: reasoning, code, realtime. |
description | string | No | Describes what this gate is for. Used by the Architect for smart routing recommendations. The dashboard offers an Auto-enhance description option — if you accept the Architect’s suggestion, the enhanced version replaces this field. |
tags | string[] | No | Organizational labels for filtering and grouping gates. |
Model Routing
| Field | Type | Default | Description |
|---|---|---|---|
routingStrategy | 'single' | 'fallback' | 'round-robin' | 'single' | How requests are distributed across models. |
fallbackModels | SupportedModel[] | [] | Ordered list of fallback models. Used by fallback and round-robin strategies. |
Request Parameters
Defaults applied to all requests through the gate. Can be overridden per-request ifallowOverrides is configured.
| Field | Type | Default | Description |
|---|---|---|---|
systemPrompt | string | — | System prompt prepended to all requests. |
temperature | number | — | Controls randomness (0 = deterministic, 2 = most random). |
maxTokens | number | — | Maximum response length in tokens. |
topP | number | — | Nucleus sampling threshold (0 - 1). Lower = more focused. |
allowOverrides | boolean | OverrideConfig | false | Whether clients can override gate defaults per-request. Can be true (all overrides) or an object specifying which fields: { model?: boolean, temperature?: boolean, maxTokens?: boolean, topP?: boolean }. |
Smart Routing (Architect)
Layer’s AI agent (the Architect) analyzes your gate’s description and usage patterns to recommend optimal models.| Field | Type | Default | Description |
|---|---|---|---|
costWeight | number | — | Weight for cost optimization (0 - 1). |
latencyWeight | number | — | Weight for latency optimization (0 - 1). |
qualityWeight | number | — | Weight for quality optimization (0 - 1). |
analysisMethod | 'cost' | 'balanced' | 'performance' | 'custom' | — | Preset optimization profile. custom uses the individual weights above. |
maxCostPer1kTokens | number | — | Maximum acceptable cost per 1,000 tokens (USD). |
maxLatencyMs | number | — | Maximum acceptable latency in milliseconds. |
reanalysisPeriod | 'daily' | 'weekly' | 'monthly' | 'never' | 'never' | How often the Architect re-evaluates model recommendations. |
autoApplyRecommendations | boolean | false | Automatically apply Architect recommendations without manual review. |
taskAnalysis | TaskAnalysis | — | Read-only. The Architect’s current recommendation including primary model, alternatives, and reasoning. |
Structured Output
Force responses into a consistent format. Native JSON schema support for OpenAI models; prompt-injected for other providers.| Field | Type | Default | Description |
|---|---|---|---|
responseFormatEnabled | boolean | false | Enable structured output. |
responseFormatType | 'text' | 'json_object' | 'json_schema' | 'text' | Output format. json_object guarantees valid JSON. json_schema validates against a strict schema. |
responseFormatSchema | object | — | JSON schema for json_schema mode. Defines the exact structure of responses. |
Spending Limits
Control costs at the gate level. See Spending for account-level controls.| Field | Type | Default | Description |
|---|---|---|---|
spendingLimit | number | null | null | Dollar cap per period. null = no limit. |
spendingLimitPeriod | 'monthly' | 'daily' | 'monthly' | How often the spending counter resets. |
spendingEnforcement | 'alert_only' | 'block' | 'alert_only' | alert_only warns but allows requests. block rejects requests when the limit is hit. |
Agent gates have additional configuration fields for sessions, sub-gates, and orchestration. See the Agent Gates documentation for those fields.
Read-Only Fields
These fields are returned by the API but cannot be set directly.| Field | Type | Description |
|---|---|---|
id | string | Gate UUID. This is what you use in API calls. |
userId | string | Owner’s user ID. |
createdAt | Date | When the gate was created. |
updatedAt | Date | When the gate was last modified. |
spendingCurrent | number | Current period spending (USD). |
spendingPeriodStart | string | Start of the current spending period. |
spendingStatus | 'active' | 'suspended' | Whether the gate is active or suspended due to spending limit. |
Dashboard Tabs
When editing a gate in the dashboard, configuration is organized into tabs:Basic Info
Name, description, task type, subtypes, and tags. Includes an Auto-enhance description toggle — when enabled, the Architect generates an improved version of your description optimized for smart routing analysis. If you accept the suggestion, it replaces your gate’sdescription field with the enhanced version.