Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/helicone/helicone/llms.txt

Use this file to discover all available pages before exploring further.

How Routing Works

Helicone AI Gateway intelligently routes your requests to the best available provider based on:
  • Authentication method (BYOK vs PTB)
  • Provider priority (OpenAI, Anthropic, etc.)
  • Cost optimization
  • Availability and rate limits
  • Explicit routing rules

Routing Priority

When you specify a model without a provider, the gateway builds a list of attempts and tries them in order:
1

BYOK Attempts First

If you have provider keys configured, BYOK attempts are prioritized:
// You have OpenAI and Anthropic keys configured
model: "gpt-4o-mini"
// Tries: OpenAI BYOK → PTB providers
2

Provider Priority

Within BYOK and PTB groups, providers are prioritized by:
  1. Native provider (e.g., OpenAI for GPT models)
  2. Major cloud providers (Azure, Bedrock, Vertex)
  3. Alternative providers (DeepInfra, OpenRouter, etc.)
3

Fallback Chain

If a request fails, the gateway automatically tries the next provider in the list.

Routing Examples

Automatic Routing

Let the gateway choose the best provider:
const response = await client.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: "Hello!" }],
});
Routing behavior:
  • If you have an OpenAI BYOK key: Uses OpenAI directly
  • If no BYOK key: Uses PTB (Helicone billing)
  • Automatically tries alternatives if the first attempt fails

Explicit Provider Routing

Specify the exact provider you want:
// OpenAI direct
model: "gpt-4o/openai"

// Azure OpenAI
model: "gpt-4o/azure"

// AWS Bedrock
model: "claude-3-7-sonnet-20250219/bedrock"

// Anthropic direct
model: "claude-sonnet-4/anthropic"

// DeepInfra
model: "llama-3.3-70b/deepinfra"
Explicit routing still respects BYOK → PTB priority. If you have a BYOK key for the specified provider, it’s used first.

Multi-Model Fallback

Specify multiple models to try in order:
const response = await client.chat.completions.create({
  model: "gpt-4o/openai,gpt-4o/azure,gpt-4o/deepinfra",
  messages: [{ role: "user", content: "Hello!" }],
});
Behavior:
  1. Tries OpenAI BYOK (if configured)
  2. Falls back to OpenAI PTB
  3. Then tries Azure BYOK (if configured)
  4. Then Azure PTB
  5. Finally tries DeepInfra
See Fallbacks for more advanced fallback strategies.

Provider Exclusions

Exclude specific providers from automatic routing:
// Exclude Anthropic from all routing
model: "!anthropic,gpt-4o-mini"

// Exclude multiple providers
model: "!anthropic,!deepinfra,gpt-4o-mini"
The !provider syntax excludes providers globally for all models in the comma-separated list.
// Use gpt-4o-mini from any provider except Anthropic
const response = await client.chat.completions.create({
  model: "!anthropic,gpt-4o-mini",
  messages: [{ role: "user", content: "Hello!" }],
});

Provider Priority Order

The gateway uses the following priority order for routing (within BYOK and PTB groups):
For GPT models (gpt-4o, gpt-5, o1, etc.):
  1. OpenAI (native provider)
  2. Azure OpenAI
  3. Alternative providers (DeepInfra, OpenRouter, etc.)
This priority order is implemented in the AttemptBuilder class and can be customized with explicit routing or exclusions.

BYOK vs PTB Routing

BYOK (Bring Your Own Key)

When you configure provider keys in Settings → API Keys:
  • Priority: BYOK attempts always come first
  • Billing: Direct from the provider
  • Failover: Falls back to PTB if BYOK fails
// With OpenAI BYOK configured
model: "gpt-4o-mini"
// 1. Tries OpenAI BYOK
// 2. Falls back to OpenAI PTB
// 3. Falls back to alternative providers

PTB (Pass-Through Billing)

When using Helicone’s API key without BYOK:
  • Priority: After BYOK attempts
  • Billing: Through Helicone (add credits at helicone.ai/credits)
  • Authentication: Single API key for all providers
// Without BYOK keys
model: "gpt-4o-mini"
// 1. Tries OpenAI PTB
// 2. Falls back to alternative PTB providers

Regional Routing

Some providers support regional endpoints:

AWS Bedrock

// Specific region
model: "us.anthropic.claude-3-7-sonnet-20250219-v1:0/bedrock"

// With fallback to Anthropic
model: "us.anthropic.claude-3-7-sonnet-20250219-v1:0/bedrock,claude-3-7-sonnet-20250219/anthropic"

Azure OpenAI

Configure Azure deployments in Settings → API Keys, then:
model: "gpt-4o/azure"

Cost-Optimized Routing

The gateway considers provider pricing when routing:
// Automatically uses the lowest cost provider
model: "llama-3.3-70b"
// Compares: DeepInfra, Together AI, Bedrock, etc.

Cost Tracking

View costs by provider in the Helicone dashboard

LLM Cost API

Access pricing data at helicone.ai/llm-cost

Routing Observability

Track which providers were attempted in the Helicone dashboard:
1

View Request Details

Open any request in Requests
2

Check Provider

See which provider was used and if fallbacks occurred
3

Analyze Patterns

Use filters to analyze routing patterns across requests

Advanced Routing

Model-Specific Provider Routing

Route different models to different providers:
// Use Bedrock for Claude, OpenAI for GPT
const claudeResponse = await client.chat.completions.create({
  model: "claude-sonnet-4/bedrock",
  messages: [{ role: "user", content: "Hello!" }],
});

const gptResponse = await client.chat.completions.create({
  model: "gpt-4o/openai",
  messages: [{ role: "user", content: "Hello!" }],
});

Conditional Routing

Route based on application logic:
function selectModel(priority: "cost" | "speed" | "quality") {
  switch (priority) {
    case "cost":
      return "gpt-4o-mini/deepinfra,gpt-4o-mini";
    case "speed":
      return "llama-3.3-70b/groq,gpt-4o-mini";
    case "quality":
      return "gpt-4o,claude-sonnet-4";
  }
}

const response = await client.chat.completions.create({
  model: selectModel("cost"),
  messages: [{ role: "user", content: "Hello!" }],
});

Routing Errors

The gateway returns errors when routing fails:
400 Bad Request
error
Invalid model or provider name:
{
  "error": {
    "message": "No available providers for the requested models",
    "type": "request_failed"
  }
}
401 Unauthorized
error
Invalid or missing API key:
{
  "error": {
    "message": "Invalid Helicone API key",
    "type": "authentication_failed"
  }
}
429 Rate Limited
error
Rate limit exceeded or insufficient credits:
{
  "error": {
    "message": "Insufficient credit limit",
    "type": "insufficient_credit_limit"
  }
}

Best Practices

Let the gateway choose providers automatically:
model: "gpt-4o-mini"  // Best for most use cases
This provides:
  • Automatic BYOK → PTB fallback
  • Cost optimization
  • Built-in resilience
Use explicit providers when you need:
model: "gpt-4o/azure"  // Required for data residency
  • Data residency requirements
  • Specific SLAs
  • Regulatory compliance
Add provider keys for:
  • Better cost control
  • Direct provider billing
  • Priority routing
Configure at Settings → API Keys
Use the Helicone dashboard to:
  • Track provider usage
  • Identify cost optimization opportunities
  • Detect routing issues
View at Helicone Dashboard

Next Steps

Fallbacks

Configure automatic failover strategies

Getting Started

Start using the AI Gateway

Browse Models

Explore all available models

Cost API

Access provider pricing data