Provider Routing

How Routing Works

Helicone AI Gateway intelligently routes your requests to the best available provider based on:

Authentication method (BYOK vs PTB)
Provider priority (OpenAI, Anthropic, etc.)
Cost optimization
Availability and rate limits
Explicit routing rules

Routing Priority

When you specify a model without a provider, the gateway builds a list of attempts and tries them in order:

BYOK Attempts First

If you have provider keys configured, BYOK attempts are prioritized:

// You have OpenAI and Anthropic keys configured
model: "gpt-4o-mini"
// Tries: OpenAI BYOK → PTB providers

Provider Priority

Within BYOK and PTB groups, providers are prioritized by:

Native provider (e.g., OpenAI for GPT models)
Major cloud providers (Azure, Bedrock, Vertex)
Alternative providers (DeepInfra, OpenRouter, etc.)

Fallback Chain

If a request fails, the gateway automatically tries the next provider in the list.

Routing Examples

Automatic Routing

Let the gateway choose the best provider:

const response = await client.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: "Hello!" }],
});

Routing behavior:

If you have an OpenAI BYOK key: Uses OpenAI directly
If no BYOK key: Uses PTB (Helicone billing)
Automatically tries alternatives if the first attempt fails

Explicit Provider Routing

Specify the exact provider you want:

// OpenAI direct
model: "gpt-4o/openai"

// Azure OpenAI
model: "gpt-4o/azure"

// AWS Bedrock
model: "claude-3-7-sonnet-20250219/bedrock"

// Anthropic direct
model: "claude-sonnet-4/anthropic"

// DeepInfra
model: "llama-3.3-70b/deepinfra"

Explicit routing still respects BYOK → PTB priority. If you have a BYOK key for the specified provider, it’s used first.

Multi-Model Fallback

Specify multiple models to try in order:

const response = await client.chat.completions.create({
  model: "gpt-4o/openai,gpt-4o/azure,gpt-4o/deepinfra",
  messages: [{ role: "user", content: "Hello!" }],
});

Behavior:

Tries OpenAI BYOK (if configured)
Falls back to OpenAI PTB
Then tries Azure BYOK (if configured)
Then Azure PTB
Finally tries DeepInfra

See Fallbacks for more advanced fallback strategies.

Provider Exclusions

Exclude specific providers from automatic routing:

// Exclude Anthropic from all routing
model: "!anthropic,gpt-4o-mini"

// Exclude multiple providers
model: "!anthropic,!deepinfra,gpt-4o-mini"

The !provider syntax excludes providers globally for all models in the comma-separated list.

// Use gpt-4o-mini from any provider except Anthropic
const response = await client.chat.completions.create({
  model: "!anthropic,gpt-4o-mini",
  messages: [{ role: "user", content: "Hello!" }],
});

Provider Priority Order

The gateway uses the following priority order for routing (within BYOK and PTB groups):

OpenAI Models
Anthropic Models
Google Models
Meta Models

For GPT models (gpt-4o, gpt-5, o1, etc.):

OpenAI (native provider)
Azure OpenAI
Alternative providers (DeepInfra, OpenRouter, etc.)

This priority order is implemented in the AttemptBuilder class and can be customized with explicit routing or exclusions.

BYOK vs PTB Routing

BYOK (Bring Your Own Key)

When you configure provider keys in Settings → API Keys:

Priority: BYOK attempts always come first
Billing: Direct from the provider
Failover: Falls back to PTB if BYOK fails

// With OpenAI BYOK configured
model: "gpt-4o-mini"
// 1. Tries OpenAI BYOK
// 2. Falls back to OpenAI PTB
// 3. Falls back to alternative providers

PTB (Pass-Through Billing)

When using Helicone’s API key without BYOK:

Priority: After BYOK attempts
Billing: Through Helicone (add credits at helicone.ai/credits)
Authentication: Single API key for all providers

// Without BYOK keys
model: "gpt-4o-mini"
// 1. Tries OpenAI PTB
// 2. Falls back to alternative PTB providers

Regional Routing

Some providers support regional endpoints:

AWS Bedrock

// Specific region
model: "us.anthropic.claude-3-7-sonnet-20250219-v1:0/bedrock"

// With fallback to Anthropic
model: "us.anthropic.claude-3-7-sonnet-20250219-v1:0/bedrock,claude-3-7-sonnet-20250219/anthropic"

Azure OpenAI

Configure Azure deployments in Settings → API Keys, then:

model: "gpt-4o/azure"

Cost-Optimized Routing

The gateway considers provider pricing when routing:

// Automatically uses the lowest cost provider
model: "llama-3.3-70b"
// Compares: DeepInfra, Together AI, Bedrock, etc.

Cost Tracking

View costs by provider in the Helicone dashboard

LLM Cost API

Access pricing data at helicone.ai/llm-cost

Routing Observability

Track which providers were attempted in the Helicone dashboard:

View Request Details

Open any request in Requests

Check Provider

See which provider was used and if fallbacks occurred

Analyze Patterns

Use filters to analyze routing patterns across requests

Advanced Routing

Model-Specific Provider Routing

Route different models to different providers:

// Use Bedrock for Claude, OpenAI for GPT
const claudeResponse = await client.chat.completions.create({
  model: "claude-sonnet-4/bedrock",
  messages: [{ role: "user", content: "Hello!" }],
});

const gptResponse = await client.chat.completions.create({
  model: "gpt-4o/openai",
  messages: [{ role: "user", content: "Hello!" }],
});

Conditional Routing

Route based on application logic:

function selectModel(priority: "cost" | "speed" | "quality") {
  switch (priority) {
    case "cost":
      return "gpt-4o-mini/deepinfra,gpt-4o-mini";
    case "speed":
      return "llama-3.3-70b/groq,gpt-4o-mini";
    case "quality":
      return "gpt-4o,claude-sonnet-4";
  }
}

const response = await client.chat.completions.create({
  model: selectModel("cost"),
  messages: [{ role: "user", content: "Hello!" }],
});

Routing Errors

The gateway returns errors when routing fails:

400 Bad Request

error

Invalid model or provider name:

{
  "error": {
    "message": "No available providers for the requested models",
    "type": "request_failed"
  }
}

401 Unauthorized

error

Invalid or missing API key:

{
  "error": {
    "message": "Invalid Helicone API key",
    "type": "authentication_failed"
  }
}

429 Rate Limited

error

Rate limit exceeded or insufficient credits:

{
  "error": {
    "message": "Insufficient credit limit",
    "type": "insufficient_credit_limit"
  }
}

Best Practices

Use Automatic Routing for Flexibility

Let the gateway choose providers automatically:

model: "gpt-4o-mini"  // Best for most use cases

This provides:

Automatic BYOK → PTB fallback
Cost optimization
Built-in resilience

Explicit Routing for Compliance

Use explicit providers when you need:

model: "gpt-4o/azure"  // Required for data residency

Data residency requirements
Specific SLAs
Regulatory compliance

Configure BYOK for Cost Control

Add provider keys for:

Better cost control
Direct provider billing
Priority routing

Configure at Settings → API Keys

Monitor Routing Patterns

Use the Helicone dashboard to:

Track provider usage
Identify cost optimization opportunities
Detect routing issues

View at Helicone Dashboard

Next Steps

Fallbacks

Configure automatic failover strategies

Getting Started

Start using the AI Gateway

Browse Models

Explore all available models

Cost API

Access provider pricing data

Get Started

AI Gateway

Observability

Prompt Management

Evaluation & Testing

Features

Self-Hosting

Integrations

How Routing Works

Routing Priority

Routing Examples

Automatic Routing

Explicit Provider Routing

Multi-Model Fallback

Provider Exclusions

Provider Priority Order

BYOK vs PTB Routing

BYOK (Bring Your Own Key)

PTB (Pass-Through Billing)

Regional Routing

AWS Bedrock

Azure OpenAI

Cost-Optimized Routing

Cost Tracking

LLM Cost API

Routing Observability

Advanced Routing

Model-Specific Provider Routing

Conditional Routing

Routing Errors

Best Practices

Next Steps

Fallbacks

Getting Started

Browse Models

Cost API

Get Started

AI Gateway

Observability

Prompt Management

Evaluation & Testing

Features

Self-Hosting

Integrations

Documentation Index

​How Routing Works

​Routing Priority

​Routing Examples

​Automatic Routing

​Explicit Provider Routing

​Multi-Model Fallback

​Provider Exclusions

​Provider Priority Order

​BYOK vs PTB Routing

​BYOK (Bring Your Own Key)

​PTB (Pass-Through Billing)

​Regional Routing

​AWS Bedrock

​Azure OpenAI

​Cost-Optimized Routing

Cost Tracking

LLM Cost API

​Routing Observability

​Advanced Routing

​Model-Specific Provider Routing

​Conditional Routing

​Routing Errors

​Best Practices

​Next Steps

Fallbacks

Getting Started

Browse Models

Cost API

How Routing Works

Routing Priority

Routing Examples

Automatic Routing

Explicit Provider Routing

Multi-Model Fallback

Provider Exclusions

Provider Priority Order

BYOK vs PTB Routing

BYOK (Bring Your Own Key)

PTB (Pass-Through Billing)

Regional Routing

AWS Bedrock

Azure OpenAI

Cost-Optimized Routing

Routing Observability

Advanced Routing

Model-Specific Provider Routing

Conditional Routing

Routing Errors

Best Practices

Next Steps