Chat Completions

Overview

The Chat Completions endpoint provides a unified OpenAI-compatible interface to interact with multiple LLM providers. Use this endpoint to send messages and receive responses from models across providers like OpenAI, Anthropic, Google, and more.

Endpoint

POST https://ai-gateway.helicone.ai/v1/chat/completions

Authentication

Include your Helicone API key in the request:

curl https://ai-gateway.helicone.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_HELICONE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'

Request Body

model

string

required

The model to use for completion. Supports comma-separated fallback models (e.g., "gpt-4,claude-3-5-sonnet-20241022").See supported models for the full list.

messages

array

required

An array of message objects that form the conversation history.Each message must include:

role: One of "system", "user", "assistant", "tool", or "developer"
content: The message content (string or array of content parts)

Example:

[
  {"role": "system", "content": "You are a helpful assistant."},
  {"role": "user", "content": "What is the capital of France?"}
]

max_tokens

integer

The maximum number of tokens to generate in the completion.

max_completion_tokens

integer

Alternative to max_tokens. The maximum number of tokens to generate.

temperature

number

Sampling temperature between 0 and 2. Higher values make output more random.Default: 1.0

top_p

number

Nucleus sampling parameter. The model considers tokens with top_p probability mass.Default: 1.0

integer

Number of completions to generate. Must be between 1 and 128.Default: 1

stream

boolean

Whether to stream the response as server-sent events.Default: false

stream_options

object

Options for streaming responses.

include_usage: Whether to include usage statistics in stream
include_obfuscation: Whether to include obfuscation data

stop

string | array

Up to 4 sequences where the API will stop generating tokens.

frequency_penalty

number

Number between -2.0 and 2.0. Positive values penalize tokens based on their frequency.Default: 0

presence_penalty

number

Number between -2.0 and 2.0. Positive values penalize tokens based on their presence.Default: 0

logprobs

boolean

Whether to return log probabilities of output tokens.Default: false

top_logprobs

integer

Number of most likely tokens to return at each position (0-20). Requires logprobs: true.

logit_bias

object

Modify likelihood of specified tokens appearing. Maps token IDs to bias values (-100 to 100).

user

string

A unique identifier for the end-user, for abuse monitoring.

seed

integer

Seed for deterministic sampling. Must be between -9223372036854775808 and 9223372036854775807.

response_format

object

Format of the response. Options:

{"type": "text"} - Plain text response
{"type": "json_object"} - Valid JSON object
{"type": "json_schema", "json_schema": {...}} - JSON matching schema

tools

array

List of tools the model can call. Each tool has:

type: "function" or "custom"
function: Function definition with name, description, and parameters

tool_choice

string | object

Controls which tool is called:

"none": No tool is called
"auto": Model decides
"required": Model must call a tool
Object: Force specific tool

parallel_tool_calls

boolean

Whether to enable parallel function calling.Default: true

reasoning_effort

string

Level of reasoning effort for models that support it. Options: "minimal", "low", "medium", "high"

reasoning_options

object

Advanced reasoning options:

budget_tokens: Token budget for reasoning

modalities

array

Output modalities supported by the model. Currently supports ["text"].

prediction

object

Predicted content to optimize latency:

type: "content"
content: Predicted message content
reasoning: Optional reasoning text

context_editing

object

Context management configuration (Anthropic models only):

enabled: Enable context editing
clear_tool_uses: Auto-clear tool call history
clear_thinking: Manage reasoning traces

store

boolean

Whether to store the completion for later use.Default: false

metadata

object

Additional metadata to attach to the request.

service_tier

string

Service tier for the request. Options: "auto", "default", "flex", "scale", "priority"

cache_control

object

Cache control settings:

type: "ephemeral"
ttl: Time to live for cached response

Helicone-Specific Parameters

Model Fallbacks

Provide multiple models separated by commas for automatic fallback:

{
  "model": "gpt-4,claude-3-5-sonnet-20241022,gemini-2.0-flash-exp",
  "messages": [...]
}

Provider Exclusion

Exclude specific providers using the ! prefix:

{
  "model": "!openai,gpt-4",
  "messages": [...]
}

Prompt Integration

Use stored prompts with variable substitution:

prompt_id

string

ID of the Helicone prompt to use

version_id

string

Specific version of the prompt (optional)

environment

string

Environment for prompt resolution (e.g., "production", "staging")

inputs

object

Variables to substitute in the prompt template

Plugins

plugins

array

Array of plugins to apply to the request. Each plugin configures additional functionality.

Response Format

string

Unique identifier for the completion

object

string

Object type, always "chat.completion"

created

integer

Unix timestamp of when the completion was created

model

string

The model used for completion

choices

array

Array of completion choices. Each choice contains:

index: Choice index
message: The generated message with role and content
finish_reason: Reason completion stopped ("stop", "length", "tool_calls", etc.)
logprobs: Log probabilities if requested

usage

object

Token usage statistics:

prompt_tokens: Tokens in the prompt
completion_tokens: Tokens in the completion
total_tokens: Total tokens used

system_fingerprint

string

System fingerprint for the model configuration

Examples

Basic Chat Completion

curl https://ai-gateway.helicone.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_HELICONE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is the capital of France?"}
    ],
    "temperature": 0.7,
    "max_tokens": 150
  }'

Streaming Response

curl https://ai-gateway.helicone.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_HELICONE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "Tell me a story"}],
    "stream": true,
    "stream_options": {"include_usage": true}
  }'

Function Calling

curl https://ai-gateway.helicone.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_HELICONE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "What is the weather in Boston?"}],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get the current weather in a location",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {"type": "string", "description": "City name"}
            },
            "required": ["location"]
          }
        }
      }
    ]
  }'

Model Fallback

curl https://ai-gateway.helicone.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_HELICONE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4,claude-3-5-sonnet-20241022",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Using Stored Prompts

curl https://ai-gateway.helicone.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_HELICONE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "prompt_id": "my-prompt-id",
    "environment": "production",
    "inputs": {
      "user_name": "Alice",
      "topic": "AI"
    }
  }'

Response Example

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1709251200,
  "model": "gpt-4",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 8,
    "total_tokens": 33
  }
}

Error Responses

The endpoint returns standard HTTP status codes:

400: Invalid request (missing required fields, invalid model, etc.)
401: Authentication failed
403: Access forbidden (suspended account, etc.)
429: Rate limit exceeded or insufficient credits
500: Internal server error

Error response format:

{
  "error": {
    "message": "Invalid model specified",
    "type": "invalid_request_error",
    "code": "invalid_model"
  }
}

Additional Headers

Helicone supports custom headers for enhanced functionality:

Helicone-User-Id: Track requests by user
Helicone-Session-Id: Group requests into sessions
Helicone-Property-*: Add custom properties
Helicone-Cache-Enabled: Enable response caching
Helicone-RateLimit-Policy: Apply custom rate limits
Helicone-Prompt-Id: Use a stored prompt

See the Features documentation for more details on these capabilities.

Overview

AI Gateway

Requests

Sessions

Prompts

Evaluations

Webhooks

Users

Overview

Endpoint

Authentication

Request Body

Helicone-Specific Parameters

Model Fallbacks

Provider Exclusion

Prompt Integration

Plugins

Response Format

Examples

Basic Chat Completion

Streaming Response

Function Calling

Model Fallback

Using Stored Prompts

Response Example

Error Responses

Additional Headers

Overview

AI Gateway

Requests

Sessions

Prompts

Evaluations

Webhooks

Users

Documentation Index

​Overview

​Endpoint

​Authentication

​Request Body

​Helicone-Specific Parameters

​Model Fallbacks

​Provider Exclusion

​Prompt Integration

​Plugins

​Response Format

​Examples

​Basic Chat Completion

​Streaming Response

​Function Calling

​Model Fallback

​Using Stored Prompts

​Response Example

​Error Responses

​Additional Headers

Overview

Endpoint

Authentication

Request Body

Helicone-Specific Parameters

Model Fallbacks

Provider Exclusion

Prompt Integration

Plugins

Response Format

Examples

Basic Chat Completion

Streaming Response

Function Calling

Model Fallback

Using Stored Prompts

Response Example

Error Responses

Additional Headers