Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/helicone/helicone/llms.txt

Use this file to discover all available pages before exploring further.

Overview

The Chat Completions endpoint provides a unified OpenAI-compatible interface to interact with multiple LLM providers. Use this endpoint to send messages and receive responses from models across providers like OpenAI, Anthropic, Google, and more.

Endpoint

POST https://ai-gateway.helicone.ai/v1/chat/completions

Authentication

Include your Helicone API key in the request:
curl https://ai-gateway.helicone.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_HELICONE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'

Request Body

model
string
required
The model to use for completion. Supports comma-separated fallback models (e.g., "gpt-4,claude-3-5-sonnet-20241022").See supported models for the full list.
messages
array
required
An array of message objects that form the conversation history.Each message must include:
  • role: One of "system", "user", "assistant", "tool", or "developer"
  • content: The message content (string or array of content parts)
Example:
[
  {"role": "system", "content": "You are a helpful assistant."},
  {"role": "user", "content": "What is the capital of France?"}
]
max_tokens
integer
The maximum number of tokens to generate in the completion.
max_completion_tokens
integer
Alternative to max_tokens. The maximum number of tokens to generate.
temperature
number
Sampling temperature between 0 and 2. Higher values make output more random.Default: 1.0
top_p
number
Nucleus sampling parameter. The model considers tokens with top_p probability mass.Default: 1.0
n
integer
Number of completions to generate. Must be between 1 and 128.Default: 1
stream
boolean
Whether to stream the response as server-sent events.Default: false
stream_options
object
Options for streaming responses.
  • include_usage: Whether to include usage statistics in stream
  • include_obfuscation: Whether to include obfuscation data
stop
string | array
Up to 4 sequences where the API will stop generating tokens.
frequency_penalty
number
Number between -2.0 and 2.0. Positive values penalize tokens based on their frequency.Default: 0
presence_penalty
number
Number between -2.0 and 2.0. Positive values penalize tokens based on their presence.Default: 0
logprobs
boolean
Whether to return log probabilities of output tokens.Default: false
top_logprobs
integer
Number of most likely tokens to return at each position (0-20). Requires logprobs: true.
logit_bias
object
Modify likelihood of specified tokens appearing. Maps token IDs to bias values (-100 to 100).
user
string
A unique identifier for the end-user, for abuse monitoring.
seed
integer
Seed for deterministic sampling. Must be between -9223372036854775808 and 9223372036854775807.
response_format
object
Format of the response. Options:
  • {"type": "text"} - Plain text response
  • {"type": "json_object"} - Valid JSON object
  • {"type": "json_schema", "json_schema": {...}} - JSON matching schema
tools
array
List of tools the model can call. Each tool has:
  • type: "function" or "custom"
  • function: Function definition with name, description, and parameters
tool_choice
string | object
Controls which tool is called:
  • "none": No tool is called
  • "auto": Model decides
  • "required": Model must call a tool
  • Object: Force specific tool
parallel_tool_calls
boolean
Whether to enable parallel function calling.Default: true
reasoning_effort
string
Level of reasoning effort for models that support it. Options: "minimal", "low", "medium", "high"
reasoning_options
object
Advanced reasoning options:
  • budget_tokens: Token budget for reasoning
modalities
array
Output modalities supported by the model. Currently supports ["text"].
prediction
object
Predicted content to optimize latency:
  • type: "content"
  • content: Predicted message content
  • reasoning: Optional reasoning text
context_editing
object
Context management configuration (Anthropic models only):
  • enabled: Enable context editing
  • clear_tool_uses: Auto-clear tool call history
  • clear_thinking: Manage reasoning traces
store
boolean
Whether to store the completion for later use.Default: false
metadata
object
Additional metadata to attach to the request.
service_tier
string
Service tier for the request. Options: "auto", "default", "flex", "scale", "priority"
cache_control
object
Cache control settings:
  • type: "ephemeral"
  • ttl: Time to live for cached response

Helicone-Specific Parameters

Model Fallbacks

Provide multiple models separated by commas for automatic fallback:
{
  "model": "gpt-4,claude-3-5-sonnet-20241022,gemini-2.0-flash-exp",
  "messages": [...]
}

Provider Exclusion

Exclude specific providers using the ! prefix:
{
  "model": "!openai,gpt-4",
  "messages": [...]
}

Prompt Integration

Use stored prompts with variable substitution:
prompt_id
string
ID of the Helicone prompt to use
version_id
string
Specific version of the prompt (optional)
environment
string
Environment for prompt resolution (e.g., "production", "staging")
inputs
object
Variables to substitute in the prompt template

Plugins

plugins
array
Array of plugins to apply to the request. Each plugin configures additional functionality.

Response Format

id
string
Unique identifier for the completion
object
string
Object type, always "chat.completion"
created
integer
Unix timestamp of when the completion was created
model
string
The model used for completion
choices
array
Array of completion choices. Each choice contains:
  • index: Choice index
  • message: The generated message with role and content
  • finish_reason: Reason completion stopped ("stop", "length", "tool_calls", etc.)
  • logprobs: Log probabilities if requested
usage
object
Token usage statistics:
  • prompt_tokens: Tokens in the prompt
  • completion_tokens: Tokens in the completion
  • total_tokens: Total tokens used
system_fingerprint
string
System fingerprint for the model configuration

Examples

Basic Chat Completion

curl https://ai-gateway.helicone.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_HELICONE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is the capital of France?"}
    ],
    "temperature": 0.7,
    "max_tokens": 150
  }'

Streaming Response

curl https://ai-gateway.helicone.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_HELICONE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "Tell me a story"}],
    "stream": true,
    "stream_options": {"include_usage": true}
  }'

Function Calling

curl https://ai-gateway.helicone.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_HELICONE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "What is the weather in Boston?"}],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get the current weather in a location",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {"type": "string", "description": "City name"}
            },
            "required": ["location"]
          }
        }
      }
    ]
  }'

Model Fallback

curl https://ai-gateway.helicone.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_HELICONE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4,claude-3-5-sonnet-20241022",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Using Stored Prompts

curl https://ai-gateway.helicone.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_HELICONE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "prompt_id": "my-prompt-id",
    "environment": "production",
    "inputs": {
      "user_name": "Alice",
      "topic": "AI"
    }
  }'

Response Example

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1709251200,
  "model": "gpt-4",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 8,
    "total_tokens": 33
  }
}

Error Responses

The endpoint returns standard HTTP status codes:
  • 400: Invalid request (missing required fields, invalid model, etc.)
  • 401: Authentication failed
  • 403: Access forbidden (suspended account, etc.)
  • 429: Rate limit exceeded or insufficient credits
  • 500: Internal server error
Error response format:
{
  "error": {
    "message": "Invalid model specified",
    "type": "invalid_request_error",
    "code": "invalid_model"
  }
}

Additional Headers

Helicone supports custom headers for enhanced functionality:
  • Helicone-User-Id: Track requests by user
  • Helicone-Session-Id: Group requests into sessions
  • Helicone-Property-*: Add custom properties
  • Helicone-Cache-Enabled: Enable response caching
  • Helicone-RateLimit-Policy: Apply custom rate limits
  • Helicone-Prompt-Id: Use a stored prompt
See the Features documentation for more details on these capabilities.