Documentation Index
Fetch the complete documentation index at: https://mintlify.com/helicone/helicone/llms.txt
Use this file to discover all available pages before exploring further.
Overview
The Chat Completions endpoint provides a unified OpenAI-compatible interface to interact with multiple LLM providers. Use this endpoint to send messages and receive responses from models across providers like OpenAI, Anthropic, Google, and more.Endpoint
Authentication
Include your Helicone API key in the request:Request Body
The model to use for completion. Supports comma-separated fallback models (e.g.,
"gpt-4,claude-3-5-sonnet-20241022").See supported models for the full list.An array of message objects that form the conversation history.Each message must include:
role: One of"system","user","assistant","tool", or"developer"content: The message content (string or array of content parts)
The maximum number of tokens to generate in the completion.
Alternative to
max_tokens. The maximum number of tokens to generate.Sampling temperature between 0 and 2. Higher values make output more random.Default:
1.0Nucleus sampling parameter. The model considers tokens with top_p probability mass.Default:
1.0Number of completions to generate. Must be between 1 and 128.Default:
1Whether to stream the response as server-sent events.Default:
falseOptions for streaming responses.
include_usage: Whether to include usage statistics in streaminclude_obfuscation: Whether to include obfuscation data
Up to 4 sequences where the API will stop generating tokens.
Number between -2.0 and 2.0. Positive values penalize tokens based on their frequency.Default:
0Number between -2.0 and 2.0. Positive values penalize tokens based on their presence.Default:
0Whether to return log probabilities of output tokens.Default:
falseNumber of most likely tokens to return at each position (0-20). Requires
logprobs: true.Modify likelihood of specified tokens appearing. Maps token IDs to bias values (-100 to 100).
A unique identifier for the end-user, for abuse monitoring.
Seed for deterministic sampling. Must be between -9223372036854775808 and 9223372036854775807.
Format of the response. Options:
{"type": "text"}- Plain text response{"type": "json_object"}- Valid JSON object{"type": "json_schema", "json_schema": {...}}- JSON matching schema
List of tools the model can call. Each tool has:
type:"function"or"custom"function: Function definition withname,description, andparameters
Controls which tool is called:
"none": No tool is called"auto": Model decides"required": Model must call a tool- Object: Force specific tool
Whether to enable parallel function calling.Default:
trueLevel of reasoning effort for models that support it. Options:
"minimal", "low", "medium", "high"Advanced reasoning options:
budget_tokens: Token budget for reasoning
Output modalities supported by the model. Currently supports
["text"].Predicted content to optimize latency:
type:"content"content: Predicted message contentreasoning: Optional reasoning text
Context management configuration (Anthropic models only):
enabled: Enable context editingclear_tool_uses: Auto-clear tool call historyclear_thinking: Manage reasoning traces
Whether to store the completion for later use.Default:
falseAdditional metadata to attach to the request.
Service tier for the request. Options:
"auto", "default", "flex", "scale", "priority"Cache control settings:
type:"ephemeral"ttl: Time to live for cached response
Helicone-Specific Parameters
Model Fallbacks
Provide multiple models separated by commas for automatic fallback:Provider Exclusion
Exclude specific providers using the! prefix:
Prompt Integration
Use stored prompts with variable substitution:ID of the Helicone prompt to use
Specific version of the prompt (optional)
Environment for prompt resolution (e.g.,
"production", "staging")Variables to substitute in the prompt template
Plugins
Array of plugins to apply to the request. Each plugin configures additional functionality.
Response Format
Unique identifier for the completion
Object type, always
"chat.completion"Unix timestamp of when the completion was created
The model used for completion
Array of completion choices. Each choice contains:
index: Choice indexmessage: The generated message withroleandcontentfinish_reason: Reason completion stopped ("stop","length","tool_calls", etc.)logprobs: Log probabilities if requested
Token usage statistics:
prompt_tokens: Tokens in the promptcompletion_tokens: Tokens in the completiontotal_tokens: Total tokens used
System fingerprint for the model configuration
Examples
Basic Chat Completion
Streaming Response
Function Calling
Model Fallback
Using Stored Prompts
Response Example
Error Responses
The endpoint returns standard HTTP status codes:400: Invalid request (missing required fields, invalid model, etc.)401: Authentication failed403: Access forbidden (suspended account, etc.)429: Rate limit exceeded or insufficient credits500: Internal server error
Additional Headers
Helicone supports custom headers for enhanced functionality:Helicone-User-Id: Track requests by userHelicone-Session-Id: Group requests into sessionsHelicone-Property-*: Add custom propertiesHelicone-Cache-Enabled: Enable response cachingHelicone-RateLimit-Policy: Apply custom rate limitsHelicone-Prompt-Id: Use a stored prompt
