Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/helicone/helicone/llms.txt

Use this file to discover all available pages before exploring further.

Now that your requests are flowing through Helicone, let’s explore the platform architecture and understand how everything works together.

Architecture Overview

Helicone is comprised of five core services that work together to provide AI Gateway and observability capabilities:

Core Components

Worker (Cloudflare Workers)

The Worker is the edge proxy that intercepts and routes all LLM requests. It’s deployed globally on Cloudflare’s edge network for minimal latency.Key responsibilities:
  • Route requests to 100+ LLM providers
  • Apply intelligent fallbacks when providers fail
  • Enforce rate limits and caching rules
  • Log request metadata to Jawn
  • Add <50ms overhead on average
Code location: /workerHow it works:
// Simplified worker flow
export default {
  async fetch(request: Request) {
    // 1. Parse Helicone headers
    const heliconeAuth = request.headers.get("Helicone-Auth");
    
    // 2. Determine target provider and model
    const { provider, model } = parseModel(body.model);
    
    // 3. Route to correct provider
    const providerResponse = await routeToProvider({
      provider,
      model,
      request,
    });
    
    // 4. Log request/response to Jawn
    await logToJawn({
      request,
      response: providerResponse,
      metadata: { cost, tokens, latency },
    });
    
    // 5. Return provider response
    return providerResponse;
  },
};
Deployment:
  • Hosted on Cloudflare Workers (cloud)
  • Runs locally with wrangler dev (self-hosted)

Request Flow

Let’s trace a complete request through the system:
1

User sends LLM request

Your application sends a request to the Helicone AI Gateway:
const response = await openai.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: "Hello!" }],
});
2

Worker routes to provider

The Worker (Cloudflare edge):
  • Authenticates the request
  • Determines target provider (OpenAI)
  • Applies any caching or rate limit rules
  • Forwards request to OpenAI
  • Receives response
3

Provider responds

OpenAI processes the request and returns the completion.The Worker:
  • Calculates cost and token usage
  • Streams response back to user (if streaming)
  • Logs request metadata to Jawn
4

Jawn stores data

Jawn receives the log and:
  • Stores metadata in Supabase (request ID, user, timestamp)
  • Writes metrics to ClickHouse (cost, tokens, latency)
  • Uploads request/response to MinIO (full bodies)
5

Dashboard displays

The Web dashboard:
  • Queries Jawn API for request list
  • Displays in real-time (<5 seconds)
  • Loads full details on click from MinIO

Key Features Enabled by Architecture

Edge Routing

Worker on Cloudflare edge means:
  • <50ms latency overhead
  • Global availability
  • Automatic failover
  • DDoS protection

Scalable Analytics

ClickHouse columnar storage:
  • Query millions of requests
  • Real-time aggregations
  • Cost-effective at scale
  • Sub-second dashboard loads

Flexible Storage

Tiered storage approach:
  • Hot data in Supabase
  • Analytics in ClickHouse
  • Bodies in S3/MinIO
  • Optimized costs

Multi-Provider

Worker intelligence:
  • 100+ provider integrations
  • Automatic fallbacks
  • Smart load balancing
  • Unified observability

Self-Hosting Options

Helicone is fully open source and can be self-hosted in multiple ways:

All-in-One Docker Container

Run everything in a single container for testing or small deployments:
docker pull helicone/helicone-all-in-one:latest
docker run -d \
  --name helicone \
  -p 3000:3000 \
  -p 8585:8585 \
  -p 9080:9080 \
  helicone/helicone-all-in-one:latest
Ports:
  • 3000 - Web dashboard
  • 8585 - Jawn API + LLM proxy
  • 9080 - MinIO S3 storage
Perfect for:
  • Local development
  • Testing integrations
  • Small teams

Development Setup

Want to contribute or customize Helicone? Here’s the local development setup:
  • Docker - For infrastructure (Postgres, ClickHouse, MinIO)
  • Node.js 20+ - Use nvm to manage versions
  • Yarn - Package manager
  • Supabase CLI - For database management
  • Wrangler - For Cloudflare Worker development
# Clone the repo
git clone https://github.com/Helicone/helicone.git
cd helicone

# Install dependencies
nvm install 20 && nvm use 20
npm install -g yarn wrangler
yarn install

# Set up environment
cp .env.example .env
supabase start

# Start infrastructure
cd docker
docker compose -f docker-compose-local.yml up -d

# Start services (3 terminals)
cd valhalla/jawn && yarn dev    # Terminal 1
cd web && yarn dev:local         # Terminal 2
cd worker && yarn dev            # Terminal 3
Access:
helicone/
├── web/                    # Next.js dashboard
├── worker/                 # Cloudflare Worker proxy
├── valhalla/jawn/          # API server
├── packages/
│   ├── cost/              # Cost calculation registry
│   ├── llm-mapper/        # Provider mappings
│   └── prompts/           # Prompt templates
├── supabase/              # Database migrations
├── clickhouse/            # Analytics schema
├── docker/                # Docker configs
├── examples/              # Integration examples
└── docs/                  # This documentation

Security & Compliance

SOC 2 Compliant

Type II certified for cloud hosting. Enterprise-grade security controls and annual audits.

GDPR Compliant

Full GDPR compliance with data residency options and user data controls.

Data Encryption

  • TLS 1.3 in transit
  • AES-256 at rest
  • Encrypted backups

Data Ownership

  • You own your data
  • Export anytime via API
  • Self-host for complete control

Performance Characteristics

Edge routing keeps overhead minimal:
  • p50: <25ms added latency
  • p95: <50ms added latency
  • p99: <100ms added latency
The Worker runs on Cloudflare’s edge (300+ locations), so requests are routed from the nearest location to the provider.

What’s Next?

Now that you understand how Helicone works, explore the features:

Sessions & Agent Debugging

Track multi-step AI workflows with session trees

Gateway Fallbacks

Configure automatic failover when providers go down

Prompt Management

Version control and deploy prompts without code

Cost Tracking

Understand your LLM economics by user or feature

Questions?

Join our Discord or email help@helicone.ai for support.