Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/helicone/helicone/llms.txt

Use this file to discover all available pages before exploring further.

The Requests page is your central hub for monitoring and debugging LLM requests. Every API call flowing through Helicone is captured with complete context, allowing you to trace issues, analyze performance, and understand how your AI application behaves in production.

What’s Captured

For every LLM request, Helicone records:

Request Details

  • Full request body (messages, parameters)
  • Model and provider information
  • Custom properties and metadata
  • User ID and session information

Response Details

  • Complete response body
  • Generated text and function calls
  • Finish reason and stop sequences
  • Token counts and cost

Performance Metrics

  • Total latency (start to finish)
  • Time to first token (TTFT)
  • Tokens per second
  • Request and response timestamps

Metadata

  • Request ID (for reference)
  • HTTP status codes
  • Error messages (if any)
  • Cache hit/miss status

Accessing Requests

Dashboard View

Visit helicone.ai/requests to see all your requests in a table view:
  • Real-time updates: New requests appear automatically
  • Sortable columns: Click column headers to sort by any field
  • Quick filters: Filter by model, status, user, or date range
  • Request drawer: Click any row to see full request details

Request Details Drawer

Click on any request to open a detailed view showing:
View the conversation in a chat-like format:
  • System prompts and instructions
  • User messages with role indicators
  • Assistant responses with streaming indicators
  • Function/tool calls and responses

Filtering Requests

Built-in Filters

Use the dashboard’s filter interface to narrow down requests: Time Range
  • Last hour, day, week, month
  • Custom date range picker
  • Timezone-aware filtering
Model & Provider
  • Filter by specific model (e.g., gpt-4o-mini)
  • Filter by provider (OpenAI, Anthropic, etc.)
  • Include/exclude specific models
Status
  • Success (2xx responses)
  • Client errors (4xx)
  • Server errors (5xx)
  • Specific status codes
User & Properties
  • Filter by user ID
  • Filter by any custom property
  • Combine multiple property filters

Advanced Filtering

For complex queries, use the filter builder:
// Example: Production errors from last 24 hours
{
  "AND": [
    { "status": { "gte": 400 } },
    { "properties.Environment": { "equals": "production" } },
    { "created_at": { "gte": "2024-03-09T14:00:00Z" } }
  ]
}

Querying via API

Retrieve requests programmatically using the REST API:

Basic Query

curl --request POST \
  --url https://api.helicone.ai/v1/request/query-clickhouse \
  --header "Content-Type: application/json" \
  --header "Authorization: Bearer $HELICONE_API_KEY" \
  --data '{
  "filter": {
    "request_response_rmt": {
      "model": {
        "equals": "gpt-4o-mini"
      }
    }
  },
  "limit": 100
}'

Filter by Custom Properties

Important: When filtering by custom properties, you MUST wrap the properties filter inside a request_response_rmt object.
curl --request POST \
  --url https://api.helicone.ai/v1/request/query-clickhouse \
  --header "Content-Type: application/json" \
  --header "Authorization: Bearer $HELICONE_API_KEY" \
  --data '{
  "filter": {
    "request_response_rmt": {
      "properties": {
        "Environment": {
          "equals": "production"
        }
      }
    }
  },
  "limit": 100
}'

Complex Filters

Combine multiple conditions using AND/OR operators:
curl --request POST \
  --url https://api.helicone.ai/v1/request/query-clickhouse \
  --header "Content-Type: application/json" \
  --header "Authorization: Bearer $HELICONE_API_KEY" \
  --data '{
  "filter": {
    "left": {
      "request_response_rmt": {
        "request_created_at": {
          "gte": "2024-03-01T00:00:00Z"
        }
      }
    },
    "operator": "and",
    "right": {
      "left": {
        "request_response_rmt": {
          "model": {
            "equals": "gpt-4o-mini"
          }
        }
      },
      "operator": "and",
      "right": {
        "request_response_rmt": {
          "properties": {
            "Environment": {
              "equals": "production"
            }
          }
        }
      }
    }
  },
  "limit": 1000
}'

Export Large Datasets

For exporting large amounts of data, use the CLI tool:
# Export all requests from last 30 days
HELICONE_API_KEY="your-api-key" \
  npx @helicone/export \
  --start-date 2024-02-01 \
  --limit 100000 \
  --include-body

# Export with property filter to CSV
HELICONE_API_KEY="your-api-key" \
  npx @helicone/export \
  --property Environment=production \
  --format csv \
  --include-body

Common Use Cases

Debug Failed Requests

  1. Filter by status code (4xx or 5xx)
  2. Look for patterns in error messages
  3. Check request parameters and prompts
  4. Verify custom properties (environment, version)
// Add debugging context to every request
const response = await client.chat.completions.create(
  { /* request */ },
  {
    headers: {
      "Helicone-Property-Environment": process.env.NODE_ENV,
      "Helicone-Property-Version": packageJson.version,
      "Helicone-Property-RequestType": "user_chat",
      "Helicone-User-Id": userId
    }
  }
);

Analyze Slow Requests

  1. Sort by latency (descending)
  2. Identify patterns in slow requests
  3. Check prompt length and token counts
  4. Compare across models and providers
// Query slow requests via API
const slowRequests = await fetch('https://api.helicone.ai/v1/request/query-clickhouse', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${HELICONE_API_KEY}`,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    filter: {
      request_response_rmt: {
        latency: { gte: 5000 } // >= 5 seconds
      }
    },
    limit: 100
  })
});

Track User-Specific Issues

  1. Filter by user ID
  2. Review their request history
  3. Check for error patterns
  4. Analyze usage patterns
// Tag all requests with user ID
const response = await client.chat.completions.create(
  { /* request */ },
  {
    headers: {
      "Helicone-User-Id": userId,
      "Helicone-Property-UserTier": userTier,
      "Helicone-Property-Feature": featureName
    }
  }
);

Monitor Cost by Feature

  1. Filter by custom property (e.g., Feature)
  2. Sum costs across requests
  3. Compare costs across features
  4. Identify cost optimization opportunities
// Tag requests by feature
const features = ['chat', 'summarize', 'translate', 'analyze'];

for (const feature of features) {
  await client.chat.completions.create(
    { /* request */ },
    {
      headers: {
        "Helicone-Property-Feature": feature,
        "Helicone-Property-Environment": "production"
      }
    }
  );
}

// Query costs by feature via dashboard or API

Request Metadata

Custom Request IDs

Provide your own request ID for easy reference:
import { randomUUID } from "crypto";

const requestId = randomUUID();

const response = await client.chat.completions.create(
  { /* request */ },
  {
    headers: {
      "Helicone-Request-Id": requestId
    }
  }
);

// Later, query by this ID
const requestDetails = await fetch(
  `https://api.helicone.ai/v1/request/${requestId}`
);

Excluding Sensitive Data

Omit request or response bodies for sensitive data:
const response = await client.chat.completions.create(
  {
    model: "gpt-4o-mini",
    messages: [{ role: "user", content: "Sensitive information..." }]
  },
  {
    headers: {
      "Helicone-Omit-Request": "true",   // Don't log request body
      "Helicone-Omit-Response": "true"   // Don't log response body
    }
  }
);

Performance Metrics

Time to First Token (TTFT)

For streaming requests, Helicone tracks when the first token arrives:
const stream = await client.chat.completions.create(
  {
    model: "gpt-4o-mini",
    messages: [{ role: "user", content: "Write a story..." }],
    stream: true
  },
  {
    headers: {
      "Helicone-Property-Feature": "story_generation"
    }
  }
);

// TTFT is automatically tracked and visible in the dashboard

Latency Analysis

Analyze latency patterns:
  • p50 (median): Typical latency
  • p95: 95th percentile - catches slow outliers
  • p99: 99th percentile - identifies worst-case performance

Sessions

Group related requests into sessions for workflow tracking

Custom Properties

Add metadata to requests for filtering and analysis

User Metrics

Analyze per-user costs and usage patterns

Alerts

Get notified about errors, rate limits, or cost thresholds

Questions?

Need help or have questions? We’re here to help: