Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/helicone/helicone/llms.txt

Use this file to discover all available pages before exploring further.

Overview

Helicone alerts monitor your AI application metrics and notify you when specific conditions are met. Set thresholds for cost, latency, errors, and custom metrics to stay informed about your system’s health and prevent issues before they impact users.
Alerts help you:
  • Monitor spending and prevent budget overruns
  • Detect performance degradation early
  • Track error rates and quality issues
  • Ensure SLAs are maintained
  • Get notified of unusual patterns

Key Benefits

Flexible Metrics

Alert on cost, latency, error rates, token usage, and custom properties

Smart Aggregation

Use sum, average, percentile, or count aggregations with time windows

Multi-Channel

Receive notifications via email, Slack, or both

Advanced Filtering

Apply filters to monitor specific users, models, or properties

Alert Types

Cost Alerts

Monitor spending to stay within budget:
{
  "name": "Daily Cost Limit",
  "metric": "cost",
  "threshold": 100.0,
  "aggregation": "sum",
  "time_window": "1d",
  "emails": ["finance@company.com"],
  "slack_channels": ["#ai-budget"]
}

Latency Alerts

Detect performance issues:
{
  "name": "High P95 Latency",
  "metric": "latency",
  "threshold": 5000,
  "aggregation": "p95",
  "percentile": 95,
  "time_window": "1h",
  "emails": ["oncall@company.com"],
  "minimum_request_count": 100
}

Error Rate Alerts

Monitor reliability:
{
  "name": "High Error Rate",
  "metric": "error_rate",
  "threshold": 0.05,  // 5% error rate
  "aggregation": "average",
  "time_window": "15m",
  "slack_channels": ["#incidents"],
  "minimum_request_count": 50
}

Token Usage Alerts

Track token consumption:
{
  "name": "High Token Usage",
  "metric": "total_tokens",
  "threshold": 1000000,
  "aggregation": "sum",
  "time_window": "1h",
  "emails": ["team@company.com"]
}

Creating Alerts

Via API

curl -X POST https://api.helicone.ai/v1/alert/create \
  -H "Authorization: Bearer $HELICONE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Daily Cost Alert",
    "metric": "cost",
    "threshold": 100.0,
    "aggregation": "sum",
    "time_window": "1d",
    "emails": ["admin@company.com"],
    "slack_channels": [],
    "minimum_request_count": 1
  }'

Via Dashboard

1

Navigate to Alerts

Go to your Helicone Dashboard and click Alerts in the sidebar.
2

Create New Alert

Click Create Alert and configure:
  • Alert name
  • Metric to monitor
  • Threshold value
  • Aggregation method
  • Time window
3

Configure Notifications

Add email addresses and/or Slack channels to receive notifications.
4

Set Filters (Optional)

Apply filters to monitor specific segments:
  • Model name
  • User ID
  • Custom properties
  • Request path
5

Save and Activate

Review your configuration and save. The alert becomes active immediately.

Alert Configuration

Metrics

MetricDescriptionUnit
costTotal cost of requestsUSD
latencyRequest latencymilliseconds
error_ratePercentage of failed requestsdecimal (0-1)
total_tokensSum of input and output tokenscount
prompt_tokensInput tokens onlycount
completion_tokensOutput tokens onlycount
request_countNumber of requestscount

Aggregation Methods

MethodDescriptionUse Case
sumTotal value in time windowCost, token usage
averageMean valueError rate, average latency
p5050th percentileMedian latency
p7575th percentileAbove-average latency
p9595th percentileTail latency, outliers
p9999th percentileWorst-case latency
countNumber of occurrencesRequest volume

Time Windows

WindowFormatUse Case
5 minutes5mReal-time monitoring
15 minutes15mShort-term trends
1 hour1hHourly budgets
6 hours6hBusiness hours
1 day1dDaily budgets
1 week7dWeekly planning

Advanced Configuration

Grouping

Group alerts by dimension to get per-segment notifications:
{
  "name": "Cost per User",
  "metric": "cost",
  "threshold": 10.0,
  "aggregation": "sum",
  "time_window": "1d",
  "grouping": "user",
  "emails": ["admin@company.com"]
}
Supported grouping:
  • user - Alert per user ID
  • model - Alert per model
  • Custom properties (e.g., team, environment)

Minimum Request Count

Avoid false positives from low traffic:
{
  "name": "High Latency Alert",
  "metric": "latency",
  "threshold": 3000,
  "aggregation": "p95",
  "percentile": 95,
  "time_window": "1h",
  "minimum_request_count": 100,  // Only alert if 100+ requests
  "emails": ["sre@company.com"]
}

Filters

Monitor specific segments using filter expressions:
{
  "name": "Production Cost Alert",
  "metric": "cost",
  "threshold": 200.0,
  "aggregation": "sum",
  "time_window": "1d",
  "filter": {
    "properties": {
      "environment": "production"
    }
  },
  "emails": ["ops@company.com"]
}

Notification Channels

Email Notifications

Add one or more email addresses:
{
  "emails": [
    "admin@company.com",
    "team@company.com",
    "oncall@company.com"
  ]
}
Email format:
Subject: [Helicone Alert] Daily Cost Limit Exceeded

Your alert "Daily Cost Limit" has been triggered.

Metric: cost
Threshold: $100.00
Actual Value: $127.45
Time Window: 1 day
Time: 2024-03-10 14:32:00 UTC

View details: https://us.helicone.ai/alerts/alert_123

Slack Notifications

Connect Slack workspace and specify channels:
{
  "slack_channels": [
    "#alerts",
    "#engineering",
    "#incidents"
  ]
}
Setup:
  1. Install Helicone Slack app in your workspace
  2. Invite the bot to desired channels: /invite @Helicone
  3. Use channel names in alert configuration

Managing Alerts

List Alerts

curl https://api.helicone.ai/v1/alert/query \
  -H "Authorization: Bearer $HELICONE_API_KEY"
Response:
{
  "data": {
    "alerts": [
      {
        "id": "alert_123",
        "name": "Daily Cost Alert",
        "metric": "cost",
        "threshold": 100.0,
        "status": "active",
        "created_at": "2024-03-10T10:00:00Z"
      }
    ],
    "history": [
      {
        "id": "history_456",
        "alert_id": "alert_123",
        "alert_name": "Daily Cost Alert",
        "status": "triggered",
        "triggered_value": "127.45",
        "alert_start_time": "2024-03-10T14:32:00Z",
        "alert_end_time": null
      }
    ]
  }
}

Delete Alert

curl -X DELETE https://api.helicone.ai/v1/alert/{alertId} \
  -H "Authorization: Bearer $HELICONE_API_KEY"

Common Alert Patterns

Set multiple cost alerts with increasing urgency:
// Warning at 80% of budget
{ threshold: 800, emails: ["team@company.com"] }

// Critical at 95% of budget
{ threshold: 950, emails: ["admin@company.com"], slack_channels: ["#critical"] }

// Emergency at 100% of budget
{ threshold: 1000, emails: ["ceo@company.com"], slack_channels: ["#emergency"] }
Track P95 latency to ensure performance SLAs:
{
  "name": "SLA Breach - P95 Latency",
  "metric": "latency",
  "threshold": 2000,  // 2 second SLA
  "aggregation": "p95",
  "percentile": 95,
  "time_window": "5m",
  "minimum_request_count": 20
}
Alert on expensive models separately:
{
  "name": "GPT-4 Daily Cost",
  "metric": "cost",
  "threshold": 50.0,
  "aggregation": "sum",
  "time_window": "1d",
  "filter": {
    "request": {
      "model": { "equals": "gpt-4" }
    }
  }
}
Track per-user usage:
{
  "name": "User Quota Alert",
  "metric": "request_count",
  "threshold": 1000,
  "aggregation": "count",
  "time_window": "1d",
  "grouping": "user",
  "grouping_is_property": false
}

Best Practices

  1. Set meaningful thresholds: Base thresholds on historical data and business requirements
  2. Use minimum request counts: Avoid noise from low-traffic periods
  3. Layer alerts: Create warning, critical, and emergency tiers
  4. Monitor trends: Use longer time windows to catch gradual increases
  5. Test alerts: Verify notification delivery before relying on alerts
  6. Document runbooks: Include action items for each alert type
  7. Review regularly: Adjust thresholds as usage patterns change

Troubleshooting

  • Verify email addresses and Slack channels are correct
  • Check spam folders for email notifications
  • Ensure Helicone bot is in Slack channels
  • Confirm alert is active and not deleted
  • Increase minimum_request_count to filter low-traffic noise
  • Adjust threshold based on normal variance
  • Use longer time windows for smoother trends
  • Add filters to focus on relevant requests
  • Lower threshold to catch issues earlier
  • Use shorter time windows for faster detection
  • Remove minimum_request_count if appropriate
  • Verify filters aren’t excluding relevant data

Webhooks

Build custom notification systems with real-time webhooks

Cost Tracking

Analyze spending patterns and optimize costs