While individual requests show you what happened, metrics reveal patterns and trends across your entire LLM application. Helicone aggregates data from all your requests to provide actionable insights about performance, costs, usage, and quality.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/helicone/helicone/llms.txt
Use this file to discover all available pages before exploring further.
Key Metrics Categories
Usage Metrics
- Request volume over time
- Active users (daily, weekly, monthly)
- Requests per user
- Model usage distribution
- Provider distribution
Performance Metrics
- Latency percentiles (p50, p95, p99)
- Time to first token (TTFT)
- Throughput (requests/second)
- Error rates by model and provider
- Cache hit rates
Cost Metrics
- Total spend over time
- Cost per user
- Cost per feature/workflow
- Cost by model and provider
- Token usage and costs
Quality Metrics
- Success rate (2xx vs errors)
- User feedback scores
- Retry rates
- Session completion rates
- Average response length
Dashboard Overview
The Helicone dashboard provides real-time metrics visualization at helicone.ai/dashboard:High-Level Metrics
At the top of your dashboard, see your key metrics at a glance:- Total Requests: Request count for the selected time period
- Total Cost: Cumulative cost across all requests
- Average Latency: Mean latency across all requests
- Error Rate: Percentage of failed requests (4xx/5xx)
- Active Users: Unique users making requests
Time-Series Graphs
Visualize trends over time:- Requests Over Time: See usage patterns and identify spikes
- Cost Over Time: Track spending trends and budget
- Latency Over Time: Monitor performance degradation
- Errors Over Time: Identify reliability issues
Breakdowns
Understand your usage composition:- By Model: Which models are used most
- By Provider: OpenAI, Anthropic, Google, etc.
- By User: Top users by request count or cost
- By Property: Custom property breakdowns (environment, feature, etc.)
Request Metrics
Volume & Distribution
Track how many requests you’re making:Latency Analysis
Understand request performance: Percentiles explained:- p50 (median): Half of requests are faster, half are slower
- p95: 95% of requests are faster - identifies slow outliers
- p99: 99% of requests are faster - catches worst-case scenarios
- p50 increasing: Overall performance degrading
- p95/p99 spikes: Some requests becoming very slow
- Large p99-p50 gap: Inconsistent performance
Time to First Token (TTFT)
For streaming requests, TTFT measures perceived responsiveness:- Lower TTFT = faster perceived response
- Critical for chat interfaces
- Varies significantly by model
Error Rates
Track request failures:- 429 (Rate Limit): Hitting provider rate limits
- 400 (Bad Request): Invalid request parameters
- 500 (Server Error): Provider outages
- 503 (Service Unavailable): Provider capacity issues
Session Metrics
For workflows using sessions, track aggregate session metrics:Session Performance
Session Cost Analysis
Understand the cost of complete workflows:- Total session cost: Sum of all requests in the session
- Cost distribution: Which parts of the workflow are most expensive
- Cost per success: Total cost divided by successful sessions
User Metrics
Analyze per-user behavior and costs:User Activity
User Segmentation
Group users by behavior:- Power Users: Top 10% by request volume
- Active Users: Made requests in last 7 days
- New Users: First request in last 30 days
- At-Risk Users: Declining usage patterns
User Costs
Track spending per user:Cost Metrics
Total Spend
Track your LLM spending over time:Cost by Model
Understand which models drive costs:Cost by Custom Property
Segment costs by any dimension:Token Usage
Track token consumption:Performance Optimization
Identifying Slow Requests
Use metrics to find performance bottlenecks:- Sort by latency: Find slowest requests
- Check patterns: Do slow requests share characteristics?
- Analyze prompts: Are slow requests using longer prompts?
- Compare models: Are certain models consistently slower?
Cache Hit Rate
Track cache effectiveness:- High hit rate (>30%): Cache working well
- Low hit rate (<10%): Review cache strategy
- Consider increasing cache bucket size
- Check cache TTL settings
Cost Optimization
Model Selection
Compare costs across models:- Use cheaper models for simple tasks
- Reserve expensive models for complex tasks
- A/B test model quality vs cost
- Implement model fallbacks
Prompt Optimization
Reduce token usage:- Shorten system prompts
- Remove redundant instructions
- Use fewer examples in few-shot prompts
- Implement prompt compression
Feature Cost Analysis
Identify expensive features:Custom Metric Tracking
Add custom properties to enable rich analytics:Alerts & Monitoring
Set up alerts based on metrics:Cost Alerts
- Daily spend exceeds threshold
- User spend exceeds limit
- Unusual cost spike detected
Performance Alerts
- Latency p95 exceeds threshold
- Error rate exceeds threshold
- TTFT degradation detected
Usage Alerts
- Request rate spike
- Unusual traffic pattern
- Provider rate limit approaching
Exporting Metrics
API Export
Export metrics for external analysis:Data Warehouse Integration
Integrate with your data warehouse:- Export data via API
- Load into your warehouse (Snowflake, BigQuery, etc.)
- Join with your business data
- Build custom dashboards
Best Practices
Metric Collection
✅ Do:- Tag all requests with custom properties for rich segmentation
- Use consistent property names across your application
- Track both business and technical metrics
- Set up alerts for critical metrics
- Collect metrics without acting on them
- Use inconsistent property names
- Ignore low-level metrics (they reveal patterns)
- Wait for issues to become critical
Metric Analysis
✅ Do:- Review metrics weekly to identify trends
- Compare across time periods (week-over-week, month-over-month)
- Segment by user cohorts and features
- Look for correlations between metrics
- Look at metrics in isolation
- Ignore gradual degradation
- Focus only on averages (check percentiles too)
- Optimize prematurely without data
Related Features
Requests
Drill down into individual requests from metrics
Custom Properties
Add dimensions for richer metric analysis
User Metrics
Deep dive into user-level analytics
Alerts
Set up alerts based on metric thresholds
Questions?
Need help or have questions? We’re here to help:- Discord Community: Join our Discord server for quick help
- GitHub Issues: Report bugs or request features on GitHub
- Documentation: Check our full documentation for more guides
