> ## Documentation Index > Fetch the complete documentation index at: https://mintlify.com/helicone/helicone/llms.txt > Use this file to discover all available pages before exploring further. # Debugging LLM Applications > Identify errors, diagnose issues, and optimize LLM application performance with Helicone's debugging tools Debugging LLM applications is different from traditional software debugging. Issues can be subtle - wrong responses, inconsistent behavior, or silent failures that only affect quality, not functionality. Helicone provides comprehensive debugging tools to identify, diagnose, and resolve issues in your LLM applications. ## Common LLM Issues API failures, rate limits, timeouts, and provider outages Wrong answers, inconsistent outputs, hallucinations, and context loss Slow responses, high latency, and token inefficiency Unexpected spending, inefficient prompts, and model selection ## Debugging Workflow Start by identifying failed requests using status code filters: Helicone request page showing status code filter for error identification

Helicone request page showing status code filter for error identification

Common status codes: * **200** - Success * **400** - Bad request (malformed input) * **401** - Authentication failed * **429** - Rate limit exceeded * **500** - Provider error * **503** - Provider unavailable ```typescript theme={null} // Add request IDs for easier debugging const requestId = `req-${Date.now()}`; const response = await client.chat.completions.create( { model: "gpt-4o", messages: [...], }, { headers: { "Helicone-Request-Id": requestId, "Helicone-Property-Feature": "document-processing", }, } ); ``` Click on any request to see complete details: Helicone request detail page with full request and response data

Helicone request detail page with full request and response data

Key information available: * **Full request body** - Exact prompt and parameters sent * **Complete response** - What the model returned * **Timing breakdown** - Where latency occurred * **Token usage** - Input/output token counts * **Cost** - Exact cost of this request * **Custom properties** - Your metadata for filtering Test fixes immediately without redeploying code:

Playground button on request detail page

The Playground allows you to: * Modify the prompt and see new results * Change model parameters (temperature, max tokens) * Switch models to compare outputs * Test different approaches quickly Helicone playground interface for testing prompts

Helicone playground interface for testing prompts

Currently, only OpenAI models are supported in the Playground Debug issues in multi-turn conversations by viewing complete sessions: ```typescript theme={null} const sessionId = `session-${userId}-${Date.now()}`; // First request in conversation await client.chat.completions.create( { model: "gpt-4o", messages: [{ role: "user", content: "Hello" }], }, { headers: { "Helicone-Session-Id": sessionId, "Helicone-Session-Name": "Customer Chat", "Helicone-Session-Path": "/greeting", }, } ); // Follow-up request (same session) await client.chat.completions.create( { model: "gpt-4o", messages: conversationHistory, }, { headers: { "Helicone-Session-Id": sessionId, "Helicone-Session-Path": "/follow-up", }, } ); ``` Sessions help you: * See the full conversation context * Identify where context was lost * Track how costs accumulate * Understand user interaction patterns ## Debugging Specific Issues ### API Errors & Rate Limits When you see 429 or 500 errors: ```typescript theme={null} async function makeRequestWithRetry( client: OpenAI, params: any, maxRetries = 3 ) { for (let i = 0; i < maxRetries; i++) { try { return await client.chat.completions.create( params, { headers: { "Helicone-Property-Retry-Attempt": String(i), }, } ); } catch (error: any) { if (error?.status === 429 && i < maxRetries - 1) { // Exponential backoff const delay = Math.pow(2, i) * 1000; await new Promise(resolve => setTimeout(resolve, delay)); continue; } throw error; } } } ``` ```typescript theme={null} // Prevent rate limit errors headers: { "Helicone-RateLimit-Policy": "100;w=60;s=user", // 100 per minute per user } ``` ```typescript theme={null} import { createGateway } from "@ai-sdk/gateway"; const gateway = createGateway({ apiKey: process.env.GATEWAY_API_KEY, baseURL: "https://gateway.helicone.ai/v1", }); // Automatically falls back if primary provider fails const response = await gateway.chat.completions.create({ model: "gpt-4o", messages: [...], }); ``` ### Quality Issues When responses are wrong or inconsistent: Filter requests by custom properties to identify patterns: ```typescript theme={null} headers: { "Helicone-Property-Query-Type": "technical-support", "Helicone-Property-User-Type": "premium", } ``` Then filter in the dashboard to see: * Do technical queries fail more often? * Are premium users having different issues? * Which features have the most quality problems? Tag requests with model versions to compare quality: ```python theme={null} response = client.chat.completions.create( model="gpt-4o", messages=messages, extra_headers={ "Helicone-Property-Prompt-Version": "v2.1", "Helicone-Property-System-Prompt": "technical-assistant" } ) ``` This helps you: * A/B test prompt changes * Track quality regressions * Identify which version works best Add quality scores to track improvements: ```typescript theme={null} // After getting user feedback await fetch(`https://api.helicone.ai/v1/request/${requestId}/score`, { method: "POST", headers: { "Authorization": `Bearer ${HELICONE_API_KEY}`, "Content-Type": "application/json", }, body: JSON.stringify({ scores: { "user-satisfaction": 5, "accuracy": 0.9, "helpfulness": 4, }, }), }); ``` ### Performance Problems When responses are slow: Check the request details for timing breakdown: * **Queue time** - How long before processing started * **Processing time** - Model inference time * **Network time** - Transfer latency ```typescript theme={null} // Add timing metadata const startTime = Date.now(); const response = await client.chat.completions.create( params, { headers: { "Helicone-Property-Client-Start-Time": String(startTime), }, } ); const endTime = Date.now(); console.log(`Total latency: ${endTime - startTime}ms`); ``` Review token counts in request details: ```typescript theme={null} // Reduce max tokens for faster responses const response = await client.chat.completions.create( { model: "gpt-4o", messages: [...], max_tokens: 500, // Limit output length }, { headers: { "Helicone-Property-Max-Tokens": "500", }, } ); ``` Switch to faster models for simple queries: ```typescript theme={null} function selectModel(complexity: string) { switch (complexity) { case "simple": return "gpt-4o-mini"; // Much faster case "complex": return "gpt-4o"; default: return "gpt-4o-mini"; } } ``` ### Cost Overruns When costs are higher than expected: ```typescript theme={null} // Add cost tracking properties headers: { "Helicone-Property-Feature": "document-analysis", "Helicone-Property-Document-Length": String(docLength), "Helicone-Session-Id": sessionId, } ``` Then analyze in the dashboard: 1. **Filter by feature** to find expensive operations 2. **Check session costs** to see complete workflows 3. **Review token usage** to identify inefficient prompts 4. **Compare model costs** to find cheaper alternatives See the [Cost Tracking guide](/guides/cost-tracking) for detailed optimization strategies. ## Advanced Debugging Techniques ### Custom Request IDs Use predictable IDs to correlate with your own logs: ```typescript theme={null} const requestId = `${userId}-${feature}-${timestamp}`; headers: { "Helicone-Request-Id": requestId, } ``` Then search for this ID in both Helicone and your application logs. ### Property-Based Filtering Tag requests with rich metadata for powerful filtering: ```python theme={null} response = client.chat.completions.create( model="gpt-4o", messages=messages, extra_headers={ "Helicone-Property-Environment": os.getenv("ENV"), "Helicone-Property-User-Tier": user.tier, "Helicone-Property-Feature": "search", "Helicone-Property-Version": "v2.3", "Helicone-Property-AB-Test": "prompt-variant-B", } ) ``` Filter combinations like: * "Show me production errors for premium users" * "Compare v2.3 vs v2.2 response times" * "Which A/B test variant has better quality?" ### Session Replay Replay entire sessions to reproduce issues: 1. Find the problematic session in the dashboard 2. Click **"Replay Session"** 3. View the exact sequence of requests 4. Test fixes against the same inputs Session replay is especially useful for debugging multi-turn conversations where context matters. ## Debugging Checklist When investigating an issue: * [ ] Check status codes for obvious errors * [ ] Review request/response in detail * [ ] Test fixes in Playground * [ ] Look at session context if multi-turn * [ ] Filter by custom properties to find patterns * [ ] Compare with working requests * [ ] Check timing breakdown for performance * [ ] Review token usage for cost issues * [ ] Add more logging for future debugging ## Proactive Debugging Prevent issues before they happen: ### Set Up Alerts ```typescript theme={null} // Configure in Helicone dashboard: // 1. Error rate > 5% // 2. Average latency > 2 seconds // 3. Daily cost > $100 // 4. Any 500 errors ``` ### Add Comprehensive Logging ```typescript theme={null} function makeTrackedRequest(feature: string, userId: string, params: any) { return client.chat.completions.create( params, { headers: { "Helicone-Session-Id": `${userId}-${Date.now()}`, "Helicone-Property-Feature": feature, "Helicone-Property-Environment": process.env.NODE_ENV, "Helicone-Property-Version": APP_VERSION, "Helicone-User-Id": userId, }, } ); } ``` ### Monitor Key Metrics Track these metrics weekly: * **Error rate** - Should stay below 2% * **P95 latency** - Should be under 3 seconds * **Average cost per session** - Watch for increases * **Cache hit rate** - Should be above 50% for cacheable content ## Debugging Tools Reference Filter by status, model, properties, and more Track multi-turn conversations and workflows Add metadata for powerful filtering Get notified of issues immediately ## Next Steps Debug complex agent workflows with tool calls Identify and optimize expensive operations A/B test fixes before deploying to production