Documentation Index Fetch the complete documentation index at: https://mintlify.com/helicone/helicone/llms.txt
Use this file to discover all available pages before exploring further.
Build a production-ready customer support assistant that automatically selects the right AI model for each query, optimizing both quality and cost. This tutorial uses Vercel AI SDK for model access and Helicone for monitoring.
What You’ll Build
A customer support system that:
Classifies query complexity using fast, cheap models
Routes to appropriate models based on complexity
Caches responses to reduce costs
Tracks everything in Helicone for analysis and optimization
Prerequisites
Setup
Install Dependencies
Create a new project and install required packages: mkdir support-assistant
cd support-assistant
npm init -y
npm install @ai-sdk/gateway ai zod
Configure Environment
Create a .env file with your API keys: VERCEL_AI_GATEWAY_API_KEY = your_vercel_key
HELICONE_API_KEY = sk-your-helicone-key
OPENAI_API_KEY = sk-your-openai-key
ANTHROPIC_API_KEY = sk-your-anthropic-key
Initialize Gateway with Helicone
Set up the AI Gateway to route all requests through Helicone for monitoring: import { createGateway } from '@ai-sdk/gateway' ;
import { generateText , tool } from 'ai' ;
import { z } from 'zod' ;
const gateway = createGateway ({
apiKey: process . env . VERCEL_AI_GATEWAY_API_KEY ! ,
baseURL: 'https://gateway.helicone.ai/v1' ,
headers: {
'Helicone-Auth' : `Bearer ${ process . env . HELICONE_API_KEY } ` ,
},
});
Implementation
Step 1: Query Classification
Use a small, fast model with tool calling for precise classification:
import { tool } from 'ai' ;
import { z } from 'zod' ;
const classifyTool = tool ({
description: 'Classify a customer support query by complexity' ,
parameters: z . object ({
complexity: z . enum ([ 'simple' , 'complex' , 'technical' ]). describe (
'simple: Basic questions about account, passwords, features. ' +
'complex: Refunds, complaints, escalations, urgent issues. ' +
'technical: API errors, integration issues, code problems.'
),
reasoning: z . string (). describe ( 'Brief explanation for the classification' ),
urgency: z . enum ([ 'low' , 'medium' , 'high' ]). describe ( 'How urgent is this query?' ),
}),
});
async function classifyQuery ( query : string ) {
const result = await generateText ({
model: gateway ( 'openai/gpt-4o-mini' ), // Fast and cheap for classification
tools: {
classify: classifyTool ,
},
toolChoice: 'required' ,
prompt: `Classify this customer support query: " ${ query } "` ,
headers: {
'Helicone-Property-Stage' : 'classification' ,
'Helicone-Property-Tool' : 'query-classifier' ,
},
});
const toolCall = result . toolCalls [ 0 ];
return {
complexity: toolCall . args . complexity as 'simple' | 'complex' | 'technical' ,
reasoning: toolCall . args . reasoning ,
urgency: toolCall . args . urgency ,
};
}
Step 2: Model Selection Strategy
Route queries to the most appropriate model:
function selectModel ( complexity : string , urgency : string ) {
// High urgency or technical issues get the best model
if ( urgency === 'high' || complexity === 'technical' ) {
return gateway ( 'anthropic/claude-3.5-sonnet' );
}
// Complex issues get GPT-4o
if ( complexity === 'complex' ) {
return gateway ( 'openai/gpt-4o' );
}
// Simple queries use the cheapest model
return gateway ( 'openai/gpt-4o-mini' );
}
function getModelName ( complexity : string , urgency : string ) : string {
if ( urgency === 'high' || complexity === 'technical' ) {
return 'claude-3.5-sonnet' ;
}
if ( complexity === 'complex' ) {
return 'gpt-4o' ;
}
return 'gpt-4o-mini' ;
}
Step 3: Handle Support Tickets
Process tickets with full tracing:
interface SupportTicket {
id : string ;
customerId : string ;
query : string ;
priority : 'low' | 'medium' | 'high' ;
}
async function processSupportTicket ( ticket : SupportTicket ) {
const sessionId = `ticket- ${ ticket . id } ` ;
// Step 1: Classify the query
const classification = await classifyQuery ( ticket . query );
console . log ( `Query classified as ${ classification . complexity } ( ${ classification . reasoning } )` );
// Step 2: Select appropriate model
const model = selectModel ( classification . complexity , classification . urgency );
const modelName = getModelName ( classification . complexity , classification . urgency );
// Step 3: Generate response with caching
try {
const response = await generateText ({
model ,
messages: [
{
role: 'system' ,
content: `You are a customer support agent for TechCorp.
Priority: ${ ticket . priority } .
Query complexity: ${ classification . complexity } .
Be helpful, professional, and concise. Always:
- Acknowledge the customer's issue
- Provide clear solutions
- Offer to escalate if needed
- Include relevant documentation links`
},
{
role: 'user' ,
content: ticket . query
}
],
temperature: 0 , // Deterministic for better caching
maxTokens: 500 ,
headers: {
// Session tracking
'Helicone-Session-Id' : sessionId ,
'Helicone-Session-Name' : `Support Ticket ${ ticket . id } ` ,
'Helicone-Session-Path' : '/response-generation' ,
// Metadata for analysis
'Helicone-User-Id' : ticket . customerId ,
'Helicone-Property-Ticket-Id' : ticket . id ,
'Helicone-Property-Priority' : ticket . priority ,
'Helicone-Property-Complexity' : classification . complexity ,
'Helicone-Property-Urgency' : classification . urgency ,
'Helicone-Property-Model' : modelName ,
// Enable caching
'Helicone-Cache-Enabled' : 'true' ,
'Helicone-Cache-Bucket-Max-Size' : '100' ,
'Helicone-Cache-Seed' : 'support-v1' ,
},
});
return {
ticketId: ticket . id ,
response: response . text ,
model: modelName ,
complexity: classification . complexity ,
reasoning: classification . reasoning ,
usage: response . usage ,
};
} catch ( error ) {
console . error ( 'Support ticket processing failed:' , error );
// Log error to Helicone
await generateText ({
model: gateway ( 'openai/gpt-4o-mini' ),
prompt: `Error processing ticket ${ ticket . id } : ${ error } ` ,
headers: {
'Helicone-Session-Id' : sessionId ,
'Helicone-Property-Error' : 'true' ,
'Helicone-Property-Ticket-Id' : ticket . id ,
},
});
throw error ;
}
}
Step 4: Add Retry Logic
Handle failures gracefully:
async function processSupportTicketWithRetry (
ticket : SupportTicket ,
maxRetries = 2
) {
for ( let attempt = 0 ; attempt <= maxRetries ; attempt ++ ) {
try {
return await processSupportTicket ( ticket );
} catch ( error ) {
if ( attempt === maxRetries ) {
// Final attempt failed, return fallback response
return {
ticketId: ticket . id ,
response: "I apologize, but I'm experiencing technical difficulties. Your ticket has been escalated to a human agent who will respond within 24 hours." ,
model: 'fallback' ,
complexity: 'error' ,
reasoning: 'Processing failed' ,
usage: null ,
};
}
// Wait before retrying (exponential backoff)
await new Promise ( resolve =>
setTimeout ( resolve , Math . pow ( 2 , attempt ) * 1000 )
);
}
}
}
Complete Example
Put it all together:
import { createGateway } from '@ai-sdk/gateway' ;
import { generateText , tool } from 'ai' ;
import { z } from 'zod' ;
const gateway = createGateway ({
apiKey: process . env . VERCEL_AI_GATEWAY_API_KEY ! ,
baseURL: 'https://gateway.helicone.ai/v1' ,
headers: {
'Helicone-Auth' : `Bearer ${ process . env . HELICONE_API_KEY } ` ,
},
});
// Example usage
async function main () {
const tickets : SupportTicket [] = [
{
id: 'TICKET-001' ,
customerId: 'CUST-789' ,
query: 'How do I reset my password?' ,
priority: 'low' ,
},
{
id: 'TICKET-002' ,
customerId: 'CUST-456' ,
query: 'I need a refund immediately. This is unacceptable!' ,
priority: 'high' ,
},
{
id: 'TICKET-003' ,
customerId: 'CUST-123' ,
query: 'Getting 401 errors when calling /api/v2/users endpoint with valid auth token' ,
priority: 'medium' ,
},
];
for ( const ticket of tickets ) {
console . log ( ` \n\n Processing ticket ${ ticket . id } ...` );
const result = await processSupportTicketWithRetry ( ticket );
console . log ( `Model: ${ result . model } ` );
console . log ( `Complexity: ${ result . complexity } ` );
console . log ( `Response: ${ result . response } ` );
if ( result . usage ) {
console . log ( `Tokens: ${ result . usage . totalTokens } ` );
}
}
}
main (). catch ( console . error );
Monitor in Helicone
Once your assistant is running, view performance in your Helicone dashboard :
Filter by Complexity
Filter requests by Complexity property to see:
Average response time by complexity
Cost per complexity tier
Which models handle which query types
Cache hit rates
Session View
Click on any ticket ID to see the complete flow:
Classification request (cheap, fast)
Response generation (model selected based on complexity)
Any retry attempts
Total cost for the entire ticket
Cost Analysis
Compare costs across complexity tiers:
Simple queries (gpt-4o-mini):
Average: $0.0002 per query
80% cache hit rate
Effective cost: $0.00004
Complex queries (gpt-4o):
Average: $0.002 per query
40% cache hit rate
Effective cost: $0.0012
Technical queries (claude-3.5-sonnet):
Average: $0.003 per query
20% cache hit rate
Effective cost: $0.0024
Optimization Tips
Monitor which queries are misclassified: headers : {
'Helicone-Property-User-Satisfaction' : userRating ,
'Helicone-Property-Correct-Classification' : wasCorrect ? 'yes' : 'no' ,
}
Then filter for incorrect classifications to improve your classifier.
Use temperature 0 and consistent prompts: temperature : 0 ,
headers : {
'Helicone-Cache-Enabled' : 'true' ,
'Helicone-Cache-Seed' : 'support-v1' , // Increment when changing prompts
}
Collect user ratings to track quality: // After user rates response
await fetch ( `https://api.helicone.ai/v1/request/ ${ requestId } /score` , {
method: 'POST' ,
headers: {
'Authorization' : `Bearer ${ HELICONE_API_KEY } ` ,
'Content-Type' : 'application/json' ,
},
body: JSON . stringify ({
scores: {
'user-rating' : rating ,
'resolved-issue' : resolved ? 1 : 0 ,
},
}),
});
Prevent abuse and control costs: headers : {
'Helicone-RateLimit-Policy' : '100;w=3600;s=user' , // 100/hour per user
}
Production Checklist
Before deploying:
Next Steps
Cost Tracking Deep dive into cost optimization strategies
Agent Tracing Track more complex agent workflows
Structured Outputs Add function calling for tool use
Caching Guide Maximize cache hit rates