Add Scores to Request
curl --request POST \
--url https://api.helicone.ai/v1/request/{requestId}/score \
--header 'Content-Type: application/json' \
--data '{
"scores": {}
}'{
"data": null,
"error": {}
}Requests
Add Scores to Request
Add evaluation scores to a request for detailed quality metrics
POST
/
v1
/
request
/
{requestId}
/
score
Add Scores to Request
curl --request POST \
--url https://api.helicone.ai/v1/request/{requestId}/score \
--header 'Content-Type: application/json' \
--data '{
"scores": {}
}'{
"data": null,
"error": {}
}Add evaluation scores to a request to track detailed quality metrics beyond simple thumbs up/down feedback. Scores allow you to measure specific dimensions of LLM outputs like accuracy, relevance, helpfulness, and custom evaluation criteria.
Scores support both integer and boolean values. Integer scores are stored as-is, while boolean values are converted to 1 (true) or 0 (false).
Path Parameters
The unique identifier of the request to add scores to. This can be found in the
Helicone-Id response header when making requests through Helicone.Example: req_abc123def456Request Body
An object containing score key-value pairs. Each key is the score name, and each value is either an integer or boolean.Supported value types:
number- Must be an integer (floats are not supported)boolean- Converted to 1 (true) or 0 (false)
{
"scores": {
"accuracy": 95,
"relevance": 87,
"helpfulness": 92,
"has_citations": true,
"is_factual": true
}
}
Response
Returns null on success.
Error message if the request failed.
Examples
Add Basic Scores
Add quality scores to a request:cURL
curl --request POST \
--url https://api.helicone.ai/v1/request/req_abc123def456/score \
--header 'Authorization: Bearer <HELICONE_API_KEY>' \
--header 'Content-Type: application/json' \
--data '{
"scores": {
"accuracy": 95,
"relevance": 87,
"helpfulness": 92
}
}'
TypeScript
const requestId = 'req_abc123def456';
const response = await fetch(
`https://api.helicone.ai/v1/request/${requestId}/score`,
{
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.HELICONE_API_KEY}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
scores: {
accuracy: 95,
relevance: 87,
helpfulness: 92
}
})
}
);
const result = await response.json();
console.log('Scores added successfully');
Python
import os
import requests
request_id = "req_abc123def456"
response = requests.post(
f"https://api.helicone.ai/v1/request/{request_id}/score",
headers={
"Authorization": f"Bearer {os.environ['HELICONE_API_KEY']}",
"Content-Type": "application/json"
},
json={
"scores": {
"accuracy": 95,
"relevance": 87,
"helpfulness": 92
}
}
)
result = response.json()
print("Scores added successfully")
Add Mixed Score Types
Combine integer and boolean scores:cURL
curl --request POST \
--url https://api.helicone.ai/v1/request/req_abc123def456/score \
--header 'Authorization: Bearer <HELICONE_API_KEY>' \
--header 'Content-Type: application/json' \
--data '{
"scores": {
"overall_quality": 88,
"coherence": 92,
"has_citations": true,
"is_factual": true,
"contains_errors": false
}
}'
TypeScript
const response = await fetch(
`https://api.helicone.ai/v1/request/${requestId}/score`,
{
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.HELICONE_API_KEY}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
scores: {
overall_quality: 88,
coherence: 92,
has_citations: true,
is_factual: true,
contains_errors: false
}
})
}
);
Use Cases
Automated LLM-as-Judge Evaluation
Use an LLM to evaluate another LLM’s output:import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'https://gateway.helicone.ai/v1',
defaultHeaders: {
'Helicone-Auth': `Bearer ${process.env.HELICONE_API_KEY}`
}
});
// Make the initial request
const { data, response } = await client.chat.completions
.create({
model: 'gpt-4',
messages: [{ role: 'user', content: 'Explain quantum computing' }]
})
.withResponse();
const requestId = response.headers.get('helicone-id');
const responseText = data.choices[0].message.content;
// Use GPT-4 to evaluate the response
const evaluation = await client.chat.completions.create({
model: 'gpt-4',
messages: [
{
role: 'system',
content: `Evaluate this response on a scale of 0-100 for:
- Accuracy: How factually correct is the information?
- Clarity: How easy is it to understand?
- Completeness: Does it fully answer the question?
Return ONLY a JSON object with these scores.`
},
{
role: 'user',
content: `Response to evaluate: ${responseText}`
}
],
response_format: { type: 'json_object' }
});
const scores = JSON.parse(evaluation.choices[0].message.content);
// Add scores to the original request
await fetch(
`https://api.helicone.ai/v1/request/${requestId}/score`,
{
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.HELICONE_API_KEY}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({ scores })
}
);
Human Evaluation Workflow
Collect detailed human evaluations:interface EvaluationForm {
accuracy: number;
relevance: number;
helpfulness: number;
tone: number;
followedInstructions: boolean;
containedErrors: boolean;
}
const submitEvaluation = async (
requestId: string,
evaluation: EvaluationForm
) => {
await fetch(
`https://api.helicone.ai/v1/request/${requestId}/score`,
{
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.HELICONE_API_KEY}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
scores: {
accuracy: evaluation.accuracy,
relevance: evaluation.relevance,
helpfulness: evaluation.helpfulness,
tone: evaluation.tone,
followed_instructions: evaluation.followedInstructions,
contained_errors: evaluation.containedErrors
}
})
}
);
};
// Usage in evaluation UI
const handleEvaluationSubmit = async (formData: EvaluationForm) => {
await submitEvaluation('req_abc123def456', formData);
console.log('Evaluation submitted successfully');
};
Automated Quality Checks
Implement automated quality scoring:const evaluateResponse = (responseText: string) => {
const scores = {
word_count: responseText.split(' ').length,
has_citations: /\[\d+\]|\(\d{4}\)/.test(responseText),
has_code_examples: responseText.includes('```'),
starts_with_greeting: /^(hello|hi|hey)/i.test(responseText),
exceeds_min_length: responseText.length > 100,
contains_markdown: /[#*_`]/.test(responseText)
};
return scores;
};
const scoreRequest = async (requestId: string, responseText: string) => {
const scores = evaluateResponse(responseText);
await fetch(
`https://api.helicone.ai/v1/request/${requestId}/score`,
{
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.HELICONE_API_KEY}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({ scores })
}
);
};
Multi-Criteria RAG Evaluation
Evaluate RAG (Retrieval-Augmented Generation) systems:const evaluateRAG = async (
requestId: string,
response: string,
context: string[],
userQuery: string
) => {
// Evaluate different aspects of RAG quality
const scores = {
// Answer quality
answer_relevance: await scoreAnswerRelevance(response, userQuery),
answer_completeness: await scoreCompleteness(response, userQuery),
// Context quality
context_relevance: await scoreContextRelevance(context, userQuery),
context_precision: await scoreContextPrecision(context, response),
// Faithfulness
faithfulness: await scoreFaithfulness(response, context),
has_hallucinations: await detectHallucinations(response, context),
// Additional checks
uses_all_context: checkContextUsage(response, context),
citations_provided: response.includes('[') && response.includes(']')
};
await fetch(
`https://api.helicone.ai/v1/request/${requestId}/score`,
{
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.HELICONE_API_KEY}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({ scores })
}
);
return scores;
};
Comparative Evaluation (A/B Testing)
Compare different models or prompts:const runComparison = async (userQuery: string) => {
// Test variant A
const { data: dataA, response: responseA } = await client.chat.completions
.create(
{
model: 'gpt-4',
messages: [{ role: 'user', content: userQuery }],
temperature: 0.7
},
{
headers: {
'Helicone-Property-Variant': 'A',
'Helicone-Property-Temperature': '0.7'
}
}
)
.withResponse();
const requestIdA = responseA.headers.get('helicone-id');
// Test variant B
const { data: dataB, response: responseB } = await client.chat.completions
.create(
{
model: 'gpt-4',
messages: [{ role: 'user', content: userQuery }],
temperature: 0.3
},
{
headers: {
'Helicone-Property-Variant': 'B',
'Helicone-Property-Temperature': '0.3'
}
}
)
.withResponse();
const requestIdB = responseB.headers.get('helicone-id');
// Evaluate both
const scoresA = await evaluateLLMResponse(dataA.choices[0].message.content);
const scoresB = await evaluateLLMResponse(dataB.choices[0].message.content);
// Add scores
await Promise.all([
fetch(`https://api.helicone.ai/v1/request/${requestIdA}/score`, {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.HELICONE_API_KEY}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({ scores: scoresA })
}),
fetch(`https://api.helicone.ai/v1/request/${requestIdB}/score`, {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.HELICONE_API_KEY}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({ scores: scoresB })
})
]);
};
Custom Evaluation Framework
Build a reusable evaluation framework:class LLMEvaluator {
private apiKey: string;
constructor(apiKey: string) {
this.apiKey = apiKey;
}
async evaluate(
requestId: string,
response: string,
criteria: string[]
): Promise<Record<string, number>> {
const scores: Record<string, number> = {};
// Run each evaluation criterion
for (const criterion of criteria) {
scores[criterion] = await this.evaluateCriterion(
response,
criterion
);
}
// Submit scores
await fetch(
`https://api.helicone.ai/v1/request/${requestId}/score`,
{
method: 'POST',
headers: {
'Authorization': `Bearer ${this.apiKey}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({ scores })
}
);
return scores;
}
private async evaluateCriterion(
response: string,
criterion: string
): Promise<number> {
// Implement your evaluation logic here
// This could use another LLM, heuristics, or external APIs
return 0;
}
}
// Usage
const evaluator = new LLMEvaluator(process.env.HELICONE_API_KEY);
const scores = await evaluator.evaluate(
'req_abc123def456',
responseText,
['accuracy', 'clarity', 'completeness', 'tone']
);
Querying by Scores
Query requests based on score values:// Find all high-quality responses (accuracy > 90)
const response = await fetch(
'https://api.helicone.ai/v1/request/query',
{
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.HELICONE_API_KEY}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
filter: {
request_response_rmt: {
scores: {
accuracy: {
gte: 90
}
}
}
},
limit: 100
})
}
);
Score Value Constraints
Important constraints:
- Only integer values are supported (no decimals/floats)
- Boolean values are automatically converted to 1 (true) or 0 (false)
- Score keys should be descriptive and consistent across your application
// ✅ Valid scores
{
"accuracy": 95, // Integer
"has_citations": true, // Boolean (converted to 1)
"contains_errors": false // Boolean (converted to 0)
}
// ❌ Invalid scores
{
"accuracy": 95.5, // Float - will cause error
"score": "high" // String - will cause error
}
Best Practices
- Consistent Naming: Use consistent score names across your evaluation workflows
- Integer Values: Always use integers for numeric scores (0-100 scale is common)
- Boolean Flags: Use booleans for yes/no criteria (presence of citations, factual accuracy, etc.)
- Multiple Dimensions: Track multiple aspects of quality for comprehensive evaluation
- Automated + Human: Combine automated scoring with periodic human evaluation
- Threshold Alerts: Set up monitoring for scores below certain thresholds
Related Endpoints
Get Request by ID
Retrieve request with all scores
Query Requests
Query requests filtered by scores
Add Feedback
Add simple thumbs up/down feedback
Add Properties
Add custom properties to requests
⌘I
