Skip to main content
POST
/
v1
/
request
/
{requestId}
/
score
Add Scores to Request
curl --request POST \
  --url https://api.helicone.ai/v1/request/{requestId}/score \
  --header 'Content-Type: application/json' \
  --data '{
  "scores": {}
}'
{
  "data": null,
  "error": {}
}

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/helicone/helicone/llms.txt

Use this file to discover all available pages before exploring further.

Add evaluation scores to a request to track detailed quality metrics beyond simple thumbs up/down feedback. Scores allow you to measure specific dimensions of LLM outputs like accuracy, relevance, helpfulness, and custom evaluation criteria.
Scores support both integer and boolean values. Integer scores are stored as-is, while boolean values are converted to 1 (true) or 0 (false).

Path Parameters

requestId
string
required
The unique identifier of the request to add scores to. This can be found in the Helicone-Id response header when making requests through Helicone.Example: req_abc123def456

Request Body

scores
object
required
An object containing score key-value pairs. Each key is the score name, and each value is either an integer or boolean.Supported value types:
  • number - Must be an integer (floats are not supported)
  • boolean - Converted to 1 (true) or 0 (false)
Example:
{
  "scores": {
    "accuracy": 95,
    "relevance": 87,
    "helpfulness": 92,
    "has_citations": true,
    "is_factual": true
  }
}

Response

data
null
Returns null on success.
error
string | null
Error message if the request failed.

Examples

Add Basic Scores

Add quality scores to a request:
cURL
curl --request POST \
  --url https://api.helicone.ai/v1/request/req_abc123def456/score \
  --header 'Authorization: Bearer <HELICONE_API_KEY>' \
  --header 'Content-Type: application/json' \
  --data '{
  "scores": {
    "accuracy": 95,
    "relevance": 87,
    "helpfulness": 92
  }
}'
TypeScript
const requestId = 'req_abc123def456';

const response = await fetch(
  `https://api.helicone.ai/v1/request/${requestId}/score`,
  {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${process.env.HELICONE_API_KEY}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      scores: {
        accuracy: 95,
        relevance: 87,
        helpfulness: 92
      }
    })
  }
);

const result = await response.json();
console.log('Scores added successfully');
Python
import os
import requests

request_id = "req_abc123def456"

response = requests.post(
    f"https://api.helicone.ai/v1/request/{request_id}/score",
    headers={
        "Authorization": f"Bearer {os.environ['HELICONE_API_KEY']}",
        "Content-Type": "application/json"
    },
    json={
        "scores": {
            "accuracy": 95,
            "relevance": 87,
            "helpfulness": 92
        }
    }
)

result = response.json()
print("Scores added successfully")

Add Mixed Score Types

Combine integer and boolean scores:
cURL
curl --request POST \
  --url https://api.helicone.ai/v1/request/req_abc123def456/score \
  --header 'Authorization: Bearer <HELICONE_API_KEY>' \
  --header 'Content-Type: application/json' \
  --data '{
  "scores": {
    "overall_quality": 88,
    "coherence": 92,
    "has_citations": true,
    "is_factual": true,
    "contains_errors": false
  }
}'
TypeScript
const response = await fetch(
  `https://api.helicone.ai/v1/request/${requestId}/score`,
  {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${process.env.HELICONE_API_KEY}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      scores: {
        overall_quality: 88,
        coherence: 92,
        has_citations: true,
        is_factual: true,
        contains_errors: false
      }
    })
  }
);

Use Cases

Automated LLM-as-Judge Evaluation

Use an LLM to evaluate another LLM’s output:
import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://gateway.helicone.ai/v1',
  defaultHeaders: {
    'Helicone-Auth': `Bearer ${process.env.HELICONE_API_KEY}`
  }
});

// Make the initial request
const { data, response } = await client.chat.completions
  .create({
    model: 'gpt-4',
    messages: [{ role: 'user', content: 'Explain quantum computing' }]
  })
  .withResponse();

const requestId = response.headers.get('helicone-id');
const responseText = data.choices[0].message.content;

// Use GPT-4 to evaluate the response
const evaluation = await client.chat.completions.create({
  model: 'gpt-4',
  messages: [
    {
      role: 'system',
      content: `Evaluate this response on a scale of 0-100 for:
      - Accuracy: How factually correct is the information?
      - Clarity: How easy is it to understand?
      - Completeness: Does it fully answer the question?
      
      Return ONLY a JSON object with these scores.`
    },
    {
      role: 'user',
      content: `Response to evaluate: ${responseText}`
    }
  ],
  response_format: { type: 'json_object' }
});

const scores = JSON.parse(evaluation.choices[0].message.content);

// Add scores to the original request
await fetch(
  `https://api.helicone.ai/v1/request/${requestId}/score`,
  {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${process.env.HELICONE_API_KEY}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({ scores })
  }
);

Human Evaluation Workflow

Collect detailed human evaluations:
interface EvaluationForm {
  accuracy: number;
  relevance: number;
  helpfulness: number;
  tone: number;
  followedInstructions: boolean;
  containedErrors: boolean;
}

const submitEvaluation = async (
  requestId: string,
  evaluation: EvaluationForm
) => {
  await fetch(
    `https://api.helicone.ai/v1/request/${requestId}/score`,
    {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${process.env.HELICONE_API_KEY}`,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({
        scores: {
          accuracy: evaluation.accuracy,
          relevance: evaluation.relevance,
          helpfulness: evaluation.helpfulness,
          tone: evaluation.tone,
          followed_instructions: evaluation.followedInstructions,
          contained_errors: evaluation.containedErrors
        }
      })
    }
  );
};

// Usage in evaluation UI
const handleEvaluationSubmit = async (formData: EvaluationForm) => {
  await submitEvaluation('req_abc123def456', formData);
  console.log('Evaluation submitted successfully');
};

Automated Quality Checks

Implement automated quality scoring:
const evaluateResponse = (responseText: string) => {
  const scores = {
    word_count: responseText.split(' ').length,
    has_citations: /\[\d+\]|\(\d{4}\)/.test(responseText),
    has_code_examples: responseText.includes('```'),
    starts_with_greeting: /^(hello|hi|hey)/i.test(responseText),
    exceeds_min_length: responseText.length > 100,
    contains_markdown: /[#*_`]/.test(responseText)
  };
  
  return scores;
};

const scoreRequest = async (requestId: string, responseText: string) => {
  const scores = evaluateResponse(responseText);
  
  await fetch(
    `https://api.helicone.ai/v1/request/${requestId}/score`,
    {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${process.env.HELICONE_API_KEY}`,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({ scores })
    }
  );
};

Multi-Criteria RAG Evaluation

Evaluate RAG (Retrieval-Augmented Generation) systems:
const evaluateRAG = async (
  requestId: string,
  response: string,
  context: string[],
  userQuery: string
) => {
  // Evaluate different aspects of RAG quality
  const scores = {
    // Answer quality
    answer_relevance: await scoreAnswerRelevance(response, userQuery),
    answer_completeness: await scoreCompleteness(response, userQuery),
    
    // Context quality
    context_relevance: await scoreContextRelevance(context, userQuery),
    context_precision: await scoreContextPrecision(context, response),
    
    // Faithfulness
    faithfulness: await scoreFaithfulness(response, context),
    has_hallucinations: await detectHallucinations(response, context),
    
    // Additional checks
    uses_all_context: checkContextUsage(response, context),
    citations_provided: response.includes('[') && response.includes(']')
  };
  
  await fetch(
    `https://api.helicone.ai/v1/request/${requestId}/score`,
    {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${process.env.HELICONE_API_KEY}`,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({ scores })
    }
  );
  
  return scores;
};

Comparative Evaluation (A/B Testing)

Compare different models or prompts:
const runComparison = async (userQuery: string) => {
  // Test variant A
  const { data: dataA, response: responseA } = await client.chat.completions
    .create(
      {
        model: 'gpt-4',
        messages: [{ role: 'user', content: userQuery }],
        temperature: 0.7
      },
      {
        headers: {
          'Helicone-Property-Variant': 'A',
          'Helicone-Property-Temperature': '0.7'
        }
      }
    )
    .withResponse();
  
  const requestIdA = responseA.headers.get('helicone-id');
  
  // Test variant B
  const { data: dataB, response: responseB } = await client.chat.completions
    .create(
      {
        model: 'gpt-4',
        messages: [{ role: 'user', content: userQuery }],
        temperature: 0.3
      },
      {
        headers: {
          'Helicone-Property-Variant': 'B',
          'Helicone-Property-Temperature': '0.3'
        }
      }
    )
    .withResponse();
  
  const requestIdB = responseB.headers.get('helicone-id');
  
  // Evaluate both
  const scoresA = await evaluateLLMResponse(dataA.choices[0].message.content);
  const scoresB = await evaluateLLMResponse(dataB.choices[0].message.content);
  
  // Add scores
  await Promise.all([
    fetch(`https://api.helicone.ai/v1/request/${requestIdA}/score`, {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${process.env.HELICONE_API_KEY}`,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({ scores: scoresA })
    }),
    fetch(`https://api.helicone.ai/v1/request/${requestIdB}/score`, {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${process.env.HELICONE_API_KEY}`,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({ scores: scoresB })
    })
  ]);
};

Custom Evaluation Framework

Build a reusable evaluation framework:
class LLMEvaluator {
  private apiKey: string;
  
  constructor(apiKey: string) {
    this.apiKey = apiKey;
  }
  
  async evaluate(
    requestId: string,
    response: string,
    criteria: string[]
  ): Promise<Record<string, number>> {
    const scores: Record<string, number> = {};
    
    // Run each evaluation criterion
    for (const criterion of criteria) {
      scores[criterion] = await this.evaluateCriterion(
        response,
        criterion
      );
    }
    
    // Submit scores
    await fetch(
      `https://api.helicone.ai/v1/request/${requestId}/score`,
      {
        method: 'POST',
        headers: {
          'Authorization': `Bearer ${this.apiKey}`,
          'Content-Type': 'application/json'
        },
        body: JSON.stringify({ scores })
      }
    );
    
    return scores;
  }
  
  private async evaluateCriterion(
    response: string,
    criterion: string
  ): Promise<number> {
    // Implement your evaluation logic here
    // This could use another LLM, heuristics, or external APIs
    return 0;
  }
}

// Usage
const evaluator = new LLMEvaluator(process.env.HELICONE_API_KEY);
const scores = await evaluator.evaluate(
  'req_abc123def456',
  responseText,
  ['accuracy', 'clarity', 'completeness', 'tone']
);

Querying by Scores

Query requests based on score values:
// Find all high-quality responses (accuracy > 90)
const response = await fetch(
  'https://api.helicone.ai/v1/request/query',
  {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${process.env.HELICONE_API_KEY}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      filter: {
        request_response_rmt: {
          scores: {
            accuracy: {
              gte: 90
            }
          }
        }
      },
      limit: 100
    })
  }
);

Score Value Constraints

Important constraints:
  • Only integer values are supported (no decimals/floats)
  • Boolean values are automatically converted to 1 (true) or 0 (false)
  • Score keys should be descriptive and consistent across your application
// ✅ Valid scores
{
  "accuracy": 95,           // Integer
  "has_citations": true,    // Boolean (converted to 1)
  "contains_errors": false  // Boolean (converted to 0)
}

// ❌ Invalid scores
{
  "accuracy": 95.5,         // Float - will cause error
  "score": "high"           // String - will cause error
}

Best Practices

  • Consistent Naming: Use consistent score names across your evaluation workflows
  • Integer Values: Always use integers for numeric scores (0-100 scale is common)
  • Boolean Flags: Use booleans for yes/no criteria (presence of citations, factual accuracy, etc.)
  • Multiple Dimensions: Track multiple aspects of quality for comprehensive evaluation
  • Automated + Human: Combine automated scoring with periodic human evaluation
  • Threshold Alerts: Set up monitoring for scores below certain thresholds

Get Request by ID

Retrieve request with all scores

Query Requests

Query requests filtered by scores

Add Feedback

Add simple thumbs up/down feedback

Add Properties

Add custom properties to requests