Fine-tuning adapts pre-trained models to your specific use case, improving quality and reducing costs for specialized tasks. Helicone integrates with OpenPipe to streamline the entire fine-tuning workflow.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/helicone/helicone/llms.txt
Use this file to discover all available pages before exploring further.
When to Fine-Tune
Fine-tuning is ideal when:Specialized Domain
Your domain requires specialized knowledge (medical, legal, technical)
Consistent Format
You need consistent output formatting that prompting can’t achieve
Cost Optimization
High volume makes a smaller fine-tuned model more economical
Latency Requirements
You need faster responses than larger models provide
Fine-Tuning Workflow
Set Up OpenPipe Integration
Connect your Helicone account to OpenPipe:
This allows you to manage fine-tuning datasets and jobs directly from Helicone.
- Navigate to Settings → Integrations in your Helicone dashboard
- Find the OpenPipe integration
- Click Connect and authorize the integration
Collect Training Data
Fine-tuning requires high-quality training examples. You can:Option 1: Use Production DataSelect successful requests from your production traffic:Then filter by these properties in Helicone to export training data.Option 2: Create Synthetic DataGenerate examples programmatically:
Create a Training Dataset
In the Helicone dashboard:
- Go to Datasets → Create New Dataset
- Select requests to include (filter by your training properties)
- Review and clean the data
- Export to OpenPipe
Dataset Best Practices
Diverse Examples
Diverse Examples
Include variety in your training data:
- Different input lengths
- Various edge cases
- Multiple query types
- Representative of production distribution
Consistent Formatting
Consistent Formatting
Ensure all examples follow the same structure:
- Identical system prompts
- Consistent output format
- Same level of detail
High-Quality Labels
High-Quality Labels
Every example should be:
- Factually correct
- Following your desired style
- Representative of ideal behavior
- Free of errors or inconsistencies
Configure Fine-Tuning Job
Click Start Fine-Tuning and configure:
- Base Model: Start with a model family (GPT-4o, GPT-3.5, etc.)
- Training Epochs: Usually 3-5 (more risks overfitting)
- Learning Rate: Use automatic or adjust based on results
- Validation Split: Hold out 10-20% for validation
Fine-tuning jobs typically take 10 minutes to a few hours depending on dataset size and model.
Monitor Training Progress
Track your fine-tuning job in real-time:
If validation loss starts increasing while training loss decreases, you’re overfitting - stop training early.
- Training loss: Should decrease steadily
- Validation loss: Should decrease without diverging from training
- Estimated completion time
Evaluate the Fine-Tuned Model
Once training completes, test your model:Compare outputs against:
- Base model performance
- Your validation set expectations
- Production requirements
Comparing Fine-Tuned vs Base Models
Run side-by-side comparisons:Cost Analysis
Fine-tuning economics depend on volume:Example Calculation
Scenario: 100,000 requests/month, 500 input + 200 output tokens each- GPT-4o (Base)
- GPT-4o-mini Fine-Tuned
Iterating on Fine-Tuned Models
Improve your model over time:Collect Feedback
Create New Training Data
Filter for low-scoring responses and correct them:Retrain Periodically
Create new versions as you collect more data:- Monthly: Add new high-quality examples
- Quarterly: Major updates with improved examples
- Annually: Evaluate if a newer base model would perform better
Troubleshooting
Poor Performance After Fine-Tuning
Overfitting
Overfitting
Symptoms: Great on training data, poor on new inputsSolutions:
- Reduce training epochs (try 2-3 instead of 5+)
- Add more diverse training examples
- Use a larger validation set (20%)
Insufficient Data
Insufficient Data
Symptoms: Model behavior is inconsistentSolutions:
- Collect 2-3x more examples
- Focus on quality over quantity
- Use data augmentation to increase variety
Wrong Base Model
Wrong Base Model
Symptoms: No improvement over base modelSolutions:
- Try a different base model family
- Ensure task matches model capabilities
- Verify training data format is correct
Fine-Tuning Resources
Training Data Best Practices
Deep dive into creating effective training datasets
Model Selection Guide
Choosing the right base model for fine-tuning
RAG vs Fine-Tuning
When to use each approach
OpenAI Fine-Tuning API
Direct API usage without OpenPipe
Next Steps
Cost Tracking
Monitor fine-tuned model economics
Experiments
A/B test fine-tuned vs base models
RAGAS Evaluations
Evaluate fine-tuned model quality systematically
OpenPipe Integration
Learn more about the OpenPipe platform
