How it works
- Create an project with your extraction schema
- Upload test documents with manually verified ground truth annotations
- Run iterations with different model settings (GPT-4o vs GPT-4o-mini, consensus, etc.)
- Compare results to find the optimal configuration for your use case
Schema Optimization Through Projects
One of the most powerful features of projects is schema refinement. When you see poor accuracy on specific fields, you can:- Improve descriptions: Make field descriptions more specific and unambiguous
- Add reasoning prompts: Use
X-ReasoningPrompt
for complex calculations or logic - Refine field types: Adjust data types based on extraction patterns
- Run initial project → identify low-accuracy fields
- Refine descriptions and add reasoning prompts → re-run project
- Compare accuracy improvements → iterate until satisfied
- Deploy optimized schema to production
Quick Start
While you can create projects programmatically with the SDK, we recommend using the Retab platform for project management. The web interface provides powerful schema editing tools, visual result comparisons, and collaborative features that make optimization much easier.
Key Benefits
- Objective Measurement: Get precise accuracy scores instead of subjective assessments
- Model Comparison: Test different models to find the best fit
- Schema Validation: Identify which fields are hardest to extract accurately
- Cost Optimization: Balance accuracy against processing costs for your use case
Best Practices
- Diverse Test Data: Include various document formats, qualities, and edge cases
- Sufficient Volume: Use at least 5-10 test documents for reliable metrics
- Ground Truth Quality: Double-check your annotations—bad ground truth leads to misleading results
Deployments
Deployments are project-based configurations for document extraction that can be called via the API routehttps://api.retab.com/v1/projects/extract/{project_id}/{iteration_id}
.
This is the primary method for executing document extraction using project-based configurations.
The extracted data as a JSON object matching the project’s schema.
Parameters
ID of the project
ID of the specific iteration to use, or
"base-configuration"
to use the project’s default settings.Single document to process (mutually exclusive with documents).
List of documents to process (mutually exclusive with document).
Optional temperature override for this specific request. Overrides the default temperature.
Optional seed for reproducible results across multiple runs.
Whether to store the extraction results for later retrieval and analysis.