extract
method in Retab’s document processing pipeline uses AI models to extract structured data from any document based on a provided JSON schema. This endpoint is ideal for automating data extraction tasks, such as pulling key information from invoices, forms, receipts, images, or scanned documents, for use in workflows like data entry automation, analytics, or integration with databases and applications.
The typical extraction workflow follows these steps:
extract
method to process the document and retrieve structured data.parse
method that focuses on raw text extraction, extract
provides:
gpt-4.1-nano
: Balanced for accuracy and cost, recommended for most extraction tasks.gemini-2.5-pro
: Use for complex documents requiring deep contextual understanding.gemini-2.5-flash
: Faster and cheaper for simple extractions or high-volume processing.description
fields: Helps the AI model understand what to extract.X-SystemPrompt
for custom guidance: E.g., “Focus on freight details” for domain-specific extractions.n_consensus > 1
to average results and boost reliability.