Projects

Projects provide a systematic way to test and validate your extraction schemas against known ground truth data. Think of it as evals for document AI. You can measure accuracy, compare different models, and optimize your extraction pipelines with confidence. A project consists of documents with annotations (your test data), iterations (test runs with different settings), and a schema (what you want to extract). This structure lets you run A/B tests between models and systematically improve your document processing accuracy.

Deployments

Deployments are project-based configurations for document extraction that can be called via the API route https://api.retab.com/v1/projects/extract/{project_id}. This is the primary method for executing document extraction using project-based configurations.

Returns

RetabParsedChatCompletion

The extracted data as a JSON object matching the project’s schema.

from retab import Retab

client = Retab()

completion = client.projects.extract(
    project_id="proj_01G34H8J2K",
    document="invoice.pdf"
)

Parameters

project_id

string

required

ID of the project

iteration_id

string

required

ID of the specific iteration to use, or "base-configuration" to use the project’s default settings.

document

Document to process.

temperature

float

Optional temperature override for this specific request. Overrides the default temperature.

seed

int

Optional seed for reproducible results across multiple runs.

store

bool

default:"True"

Whether to store the extraction results for later retrieval and analysis.

metadata

dict[str, str]

Custom key-value metadata to attach to the extraction for organization and filtering.

Organizing Extractions with Metadata

The metadata parameter allows you to attach custom key-value pairs to your extractions, making it easy to organize, filter, and retrieve results later. This is particularly useful when processing documents for multiple clients, departments, or workflows within a single project.

from retab import Retab

client = Retab()

# Attach metadata to organize extractions by customer
completion = client.projects.extract(
    project_id="proj_01G34H8J2K",
    document="invoice.pdf",
    metadata={
        "organization_id": "org_acme_corp",
        "department": "accounting",
        "batch_id": "batch_2024_04"
    }
)

Common use cases for metadata include:

Multi-tenant applications: Tag extractions with organization_id or customer_id to separate data by client
Workflow tracking: Add batch_id, pipeline_stage, or source_system to trace document processing
Categorization: Include document_type, region, or priority for filtering and reporting

Please check the API Reference for complete method documentation.

Overview

Core Concepts

Projects

Deployments

Parameters

Organizing Extractions with Metadata

Overview

Core Concepts

Projects

​Deployments

​Parameters

​Organizing Extractions with Metadata

Deployments

Parameters

Organizing Extractions with Metadata