Introduction

The client.processors module provides a powerful, production-grade system for configuring and managing document extraction workflows. Processors define what to extract and how to extract it, creating reusable configurations that can be triggered on-demand or automated through various channels.

Please check the API Reference for more details.

Processors eliminate the complexity of manually configuring extraction parameters for each document, allowing you to focus on defining your output schema and reusing proven configurations across multiple documents and automation workflows.

MethodPurpose
createCreates a new processor configuration with specified extraction parameters.
submitProcesses documents using an existing processor and returns structured JSON.

Processor


Object

Object
Processor

A Processor object containing the extraction configuration.

Create

Creates a new processor with the specified extraction configuration. The processor can then be used repeatedly to extract structured data from documents.

Returns
Processor

A newly created Processor object with the specified configuration.

from retab import Retab

reclient = Retab()

processor = reclient.processors.create(
    name="Invoice Processor",
    json_schema="Invoice_schema.json",
    model="gpt-4o-mini",
    modality="native",
    temperature=0.1,
    reasoning_effort="medium",
    n_consensus=3,  # Enable consensus with 3 parallel runs
    image_resolution_dpi=150
)

Submit

Processes one or more documents using an existing processor configuration and returns the extracted structured data. This is the primary method for executing document extraction.

Returns
Dict[str, Any]

The extracted data as a JSON object matching the processor’s schema.

from retab import Retab, MIMEData

reclient = Retab()

# Process a single document
with open("invoice.pdf", "rb") as f:
    mime = MIMEData.from_bytes(f.read(), filename="invoice.pdf")

completion = reclient.processors.submit(
    processor_id="proc_01G34H8J2K",
    document=mime
)