Extractions

Extractions are the results of document processing. Each extraction contains the structured data extracted from a document, along with metadata about the extraction process. You can list, filter, retrieve, update, and download extractions programmatically.

Listing Extractions

Retrieve a paginated list of extractions with optional filtering by date, origin, review status, or custom metadata.

from datetime import datetime
from retab import Retab

client = Retab()

# List recent extractions
extractions = client.extractions.list(
    limit=10,
    order="desc"
)

# Filter by metadata
extractions = client.extractions.list(
    metadata={"organization_id": "org_acme_corp"},
    limit=50
)

# Filter by date range
extractions = client.extractions.list(
    from_date=datetime(2024, 1, 1),
    to_date=datetime(2024, 12, 31)
)

Parameters

limit

int

default:"10"

Maximum number of extractions to return per page.

order

string

default:"desc"

Sort order by creation date. Either "asc" or "desc".

before

string

Cursor for pagination - return extractions before this ID.

after

string

Cursor for pagination - return extractions after this ID.

from_date

datetime

Filter extractions created on or after this date. Use datetime in Python or Date in JavaScript.

to_date

datetime

Filter extractions created on or before this date. Use datetime in Python or Date in JavaScript.

metadata

dict[str, str]

Filter by custom metadata key-value pairs.

Getting an Extraction

Retrieve a single extraction by its ID.

from retab import Retab

client = Retab()

extraction = client.extractions.get("extr_01G34H8J2K")
print(extraction)

Updating an Extraction

Update an extraction’s predictions or other properties.

from retab import Retab

client = Retab()

# Update predictions after human review
updated = client.extractions.update(
    extraction_id="extr_01G34H8J2K",
    predictions={
        "invoice_number": "INV-2024-0789-CORRECTED",
        "total_amount": 1576.75
    }
)

Downloading Extractions

Download extractions in bulk as JSONL, CSV, or XLSX format.

from datetime import datetime
from retab import Retab

client = Retab()

# Get download URL for JSONL export
result = client.extractions.download(
    format="jsonl",
    from_date=datetime(2024, 1, 1),
    metadata={"organization_id": "org_acme_corp"}
)

print(f"Download URL: {result['download_url']}")
print(f"Expires at: {result['expires_at']}")

Download Parameters

format

string

default:"jsonl"

Export format: "jsonl", "csv", or "xlsx".

from_date

datetime

Filter extractions created on or after this date. Use datetime in Python or Date in JavaScript.

to_date

datetime

Filter extractions created on or before this date. Use datetime in Python or Date in JavaScript.

metadata

dict[str, str]

Filter by custom metadata.

Filtering by Metadata

Metadata filtering is powerful for organizing extractions across multiple clients or workflows. When you attach metadata during extraction, you can later filter by those same keys.

from retab import Retab

client = Retab()

# List all extractions for a specific organization
org_extractions = client.extractions.list(
    metadata={"organization_id": "org_acme_corp"},
    limit=100
)


# Download all extractions from a specific batch
batch_download = client.extractions.download(
    format="csv",
    metadata={"batch_id": "batch_2024_04"}
)

Please check the API Reference for complete method documentation.

Overview

Core Concepts

Consensus

Listing Extractions

Parameters

Getting an Extraction

Updating an Extraction

Downloading Extractions

Download Parameters

Filtering by Metadata

Overview

Core Concepts

Consensus

​Listing Extractions

​Parameters

​Getting an Extraction

​Updating an Extraction

​Downloading Extractions

​Download Parameters

​Filtering by Metadata

Listing Extractions

Parameters

Getting an Extraction

Updating an Extraction

Downloading Extractions

Download Parameters

Filtering by Metadata