Skip to main content
Extractions are the results of document processing. Each extraction contains the structured data extracted from a document, along with metadata about the extraction process. You can list, filter, retrieve, update, and download extractions programmatically.

Listing Extractions

Retrieve a paginated list of extractions with optional filtering by date, origin, review status, or custom metadata.
from datetime import datetime
from retab import Retab

client = Retab()

# List recent extractions
extractions = client.extractions.list(
    limit=10,
    order="desc"
)

# Filter by metadata
extractions = client.extractions.list(
    metadata={"organization_id": "org_acme_corp"},
    limit=50
)

# Filter by date range
extractions = client.extractions.list(
    from_date=datetime(2024, 1, 1),
    to_date=datetime(2024, 12, 31)
)

Parameters

limit
int
default:"10"
Maximum number of extractions to return per page.
order
string
default:"desc"
Sort order by creation date. Either "asc" or "desc".
before
string
Cursor for pagination - return extractions before this ID.
after
string
Cursor for pagination - return extractions after this ID.
from_date
datetime
Filter extractions created on or after this date. Use datetime in Python or Date in JavaScript.
to_date
datetime
Filter extractions created on or before this date. Use datetime in Python or Date in JavaScript.
metadata
dict[str, str]
Filter by custom metadata key-value pairs.
human_review_status
string
Filter by review status: "success", "review_required", or "reviewed".

Getting an Extraction

Retrieve a single extraction by its ID.
from retab import Retab

client = Retab()

extraction = client.extractions.get("extr_01G34H8J2K")
print(extraction)

Updating an Extraction

Update an extraction’s predictions, review status, or other properties.
from retab import Retab

client = Retab()

# Update predictions after human review
updated = client.extractions.update(
    extraction_id="extr_01G34H8J2K",
    predictions={
        "invoice_number": "INV-2024-0789-CORRECTED",
        "total_amount": 1576.75
    },
    human_review_status="reviewed"
)

Downloading Extractions

Download extractions in bulk as JSONL, CSV, or XLSX format.
from datetime import datetime
from retab import Retab

client = Retab()

# Get download URL for JSONL export
result = client.extractions.download(
    format="jsonl",
    from_date=datetime(2024, 1, 1),
    metadata={"organization_id": "org_acme_corp"}
)

print(f"Download URL: {result['download_url']}")
print(f"Expires at: {result['expires_at']}")

Download Parameters

format
string
default:"jsonl"
Export format: "jsonl", "csv", or "xlsx".
from_date
datetime
Filter extractions created on or after this date. Use datetime in Python or Date in JavaScript.
to_date
datetime
Filter extractions created on or before this date. Use datetime in Python or Date in JavaScript.
metadata
dict[str, str]
Filter by custom metadata.

Filtering by Metadata

Metadata filtering is powerful for organizing extractions across multiple clients or workflows. When you attach metadata during extraction, you can later filter by those same keys.
from retab import Retab

client = Retab()

# List all extractions for a specific organization
org_extractions = client.extractions.list(
    metadata={"organization_id": "org_acme_corp"},
    limit=100
)


# Download all extractions from a specific batch
batch_download = client.extractions.download(
    format="csv",
    metadata={"batch_id": "batch_2024_04"}
)
Please check the API Reference for complete method documentation.