Skip to main content
Retab is a document processing worfklow builder for the age of AI.
Retab transforms any document into structured data, with six core functionalities: parse ->
Convert any file (PDFs, Excel, emails, images) into LLM-ready markdown.
extract ->
Extract structured JSON from documents using your defined schema.
edit ->
Modify document content while preserving formatting.
split ->
Intelligently split documents into logical sections.
partition ->
Group repeated records in a document into chunks by a key such as invoice number or policy ID.
classify ->
Categorize documents based on content and type.

Quickstart

The most basic workflow is extracting structured data from a document. The easiest way to access the API is through the Python or Node SDK.
Find your API key in the dashboard settings.
1

Install the SDK

pip install retab
2

Generate a Schema

from retab import Retab

client = Retab()

schema_response = client.schemas.generate(
    documents=["Invoice.pdf"],
    model="retab-small",
)
3

Extract Data

from retab import Retab

client = Retab()

extraction = client.extractions.create(
    json_schema=schema_response.json_schema,
    document="Invoice.pdf",
    model="retab-micro",
)

print(extraction.id)
print(extraction.output)
print(extraction.consensus.likelihoods)

Get Started

Workflows

Build complex document workflows with our no-code editor.

API Playground

Explore the API playground and try Retab API.

Community

Discord

Join our community for tips and best practices.

GitHub

Star us and contribute to the project.