Skip to main content
POST
/
v1
/
splits
from retab import Retab

client = Retab()

split = client.splits.create(
    document="invoice_batch.pdf",
    model="retab-small",
    subdocuments=[
        {
            "name": "invoice",
            "description": "Invoice documents with billing information",
            "allow_multiple_instances": True,
        },
        {"name": "receipt", "description": "Receipt documents for payments"},
        {"name": "contract", "description": "Legal contract documents"},
    ],
    instructions="Processing Q4 2024 vendor document batch",
    n_consensus=3,
    bust_cache=False,
)

for sub in split.output:
    print(f"{sub.name}: pages {sub.pages}")
{
  "id": "split_01G34H8J2K",
  "file": {
    "id": "file_6dd6eb00688ad8d1",
    "filename": "invoice_batch.pdf",
    "mime_type": "application/pdf"
  },
  "model": "retab-small",
  "subdocuments": [
    {
      "name": "invoice",
      "description": "Invoice documents with billing information",
      "allow_multiple_instances": true
    },
    {
      "name": "receipt",
      "description": "Receipt documents for payments",
      "allow_multiple_instances": false
    }
  ],
  "n_consensus": 3,
  "instructions": "Processing Q4 2024 vendor document batch",
  "output": [
    { "name": "invoice", "pages": [1, 2, 3] },
    { "name": "invoice", "pages": [4, 5] },
    { "name": "receipt", "pages": [6] }
  ],
  "consensus": {
    "likelihoods": [
      { "name": 0.98, "pages": [0.99, 0.97, 0.96] },
      { "name": 0.95, "pages": [0.95, 0.93] },
      { "name": 0.99, "pages": [0.99] }
    ],
    "choices": [
      [
        { "name": "invoice", "pages": [1, 2, 3] },
        { "name": "invoice", "pages": [4, 5] },
        { "name": "receipt", "pages": [6] }
      ],
      [
        { "name": "invoice", "pages": [1, 2] },
        { "name": "invoice", "pages": [3, 4, 5] },
        { "name": "receipt", "pages": [6] }
      ],
      [
        { "name": "invoice", "pages": [1, 2, 3] },
        { "name": "invoice", "pages": [4, 5] },
        { "name": "receipt", "pages": [6] }
      ]
    ]
  },
  "usage": {
    "credits": 3.0
  },
  "created_at": "2024-03-15T10:30:00Z"
}

Documentation Index

Fetch the complete documentation index at: https://docs.retab.com/llms.txt

Use this file to discover all available pages before exploring further.

Split a multi-page document into labeled subdocuments and return the canonical split resource. This endpoint is split-only: key-based grouping belongs to /v1/partitions.
from retab import Retab

client = Retab()

split = client.splits.create(
    document="invoice_batch.pdf",
    model="retab-small",
    subdocuments=[
        {
            "name": "invoice",
            "description": "Invoice documents with billing information",
            "allow_multiple_instances": True,
        },
        {"name": "receipt", "description": "Receipt documents for payments"},
        {"name": "contract", "description": "Legal contract documents"},
    ],
    instructions="Processing Q4 2024 vendor document batch",
    n_consensus=3,
    bust_cache=False,
)

for sub in split.output:
    print(f"{sub.name}: pages {sub.pages}")
{
  "id": "split_01G34H8J2K",
  "file": {
    "id": "file_6dd6eb00688ad8d1",
    "filename": "invoice_batch.pdf",
    "mime_type": "application/pdf"
  },
  "model": "retab-small",
  "subdocuments": [
    {
      "name": "invoice",
      "description": "Invoice documents with billing information",
      "allow_multiple_instances": true
    },
    {
      "name": "receipt",
      "description": "Receipt documents for payments",
      "allow_multiple_instances": false
    }
  ],
  "n_consensus": 3,
  "instructions": "Processing Q4 2024 vendor document batch",
  "output": [
    { "name": "invoice", "pages": [1, 2, 3] },
    { "name": "invoice", "pages": [4, 5] },
    { "name": "receipt", "pages": [6] }
  ],
  "consensus": {
    "likelihoods": [
      { "name": 0.98, "pages": [0.99, 0.97, 0.96] },
      { "name": 0.95, "pages": [0.95, 0.93] },
      { "name": 0.99, "pages": [0.99] }
    ],
    "choices": [
      [
        { "name": "invoice", "pages": [1, 2, 3] },
        { "name": "invoice", "pages": [4, 5] },
        { "name": "receipt", "pages": [6] }
      ],
      [
        { "name": "invoice", "pages": [1, 2] },
        { "name": "invoice", "pages": [3, 4, 5] },
        { "name": "receipt", "pages": [6] }
      ],
      [
        { "name": "invoice", "pages": [1, 2, 3] },
        { "name": "invoice", "pages": [4, 5] },
        { "name": "receipt", "pages": [6] }
      ]
    ]
  },
  "usage": {
    "credits": 3.0
  },
  "created_at": "2024-03-15T10:30:00Z"
}

Authorizations

Api-Key
string
header
required

Body

application/json

Request body to create a split.

document
MIMEData · object
required

A file represented by its filename and a base64 data url.

subdocuments
Subdocument · object[]
required

The subdocuments to split the document into

model
string
default:retab-small

The model to use to split the document

instructions
string | null

Free-form instructions appended to the system prompt to steer the split.

n_consensus
integer
default:1

Number of consensus split runs to perform. Uses deterministic single-pass when set to 1.

bust_cache
boolean
default:false

If true, skip the LLM cache and force a fresh completion

background
boolean
default:false

If true, run asynchronously: returns immediately with status 'queued' and an empty output. Poll GET /v1//{id} until status is terminal. Mutually exclusive with stream.

Response

Successful Response

A split result: a document divided into its constituent subdocuments.

id
string
required

Unique identifier of the split result

file
FileRef · object
required

Information about the split file

model
string
required

Model used for the split operation

subdocuments
Subdocument · object[]
required

Subdocuments used for the split operation

n_consensus
integer
default:1

Number of consensus votes used

instructions
string | null

Free-form instructions supplied with the split request.

output
SplitResult · object[]

The list of document splits with their assigned pages. Empty [] until status == 'completed'.

status
enum<string>
default:pending

Lifecycle status. The synchronous path returns 'completed'. Background runs progress pending -> queued -> in_progress -> completed | failed | cancelled.

Available options:
pending,
queued,
in_progress,
completed,
failed,
cancelled
error
PrimitiveError · object

Error details when a background run fails; null otherwise. Always present so consumers can read it without an existence check.

consensus
SplitConsensus · object

Consensus metadata for multi-vote split runs

usage
RetabUsage · object

Usage information for the split operation

created_at
string<date-time> | null