Skip to main content
POST
/
v1
/
splits
from retab import Retab

client = Retab()

split = client.splits.create(
    document="invoice_batch.pdf",
    model="retab-small",
    subdocuments=[
        {
            "name": "invoice",
            "description": "Invoice documents with billing information",
            "partition_key": "invoice number",
            "allow_multiple_instances": True,
        },
        {"name": "receipt", "description": "Receipt documents for payments"},
        {"name": "contract", "description": "Legal contract documents"},
    ],
    context="Processing Q4 2024 vendor document batch",
    n_consensus=1,
)

for sub in split.output:
    print(f"{sub.name}: pages {sub.pages}")
    for partition in sub.partitions:
        print(f"  {partition.key}: pages {partition.pages}")
{
  "id": "split_01G34H8J2K",
  "organization_id": "org_abc123",
  "file": {
    "id": "file_6dd6eb00688ad8d1",
    "filename": "invoice_batch.pdf",
    "mime_type": "application/pdf"
  },
  "model": "retab-small",
  "subdocuments": [
    {"name": "invoice", "description": "Invoice documents", "partition_key": "invoice number", "allow_multiple_instances": true},
    {"name": "receipt", "description": "Receipts", "partition_key": null, "allow_multiple_instances": false}
  ],
  "n_consensus": 1,
  "output": [
    {
      "name": "invoice",
      "pages": [1, 2, 3],
      "partitions": [
        {"key": "INV-001", "pages": [1, 2]},
        {"key": "INV-002", "pages": [3]}
      ]
    },
    {"name": "receipt", "pages": [4, 5], "partitions": []}
  ],
  "consensus": null,
  "created_at": "2024-03-15T10:30:00Z",
  "updated_at": "2024-03-15T10:30:00Z"
}
Split a multi-page document into labeled subdocuments, optionally partitioned by a key, and persist the result as a Split resource that can later be retrieved via GET /v1/splits/{split_id} or listed via GET /v1/splits.
from retab import Retab

client = Retab()

split = client.splits.create(
    document="invoice_batch.pdf",
    model="retab-small",
    subdocuments=[
        {
            "name": "invoice",
            "description": "Invoice documents with billing information",
            "partition_key": "invoice number",
            "allow_multiple_instances": True,
        },
        {"name": "receipt", "description": "Receipt documents for payments"},
        {"name": "contract", "description": "Legal contract documents"},
    ],
    context="Processing Q4 2024 vendor document batch",
    n_consensus=1,
)

for sub in split.output:
    print(f"{sub.name}: pages {sub.pages}")
    for partition in sub.partitions:
        print(f"  {partition.key}: pages {partition.pages}")
{
  "id": "split_01G34H8J2K",
  "organization_id": "org_abc123",
  "file": {
    "id": "file_6dd6eb00688ad8d1",
    "filename": "invoice_batch.pdf",
    "mime_type": "application/pdf"
  },
  "model": "retab-small",
  "subdocuments": [
    {"name": "invoice", "description": "Invoice documents", "partition_key": "invoice number", "allow_multiple_instances": true},
    {"name": "receipt", "description": "Receipts", "partition_key": null, "allow_multiple_instances": false}
  ],
  "n_consensus": 1,
  "output": [
    {
      "name": "invoice",
      "pages": [1, 2, 3],
      "partitions": [
        {"key": "INV-001", "pages": [1, 2]},
        {"key": "INV-002", "pages": [3]}
      ]
    },
    {"name": "receipt", "pages": [4, 5], "partitions": []}
  ],
  "consensus": null,
  "created_at": "2024-03-15T10:30:00Z",
  "updated_at": "2024-03-15T10:30:00Z"
}

Request Body

document
MIMEData
required
The document to split.
subdocuments
array[Subdocument]
required
The labeled subdocuments to split the document into. Each entry has name, optional description, optional partition_key, and allow_multiple_instances (when true, the split runs an extra vision-based refinement pass for repeated instances).
model
string
default:"retab-small"
The model used for the split operation.
context
string
Additional context for the split (e.g. iteration context from a workflow loop).
n_consensus
integer
default:"1"
Number of consensus split runs to perform. Max: 8.
bust_cache
boolean
default:"false"
When true, bypass the cache and re-run the split.

Response Fields

id
string
Unique split identifier.
file
object
File metadata: id, filename, mime_type.
output
array[SplitResult]
The list of subdocuments with their assigned pages.
consensus
object | null
Consensus metadata (likelihoods, choices).
usage
RetabUsage | null
Token and credit usage information.
created_at
string
ISO 8601 creation timestamp.
updated_at
string
ISO 8601 last update timestamp.

Authorizations

Api-Key
string
header
required

Query Parameters

access_token
string | null

Body

application/json
document
MIMEData · object
required

The document to split

subdocuments
Subdocument · object[]
required

The subdocuments to split the document into

Minimum array length: 1
model
string
default:retab-small

The model to use to split the document

context
string | null

Additional context for the split operation (e.g., iteration context from a loop)

n_consensus
integer
default:1

Number of consensus split runs to perform. Uses deterministic single-pass when set to 1.

Required range: 1 <= x <= 8
bust_cache
boolean
default:false

If true, skip the LLM cache and force a fresh completion

Response

Successful Response

Backend-internal split with organization scoping.

file
FileRef · object
required

Information about the split file

model
string
required

Model used for the split operation

subdocuments
Subdocument · object[]
required

Subdocuments used for the split operation

output
SplitResult · object[]
required

The list of document splits with their assigned pages

organization_id
string
required

Organization ID of the user or application

id
string

Unique identifier of the split

n_consensus
integer
default:1

Number of consensus votes used

context
string | null

Additional context supplied with the split request

consensus
SplitConsensus · object

Consensus metadata for multi-vote split runs

usage
RetabUsage · object

Usage information for the split operation

created_at
string<date-time>
updated_at
string<date-time>