Partition - Retab Docs

Introduction

partitions.create groups a document into repeated chunks using a key such as invoice_number, policy_id, or claim_number. This is a separate primitive from split.

split answers: “What subdocument type is on these pages?”
partition answers: “Which pages belong to each key value?”

Use partition when one document contains many records of the same conceptual type and you want one result per detected key. Common use cases include:

Invoice batches: Group one PDF into one chunk per invoice number.
Claim packets: Group pages by claim ID inside a large insurance packet.
Policy exports: Break a carrier export into one chunk per policy number.
Repeated forms: Segment a homogeneous packet into one chunk per repeated identifier.

Key features of the Partition API:

Key-based grouping: Separate repeated records by business identifier.
Canonical resource: Returns a stored Partition with id, file, model, output, consensus, and usage.
Page-level mapping: Each chunk includes the 1-indexed pages assigned to it.
Consensus support: Increase n_consensus to inspect consensus.likelihoods and consensus.choices.

Partition API

PartitionRequest

Show properties

document

MIMEData

required

The document to partition. The HTTP API accepts MIMEData. The SDKs also accept convenient local inputs such as file paths, file-like objects, images, buffers, and URLs, then convert them for you.

key

string

required

The field or concept used to separate the document into chunks, such as invoice_number, policy_id, or claim_number.

instructions

string

required

Natural-language guidance describing how the document should be partitioned.

model

string

default:"retab-small"

The model used for partitioning.

n_consensus

integer

default:"1"

Number of partitioning runs to use for consensus voting.

allow_overlap

boolean

default:"true"

When true, partition chunks may share pages. Set to false for exclusive chunks.

bust_cache

boolean

default:"false"

When true, bypass the cache and force a fresh partition run.

Returns

Partition

The stored partition record with its grouped chunks and metadata.

Show properties

string

Unique identifier of the partition run (prefix prtn_).

file

FileRef

Reference to the source document (id, filename, mime_type).

model

string

Model used for partitioning.

key

string

Partition key used for the run.

instructions

string

Instructions supplied with the partition request.

n_consensus

integer

Number of consensus votes used.

allow_overlap

boolean

Whether partition chunks were allowed to share pages.

output

array[PartitionChunk]

One chunk per detected key value, each containing: - key: The detected partition key value - pages: The 1-indexed pages assigned to that chunk

consensus

PartitionConsensus

Present for all responses and populated when n_consensus > 1: - likelihoods: A tree aligned with output, with confidence for key and for each page leaf - choices: One entry per consensus run

usage

RetabUsage | null

Usage information for the partition operation.

created_at

datetime | null

Timestamp when the partition was created.

Use Case: Partitioning an Invoice Batch

Use partitioning when every record is the same general document type, but each record has its own identifier and should become its own chunk.

from retab import Retab

client = Retab()

response = client.partitions.create(
    document="invoice_batch.pdf",
    key="invoice_number",
    instructions="Return one chunk per invoice number and keep all pages for the same invoice together.",
    model="retab-small",
    n_consensus=3,
)

for chunk in response.output:
    print(chunk.key, chunk.pages)

print(response.consensus.likelihoods)

When to Use Partition vs Split

Use partitions.create when every record is conceptually the same kind of document and you want one chunk per repeated key value.
Use splits.create when you need to classify a document into different subdocument types such as invoice, receipt, and contract.
Use partitions.create after splits.create when you first need to isolate a subdocument type and then group that subset by a key.

Best Practices

Make the key semantically precise: invoice_number, claim_id, policy_number.
Write instructions as grouping guidance, not as extraction schema guidance.
Use partition only when the packet is homogeneous or when you have already isolated the relevant subdocument pages.
Raise n_consensus when key assignment quality is important enough to inspect disagreements.

Pricing

Partition is billed at the same rate as split:

credits_per_page = n_consensus × model_multiplier
total_credits    = credits_per_page × page_count

Model multipliers: retab-micro = 0.2, retab-small = 1.0, retab-large = 3.0. Examples (1 page):

Model	n_consensus = 1	n_consensus = 3
`retab-micro`	0.2	0.6
`retab-small`	1.0	3.0
`retab-large`	3.0	9.0

No separate preprocessing charge applies to partition operations.

​Introduction

​Partition API

​Use Case: Partitioning an Invoice Batch

​When to Use Partition vs Split

​Best Practices

​Pricing

Introduction

Partition API

Use Case: Partitioning an Invoice Batch

When to Use Partition vs Split

Best Practices

Pricing