Skip to main content

Introduction

partitions.create groups a document into repeated chunks using a key such as invoice_number, policy_id, or claim_number. This is a separate primitive from split.
  • split answers: “What subdocument type is on these pages?”
  • partition answers: “Which pages belong to each key value?”
Use partition when one document contains many records of the same conceptual type and you want one result per detected key. Common use cases include:
  1. Invoice batches: Group one PDF into one chunk per invoice number.
  2. Claim packets: Group pages by claim ID inside a large insurance packet.
  3. Policy exports: Break a carrier export into one chunk per policy number.
  4. Repeated forms: Segment a homogeneous packet into one chunk per repeated identifier.
Key features of the Partition API:
  • Key-based grouping: Separate repeated records by business identifier.
  • Canonical resource: Returns a stored Partition with id, file, model, output, consensus, and usage.
  • Page-level mapping: Each chunk includes the 1-indexed pages assigned to it.
  • Consensus support: Increase n_consensus to inspect consensus.likelihoods and consensus.choices.

Partition API

PartitionRequest
PartitionRequest
Returns
Partition
The stored partition record with its grouped chunks and metadata.

Use Case: Partitioning an Invoice Batch

Use partitioning when every record is the same general document type, but each record has its own identifier and should become its own chunk.
from retab import Retab

client = Retab()

response = client.partitions.create(
    document="invoice_batch.pdf",
    key="invoice_number",
    instructions="Return one chunk per invoice number and keep all pages for the same invoice together.",
    model="retab-small",
    n_consensus=3,
)

for chunk in response.output:
    print(chunk.key, chunk.pages)

print(response.consensus.likelihoods)

When to Use Partition vs Split

  • Use partitions.create when every record is conceptually the same kind of document and you want one chunk per repeated key value.
  • Use splits.create when you need to classify a document into different subdocument types such as invoice, receipt, and contract.
  • Use partitions.create after splits.create when you first need to isolate a subdocument type and then group that subset by a key.

Best Practices

  • Make the key semantically precise: invoice_number, claim_id, policy_number.
  • Write instructions as grouping guidance, not as extraction schema guidance.
  • Use partition only when the packet is homogeneous or when you have already isolated the relevant subdocument pages.
  • Raise n_consensus when key assignment quality is important enough to inspect disagreements.

Pricing

Partition is billed at the same rate as split:
credits_per_page = n_consensus × model_multiplier
total_credits    = credits_per_page × page_count
Model multipliers: retab-micro = 0.2, retab-small = 1.0, retab-large = 3.0. Examples (1 page):
Modeln_consensus = 1n_consensus = 3
retab-micro0.20.6
retab-small1.03.0
retab-large3.09.0
No separate preprocessing charge applies to partition operations.