Introduction
partitions.create groups a document into repeated chunks using a key such as invoice_number, policy_id, or claim_number.
This is a separate primitive from split.
splitanswers: “What subdocument type is on these pages?”partitionanswers: “Which pages belong to each key value?”
partition when one document contains many records of the same conceptual type and you want one result per detected key.
Common use cases include:
- Invoice batches: Group one PDF into one chunk per invoice number.
- Claim packets: Group pages by claim ID inside a large insurance packet.
- Policy exports: Break a carrier export into one chunk per policy number.
- Repeated forms: Segment a homogeneous packet into one chunk per repeated identifier.
- Key-based grouping: Separate repeated records by business identifier.
- Canonical resource: Returns a stored
Partitionwithid,file,model,output,consensus, andusage. - Page-level mapping: Each chunk includes the 1-indexed pages assigned to it.
- Consensus support: Increase
n_consensusto inspectconsensus.likelihoodsandconsensus.choices.
Partition API
The stored partition record with its grouped chunks and metadata.
Use Case: Partitioning an Invoice Batch
Use partitioning when every record is the same general document type, but each record has its own identifier and should become its own chunk.When to Use Partition vs Split
- Use
partitions.createwhen every record is conceptually the same kind of document and you want one chunk per repeated key value. - Use
splits.createwhen you need to classify a document into different subdocument types such asinvoice,receipt, andcontract. - Use
partitions.createaftersplits.createwhen you first need to isolate a subdocument type and then group that subset by a key.
Best Practices
- Make the
keysemantically precise:invoice_number,claim_id,policy_number. - Write
instructionsas grouping guidance, not as extraction schema guidance. - Use
partitiononly when the packet is homogeneous or when you have already isolated the relevant subdocument pages. - Raise
n_consensuswhen key assignment quality is important enough to inspect disagreements.
Pricing
Partition is billed at the same rate as split:retab-micro = 0.2, retab-small = 1.0, retab-large = 3.0.
Examples (1 page):
| Model | n_consensus = 1 | n_consensus = 3 |
|---|---|---|
retab-micro | 0.2 | 0.6 |
retab-small | 1.0 | 3.0 |
retab-large | 3.0 | 9.0 |