splits.create assigns pages in a multi-page document to named subdocuments. Each result contains only:
name
pages
That is the full mental model for split. It is a document-labeling primitive, not a key-based grouping primitive.Use split when one file contains different document types, sections, or repeated subdocument instances and you want to know which pages belong to each one.Common use cases include:
Mixed document batches: Separate invoices, receipts, contracts, and cover letters from one uploaded PDF.
Report section detection: Find the executive summary, appendix, or financial section inside a long report.
Repeated instances: Detect repeated occurrences of the same subdocument type with allow_multiple_instances=True.
Workflow routing: Route each detected subdocument type into its own downstream extraction or review branch.
Key features of the Split API:
Named subdocuments: Define the labels you care about with natural-language descriptions.
Page-level output: Results are explicit 1-indexed page arrays.
Repeated instances: The same name can appear multiple times in output.
Consensus support: Increase n_consensus to get consensus.likelihoods and consensus.choices.
Pure split primitive: Key-based grouping is handled by partitions.create, not by split.
The document to split. The HTTP API accepts MIMEData. The SDKs also accept
convenient local inputs such as file paths, file-like objects, images,
buffers, and URLs, then convert them for you.
List of subdocuments to classify the document into. Each subdocument has: -
name: Unique identifier for the subdocument - description: Detailed
description to help the model identify this subdocument - partition_key
(optional): Key used to partition repeated instances inside this subdocument -
allow_overlap (optional, default true): Set to false when partition
chunks for this subdocument must be exclusive - allow_multiple_instances
(optional): Set to true when this subdocument type can appear more than once
in the document and you want each distinct instance detected separately
Number of split passes to run before building the final answer. Leave it at
1 for the fastest deterministic pass, or raise it when boundary quality is
business-critical and you want consensus.likelihoods and
consensus.choices.
Present when n_consensus > 1 and contains: - likelihoods: A tree
aligned with output, with confidence for name and for each page leaf -
choices: One entry per consensus run