Introduction
Retab classifications use the resource-basedPOST /v1/classifications API. A classification request analyzes a document, chooses exactly one category, and persists the result as a Classification object that you can later retrieve or list.
Common use cases include:
- Document Routing: Route incoming files to the right downstream extraction pipeline
- Pre-filtering: Choose the right schema before extraction
- Mailroom Automation: Sort attachments by document type
- Quality Control: Verify documents match the expected category in a workflow
- Single Decision: Returns exactly one category for the document
- Reasoned Output: Includes an explanation for the selected category
- Custom Categories: Define categories specific to your workflow
- Consensus Support: Run multiple votes with
n_consensuswhen you want more stability - Stored Resource: The result is persisted and can be fetched later
Classification API
A stored classification resource containing the decision, consensus metadata, and usage.
Use Case: Document Routing in a Processing Pipeline
Classify incoming documents, then route them to the right extraction schema based onoutput.category.
Use Case: Email Attachment Filtering
Classify attachments first, then branch onoutput.category.
Classify vs Split
| Feature | Classify | Split |
|---|---|---|
| Purpose | Categorize the whole document | Identify sections within a document |
| Output | One output decision | Multiple subdocument assignments |
| Use Case | Routing, filtering, triage | Batch separation, section extraction |
| Result Shape | output.category, output.reasoning | output[] with pages per subdocument |
- You need to know what kind of document you have
- The document should map to one main category
- You are routing to another workflow or extraction schema
- The file contains multiple sections with different meanings
- You need page-level grouping
- You plan to extract multiple subdocuments from one upload
Best Practices
Category Definition
- Be Specific: Describe what makes each category unique
- Use Visual Cues: Mention layouts, headers, logos, or tables when useful
- Keep Overlap Low: Categories should be distinguishable from each other
- Add a Catch-all: Include an
othercategory when documents can fall outside the expected set
Reasoning and Consensus
- Review
output.reasoningwhen you need an audit trail or human validation - Use
n_consensus > 1for noisy or ambiguous document sets - Inspect
consensus.choicesto understand close calls between categories
Performance
- Use 3-7 categories for best results in most routing setups
- Limit pages with
first_n_pageswhen the first page contains enough signal - Start with
retab-smalland only scale up when accuracy demands it