Documentation Index Fetch the complete documentation index at: https://docs.retab.com/llms.txt
Use this file to discover all available pages before exploring further.
Introduction
Retab classifications use the resource-based POST /v1/classifications API. A classification request analyzes a document, chooses exactly one category, and persists the result as a Classification object that you can later retrieve or list.
Common use cases include:
Document Routing : Route incoming files to the right downstream extraction pipeline
Pre-filtering : Choose the right schema before extraction
Mailroom Automation : Sort attachments by document type
Quality Control : Verify documents match the expected category in a workflow
Key features of the Classification API:
Single Decision : Returns exactly one category for the document
Reasoned Output : Includes an explanation for the selected category
Custom Categories : Define categories specific to your workflow
Consensus Support : Run multiple votes with n_consensus when you want more stability
Stored Resource : The result is persisted and can be fetched later
Classification API
The document to classify. Can be a file path, bytes, or image input.
The model to use for classification. Recommended: retab-small for most cases.
The candidate categories. Each category has:
name: Stable identifier for the category
description: Instructions that help the model distinguish that category
Restrict classification to the first N pages. Useful for large documents when the signal appears early.
Free-form instructions appended to the system prompt to steer the classification.
Number of votes to run for consensus. Max: 16.
A stored classification resource containing the decision, consensus metadata, and usage. Unique identifier of the classification.
File metadata for the classified document.
Model used for classification.
Categories the document was classified against.
Number of votes used for the classification.
Free-form instructions supplied with the request.
Final classification decision.
Consensus metadata with flat choices entries.
Usage information for the request.
Use Case: Document Routing in a Processing Pipeline
Classify incoming documents, then route them to the right extraction schema based on output.category.
Python
Javascript
Go
TypeScript
from retab import Retab
client = Retab()
categories = [
{ "name" : "invoice" , "description" : "Invoice documents with billing details, line items, totals, and payment terms" },
{ "name" : "receipt" , "description" : "Payment receipts showing transaction confirmation and amounts paid" },
{ "name" : "contract" , "description" : "Legal contracts with terms, conditions, and signature blocks" },
{ "name" : "purchase_order" , "description" : "Purchase order documents with order details and shipping information" },
]
classification = client.classifications.create(
document = "incoming_document.pdf" ,
model = "retab-small" ,
categories = categories,
)
print ( f "Document classified as: { classification.output.category } " )
print ( f "Reasoning: { classification.output.reasoning } " )
if classification.output.category == "invoice" :
invoice_schema = { ... }
extraction = client.extractions.create(
document = "incoming_document.pdf" ,
model = "retab-small" ,
json_schema = invoice_schema,
)
elif classification.output.category == "contract" :
contract_schema = { ... }
extraction = client.extractions.create(
document = "incoming_document.pdf" ,
model = "retab-small" ,
json_schema = contract_schema,
)
Use Case: Email Attachment Filtering
Classify attachments first, then branch on output.category.
from retab import Retab
client = Retab()
categories = [
{ "name" : "invoice" , "description" : "Invoice or billing documents requiring payment" },
{ "name" : "quote" , "description" : "Price quotes or proposals from vendors" },
{ "name" : "marketing" , "description" : "Marketing materials, brochures, or promotional content" },
{ "name" : "other" , "description" : "Miscellaneous documents not fitting other categories" },
]
for attachment in email_attachments:
classification = client.classifications.create(
document = attachment,
model = "retab-small" ,
categories = categories,
)
if classification.output.category == "invoice" :
process_invoice(attachment)
elif classification.output.category == "quote" :
queue_for_review(attachment)
elif classification.output.category == "marketing" :
archive_document(attachment)
print ( f " { attachment.name } : { classification.output.category } " )
print ( f " Reason: { classification.output.reasoning[: 100 ] } ..." )
Classify vs Split
Feature Classify Split Purpose Categorize the whole document Identify sections within a document Output One output decision Multiple subdocument assignments Use Case Routing, filtering, triage Batch separation, section extraction Result Shape output.category, output.reasoningoutput[] with pages per subdocument
Use Classify when:
You need to know what kind of document you have
The document should map to one main category
You are routing to another workflow or extraction schema
Use Split when:
The file contains multiple sections with different meanings
You need page-level grouping
You plan to extract multiple subdocuments from one upload
Best Practices
Category Definition
Be Specific : Describe what makes each category unique
Use Visual Cues : Mention layouts, headers, logos, or tables when useful
Keep Overlap Low : Categories should be distinguishable from each other
Add a Catch-all : Include an other category when documents can fall outside the expected set
Reasoning and Consensus
Review output.reasoning when you need an audit trail or human validation
Use n_consensus > 1 for noisy or ambiguous document sets
Inspect consensus.choices to understand close calls between categories
Use 3-7 categories for best results in most routing setups
Limit pages with first_n_pages when the first page contains enough signal
Start with retab-small and only scale up when accuracy demands it