Skip to main content

Introduction

Retab classifications use the resource-based POST /v1/classifications API. A classification request analyzes a document, chooses exactly one category, and persists the result as a Classification object that you can later retrieve or list. Common use cases include:
  1. Document Routing: Route incoming files to the right downstream extraction pipeline
  2. Pre-filtering: Choose the right schema before extraction
  3. Mailroom Automation: Sort attachments by document type
  4. Quality Control: Verify documents match the expected category in a workflow
Key features of the Classification API:
  • Single Decision: Returns exactly one category for the document
  • Reasoned Output: Includes an explanation for the selected category
  • Custom Categories: Define categories specific to your workflow
  • Consensus Support: Run multiple votes with n_consensus when you want more stability
  • Stored Resource: The result is persisted and can be fetched later

Classification API

ClassificationRequest
ClassificationRequest
Returns
Classification Object
A stored classification resource containing the decision, consensus metadata, and usage.

Use Case: Document Routing in a Processing Pipeline

Classify incoming documents, then route them to the right extraction schema based on output.category.
from retab import Retab

client = Retab()

categories = [
    {"name": "invoice", "description": "Invoice documents with billing details, line items, totals, and payment terms"},
    {"name": "receipt", "description": "Payment receipts showing transaction confirmation and amounts paid"},
    {"name": "contract", "description": "Legal contracts with terms, conditions, and signature blocks"},
    {"name": "purchase_order", "description": "Purchase order documents with order details and shipping information"},
]

classification = client.classifications.create(
    document="incoming_document.pdf",
    model="retab-small",
    categories=categories,
)

print(f"Document classified as: {classification.output.category}")
print(f"Reasoning: {classification.output.reasoning}")

if classification.output.category == "invoice":
    invoice_schema = {...}
    extraction = client.extractions.create(
        document="incoming_document.pdf",
        model="retab-small",
        json_schema=invoice_schema,
    )
elif classification.output.category == "contract":
    contract_schema = {...}
    extraction = client.extractions.create(
        document="incoming_document.pdf",
        model="retab-small",
        json_schema=contract_schema,
    )

Use Case: Email Attachment Filtering

Classify attachments first, then branch on output.category.
from retab import Retab

client = Retab()

categories = [
    {"name": "invoice", "description": "Invoice or billing documents requiring payment"},
    {"name": "quote", "description": "Price quotes or proposals from vendors"},
    {"name": "marketing", "description": "Marketing materials, brochures, or promotional content"},
    {"name": "other", "description": "Miscellaneous documents not fitting other categories"},
]

for attachment in email_attachments:
    classification = client.classifications.create(
        document=attachment,
        model="retab-small",
        categories=categories,
    )

    if classification.output.category == "invoice":
        process_invoice(attachment)
    elif classification.output.category == "quote":
        queue_for_review(attachment)
    elif classification.output.category == "marketing":
        archive_document(attachment)

    print(f"{attachment.name}: {classification.output.category}")
    print(f"  Reason: {classification.output.reasoning[:100]}...")

Classify vs Split

FeatureClassifySplit
PurposeCategorize the whole documentIdentify sections within a document
OutputOne output decisionMultiple subdocument assignments
Use CaseRouting, filtering, triageBatch separation, section extraction
Result Shapeoutput.category, output.reasoningoutput[] with pages per subdocument
Use Classify when:
  • You need to know what kind of document you have
  • The document should map to one main category
  • You are routing to another workflow or extraction schema
Use Split when:
  • The file contains multiple sections with different meanings
  • You need page-level grouping
  • You plan to extract multiple subdocuments from one upload

Best Practices

Category Definition

  • Be Specific: Describe what makes each category unique
  • Use Visual Cues: Mention layouts, headers, logos, or tables when useful
  • Keep Overlap Low: Categories should be distinguishable from each other
  • Add a Catch-all: Include an other category when documents can fall outside the expected set

Reasoning and Consensus

  • Review output.reasoning when you need an audit trail or human validation
  • Use n_consensus > 1 for noisy or ambiguous document sets
  • Inspect consensus.choices to understand close calls between categories

Performance

  • Use 3-7 categories for best results in most routing setups
  • Limit pages with first_n_pages when the first page contains enough signal
  • Start with retab-small and only scale up when accuracy demands it