Introduction
Retab offers a consolidated, production-grade pipeline for processing any types of documents with AI. Our model read documents the way humans do. It accepts native digital files (Images, PDFs, DOCX, XLSX, E-mail) and parses text, detects visual structure across pages, tables, forms, and figures. Please check the API Reference for more details. The module exposes four high-level methods:| Method | Purpose |
|---|---|
extract | Executes the extraction and returns the parsed object (optionally with consensus voting). One-step OCR + LLM parsing when only the structured output is required. |
parse | Converts any document into structured text content with page-by-page extraction. Perfect for RAG, text extraction, and preparing documents for further processing or indexing. |
edit | Automatically detects form fields in PDFs using OCR and LLM, then fills them based on natural language instructions. Ideal for automated form completion workflows. |
split | Analyzes multi-page documents and classifies pages into user-defined categories, returning page ranges for each section. Perfect for separating mixed document batches and organizing content by type. |
The document data structure
Documents in Retab are represented asMIMEData objects, which encapsulate the file content and metadata. This structure allows you to work with documents in a consistent way regardless of their original format. The url field directly matches OpenAI’s expected format for image inputs.
document parameter as a file path, bytes, or a PIL.Image.Image object, and we will automatically convert it to a MIMEData object for you.