Introduction

Retab offers a consolidated, production-grade pipeline for processing any types of documents with AI. Our model read documents the way humans do. It accepts native digital files (Images, PDFs, DOCX, XLSX, E-mail) and parses text, detects visual structure across pages, tables, forms, and figures. Please check the API Reference for more details. The module exposes three high-level methods:
MethodPurpose
create_messagesGenerates a verbatim, chat-formatted rendition of the document.Retrieval-augmented generation or “chat with your PDF”.
create_inputsWraps the document in a developer prompt targeting a supplied JSON schema.Function-calling or structured extraction with JSON mode.
extractExecutes the extraction and returns the parsed object (optionally with consensus voting).One-step OCR + LLM parsing when only the structured output is required.
parseConverts any document into structured text content with page-by-page extraction.Perfect for RAG, text extraction, and preparing documents for further processing or indexing.
The complexities of OCR, layout reconstruction are handled internally, allowing to focus solely on downstream prompt and context-engineering logic.

The document data structure

Documents in Retab are represented as MIMEData objects, which encapsulate the file content and metadata. This structure allows you to work with documents in a consistent way regardless of their original format. The url field directly matches OpenAI’s expected format for image inputs.
MIMEData Object Structure
object
{
  "document": {
    "filename": "Alphabet-10Q-Q1-25.pdf",
    "url": "data:application/pdf;base64,JVBERi0xLjQKJfbk/N8KMSAwIG9iago8PAovVHlwZS…"
  }
}
The python SDK is flexible and allows you to use the document parameter as a file path, bytes, or a PIL.Image.Image object, and we will automatically convert it to a MIMEData object for you.