The client.documents module offers a consolidated, production-grade pipeline for processing any types of documents with AI. Our model read documents the way humans do. It accepts native digital files (Images, PDFs, DOCX, XLSX, E-mail) and parses text, detects visual structure across pages, tables, forms, and figures. Please check the API Reference for more details.
The module exposes three high-level methods:
Method
Purpose
create_messages
Generates a verbatim, chat-formatted rendition of the document.
Retrieval-augmented generation or “chat with your PDF”.
create_inputs
Wraps the document in a developer prompt targeting a supplied JSON schema.
Function-calling or structured extraction with JSON mode.
extract
Executes the extraction and returns the parsed object (optionally with consensus voting).
One-step OCR + LLM parsing when only the structured output is required.
parse
Converts any document into structured text content with page-by-page extraction.
Perfect for RAG, text extraction, and preparing documents for further processing or indexing.
The complexities of OCR, layout reconstruction are handled internally, allowing to focus solely on downstream prompt and context-engineering logic.
Documents in Retab are represented as MIMEData objects, which encapsulate the file content and metadata. This structure allows you to work with documents in a consistent way regardless of their original format. The url field directly matches OpenAI’s expected format for image inputs.
The python SDK is flexible and allows you to use the document parameter as a file path, bytes, or a PIL.Image.Image object, and we will automatically convert it to a MIMEData object for you.
An OpenAI ParsedChatCompletion object with the extracted data.
Copy
from retab import Retabreclient = Retab()doc_msg = reclient.documents.extract( document = "freight/booking_confirmation.jpg", model="gpt-4.1-nano", json_schema = { 'X-SystemPrompt': 'You are a useful assistant.', 'properties': { 'name': { 'description': 'The name of the calendar event.', 'title': 'Name', 'type': 'string' }, 'date': { 'description': 'The date of the calendar event in ISO 8601 format.', 'title': 'Date', 'type': 'string' } }, 'required': ['name', 'date'], 'title': 'CalendarEvent', 'type': 'object' }, modality="text", n_consensus=1 # 1 means disabled (default), if greater than 1 it will run the extraction with n-consensus mode)
Converts any document into structured text content with page-by-page extraction. This method processes various document types and returns plain text content along with usage information and metadata. Perfect for OCR tasks, text extraction, and preparing documents for further processing or indexing.
Retab’s document processing pipeline automatically converts various file types into LLM-ready formats, eliminating the need for custom parsers. This guide explains how to process different document types and understand the resulting output format.
Converts any document into OpenAI-compatible chat messages. You can choose between different preprocessing parameters according to your needs: modalities (text, image, native) and image settings (dpi, browser_canvas, etc..).
Converts any document and a json schema into OpenAI-compatible responses input. You can choose between different preprocessing parameters according to your needs: modalities (text, image, native) and image settings (dpi, browser_canvas, etc..).