Documents

Introduction

The client.documents module offers a consolidated, production-grade pipeline for processing any types of documents with AI. Our model read documents the way humans do. It accepts native digital files (Images, PDFs, DOCX, XLSX, E-mail) and parses text, detects visual structure across pages, tables, forms, and figures. Please check the API Reference for more details.

The module exposes three high-level methods:

Method	Purpose
`create_messages`	Generates a verbatim, chat-formatted rendition of the document.	Retrieval-augmented generation or “chat with your PDF”.
`create_inputs`	Wraps the document in a developer prompt targeting a supplied JSON schema.	Function-calling or structured extraction with JSON mode.
`extract`	Executes the extraction and returns the parsed object (optionally with consensus voting).	One-step OCR + LLM parsing when only the structured output is required.
`parse`	Converts any document into structured text content with page-by-page extraction.	Perfect for RAG, text extraction, and preparing documents for further processing or indexing.

The complexities of OCR, layout reconstruction are handled internally, allowing to focus solely on downstream prompt and context-engineering logic.

The document data structure

Documents in Retab are represented as MIMEData objects, which encapsulate the file content and metadata. This structure allows you to work with documents in a consistent way regardless of their original format. The url field directly matches OpenAI’s expected format for image inputs.

MIMEData Object Structure

object

{
  "document": {
    "filename": "Alphabet-10Q-Q1-25.pdf",
    "url": "data:application/pdf;base64,JVBERi0xLjQKJfbk/N8KMSAwIG9iago8PAovVHlwZS…"
  }
}

The python SDK is flexible and allows you to use the document parameter as a file path, bytes, or a PIL.Image.Image object, and we will automatically convert it to a MIMEData object for you.

Extract

Returns

ParsedChatCompletion

An OpenAI ParsedChatCompletion object with the extracted data.


from retab import Retab

reclient = Retab()

doc_msg = reclient.documents.extract(
    document = "freight/booking_confirmation.jpg", 
    model="gpt-4.1-nano",
    json_schema = {
      'X-SystemPrompt': 'You are a useful assistant.',
      'properties': {
          'name': {
              'description': 'The name of the calendar event.',
              'title': 'Name',
              'type': 'string'
          },
          'date': {
              'description': 'The date of the calendar event in ISO 8601 format.',
              'title': 'Date',
              'type': 'string'
          }
      },
      'required': ['name', 'date'],
      'title': 'CalendarEvent',
      'type': 'object'
    },
    modality="text",
    n_consensus=1 # 1 means disabled (default), if greater than 1 it will run the extraction with n-consensus mode
)

Parse

Converts any document into structured text content with page-by-page extraction. This method processes various document types and returns plain text content along with usage information and metadata. Perfect for OCR tasks, text extraction, and preparing documents for further processing or indexing.

ParseRequest

Returns

ParseResult Object

A ParseResult object containing the extracted text content and processing information.

from retab import Retab

reclient = Retab()
result = reclient.documents.parse(
    document = "reports/annual-report-2024.pdf",
    fast_mode = False,
    table_parsing_format = "html",
    image_resolution_dpi = 72,
    browser_canvas = "A4"
)

# Access individual pages
for i, page_content in enumerate(result.pages):
    print(f"Page {i + 1}: {page_content[:100]}...")

# Access full document text
full_text = result.text
print(f"Document contains {result.usage.page_count} pages")
print(f"Processing consumed {result.usage.credits} credits")

Make files LLM-ready

Retab’s document processing pipeline automatically converts various file types into LLM-ready formats, eliminating the need for custom parsers. This guide explains how to process different document types and understand the resulting output format.

Supported File Types

Retab supports a wide range of document formats:

Text Documents: PDF, DOC, DOCX, TXT
Spreadsheets: XLS, XLSX, CSV
Emails: EML, MSG
Images: JPG, PNG, TIFF
Presentations: PPT, PPTX
And more: HTML, XML, JSON

Create Messages

Converts any document into OpenAI-compatible chat messages. You can choose between different preprocessing parameters according to your needs: modalities (text, image, native) and image settings (dpi, browser_canvas, etc..).

Returns

DocumentMessage Object

A DocumentMessage object with the messages created from the document.

from retab import Retab
from openai import OpenAI

reclient = Retab()
doc_msg = reclient.documents.create_messages(
    document = "freight/booking_confirmation.jpg",
    modality = "text",
    image_resolution_dpi = 72,
    browser_canvas = "A4"
)

Use doc_msg.items to have a list of [PIL.Image.Image | str] objects

Create Inputs

Converts any document and a json schema into OpenAI-compatible responses input. You can choose between different preprocessing parameters according to your needs: modalities (text, image, native) and image settings (dpi, browser_canvas, etc..).

Returns

DocumentMessage Object

A DocumentMessage object with the document content structured according to the provided JSON schema.

from retab import Retab

reclient = Retab()
doc_input = reclient.documents.create_inputs(
    document = "freight/invoice.pdf",
    json_schema = {
        "properties": {
            "invoice_number": {
                "type": "string",
                "description": "The invoice number"
            },
            "total_amount": {
                "type": "number",
                "description": "The total invoice amount"
            },
            "issue_date": {
                "type": "string",
                "description": "The date the invoice was issued"
            }
        },
        "required": ["invoice_number", "total_amount", "issue_date"]
    },
    modality = "text",
    image_resolution_dpi = 72,
    browser_canvas = "A4"
)

Overview

Core Concepts

Miscellaneous

Introduction

The document data structure

Extract

Parse

Make files LLM-ready

Supported File Types

Create Messages

Create Inputs

Overview

Core Concepts

Miscellaneous

​Introduction

​The document data structure

​Extract

​Parse

​Make files LLM-ready

​Supported File Types

​Create Messages

​Create Inputs

Introduction

The document data structure

Extract

Parse

Make files LLM-ready

Supported File Types

Create Messages

Create Inputs