Model Routing

Retab provides intelligent model routing through two special model identifiers: auto-large and auto-small. These models automatically route your requests to the current best-performing model based on availability, performance, and speed metrics. This means you don’t need to manually update your model selection when new, better-performing models become available - Retab handles the routing for you, ensuring your applications always use the optimal model for your use case.

Sync & Async Client

Retab offers both synchronous and asynchronous client interfaces, making it versatile for different application needs. The asynchronous client (AsyncRetab) is ideal for high-performance, non-blocking applications where multiple tasks run concurrently. For simpler or blocking operations, the synchronous client (Retab) provides a straightforward approach.

Here’s how you can use both:

# Async client
from retab import AsyncRetab

async def fetch_models():
    reclient = AsyncRetab()
    models = await reclient.models.list()
    print(models)

# Sync client
from retab import Retab

client = Retab()
models = client.models.list()
print(models)

Both clients provide the same core functionality, enabling you to list models, create messages, extract data from documents, and more, with the flexibility to match your application’s concurrency model.

Pagination

Many top-level resources have support for bulk fetches via list API methods. For instance, you can list extraction links, list email addresses, and list logs. These list API methods share a common structure, taking at least these four parameters: limit, order, after, and before.

Retab utilizes pagination via the after and before parameters. Both parameters take an existing object ID value and return objects in either descending or ascending order by creation time.

Idempotency

The Retab API supports idempotency which guarantees that performing the same operation multiple times will have the same result as if the operation were performed only once. This is handy in situations where you may need to retry a request due to a failure or prevent accidental duplicate requests from creating more than one resource.

To achieve idempotency, you can add Idempotency-Key request header to any Retab API request with a unique string as the value. Each subsequent request matching this unique string will return the same response. We suggest using v4 UUIDs for idempotency keys to avoid collisions.

Idempotency key example
curl --request POST \
  --url https://api.retab.com/v1/emails/tests/webhook \
  -H "Authorization: Bearer sk_test_a2V5XzAxSkgwVjhSN1ZaRTlYUzJYQzhOOTVRVDMzLEJSa3BzTEFuUTRVUWF5dEV5ZHpnRVZpVkI" \
  -H "Idempotency-Key: cd320c5c-e928-4212-a5bd-986c29362867" \

Idempotency keys expire after 24 hours. The Retab API will generate a new response if you submit a request with an expired key.

Rate Limits

Retab implements rate limiting to ensure stable service for all users. The API uses a rolling window rate limit with the following configuration:

  • 300 requests per 60-second window
  • Applies across the following API endpoints:
    • POST /v1/documents/extractions
    • POST /v1/documents/create_messages

When you exceed the rate limit, the API will return a 429 Too Many Requests response. The response headers will include:

Status 429 - {'detail': 'Rate limit exceeded. Please try again later.'}

For high-volume applications, we can provide a dedicated plan. Contact us for more information.

Modality

LLM works with text and image data. Retab converts documents into different modalities, based on the document type.

Native modalities

Here are the list of native modalities supported by Retab:

TEXT_TYPES = Literal[".txt", ".csv", ".tsv", ".md", ".log", ".html", ".htm", ".xml", ".json", ".yaml", ".yml", ".rtf", ".ini", ".conf", ".cfg", ".nfo", ".srt", ".sql", ".sh", ".bat", ".ps1", ".js", ".jsx", ".ts", ".tsx", ".py", ".java", ".c", ".cpp", ".cs", ".rb", ".php", ".swift", ".kt", ".go", ".rs", ".pl", ".r", ".m", ".scala"]

You can also use the modality parameter to specify the modality of the document and override the default modality.

import json
from retab.client import Retab

with open("booking_confirmation_json_schema.json", "r") as f:
    json_schema = json.load(f)

reclient = Retab()

response = reclient.documents.extract(
    json_schema = json_schema,
    document="booking_confirmation.jpg",
    model="gpt-4.1-nano",
    temperature=0,
    modality='text' # The image will be converted to text (with an OCR model) before being sent to the LLM
)

Image Settings

When processing images, several factors can affect the LLM’s ability to accurately interpret and extract information. The image_resolution_dpi and browser_canvas parameters allow you to tune images settings to improve extraction quality.

API Reference

image_resolution_dpi
integer

The DPI of the image. Defaults to 96.

browser_canvas
string

The canvas size of the browser. Must be one of:

  • “A3” (11.7in x 16.54in)
  • “A4” (8.27in x 11.7in)
  • “A5” (5.83in x 8.27in) Defaults to “A4”.

Consensus

You can leverage the consensus feature to improve the accuracy of the extraction. The consensus feature is a way to aggregate the results of multiple LLMs to improve the accuracy of the extraction.

The consensus principle is simple: Multiple runs should give the same result, if the result is not the same, the LLM is not confident about the result so neither should you. We compute a consensus score for each field.

Some additional _consensus_score fields are added to the likelihoods object, they are computed as the average of the consensus scores within some context.

import json
from retab.client import Retab

with open("booking_confirmation_json_schema.json", "r") as f:
    json_schema = json.load(f)

reclient = Retab()

response = reclient.documents.extract(
    json_schema = json_schema,
    document="booking_confirmation.jpg",
    model="gpt-4.1-nano",
    n_consensus=10  # This will run and combine the results of 10 calls to the same LLM
)