Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.retab.com/llms.txt

Use this file to discover all available pages before exploring further.

Run long-running document tasks asynchronously. Some document operations can take longer than a normal request/response cycle, especially for large files, multi-page PDFs, or consensus runs. Background mode lets the API accept the request, return a resource immediately, and continue the work after the client disconnects. To start a background run, pass background: true when creating a primitive resource such as a parse, extraction, classification, split, partition, or edit. The initial response includes an id and a lifecycle status; poll the resource until it reaches a terminal state.

Create a background run

Set background to true on the create request. The response returns before the output is ready, usually with status queued or pending.
Python
from retab import Retab

client = Retab()

extraction = client.extractions.create(
    document="invoice.pdf",
    json_schema={
        "type": "object",
        "properties": {
            "invoice_number": {"type": "string"},
            "total": {"type": "number"},
        },
        "required": ["invoice_number", "total"],
    },
    model="retab-small",
    background=True,
)

print(extraction.id)
print(extraction.status)
TypeScript
import { Retab } from "@retab/node";

const client = new Retab({ apiKey: process.env.RETAB_API_KEY });

const extraction = await client.extractions.create(
  "invoice.pdf",
  {
    type: "object",
    properties: {
      invoice_number: { type: "string" },
      total: { type: "number" },
    },
    required: ["invoice_number", "total"],
  },
  "retab-small",
  undefined,
  undefined,
  undefined,
  undefined,
  undefined,
  undefined,
  false,
  true
);

console.log(extraction.id);
console.log(extraction.status);
cURL
curl https://api.retab.com/v1/extractions \
  -H "Api-Key: $RETAB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "document": {
      "filename": "invoice.pdf",
      "url": "data:application/pdf;base64,..."
    },
    "json_schema": {
      "type": "object",
      "properties": {
        "invoice_number": {"type": "string"},
        "total": {"type": "number"}
      },
      "required": ["invoice_number", "total"]
    },
    "model": "retab-small",
    "background": true
  }'

Poll a background run

Retrieve the resource by ID to check progress. Keep polling while the status is pending, queued, or in_progress. Once the status changes to completed, failed, or cancelled, the run has reached a terminal state.
Python
from time import sleep

terminal_statuses = {"completed", "failed", "cancelled"}

while extraction.status not in terminal_statuses:
    print(f"Current status: {extraction.status}")
    sleep(2)
    extraction = client.extractions.get(extraction.id)

print(f"Final status: {extraction.status}")

if extraction.status == "completed":
    print(extraction.output)
elif extraction.status == "failed":
    print(extraction.error)
TypeScript
const terminalStatuses = new Set(["completed", "failed", "cancelled"]);

while (!terminalStatuses.has(extraction.status ?? "")) {
  console.log(`Current status: ${extraction.status}`);
  await new Promise((resolve) => setTimeout(resolve, 2000));
  extraction = await client.extractions.get(extraction.id);
}

console.log(`Final status: ${extraction.status}`);

if (extraction.status === "completed") {
  console.log(extraction.output);
} else if (extraction.status === "failed") {
  console.log(extraction.error);
}
cURL
curl https://api.retab.com/v1/extractions/extr_01G34H8J2K \
  -H "Api-Key: $RETAB_API_KEY"

Cancel a background run

You can cancel a run that has not reached a terminal state. Cancellation is idempotent for primitive resources: if the run already finished, Retab returns the current resource state.
Python
extraction = client.extractions.create_extraction_cancel("extr_01G34H8J2K")
print(extraction.status)
TypeScript
const extraction = await client.extractions.create_extraction_cancel(
  "extr_01G34H8J2K"
);

console.log(extraction.status);
cURL
curl -X POST https://api.retab.com/v1/extractions/extr_01G34H8J2K/cancel \
  -H "Api-Key: $RETAB_API_KEY"

Streaming and background mode

background and stream are mutually exclusive. Use streaming when you need incremental events on a live connection. Use background mode when you want the task to keep running even if the client closes the connection, then poll the resource later.

Supported resources

The same pattern applies across primitives:
ResourceCreate with backgroundPoll statusCancel
ClassificationsPOST /v1/classificationsGET /v1/classifications/{classification_id}POST /v1/classifications/{classification_id}/cancel
ExtractionsPOST /v1/extractionsGET /v1/extractions/{extraction_id}POST /v1/extractions/{extraction_id}/cancel
SplitsPOST /v1/splitsGET /v1/splits/{split_id}POST /v1/splits/{split_id}/cancel
PartitionsPOST /v1/partitionsGET /v1/partitions/{partition_id}POST /v1/partitions/{partition_id}/cancel
EditsPOST /v1/editsGET /v1/edits/{edit_id}POST /v1/edits/{edit_id}/cancel
ParsesPOST /v1/parsesGET /v1/parses/{parse_id}POST /v1/parses/{parse_id}/cancel

Limits

  1. Background runs return before the output is ready. Read output only after the resource reaches completed.
  2. Background mode cannot be combined with streaming.
  3. Poll by resource ID. Do not depend on a long-lived HTTP connection for completion.
  4. Failed runs expose error details on the resource’s error field.