Create Parse - Retab Docs

from retab import Retab

client = Retab()

parse = client.parses.create(
    document="document.pdf",
    model="retab-small",
    table_parsing_format="markdown",
    image_resolution_dpi=192,
)

print(f"Parse ID: {parse.id}")
print(f"Filename: {parse.file.filename}")
print(f"Full text: {parse.output.text}")
for i, page in enumerate(parse.output.pages):
    print(f"Page {i + 1}: {page}")

{
  "id": "parse_01G34H8J2K",
  "file": {
    "id": "file_6dd6eb00688ad8d1",
    "filename": "document.pdf",
    "mime_type": "application/pdf"
  },
  "model": "retab-small",
  "table_parsing_format": "markdown",
  "image_resolution_dpi": 192,
  "output": {
    "pages": [
      "# Document Title\n\nFirst page content with a markdown table...",
      "Second page content continues here...",
      "Third and final page content..."
    ],
    "text": "# Document Title\n\nFirst page content with a markdown table...\n\nSecond page content continues here...\n\nThird and final page content..."
  },
  "usage": {
    "page_count": 3,
    "credits": 1.5
  },
  "created_at": "2024-03-15T10:30:00Z"
}

POST

parses

from retab import Retab

client = Retab()

parse = client.parses.create(
    document="document.pdf",
    model="retab-small",
    table_parsing_format="markdown",
    image_resolution_dpi=192,
)

print(f"Parse ID: {parse.id}")
print(f"Filename: {parse.file.filename}")
print(f"Full text: {parse.output.text}")
for i, page in enumerate(parse.output.pages):
    print(f"Page {i + 1}: {page}")

{
  "id": "parse_01G34H8J2K",
  "file": {
    "id": "file_6dd6eb00688ad8d1",
    "filename": "document.pdf",
    "mime_type": "application/pdf"
  },
  "model": "retab-small",
  "table_parsing_format": "markdown",
  "image_resolution_dpi": 192,
  "output": {
    "pages": [
      "# Document Title\n\nFirst page content with a markdown table...",
      "Second page content continues here...",
      "Third and final page content..."
    ],
    "text": "# Document Title\n\nFirst page content with a markdown table...\n\nSecond page content continues here...\n\nThird and final page content..."
  },
  "usage": {
    "page_count": 3,
    "credits": 1.5
  },
  "created_at": "2024-03-15T10:30:00Z"
}

Parse a document into normalized text and persist the result as a Parse resource that can later be retrieved via GET /v1/parses/{parse_id} or listed via GET /v1/parses.

from retab import Retab

client = Retab()

parse = client.parses.create(
    document="document.pdf",
    model="retab-small",
    table_parsing_format="markdown",
    image_resolution_dpi=192,
)

print(f"Parse ID: {parse.id}")
print(f"Filename: {parse.file.filename}")
print(f"Full text: {parse.output.text}")
for i, page in enumerate(parse.output.pages):
    print(f"Page {i + 1}: {page}")

{
  "id": "parse_01G34H8J2K",
  "file": {
    "id": "file_6dd6eb00688ad8d1",
    "filename": "document.pdf",
    "mime_type": "application/pdf"
  },
  "model": "retab-small",
  "table_parsing_format": "markdown",
  "image_resolution_dpi": 192,
  "output": {
    "pages": [
      "# Document Title\n\nFirst page content with a markdown table...",
      "Second page content continues here...",
      "Third and final page content..."
    ],
    "text": "# Document Title\n\nFirst page content with a markdown table...\n\nSecond page content continues here...\n\nThird and final page content..."
  },
  "usage": {
    "page_count": 3,
    "credits": 1.5
  },
  "created_at": "2024-03-15T10:30:00Z"
}

Authorizations

Api-Key

string

header

required

Body

application/json

Public create-parse request body.

document

MIMEData · object

required

A file represented by its filename and a base64 data url.

MIMEData
FileRef

Show child attributes

model

string

default:retab-small

The model to use for parsing

table_parsing_format

enum<string>

default:html

Format used to render tables extracted from the document

Available options:

markdown,

yaml,

html,

json

image_resolution_dpi

integer

default:192

DPI used when rasterizing pages for the parser

Required range: x >= 72

instructions

string | null

Free-form instructions appended to the system prompt to steer the parse.

bust_cache

boolean

default:false

If true, skip the LLM cache and force a fresh completion

background

boolean

default:false

If true, run asynchronously: returns immediately with status 'queued' and an empty output. Poll GET /v1//{id} until status is terminal. Mutually exclusive with stream.

Response

Successful Response

A parse result: the per-page and full-document text extracted from a document.

string

required

Unique identifier of the parse

file

FileRef · object

required

Information about the parsed file

Show child attributes

model

string

required

Model used for parsing

table_parsing_format

enum<string>

required

Format used to render tables extracted from the document

Available options:

markdown,

yaml,

html,

json

image_resolution_dpi

integer

required

DPI used when rasterizing pages for the parser

output

ParseOutput · object

required

The parsed document content

Show child attributes

instructions

string | null

Free-form instructions supplied with the parse request.

status

enum<string>

default:pending

Lifecycle status. The synchronous path returns 'completed'. Background runs progress pending -> queued -> in_progress -> completed | failed | cancelled.

Available options:

pending,

queued,

in_progress,

completed,

failed,

cancelled

error

PrimitiveError · object

Error details when a background run fails; null otherwise. Always present so consumers can read it without an existence check.

Show child attributes

usage

RetabUsage · object

Usage information for the parse operation

Show child attributes

created_at

string<date-time> | null

Delete Classification List Parses