Skip to main content
POST
/
v1
/
parses
from retab import Retab

client = Retab()

parse = client.parses.create(
    document="document.pdf",
    model="retab-small",
    table_parsing_format="markdown",
    image_resolution_dpi=192,
)

print(f"Parse ID: {parse.id}")
print(f"Filename: {parse.file.filename}")
print(f"Full text: {parse.output.text}")
for i, page in enumerate(parse.output.pages):
    print(f"Page {i + 1}: {page}")
{
  "id": "parse_01G34H8J2K",
  "organization_id": "org_abc123",
  "file": {
    "id": "file_6dd6eb00688ad8d1",
    "filename": "document.pdf",
    "mime_type": "application/pdf"
  },
  "model": "retab-small",
  "table_parsing_format": "markdown",
  "image_resolution_dpi": 192,
  "output": {
    "pages": [
      "# Document Title\n\nFirst page content with a markdown table...",
      "Second page content continues here...",
      "Third and final page content..."
    ],
    "text": "# Document Title\n\nFirst page content with a markdown table...\n\nSecond page content continues here...\n\nThird and final page content..."
  },
  "usage": {
    "page_count": 3,
    "credits": 1.5
  },
  "created_at": "2024-03-15T10:30:00Z",
  "updated_at": "2024-03-15T10:30:00Z"
}
Parse a document into normalized text and persist the result as a Parse resource that can later be retrieved via GET /v1/parses/{parse_id} or listed via GET /v1/parses.
from retab import Retab

client = Retab()

parse = client.parses.create(
    document="document.pdf",
    model="retab-small",
    table_parsing_format="markdown",
    image_resolution_dpi=192,
)

print(f"Parse ID: {parse.id}")
print(f"Filename: {parse.file.filename}")
print(f"Full text: {parse.output.text}")
for i, page in enumerate(parse.output.pages):
    print(f"Page {i + 1}: {page}")
{
  "id": "parse_01G34H8J2K",
  "organization_id": "org_abc123",
  "file": {
    "id": "file_6dd6eb00688ad8d1",
    "filename": "document.pdf",
    "mime_type": "application/pdf"
  },
  "model": "retab-small",
  "table_parsing_format": "markdown",
  "image_resolution_dpi": 192,
  "output": {
    "pages": [
      "# Document Title\n\nFirst page content with a markdown table...",
      "Second page content continues here...",
      "Third and final page content..."
    ],
    "text": "# Document Title\n\nFirst page content with a markdown table...\n\nSecond page content continues here...\n\nThird and final page content..."
  },
  "usage": {
    "page_count": 3,
    "credits": 1.5
  },
  "created_at": "2024-03-15T10:30:00Z",
  "updated_at": "2024-03-15T10:30:00Z"
}

Request Body

document
MIMEData
required
The document to parse. HTTP callers must pass a MIMEData object with filename and url (a data URL or an https URL). The Python and Node SDKs also accept file paths, file-like objects, images, buffers, and URLs and convert them for you.
model
string
default:"retab-small"
The model used for parsing.
table_parsing_format
"markdown" | "yaml" | "html" | "json"
default:"html"
Controls how tables are represented in the parsed text.
image_resolution_dpi
integer
default:"192"
DPI used when rasterizing pages for OCR-backed parsing. Accepted values are 96 to 300.
bust_cache
boolean
default:"false"
When true, bypass the parse cache and re-run the parse even if an identical request was recently fulfilled.

Response Fields

id
string
Unique parse identifier.
file
object
The parsed document’s file metadata: id, filename, mime_type.
model
string
The model that produced this parse.
table_parsing_format
string
Table rendering format used for this parse.
image_resolution_dpi
integer
DPI used when rasterizing pages.
output
object
The parsed content.
usage
RetabUsage | null
Processing usage information including page_count and credits.
created_at
string
ISO 8601 creation timestamp.
updated_at
string
ISO 8601 last update timestamp.

Authorizations

Api-Key
string
header
required

Query Parameters

access_token
string | null

Body

application/json
document
MIMEData · object
required

The document to parse

model
string
default:retab-small

The model to use for parsing

table_parsing_format
enum<string>
default:html

Format used to render tables extracted from the document

Available options:
markdown,
yaml,
html,
json
image_resolution_dpi
integer
default:192

DPI used when rasterizing pages for the parser

Required range: 96 <= x <= 300
bust_cache
boolean
default:false

If true, skip the LLM cache and force a fresh completion

Response

Successful Response

Backend-internal parse with organization scoping.

file
FileRef · object
required

Information about the parsed file

model
string
required

Model used for parsing

table_parsing_format
enum<string>
required

Format used to render tables extracted from the document

Available options:
markdown,
yaml,
html,
json
image_resolution_dpi
integer
required

DPI used when rasterizing pages for the parser

output
ParseOutput · object
required

The parsed document content

organization_id
string
required

Organization ID of the user or application

id
string

Unique identifier of the parse

usage
RetabUsage · object

Usage information for the parse operation

created_at
string<date-time>
updated_at
string<date-time>