Skip to main content
POST
/
v1
/
documents
/
extract
from retab import Retab

client = Retab()
response = client.documents.extract(
    json_schema = "Invoice_schema.json",
    document = "Invoice.pdf",
    model="gpt-4.1-nano",
    temperature=0
)
{
    "content": {
        "id": "chatcmpl-AoBs45TNWTB1VKGSXV7NAwCnxMaNN",
        "choices": [
            {
                "finish_reason": "stop",
                "index": 0,
                "logprobs": null,
                "message": {
                    "content": "{\"name\": \"Confirmation d'affr\\u00e9tement\", \"date\": \"2024-11-08\"}",
                    "refusal": null,
                    "role": "assistant",
                    "audio": null,
                    "function_call": null,
                    "tool_calls": [],
                    "parsed": {
                        "name": "Confirmation d'affr\u00e9tement",
                        "date": "2024-11-08"
                    }
                }
            }
        ],
        "created": 1736525396,
        "model": "gpt-4.1-nano",
        "object": "chat.completion",
        "service_tier": "default",
        "system_fingerprint": "fp_f2cd28694a",
        "usage": {
            "completion_tokens": 20,
            "prompt_tokens": 2760,
            "total_tokens": 2780,
            "completion_tokens_details": {
                "accepted_prediction_tokens": 0,
                "audio_tokens": 0,
                "reasoning_tokens": 0,
                "rejected_prediction_tokens": 0
            },
            "prompt_tokens_details": {
                "audio_tokens": 0,
                "cached_tokens": 0
            }
        },
        "likelihoods": {
            "name": 0.7227993785831323,
            "date": 0.7306298416895017
        }
    },
    "error": null
}

from retab import Retab

client = Retab()
response = client.documents.extract(
    json_schema = "Invoice_schema.json",
    document = "Invoice.pdf",
    model="gpt-4.1-nano",
    temperature=0
)
{
    "content": {
        "id": "chatcmpl-AoBs45TNWTB1VKGSXV7NAwCnxMaNN",
        "choices": [
            {
                "finish_reason": "stop",
                "index": 0,
                "logprobs": null,
                "message": {
                    "content": "{\"name\": \"Confirmation d'affr\\u00e9tement\", \"date\": \"2024-11-08\"}",
                    "refusal": null,
                    "role": "assistant",
                    "audio": null,
                    "function_call": null,
                    "tool_calls": [],
                    "parsed": {
                        "name": "Confirmation d'affr\u00e9tement",
                        "date": "2024-11-08"
                    }
                }
            }
        ],
        "created": 1736525396,
        "model": "gpt-4.1-nano",
        "object": "chat.completion",
        "service_tier": "default",
        "system_fingerprint": "fp_f2cd28694a",
        "usage": {
            "completion_tokens": 20,
            "prompt_tokens": 2760,
            "total_tokens": 2780,
            "completion_tokens_details": {
                "accepted_prediction_tokens": 0,
                "audio_tokens": 0,
                "reasoning_tokens": 0,
                "rejected_prediction_tokens": 0
            },
            "prompt_tokens_details": {
                "audio_tokens": 0,
                "cached_tokens": 0
            }
        },
        "likelihoods": {
            "name": 0.7227993785831323,
            "date": 0.7306298416895017
        }
    },
    "error": null
}

Authorizations

Api-Key
string
header
required

Headers

Idempotency-Key
string | null
Idempotency-ForceRefresh
boolean
default:false

Body

application/json
model
string
required

Model used for chat completion

json_schema
object
required

JSON schema format used to validate the output data.

document
object

Document to be analyzed

documents
MIMEData · object[]

Documents to be analyzed (preferred over document)

image_resolution_dpi
integer
default:96

Resolution of the image sent to the LLM

temperature
number
default:0

Temperature for sampling. If not provided, the default temperature for the model will be used.

Examples:

0

reasoning_effort
enum<string> | null
default:minimal

The effort level for the model to reason about the input data. If not provided, the default reasoning effort for the model will be used.

Available options:
minimal,
low,
medium,
high
n_consensus
integer
default:1

Number of consensus models to use for extraction. If greater than 1 the temperature cannot be 0.

stream
boolean
default:false

If true, the extraction will be streamed to the user using the active WebSocket connection

seed
integer | null

Seed for the random number generator. If not provided, a random seed will be generated.

Examples:

null

store
boolean
default:true

If true, the extraction will be stored in the database

need_validation
boolean
default:false

If true, the extraction will be validated against the schema

test_exception
enum<string> | null
Available options:
before_handle_extraction,
within_extraction_parse_or_stream,
after_handle_extraction,
within_process_document_stream_generator

Response

Successful Response

id
string
required
choices
RetabParsedChoice · object[]
required
created
integer
required
model
string
required
object
string
required
Allowed value: "chat.completion"
service_tier
enum<string> | null
Available options:
auto,
default,
flex,
scale,
priority
system_fingerprint
string | null
usage
object | null
extraction_id
string | null
likelihoods
object | null

Object defining the uncertainties of the fields extracted when using consensus. Follows the same structure as the extraction object.

schema_validation_error
object | null
request_at
string<date-time> | null

Timestamp of the request

first_token_at
string<date-time> | null

Timestamp of the first token of the document. If non-streaming, set to last_token_at

last_token_at
string<date-time> | null

Timestamp of the last token of the document