from retab import Retab
client = Retab()
response = client.documents.create_messages(
document = "Invoice.pdf",
modality="text"
)
{
"id": "doc_dTjnQl0d-_aTim9hFbHkO",
"object": "document_message",
"messages": [
{
"role": "user",
"content": [
{
"text": "<Unstructured Text Content>\n \u00a0\nINV-2024-001 \u00b7 $XX.XX USD due [DATE]\nPage 1 of 1\nInvoice\nInvoice number INV-2024-001\nDate of issue\n[DATE]\nDate due\n[DATE]\n[VENDOR COMPANY]\n[VENDOR ADDRESS LINE 1]\n[VENDOR ADDRESS LINE 2]\n[VENDOR CITY], [VENDOR STATE] [VENDOR ZIP]\n[VENDOR COUNTRY]\n[VENDOR EMAIL]\n[VENDOR TAX ID]\nBill to\n[CUSTOMER COMPANY]\n[CUSTOMER ADDRESS LINE 1]\n[CUSTOMER ADDRESS LINE 2]\n[CUSTOMER CITY], [CUSTOMER STATE] [CUSTOMER ZIP]\n[CUSTOMER COUNTRY]\n[CUSTOMER EMAIL]\nShip to\n[RECIPIENT NAME]\n[SHIPPING ADDRESS LINE 1]\n[SHIPPING ADDRESS LINE 2]\n[SHIPPING CITY], [SHIPPING STATE] [SHIPPING ZIP]\n[SHIPPING COUNTRY]\n$XX.XX USD due [DATE]\nPay online\nDescription\nQty\nUnit price\nAmount\n[SERVICE DESCRIPTION]\n[SERVICE PERIOD]\n1\n$XX.XX\n$XX.XX\nSubtotal\n$XX.XX\nTotal\n$XX.XX\nAmount due\n$XX.XX\u00a0USD\n\u00a0\n\n </Unstructured Text Content>\n \n <Structured Text Content from LLM>\n # Invoice\n\nInvoice number INV-2024-001\nDate of issue [DATE]\nDate due [DATE]\n\n[VENDOR COMPANY]\n[VENDOR ADDRESS LINE 1]\n[VENDOR ADDRESS LINE 2]\n[VENDOR CITY], [VENDOR STATE] [VENDOR ZIP]\n[VENDOR COUNTRY]\n[VENDOR EMAIL]\n[VENDOR TAX ID]\n\nBill to\n[CUSTOMER COMPANY]\n[CUSTOMER ADDRESS LINE 1]\n[CUSTOMER ADDRESS LINE 2]\n[CUSTOMER CITY], [CUSTOMER STATE] [CUSTOMER ZIP]\n[CUSTOMER COUNTRY]\n[CUSTOMER EMAIL]\n\nShip to\n[RECIPIENT NAME]\n[SHIPPING ADDRESS LINE 1]\n[SHIPPING ADDRESS LINE 2]\n[SHIPPING CITY], [SHIPPING STATE] [SHIPPING ZIP]\n[SHIPPING COUNTRY]\n\n$XX.XX USD due [DATE]\n\nPay online\n\n~~~yaml\ntable:\n caption: \"\"\n dimensions:\n rows: 1\n columns: 4\n header:\n - [Description, Qty, Unit price, Amount]\n body:\n - [[SERVICE DESCRIPTION], 1, $XX.XX, $XX.XX]\n~~~\n\n[SERVICE PERIOD]\n\n~~~yaml\ntable:\n caption: \"\"\n dimensions:\n rows: 3\n columns: 2\n body:\n - [Subtotal, $XX.XX]\n - [Total, $XX.XX]\n - [Amount due, $XX.XX USD]\n~~~\n\nINV-2024-001 - $XX.XX USD due [DATE]\n\nPage 1 of 1\n </Structured Text Content from LLM>\n ",
"type": "text"
}
]
}
],
"created": 1748268213,
"token_count": {
"total_tokens": 615,
"developer_tokens": 0,
"user_tokens": 615
}
}
A route for converting any document into OpenAI-compatible chat messages. You can choose between different preprocessing parameters according to your needs: modalities (text, image, native) and image settings (dpi, browser_canvas, etc..).
from retab import Retab
client = Retab()
response = client.documents.create_messages(
document = "Invoice.pdf",
modality="text"
)
{
"id": "doc_dTjnQl0d-_aTim9hFbHkO",
"object": "document_message",
"messages": [
{
"role": "user",
"content": [
{
"text": "<Unstructured Text Content>\n \u00a0\nINV-2024-001 \u00b7 $XX.XX USD due [DATE]\nPage 1 of 1\nInvoice\nInvoice number INV-2024-001\nDate of issue\n[DATE]\nDate due\n[DATE]\n[VENDOR COMPANY]\n[VENDOR ADDRESS LINE 1]\n[VENDOR ADDRESS LINE 2]\n[VENDOR CITY], [VENDOR STATE] [VENDOR ZIP]\n[VENDOR COUNTRY]\n[VENDOR EMAIL]\n[VENDOR TAX ID]\nBill to\n[CUSTOMER COMPANY]\n[CUSTOMER ADDRESS LINE 1]\n[CUSTOMER ADDRESS LINE 2]\n[CUSTOMER CITY], [CUSTOMER STATE] [CUSTOMER ZIP]\n[CUSTOMER COUNTRY]\n[CUSTOMER EMAIL]\nShip to\n[RECIPIENT NAME]\n[SHIPPING ADDRESS LINE 1]\n[SHIPPING ADDRESS LINE 2]\n[SHIPPING CITY], [SHIPPING STATE] [SHIPPING ZIP]\n[SHIPPING COUNTRY]\n$XX.XX USD due [DATE]\nPay online\nDescription\nQty\nUnit price\nAmount\n[SERVICE DESCRIPTION]\n[SERVICE PERIOD]\n1\n$XX.XX\n$XX.XX\nSubtotal\n$XX.XX\nTotal\n$XX.XX\nAmount due\n$XX.XX\u00a0USD\n\u00a0\n\n </Unstructured Text Content>\n \n <Structured Text Content from LLM>\n # Invoice\n\nInvoice number INV-2024-001\nDate of issue [DATE]\nDate due [DATE]\n\n[VENDOR COMPANY]\n[VENDOR ADDRESS LINE 1]\n[VENDOR ADDRESS LINE 2]\n[VENDOR CITY], [VENDOR STATE] [VENDOR ZIP]\n[VENDOR COUNTRY]\n[VENDOR EMAIL]\n[VENDOR TAX ID]\n\nBill to\n[CUSTOMER COMPANY]\n[CUSTOMER ADDRESS LINE 1]\n[CUSTOMER ADDRESS LINE 2]\n[CUSTOMER CITY], [CUSTOMER STATE] [CUSTOMER ZIP]\n[CUSTOMER COUNTRY]\n[CUSTOMER EMAIL]\n\nShip to\n[RECIPIENT NAME]\n[SHIPPING ADDRESS LINE 1]\n[SHIPPING ADDRESS LINE 2]\n[SHIPPING CITY], [SHIPPING STATE] [SHIPPING ZIP]\n[SHIPPING COUNTRY]\n\n$XX.XX USD due [DATE]\n\nPay online\n\n~~~yaml\ntable:\n caption: \"\"\n dimensions:\n rows: 1\n columns: 4\n header:\n - [Description, Qty, Unit price, Amount]\n body:\n - [[SERVICE DESCRIPTION], 1, $XX.XX, $XX.XX]\n~~~\n\n[SERVICE PERIOD]\n\n~~~yaml\ntable:\n caption: \"\"\n dimensions:\n rows: 3\n columns: 2\n body:\n - [Subtotal, $XX.XX]\n - [Total, $XX.XX]\n - [Amount due, $XX.XX USD]\n~~~\n\nINV-2024-001 - $XX.XX USD due [DATE]\n\nPage 1 of 1\n </Structured Text Content from LLM>\n ",
"type": "text"
}
]
}
],
"created": 1748268213,
"token_count": {
"total_tokens": 615,
"developer_tokens": 0,
"user_tokens": 615
}
}
from retab import Retab
client = Retab()
response = client.documents.create_messages(
document = "Invoice.pdf",
modality="text"
)
{
"id": "doc_dTjnQl0d-_aTim9hFbHkO",
"object": "document_message",
"messages": [
{
"role": "user",
"content": [
{
"text": "<Unstructured Text Content>\n \u00a0\nINV-2024-001 \u00b7 $XX.XX USD due [DATE]\nPage 1 of 1\nInvoice\nInvoice number INV-2024-001\nDate of issue\n[DATE]\nDate due\n[DATE]\n[VENDOR COMPANY]\n[VENDOR ADDRESS LINE 1]\n[VENDOR ADDRESS LINE 2]\n[VENDOR CITY], [VENDOR STATE] [VENDOR ZIP]\n[VENDOR COUNTRY]\n[VENDOR EMAIL]\n[VENDOR TAX ID]\nBill to\n[CUSTOMER COMPANY]\n[CUSTOMER ADDRESS LINE 1]\n[CUSTOMER ADDRESS LINE 2]\n[CUSTOMER CITY], [CUSTOMER STATE] [CUSTOMER ZIP]\n[CUSTOMER COUNTRY]\n[CUSTOMER EMAIL]\nShip to\n[RECIPIENT NAME]\n[SHIPPING ADDRESS LINE 1]\n[SHIPPING ADDRESS LINE 2]\n[SHIPPING CITY], [SHIPPING STATE] [SHIPPING ZIP]\n[SHIPPING COUNTRY]\n$XX.XX USD due [DATE]\nPay online\nDescription\nQty\nUnit price\nAmount\n[SERVICE DESCRIPTION]\n[SERVICE PERIOD]\n1\n$XX.XX\n$XX.XX\nSubtotal\n$XX.XX\nTotal\n$XX.XX\nAmount due\n$XX.XX\u00a0USD\n\u00a0\n\n </Unstructured Text Content>\n \n <Structured Text Content from LLM>\n # Invoice\n\nInvoice number INV-2024-001\nDate of issue [DATE]\nDate due [DATE]\n\n[VENDOR COMPANY]\n[VENDOR ADDRESS LINE 1]\n[VENDOR ADDRESS LINE 2]\n[VENDOR CITY], [VENDOR STATE] [VENDOR ZIP]\n[VENDOR COUNTRY]\n[VENDOR EMAIL]\n[VENDOR TAX ID]\n\nBill to\n[CUSTOMER COMPANY]\n[CUSTOMER ADDRESS LINE 1]\n[CUSTOMER ADDRESS LINE 2]\n[CUSTOMER CITY], [CUSTOMER STATE] [CUSTOMER ZIP]\n[CUSTOMER COUNTRY]\n[CUSTOMER EMAIL]\n\nShip to\n[RECIPIENT NAME]\n[SHIPPING ADDRESS LINE 1]\n[SHIPPING ADDRESS LINE 2]\n[SHIPPING CITY], [SHIPPING STATE] [SHIPPING ZIP]\n[SHIPPING COUNTRY]\n\n$XX.XX USD due [DATE]\n\nPay online\n\n~~~yaml\ntable:\n caption: \"\"\n dimensions:\n rows: 1\n columns: 4\n header:\n - [Description, Qty, Unit price, Amount]\n body:\n - [[SERVICE DESCRIPTION], 1, $XX.XX, $XX.XX]\n~~~\n\n[SERVICE PERIOD]\n\n~~~yaml\ntable:\n caption: \"\"\n dimensions:\n rows: 3\n columns: 2\n body:\n - [Subtotal, $XX.XX]\n - [Total, $XX.XX]\n - [Amount due, $XX.XX USD]\n~~~\n\nINV-2024-001 - $XX.XX USD due [DATE]\n\nPage 1 of 1\n </Structured Text Content from LLM>\n ",
"type": "text"
}
]
}
],
"created": 1748268213,
"token_count": {
"total_tokens": 615,
"developer_tokens": 0,
"user_tokens": 615
}
}
Successful Response
A unique identifier for the document loading.
A list of messages containing the document content and metadata.
Show child attributes
The Unix timestamp (in seconds) of when the document was loaded.
Returns the token count for the document message.
This property calculates token usage based on both text and image content in the messages using the token counting utilities.
Returns: TokenCount: A Pydantic model with total, user, and developer token counts.
Show child attributes
The type of object being loaded.
"document_message"