parse
method in Retab’s document processing pipeline converts any document into cleaned, raw markdown text with page-by-page extraction. This endpoint is ideal for extracting cleaned document content to be used as context for downstream processing, such as RAG pipelines, custom ingestion pipelines, embeddings classification, and content indexing workflows.
The typical RAG workflow follows these steps:
parse
methodparse
provides:
gemini-2.5-pro
: Most accurate and robust model, recommended for complex or high-stakes document parsing tasks.gemini-2.5-flash
: Best for speed and cost-effectiveness, suitable for most general-purpose documents.gemini-2.5-flash-lite
: Fastest and most cost-efficient, ideal for simple documents or high-volume batch processing where maximum throughput is needed.doc_msg.items
to have a list of [PIL.Image.Image | str]
objects