Introduction
Parse is a first-class resource:
client.parses.create(...) (Python) / client.parses.create({...}) (Node) persist the result and make it retrievable via client.parses.get(id) and client.parses.list().parses.create method turns a document into normalized text content, returned both page-by-page and as one combined string. It is the right tool when you need readable document text for RAG pipelines, search indexing, prompting, debugging, or any workflow that works on free text rather than schema-constrained extraction.
Unlike extract, parse does not try to fit the document into a JSON schema. Instead, it returns a Parse resource with:
output.pages: one parsed string per pageoutput.text: the full document content as a single stringfile: basic file metadata (id,filename,mime_type)usage: page count and credits consumedid,created_at,updated_at: resource identifiers and timestamps
html, markdown, yaml, or json, depending on what your downstream system expects.
For chunking, chonkie is a good fit for RAG-style pipelines.
Parse API
A persisted parse resource with text content and usage metadata.
Use Case: Preparing Documents For RAG
This pattern is useful when you want Retab to handle document parsing and your application to handle chunking and indexing.Best Practices
When To Use Parse
- Use
parsewhen your downstream system wants readable text. - Use
extractwhen you need typed fields that match a schema.
Picking A Table Format
- Use
markdownfor chunking, prompting, and most RAG pipelines. - Use
htmlwhen preserving table structure matters more than readability. - Use
jsonoryamlwhen another parser will consume the table output directly.
Choosing DPI
- Start with
192DPI for general-purpose parsing. - Drop to
96DPI when throughput matters more than OCR quality. - Increase toward
300DPI for scans, fine print, or low-quality images.
Indexing Advice
- Store the page number with every chunk you create from
parse.output.pages. - Keep the original
parse.file.idorparse.file.filenamealongside indexed text so retrieval results remain traceable.