Introduction
What is Retab?
Retab solves all the major challenges in document processing with LLMs:
- Universal Document Preprocessing: Convert any file type (PDFs, Excel, emails, etc.) into LLM-ready format without writing custom parsers
- Structured, Schema-driven Extraction: Get consistent, reliable outputs using schema-based prompt engineering
- Processors: Publish a live, stable, shareable document processor.
- Automations: Create document processing workflows that can be triggered by events (mailbox, upload link, endpoint, outlook plugin).
- Evaluations: Evaluate the performance of models against annotated datasets
- Optimizations: Identify the most used processors and help you finetune models to reduce costs and improve performance
We are offering you all the software-defined primitives to build your own document processing solutions. We see it as Stripe for document processing.
Our goal is to make the process of analyzing documents and unstructured data as easy and transparent as possible.
Many people haven’t yet realized how powerful LLMs have become at document processing tasks - we’re here to help unlock these capabilities.
Go further
Jupyter Notebooks
You can view minimal notebooks that demonstrate how to use Retab to process documents:
- Mailbox creation quickstart
- Upload Links creation quickstart
- Document Extractions quickstart
- Document Extractions quickstart - Async
Community
Let’s create the future of document processing together!
Join our discord community to share tips, discuss best practices, and showcase what you build. Or just tweet at us.
We can’t wait to see how you’ll use Retab.
Roadmap
We share our roadmap publicly on Github
Among the features we’re working on:
- Node.js SDK
- Low-level speed optimizations for Evals Frontend
- New JSON Reconciliation Model