Introduction
Theedit API in Retab’s document processing pipeline enables intelligent document form filling. It supports PDF, Word (DOCX), Excel (XLSX), and PowerPoint (PPTX) files, automatically detecting fillable elements and populating them based on natural language instructions. This is ideal for automating form completion workflows, document generation, and batch processing of standardized forms.
Supported Formats
| Format | Extension | Processing Method |
|---|---|---|
.pdf | Computer vision + LLM form field detection | |
| Word | .docx, .doc, .odt | Native XML editing (preserves formatting) |
| Excel | .xlsx, .xls, .ods | Native cell editing |
| PowerPoint | .pptx, .ppt, .odp | Native shape/text editing |
How It Works
For PDF files:- Computer Vision: Detect form field bounding boxes with precise coordinates
- LLM Inference: Name and classify detected fields semantically
- Intelligent Filling: Match your instructions to the appropriate form fields
- PDF Generation: Create a new PDF with the filled values
- Structure Extraction: Parse the document’s XML structure to identify all text elements
- Element Detection: LLM identifies fillable placeholders (empty cells, whitespace after labels, placeholder text)
- Native Editing: Apply edits directly to the XML, preserving all original formatting and styles
- Document Generation: Return the filled document in its original format
edit provides:
- Multi-Format Support: PDF, Word, Excel, and PowerPoint files
- Zero Configuration: No need to pre-define field positions or create templates
- Natural Language Instructions: Describe what to fill in plain English or JSON
- Format Preservation: Office files retain all original formatting, styles, and layout
- Automatic Field Matching: LLM intelligently maps your data to form fields
- MIMEData Output: Get the filled document as MIMEData with filename and base64 content
SDK Surface
The edit SDK exposes the same workflow in both clients:client.edits.create(...)for direct document filling (passdocument=...) or template-based filling (passtemplate_id=...)client.edits.templates.create(...)to register a reusable PDF form templateclient.edits.templates.fill(...)to fill a saved template with new instructions
In both SDKs,
color is a top-level argument on edits.create(...) and edits.templates.fill(...).Edit API
The Edit API provides two main approaches:- Direct Fill (
edits.createwithdocument=...): AI-powered document filling that automatically detects and fills form fields - Template Fill (
edits.templates.fill): Optimized filling using pre-defined templates (PDF only)
Direct Fill
The main endpoint for filling documents with AI. Supports all document formats.An
Edit resource with the filled document and form data.Create Template (Pre-Register A Form)
Register a reusable PDF form template with its empty PDF and a list of form fields. You typically detect those fields in the dashboard first, or reuse a form schema you already own, and then persist the template for fast batch filling.Use Case: PDF Form Filling
Fill PDF forms programmatically using natural language instructions.Use Case: Create a Template
Persist an empty PDF together with its list of form fields as a reusable template. Templates are the recommended way to fill the same form repeatedly — they skip field detection on every fill and guarantee consistent field mapping.You can discover field bounding boxes visually in the Retab dashboard editor, then pass the resulting
form_fields list into edits.templates.create(...) to persist it.Use Case: Template-Based Form Filling
Use pre-defined templates for consistent form filling without re-detecting fields each time. This is optimized for batch processing scenarios.When to use templates vs direct fill:
- Use
edits.templates.fill()for batch processing the same PDF form with different data - Use
edits.create()withdocument=...for one-off document filling or when you don’t have a pre-defined template
Use Case: Batch Form Processing
Process multiple forms with different data programmatically. For best performance with repeated forms, use templates.Use Case: Word Document Filling
Fill Word documents (DOCX) while preserving all original formatting and styles.Use Case: Excel Spreadsheet Filling
Fill Excel spreadsheets by targeting specific cells with data.Use Case: PowerPoint Presentation Filling
Fill PowerPoint presentations with dynamic content.Best Practices
Model Selection
retab-large: Most accurate for complex documents with many fields or ambiguous layouts. Recommended for production use.retab-small: Faster and more cost-effective, suitable for simple documents with clear field labels.
Format-Specific Tips
PDF Forms:- Works best with forms that have clear field labels
- Supports both text fields and checkboxes
- For checkboxes, use “checked” or “unchecked” as the value
- Use
edits.templates.create()to register a reusable template with known form fields - Use the
colorparameter to customize the filled text color (e.g.,color="#FF0000"for red)
- Best for documents with placeholders like
[Enter name]or blank spaces after labels - All formatting (fonts, styles, colors) is preserved
- Tables are fully supported
- Ideal for templates with empty cells next to labels
- Supports multiple sheets
- Cell references use standard Excel notation (Sheet1!A1)
- Works with text placeholders in shapes
- Tables within slides are supported
- Preserves all slide formatting and layouts
Writing Effective Filling Instructions
- Use JSON for structured data:
{"name": "John", "date": "2025-01-15"}works great - Be explicit: Use field labels that match or closely resemble those in the document
- Use key-value pairs: Format as “Field Name: Value” for best matching
- Include context: If a document has multiple similar fields, add context like “Section A - Name: John”
Working with MIMEData
- The
filled_documentresponse is a MIMEData object withfilenameandurlproperties - The
urlis a data URI with format-appropriate MIME type - Extract base64 content by splitting on comma:
url.split(",")[1] - Python SDK accepts file paths directly and handles MIMEData conversion automatically
- Output format matches input format (DOCX in → DOCX out, XLSX in → XLSX out)
Template Workflow
- Register: Use
edits.templates.create()with a blank PDF and the list of form fields (bounding boxes + descriptions) - Review: Inspect
template.form_fieldsto verify coverage - Fill: Use
edits.templates.fill()for fast batch processing against the saved template