Documentation Index
Fetch the complete documentation index at: https://docs.retab.com/llms.txt
Use this file to discover all available pages before exploring further.
What are Workflows?
Workflows are visual, block-based pipelines that let you chain together multiple document processing operations. Instead of writing code for each step, you can drag and drop blocks onto a canvas, connect them, and create powerful document automation flows. A workflow typically consists of:- Input blocks - Entry points for data:
- Document - Upload files (PDF, images, Word, Excel)
- JSON Input - Pass structured JSON data
- Processing blocks - Operations like Extract, Parse, Split, Classifier
- Logic blocks - Conditional flows like review gates, Function, If/Else routing, and API Call
Tests
Workflow tests validate individual block outputs with saved inputs and assertions. Use them to check that an Extract, Function, Split, or Classifier block still behaves as expected after you change schemas, prompts, code, or block configuration. Learn more in Tests.Experiments
Experiments measure a block’s consistency by replaying the same Extract, Split, Classifier, or split-by-key For Each block over a fixed document set with multiple consensus passes. Use them to compare block configurations, find low-agreement documents or fields, and decide what needs stricter tests. Learn more in Experiments.Creating a Workflow
- Navigate to the Workflows section in your dashboard
- Click Create Workflow to open a new canvas
- Drag blocks from the sidebar onto the canvas
- Connect blocks by dragging from output handles to input handles
- Configure each block by clicking on it
- Your workflow auto-saves as you build
Connecting Blocks
Blocks communicate through handles that define the type of data they accept or produce:| Handle Type | Icon | Description |
|---|---|---|
| File | 📎 | Document files (PDF, images, Word, Excel) |
| JSON | { } | Structured data extracted from documents |
Connection Rules
- File → File: Pass documents between processing blocks
- JSON → JSON: Pass extracted data between logic blocks
- Each input handle accepts only one connection
- Connections validate automatically to prevent incompatible links
Declarative Workflow Spec
You can also define and manage workflows from YAML. A declarative spec usesapiVersion: workflows.retab.com/v1alpha2 and keeps topology in spec.edges.
Every edge endpoint is explicit: it names the block and the raw runtime handle.
client.workflows.spec:
validate() for parse and handle checks, plan() to preview draft
changes, apply() to reconcile the draft workflow, and get() to get
canonical YAML from an existing workflow.
plan() and apply() return Terraform-style summary, resource_changes,
and rendered_plan fields so clients can inspect exactly what changed.
Call client.workflows.publish(workflow_id) separately when the draft should
become the live published workflow.
For endpoint details, see:
Edit Mode vs Run Mode
Workflows have two operational modes:Edit Mode
- Add, remove, and configure blocks
- Create and delete connections
- Rename the workflow
- View generated Python code
Run Mode
- Upload documents to input blocks
- Execute the workflow step-by-step
- View results at each stage
- Download processed files and extracted data
Running a Workflow
A workflow is fundamentally an asynchronous job. When you start it, Retab creates a workflow run, executes each step on the server, and stores the results on that run. You can then poll the run until it finishes and inspect the stored step outputs. For the SDK and HTTP endpoint details, see the workflow API reference:From the Dashboard
- Switch to Run Mode
- Upload a document to each Document input block
- Click Run Workflow
- Watch as each block processes (status indicators show progress)
- Click on output handles to view results
Using the SDK
The Python, Node, and Go SDKs expose workflow metadata, graph authoring, run execution, and typed step inspection:client.workflows.*/client.Workflows.*forlist(),get(),create(),update(),delete(), andpublish()client.workflows.blocks.*/client.Workflows.Blocks.*andclient.workflows.edges.*/client.Workflows.Edges.*for programmatic graph changesclient.workflows.runs.*/client.Workflows.Runs.*andclient.workflows.steps.*/client.Workflows.Steps.*for running flows and reading results
Discover input block IDs
Workflow run inputs are keyed by the IDs of yourstart_document and start_json blocks. List the workflow’s blocks to discover them.
Run and wait for completion
Workflows support two input maps:documentsfor Document (start_document) blocksjson_inputsfor JSON Input (start_json) blocks
steps.list(run.id) returns the step roster for a run. For the full execution record for one block, including typed inputs and outputs, use steps.get(run.id, block_id).
Inspect step outputs
Start withsteps.list(run.id) when you need the blocks that ran. Then call steps.get(run.id, block_id) for the specific execution record you want to inspect.
Step payloads are normalized into HandlePayload objects. For JSON-producing blocks, extracted_data is shorthand for the default output-json-0 handle.
steps.list(run.id, block_ids=[...]) when you only need a subset of step summaries. Use steps.get(run.id, block_id) when you need the normalized execution record for a single block.
Fetch the artifact record
Some blocks persist a durable artifact record.step.artifact is only the stable
pointer:
client.workflows.artifacts.get(step.artifact) to dereference that pointer.
The response is the backing record flattened with operation at the top level,
so consumers can dispatch on one object without juggling an extra record
wrapper.
workflows.artifacts.list(run.id) dereferences every artifact produced by a
run. Pass operation= or block_id= when you only need a subset.
| operation | produced by | record includes |
|---|---|---|
extraction | extract | extraction result, choices, likelihoods, schema details |
split | split | split result and output document grouping |
classification | classifier | selected class and consensus details |
parse | parse | parsed document content |
edit | edit | edited document result |
partition | for_each_sentinel_start | partitioned items for the loop |
conditional_evaluation | conditional | evaluations, selected_handles, matched_condition_ids |
while_loop_termination | while_loop | termination reason and final condition evaluations |
api_call_invocation | api_call | request/response attempts, retry trace, and final error |
function_invocation | function | function inputs, output, duration, and final error |
Build workflows from code
The same SDK can create and publish workflow graphs:client.workflows.list() or client.workflows.get(workflow_id) when you need to browse existing workflows before launching a run.
Reading Workflow Results
The standard production pattern is to run the workflow, keep the returnedrun.id,
and poll the run until lifecycle.status reaches completed, error, cancelled,
or awaiting_review.
- Start the workflow from the SDK or API
- Receive a
run.idand an initial lifecycle immediately - Poll the workflow run until it finishes or waits for review
- Read the step results from the completed run
Workflow Execution Order
Workflows execute in topological order based on the block connections:- Start from Document input blocks
- Process each block once all its inputs are ready
- Continue until all blocks are processed or an error occurs
- Read outputs from the completed run and its step results
Conditional Routing
When using Classifier or If/Else blocks, only the branches that receive data are executed. Blocks on skipped branches are marked as “skipped” rather than failed.Viewing Generated Code
Every workflow can be exported as Python code. Click View Code in the sidebar to see the equivalent SDK calls for your workflow. This is useful for:- Integrating workflows into your existing codebase
- Running workflows in production environments
- Understanding how the visual blocks translate to API calls
Best Practices
Start simple
Start simple
Begin with a single Extract or Parse block, then gradually add complexity.
Test each addition before moving on.
Use descriptive labels
Use descriptive labels
Rename blocks to describe their purpose (e.g., “Invoice Data” instead of
“Extract 1”). This makes complex workflows easier to understand.
Add notes for documentation
Add notes for documentation
Use Note blocks to document sections of your workflow. They don’t affect
execution but help explain the logic.
Validate with review gates
Validate with review gates
For critical data, add a review gate to the extraction block. This ensures a
reviewer checks low-likelihood results before they proceed.
Use Classifier for document routing
Use Classifier for document routing
When processing different document types, use a Classifier block to route each
document to the appropriate extraction schema.
Test with sample documents
Test with sample documents
Before deploying, run your workflow with representative sample documents to
catch edge cases.
Example: Invoice Processing Workflow
Here’s a common workflow pattern for processing invoices:- Start block accepts the invoice PDF
- Extract block pulls out vendor, amount, date, line items
- The extract block’s review gate flags low-likelihood extractions for review
- Read the verified data from the completed workflow run
Example: Multi-Document Classification Workflow
For workflows that process mixed document bundles:- Classifier routes documents by category (Invoice, Contract, Receipt)
- Each Extract block uses a document-specific schema
- Function blocks compute derived fields for each document type
- Merge JSON combines results from all branches into a single output