Workflows

What are Workflows?

Workflows are visual, node-based pipelines that let you chain together multiple document processing operations. Instead of writing code for each step, you can drag and drop nodes onto a canvas, connect them, and create powerful document automation flows. A workflow typically consists of:

Input nodes - Entry points for data:
- Document - Upload files (PDF, images, Word, Excel)
- JSON Input - Pass structured JSON data
- Text Input - Pass plain text strings
Processing nodes - Operations like Extract, Parse, Split, Classifier
Logic nodes - Conditional flows like Human-in-the-Loop, Functions, If/Else routing, and API Call
Output nodes (Webhook) - Destinations for your processed data

Creating a Workflow

Navigate to the Workflows section in your dashboard
Click Create Workflow to open a new canvas
Drag nodes from the sidebar onto the canvas
Connect nodes by dragging from output handles to input handles
Configure each node by clicking on it
Your workflow auto-saves as you build

Connecting Nodes

Nodes communicate through handles that define the type of data they accept or produce:

Handle Type	Icon	Description
File	📎	Document files (PDF, images, Word, Excel)
JSON	`{ }`	Structured data extracted from documents
Text	📄	Plain text or instruction strings

Connection Rules

File → File: Pass documents between processing nodes
JSON → JSON: Pass extracted data between logic nodes
JSON → Text: JSON data can connect to text inputs (e.g., for template instructions)
Each input handle accepts only one connection
Connections validate automatically to prevent incompatible links

Edit Mode vs Run Mode

Workflows have two operational modes:

Edit Mode

Add, remove, and configure nodes
Create and delete connections
Rename the workflow
View generated Python code

Run Mode

Upload documents to input nodes
Execute the workflow step-by-step
View results at each stage
Download processed files and extracted data

Toggle between modes using the switch at the top of the canvas.

Running a Workflow

From the Dashboard

Switch to Run Mode
Upload a document to each Document input node
Click Run Workflow
Watch as each node processes (status indicators show progress)
Click on output handles to view results

Using the SDK

Workflows support three types of inputs:

documents: For Document (start) nodes - file inputs
json_inputs: For JSON Input (start_json) nodes - structured data
text_inputs: For Text Input (start_text) nodes - plain text

from retab import Retab
from pathlib import Path

client = Retab()

# Run a workflow with documents only
run = client.workflows.runs.create(
    workflow_id="wf_abc123",
    documents={
        "document-node-id": Path("path/to/invoice.pdf")
    }
)

# Run a workflow with documents and JSON data
run = client.workflows.runs.create(
    workflow_id="wf_abc123",
    documents={
        "document-node-id": Path("path/to/invoice.pdf")
    },
    json_inputs={
        "json-node-id": {"customer_id": "cust_123", "priority": "high"}
    }
)

# Run a workflow with all input types
run = client.workflows.runs.create(
    workflow_id="wf_abc123",
    documents={
        "document-node-id": Path("path/to/invoice.pdf")
    },
    json_inputs={
        "json-node-id": {"customer_id": "cust_123"}
    },
    text_inputs={
        "text-node-id": "Process with high priority"
    }
)

# Poll for completion
import time
while run.status in ["pending", "running"]:
    time.sleep(1)
    run = client.workflows.runs.get(run.id)

# Access the results
for step in run.steps:
    print(f"{step.node_id}: {step.status}")
    if step.output:
        print(f"  Output: {step.output}")

Workflow Execution Order

Workflows execute in topological order based on the node connections:

Start from Document input nodes
Process each node once all its inputs are ready
Continue until all nodes are processed or an error occurs
Send results to any Webhook output nodes

If a node fails, execution stops and the error is displayed on that node.

Conditional Routing

When using Classifier or If/Else nodes, only the branches that receive data are executed. Nodes on skipped branches are marked as “skipped” rather than failed.

Viewing Generated Code

Every workflow can be exported as Python code. Click View Code in the sidebar to see the equivalent SDK calls for your workflow. This is useful for:

Integrating workflows into your existing codebase
Running workflows in production environments
Understanding how the visual nodes translate to API calls

Best Practices

Start simple

Begin with a single Extract or Parse node, then gradually add complexity. Test each addition before moving on.

Use descriptive labels

Rename nodes to describe their purpose (e.g., “Invoice Data” instead of “Extract 1”). This makes complex workflows easier to understand.

Add notes for documentation

Use Note nodes to document sections of your workflow. They don’t affect execution but help explain the logic.

Validate with Human-in-the-Loop

For critical data, add a HIL node after extraction. This ensures a human reviews low-likelihood results before they proceed.

Use Classifier for document routing

When processing different document types, use a Classifier node to route each document to the appropriate extraction schema.

Test with sample documents

Before deploying, run your workflow with representative sample documents to catch edge cases.

Example: Invoice Processing Workflow

Here’s a common workflow pattern for processing invoices:

Start node accepts the invoice PDF
Extract node pulls out vendor, amount, date, line items
HIL node flags low-likelihood extractions for human review
End node sends verified data to your webhook

Example: Multi-Document Classification Workflow

For workflows that process mixed document bundles:

Classifier routes documents by category (Invoice, Contract, Receipt)
Each Extract node uses a document-specific schema
Functions nodes compute derived fields for each document type
Merge JSON combines results from all branches into a single output

​What are Workflows?

​Creating a Workflow

​Connecting Nodes

​Connection Rules

​Edit Mode vs Run Mode

​Edit Mode

​Run Mode

​Running a Workflow

​From the Dashboard

​Using the SDK

​Workflow Execution Order

​Conditional Routing

​Viewing Generated Code

​Best Practices

​Example: Invoice Processing Workflow

​Example: Multi-Document Classification Workflow