Skip to main content

What are Workflows?

Workflows are visual, block-based pipelines that let you chain together multiple document processing operations. Instead of writing code for each step, you can drag and drop blocks onto a canvas, connect them, and create powerful document automation flows. A workflow typically consists of:
  • Input blocks - Entry points for data:
    • Document - Upload files (PDF, images, Word, Excel)
    • JSON Input - Pass structured JSON data
  • Processing blocks - Operations like Extract, Parse, Split, Classifier
  • Logic blocks - Conditional flows like Human-in-the-Loop, Function, If/Else routing, and API Call

Creating a Workflow

  1. Navigate to the Workflows section in your dashboard
  2. Click Create Workflow to open a new canvas
  3. Drag blocks from the sidebar onto the canvas
  4. Connect blocks by dragging from output handles to input handles
  5. Configure each block by clicking on it
  6. Your workflow auto-saves as you build

Connecting Blocks

Blocks communicate through handles that define the type of data they accept or produce:
Handle TypeIconDescription
File📎Document files (PDF, images, Word, Excel)
JSON{ }Structured data extracted from documents

Connection Rules

  • File → File: Pass documents between processing blocks
  • JSON → JSON: Pass extracted data between logic blocks
  • Each input handle accepts only one connection
  • Connections validate automatically to prevent incompatible links

Edit Mode vs Run Mode

Workflows have two operational modes:

Edit Mode

  • Add, remove, and configure blocks
  • Create and delete connections
  • Rename the workflow
  • View generated Python code

Run Mode

  • Upload documents to input blocks
  • Execute the workflow step-by-step
  • View results at each stage
  • Download processed files and extracted data
Toggle between modes using the switch at the top of the canvas.

Running a Workflow

A workflow is fundamentally an asynchronous job. When you start it, Retab creates a workflow run, executes each step on the server, and stores the results on that run. You can then poll the run until it finishes and inspect the stored step outputs. For the SDK and HTTP endpoint details, see the workflow API reference:

From the Dashboard

  1. Switch to Run Mode
  2. Upload a document to each Document input block
  3. Click Run Workflow
  4. Watch as each block processes (status indicators show progress)
  5. Click on output handles to view results

Using the SDK

The Python SDK exposes workflow metadata, graph authoring, run execution, and typed step inspection:
  • client.workflows.* for list(), get(), create(), publish(), duplicate(), and get_entities()
  • client.workflows.blocks.* and client.workflows.edges.* for programmatic graph changes
  • client.workflows.runs.* and client.workflows.runs.steps.* for running flows and reading results

Discover input block IDs

Workflow run inputs are keyed by the IDs of your start and start_json blocks. get_entities() is the easiest way to discover them.
from retab import Retab

client = Retab()

workflow = client.workflows.get_entities("wf_abc123")

document_start_id = workflow.start_blocks[0].id
json_start_id = workflow.start_json_blocks[0].id

Run and wait for completion

Workflows support two input maps:
  • documents for Document (start) blocks
  • json_inputs for JSON Input (start_json) blocks
from pathlib import Path

from retab import Retab

client = Retab()

workflow = client.workflows.get_entities("wf_abc123")
document_start_id = workflow.start_blocks[0].id
json_start_id = workflow.start_json_blocks[0].id

run = client.workflows.runs.create(
    workflow_id=workflow.workflow.id,
    documents={
        document_start_id: Path("path/to/invoice.pdf"),
    },
    json_inputs={
        json_start_id: {"customer_id": "cust_123", "priority": "high"},
    },
)

run = client.workflows.runs.wait_for_completion(
    run.id,
    poll_interval_seconds=1.0,
)
run.raise_for_status()

print(run.status)
print(run.waiting_for_block_ids)
print(run.final_outputs)
run.steps contains per-block status summaries. For typed inputs and outputs on each block, use the step helpers.

Inspect step outputs

Start with steps.list(run.id) — it returns every persisted step in a single HTTP call. Avoid looping over run.steps and calling steps.get() per block; that’s N+1. Step payloads are normalized into HandlePayload objects. For JSON-producing blocks, extracted_data is shorthand for the default output-json-0 handle.
# Batch: one HTTP call for all steps in the run
for step in client.workflows.runs.steps.list(run.id):
    print(step.block_id, step.status, step.error, step.extracted_data)
    if step.artifact:
        print(step.artifact.operation, step.artifact.id)

# Single step:
step = client.workflows.runs.steps.get(run.id, "extract-block-id")
print(step.status, step.extracted_data)
Use steps.list(run.id, block_ids=[...]) when you only need a subset. Use steps.get_many(run.id, [...]) when you want normalized handle payloads (same shape as steps.get()) for a subset of blocks.

Jump from a step to its typed resource

Inference blocks persist a resource; step.artifact is a {operation, id} pointer you use to fetch the full typed result:
step = client.workflows.runs.steps.get(run.id, "extract-block-id")
if step.artifact:
    extraction = client.extractions.get(step.artifact.id)
    print(extraction.choices)   # full consensus, likelihoods, schema
operationblock typefetch with
extractionextractclient.extractions.get(id)
splitsplitclient.splits.get(id)
classificationclassifierclient.classifications.get(id)
parseparseclient.parses.get(id)
editeditclient.edits.get(id)
partitionfor_each_sentinel_startclient.partitions.get(id)

Build workflows from code

The same SDK can create and publish workflow graphs:
workflow = client.workflows.create(name="Invoice Pipeline")
entities = client.workflows.get_entities(workflow.id)
start_block = entities.start_blocks[0]

extract_block = client.workflows.blocks.create(
    workflow.id,
    id="extract-invoice",
    type="extract",
    label="Extract Invoice",
    position_x=320,
    position_y=0,
    config={
        "json_schema": {
            "type": "object",
            "properties": {
                "invoice_number": {"type": "string"},
                "total_amount": {"type": "number"},
            },
        },
    },
)

client.workflows.edges.create(
    workflow.id,
    id="edge-start-to-extract",
    source_block=start_block.id,
    target_block=extract_block.id,
    source_handle="output-file-0",
    target_handle="input-file-0",
)

client.workflows.publish(workflow.id, description="Initial version")
Use client.workflows.list() or client.workflows.get(workflow_id) when you need to browse existing workflows before launching a run, and client.workflows.duplicate(workflow_id) when you want a draft copy of an existing flow.

Reading Workflow Results

The standard production pattern is to run the workflow, keep the returned run.id, and poll the run until it reaches a terminal status such as completed or error.
  1. Start the workflow from the SDK or API
  2. Receive a run.id and an initial status immediately
  3. Poll the workflow run until it finishes
  4. Read the step results from the completed run
The workflow run is the source of truth for execution state and outputs. This is enough for many scripts, internal tools, and backend services.

Workflow Execution Order

Workflows execute in topological order based on the block connections:
  1. Start from Document input blocks
  2. Process each block once all its inputs are ready
  3. Continue until all blocks are processed or an error occurs
  4. Read outputs from the completed run and its step results
If a block fails, execution stops and the error is displayed on that block.

Conditional Routing

When using Classifier or If/Else blocks, only the branches that receive data are executed. Blocks on skipped branches are marked as “skipped” rather than failed.

Viewing Generated Code

Every workflow can be exported as Python code. Click View Code in the sidebar to see the equivalent SDK calls for your workflow. This is useful for:
  • Integrating workflows into your existing codebase
  • Running workflows in production environments
  • Understanding how the visual blocks translate to API calls

Best Practices

Begin with a single Extract or Parse block, then gradually add complexity. Test each addition before moving on.
Rename blocks to describe their purpose (e.g., “Invoice Data” instead of “Extract 1”). This makes complex workflows easier to understand.
Use Note blocks to document sections of your workflow. They don’t affect execution but help explain the logic.
For critical data, add a HIL block after extraction. This ensures a human reviews low-likelihood results before they proceed.
When processing different document types, use a Classifier block to route each document to the appropriate extraction schema.
Before deploying, run your workflow with representative sample documents to catch edge cases.

Example: Invoice Processing Workflow

Here’s a common workflow pattern for processing invoices:
  1. Start block accepts the invoice PDF
  2. Extract block pulls out vendor, amount, date, line items
  3. HIL block flags low-likelihood extractions for human review
  4. Read the verified data from the completed workflow run

Example: Multi-Document Classification Workflow

For workflows that process mixed document bundles:
  1. Classifier routes documents by category (Invoice, Contract, Receipt)
  2. Each Extract block uses a document-specific schema
  3. Function blocks compute derived fields for each document type
  4. Merge JSON combines results from all branches into a single output