What are Workflow Evals?
Workflow evals are saved, block-level regression checks. Each eval freezes the inputs for one workflow block, replays that block against the current workflow draft, and evaluates one assertion against the block output. Use evals when you want to answer questions like:- Does this Extract block still return the expected invoice total?
- Does this Function block still compute the validation flag correctly?
- Does this Split block still assign pages to the right subdocuments?
- Does this Classifier block still route a file to the expected category?
Supported Blocks
Workflow evals are currently supported for:| Block | What you usually assert |
|---|---|
| Extract | Extracted JSON fields |
| Function | Returned JSON fields from the function output schema |
| Split | Split manifest quality or produced subdocument file handles |
| Classifier | Produced category file handles |
Eval Shape
A workflow eval stores:- Target - the block to run, currently
{ "type": "block", "block_id": "..." }. - Source - the handle inputs to replay.
- Assertion - one expected condition for one declared output handle.
source.type can be:
| Source | Fields | Use it when |
|---|---|---|
manual | handle_inputs | You want to provide explicit JSON or file inputs. |
run_step | run_id, optional step_id | You want to capture the inputs a block received in a real run. |
step_id so Retab knows which
iteration’s inputs to capture. File inputs are materialized as durable Retab
file references so the eval does not depend on the original browser upload
session.
Assertions
An assertion targets a declared output handle and, optionally, a dotted path inside that handle’s JSON payload.| Block | Output target examples |
|---|---|
| Extract | output-json-0.total, output-json-0.vendor.name |
| Function | output-json-0.is_valid, output-json-0.error_message |
| Split | output-json-splits or a subdocument file handle |
| Classifier | A category file handle |
| Kind | Use for |
|---|---|
exists, not_exists | Presence checks. |
equals, not_equals | Strict value equality or inequality. |
contains, not_contains | Substring or list membership checks. |
number_compare, between | Numeric comparisons. |
starts_with, ends_with, matches_regex | String pattern checks. |
object_contains, array_contains | Subset checks for objects and arrays of objects. |
length_compare | Length checks for strings, arrays, or objects. |
json_schema_valid | JSON Schema validation for a target subtree. |
all_items_match, any_item_matches | Nested assertions over array items. |
similarity_gte | Similarity thresholds. |
llm_judged_as, llm_not_judged_as | Rubric-based LLM judging. |
split_iou_gte | Intersection-over-Union for split page assignments. |
number_compare and length_compare use op values gt, gte, lt, lte,
eq, or neq.
Running Evals
You can run one eval, every eval for one block, or every eval in a workflow. In the API, create a parent eval run with:workflow_id. scope is optional:
| Scope | Meaning |
|---|---|
omitted or null | Run every saved eval in the workflow. |
{ "type": "workflow" } | Run every saved eval in the workflow. |
{ "type": "block" } | Run every saved eval for one block. |
{ "type": "single" } | Run one saved eval by eval_id. |
- Loads the current workflow draft and current block configuration.
- Rebuilds the saved handle inputs into normal runtime inputs.
- Executes only the selected block.
- Stores the block artifact, handle outputs, routing decisions, warnings, and timing.
- Resolves the assertion target from the handle outputs.
- Records the assertion outcome and the result verdict.
GET /v1/workflows/evals/runs/{run_id} until the parent run reaches a terminal
lifecycle, then read child rows from
GET /v1/workflows/evals/results?run_id={run_id}.
Results
A parentWorkflowEvalRun has a lifecycle:
| Lifecycle | Meaning |
|---|---|
pending | The run was created but execution has not started. |
queued | The run is waiting for a worker. |
running | One or more eval results are executing. |
completed | The run finished; inspect outcome counts or rows. |
error | The run failed before normal completion. |
cancelled | The run was cancelled. |
counts separates lifecycle from assertion outcomes:
WorkflowEvalResult row has its own lifecycle and, once completed, a
verdict of passed, failed, or blocked. The nested
assertion_result.outcome uses the same three outcome values and includes
actual_value, expected_value, optional score/threshold fields, and failure
details when the assertion cannot pass.
Result rows also include the saved handle_inputs, produced handle_outputs,
workflow/block fingerprints, the execution artifact, routing decisions,
warnings, and timing.
Freshness and Drift
Workflow evals are tied to the block inputs and output schema that existed when the eval was created or last updated. When the workflow draft changes, Retab reports several freshness signals:| Field | Values | Meaning |
|---|---|---|
schema_drift | none, partial, drifted, unknown | Whether the assertion target still resolves. |
assertion_drift_status | valid, drifted, broken | Whether the saved assertion is still usable. |
freshness.status | fresh, stale, unknown | Whether the latest run matches the baseline. |
drift.status | none, drifted, broken, unknown | Artifact-level drift summary. |
latest_run_summary, latest_passing_run_summary, and
latest_failing_run_summary. Each summary separates run lifecycle (status)
from assertion outcome (outcome).
Staleness does not automatically mean the workflow is broken. It means the eval
should be rerun or recaptured before you rely on its latest result.
Recommended Workflow
- Run the workflow on representative documents.
- Open the Evals page and create an eval from a completed run, or create one with explicit manual inputs.
- Pick the block output field or handle you want to protect.
- Define one assertion for the expected behavior.
- Run the eval after changing schemas, prompts, code, categories, or split definitions.
- Use stale evals as a review queue before publishing workflow changes.
API Reference
| Action | Endpoint |
|---|---|
| Create/list | /v1/workflows/evals |
| Get/update/delete | /v1/workflows/evals/{eval_id} |
| Run evals | /v1/workflows/evals/runs |
| Read results | /v1/workflows/evals/results |