Workflow Evals API - Retab Docs

Workflow evals let you freeze inputs for one workflow block and assert something about that block’s output on a later run. This page documents the API shapes for eval definitions, assertions, parent eval runs, and result rows. For the conceptual workflow and dashboard flow, see Evals.

Eval Definition

A WorkflowEval is a saved assertion plus the inputs needed to replay one block.

{
  "id": "wfnodeeval_...",
  "workflow_id": "wf_...",
  "target": { "type": "block", "block_id": "block_extract_invoice" },
  "source": {
    "type": "manual",
    "handle_inputs": {
      "input-json-0": { "type": "json", "data": { "invoice_id": "INV-001" } }
    }
  },
  "name": "invoice total",
  "assertion": {
    "id": "assert_xyz",
    "target": { "output_handle_id": "output-json-0", "path": "total" },
    "condition": { "kind": "equals", "expected": 1234.56 },
    "label": null
  },
  "schema_drift": "none",
  "assertion_drift_status": "valid",
  "freshness": { "status": "fresh", "reasons": [] },
  "latest_run_summary": null,
  "latest_passing_run_summary": null,
  "latest_failing_run_summary": null
}

`target`

target is a discriminated union by type.

`type`	Fields	Meaning
`block`	`block_id`	Replay one block in the current workflow.

The API shape is a union so workflow-level targets can be added later without renaming the field.

`source`

source is also a discriminated union by type.

`type`	Fields	Meaning
`manual`	`handle_inputs`	Explicit handle inputs supplied by the caller.
`run_step`	`run_id`, optional `step_id`	Capture the inputs a block received during a previous workflow run. Use `step_id` for loop rows.

Manual handle_inputs values are typed:

{
  "input-json-0": { "type": "json", "data": { "amount": 1234.56 } },
  "input-document-0": {
    "type": "file",
    "document": {
      "id": "file_invoice_q1",
      "filename": "invoice.pdf",
      "mime_type": "application/pdf"
    }
  }
}

When an eval is created from a run_step, Retab snapshots the inputs at create time. File handles are materialized as durable Retab file references so later eval runs do not depend on the original upload session.

`assertion`

Workflow evals use one assertion per eval. An assertion targets one output handle and an optional path inside that handle’s payload.

{
  "target": { "output_handle_id": "output-json-0", "path": "vendor.name" },
  "condition": { "kind": "contains", "expected": "Acme" },
  "label": "vendor includes Acme"
}

The current condition.kind values are:

Kind	Required fields
`exists`, `not_exists`	none
`equals`, `not_equals`	`expected`
`contains`, `not_contains`	`expected`
`number_compare`	`op`, `expected`
`between`	`lower`, `upper`, optional `inclusive`
`starts_with`, `ends_with`	`expected`
`matches_regex`	`pattern`
`object_contains`	`expected` object
`array_contains`	`expected` object
`length_compare`	`op`, `expected`
`json_schema_valid`	`schema`
`all_items_match`, `any_item_matches`	nested `condition`
`similarity_gte`	`reference`, `threshold`, optional `method`
`llm_judged_as`, `llm_not_judged_as`	`rubric`, optional `expected_label`
`split_iou_gte`	`expected`, optional `threshold`

number_compare and length_compare support op values gt, gte, lt, lte, eq, and neq.

Runs

Create a parent eval run with:

POST /v1/workflows/evals/runs

The request body requires workflow_id. scope is optional:

{
  "workflow_id": "wf_abc123xyz",
  "scope": { "type": "single", "eval_id": "wfnodeeval_hsLEQiM61ez9Piv147MWk" }
}

Scope variants:

`scope.type`	Fields	Meaning
omitted/null	none	Run every saved eval in the workflow.
`workflow`	none	Run every saved eval in the workflow.
`block`	`block_id`	Run every eval for one block.
`single`	`eval_id`	Run one saved eval.

A WorkflowEvalRun is the parent resource for one batch:

{
  "id": "wfevalrun_q1z2",
  "workflow_id": "wf_abc123xyz",
  "workflow_version_id": "draft_2026_05_18",
  "trigger": { "type": "api" },
  "lifecycle": { "status": "pending" },
  "timing": {
    "created_at": "2026-05-18T10:00:00Z",
    "started_at": null,
    "completed_at": null,
    "duration_ms": null
  },
  "target": { "type": "block", "block_id": "block_extract_invoice" },
  "eval_id": "wfnodeeval_hsLEQiM61ez9Piv147MWk",
  "total_evals": 1,
  "counts": {
    "lifecycle_counts": {
      "pending": 1,
      "queued": 0,
      "running": 0,
      "completed": 0,
      "error": 0,
      "cancelled": 0
    },
    "outcome": {
      "passed": 0,
      "failed": 0,
      "blocked": 0
    }
  }
}

Run lifecycle values are pending, queued, running, completed, error, and cancelled.

Results

After a parent run finishes, list child rows with:

GET /v1/workflows/evals/results?run_id={run_id}

Each WorkflowEvalResult is the immutable record for one eval execution:

{
  "id": "wfresult_a",
  "workflow_eval_run_id": "wfevalrun_q1z2",
  "eval_id": "wfnodeeval_a",
  "workflow_id": "wf_abc123xyz",
  "block_id": "block_extract_invoice",
  "block_type": "extract",
  "lifecycle": { "status": "completed" },
  "timing": {
    "created_at": "2026-05-18T10:00:00Z",
    "started_at": "2026-05-18T10:00:02Z",
    "completed_at": "2026-05-18T10:00:20Z",
    "duration_ms": 18221
  },
  "handle_inputs": {
    "input-document-0": {
      "type": "file",
      "document": {
        "id": "file_invoice_q1",
        "filename": "invoice.pdf",
        "mime_type": "application/pdf"
      }
    }
  },
  "handle_outputs": {
    "output-json-0": {
      "type": "json",
      "data": { "total": 1234.56, "vendor": { "name": "Acme Inc" } }
    }
  },
  "assertion_result": {
    "assertion_id": "assert_xyz",
    "condition_kind": "equals",
    "outcome": "passed",
    "actual_value": 1234.56,
    "expected_value": 1234.56,
    "failure": null
  },
  "verdict": "passed",
  "verdict_summary": {
    "passed": true,
    "assertions_passed": 1,
    "assertions_failed": 0,
    "blocked_assertions": 0,
    "failed_assertion_ids": []
  }
}

Result lifecycle values match parent run lifecycle values. verdict and assertion_result.outcome are assertion outcomes: passed, failed, or blocked. Execution errors are represented in lifecycle.status = "error" and the lifecycle error details, not as a fourth verdict value.

Freshness and Drift

Read responses recompute drift against the current workflow draft.

Field	Values	Meaning
`schema_drift`	`none`, `partial`, `drifted`, `unknown`	Whether the assertion target still resolves.
`assertion_drift_status`	`valid`, `drifted`, `broken`	Whether the saved assertion is still usable.
`freshness.status`	`fresh`, `stale`, `unknown`	Whether the latest run matches the baseline.
`drift.status`	`none`, `drifted`, `broken`, `unknown`	Artifact-level drift summary.

Latest-run summaries expose lifecycle and outcome separately:

{
  "run_record_id": "wfresult_a",
  "status": "completed",
  "outcome": "passed",
  "started_at": "2026-05-18T10:00:02Z",
  "completed_at": "2026-05-18T10:00:20Z",
  "assertions_passed": 1,
  "assertions_failed": 0,
  "blocked_assertions": 0
}

Endpoint Map

Method	Path	Purpose
`POST`	`/v1/workflows/evals`	Create
`GET`	`/v1/workflows/evals?workflow_id={workflow_id}`	List
`GET`	`/v1/workflows/evals/{eval_id}`	Get
`PATCH`	`/v1/workflows/evals/{eval_id}`	Update
`DELETE`	`/v1/workflows/evals/{eval_id}`	Delete
`POST`	`/v1/workflows/evals/runs`	Create Run
`GET`	`/v1/workflows/evals/runs`	List Runs
`GET`	`/v1/workflows/evals/runs/{run_id}`	Get Run
`POST`	`/v1/workflows/evals/runs/{run_id}/cancel`	Cancel Run
`GET`	`/v1/workflows/evals/results?run_id={run_id}`	List Results
`GET`	`/v1/workflows/evals/results/{result_id}`	Get Result

MCP Tools

The eval API is also exposed through MCP tools:

workflows_evals_create
workflows_evals_list
workflows_evals_get
workflows_evals_update
workflows_evals_delete
workflows_evals_runs_create
workflows_evals_runs_get
workflows_evals_results_list
workflows_evals_results_get

The tool input schemas match the request bodies above. See the MCP page for setup.

​Eval Definition

​target

​source

​assertion

​Runs

​Results

​Freshness and Drift

​Endpoint Map

​MCP Tools