Skip to main content
Workflow evals let you freeze inputs for one workflow block and assert something about that block’s output on a later run. This page documents the API shapes for eval definitions, assertions, parent eval runs, and result rows. For the conceptual workflow and dashboard flow, see Evals.

Eval Definition

A WorkflowEval is a saved assertion plus the inputs needed to replay one block.
{
  "id": "wfnodeeval_...",
  "workflow_id": "wf_...",
  "target": { "type": "block", "block_id": "block_extract_invoice" },
  "source": {
    "type": "manual",
    "handle_inputs": {
      "input-json-0": { "type": "json", "data": { "invoice_id": "INV-001" } }
    }
  },
  "name": "invoice total",
  "assertion": {
    "id": "assert_xyz",
    "target": { "output_handle_id": "output-json-0", "path": "total" },
    "condition": { "kind": "equals", "expected": 1234.56 },
    "label": null
  },
  "schema_drift": "none",
  "assertion_drift_status": "valid",
  "freshness": { "status": "fresh", "reasons": [] },
  "latest_run_summary": null,
  "latest_passing_run_summary": null,
  "latest_failing_run_summary": null
}

target

target is a discriminated union by type.
typeFieldsMeaning
blockblock_idReplay one block in the current workflow.
The API shape is a union so workflow-level targets can be added later without renaming the field.

source

source is also a discriminated union by type.
typeFieldsMeaning
manualhandle_inputsExplicit handle inputs supplied by the caller.
run_steprun_id, optional step_idCapture the inputs a block received during a previous workflow run. Use step_id for loop rows.
Manual handle_inputs values are typed:
{
  "input-json-0": { "type": "json", "data": { "amount": 1234.56 } },
  "input-document-0": {
    "type": "file",
    "document": {
      "id": "file_invoice_q1",
      "filename": "invoice.pdf",
      "mime_type": "application/pdf"
    }
  }
}
When an eval is created from a run_step, Retab snapshots the inputs at create time. File handles are materialized as durable Retab file references so later eval runs do not depend on the original upload session.

assertion

Workflow evals use one assertion per eval. An assertion targets one output handle and an optional path inside that handle’s payload.
{
  "target": { "output_handle_id": "output-json-0", "path": "vendor.name" },
  "condition": { "kind": "contains", "expected": "Acme" },
  "label": "vendor includes Acme"
}
The current condition.kind values are:
KindRequired fields
exists, not_existsnone
equals, not_equalsexpected
contains, not_containsexpected
number_compareop, expected
betweenlower, upper, optional inclusive
starts_with, ends_withexpected
matches_regexpattern
object_containsexpected object
array_containsexpected object
length_compareop, expected
json_schema_validschema
all_items_match, any_item_matchesnested condition
similarity_gtereference, threshold, optional method
llm_judged_as, llm_not_judged_asrubric, optional expected_label
split_iou_gteexpected, optional threshold
number_compare and length_compare support op values gt, gte, lt, lte, eq, and neq.

Runs

Create a parent eval run with:
POST /v1/workflows/evals/runs
The request body requires workflow_id. scope is optional:
{
  "workflow_id": "wf_abc123xyz",
  "scope": { "type": "single", "eval_id": "wfnodeeval_hsLEQiM61ez9Piv147MWk" }
}
Scope variants:
scope.typeFieldsMeaning
omitted/nullnoneRun every saved eval in the workflow.
workflownoneRun every saved eval in the workflow.
blockblock_idRun every eval for one block.
singleeval_idRun one saved eval.
A WorkflowEvalRun is the parent resource for one batch:
{
  "id": "wfevalrun_q1z2",
  "workflow_id": "wf_abc123xyz",
  "workflow_version_id": "draft_2026_05_18",
  "trigger": { "type": "api" },
  "lifecycle": { "status": "pending" },
  "timing": {
    "created_at": "2026-05-18T10:00:00Z",
    "started_at": null,
    "completed_at": null,
    "duration_ms": null
  },
  "target": { "type": "block", "block_id": "block_extract_invoice" },
  "eval_id": "wfnodeeval_hsLEQiM61ez9Piv147MWk",
  "total_evals": 1,
  "counts": {
    "lifecycle_counts": {
      "pending": 1,
      "queued": 0,
      "running": 0,
      "completed": 0,
      "error": 0,
      "cancelled": 0
    },
    "outcome": {
      "passed": 0,
      "failed": 0,
      "blocked": 0
    }
  }
}
Run lifecycle values are pending, queued, running, completed, error, and cancelled.

Results

After a parent run finishes, list child rows with:
GET /v1/workflows/evals/results?run_id={run_id}
Each WorkflowEvalResult is the immutable record for one eval execution:
{
  "id": "wfresult_a",
  "workflow_eval_run_id": "wfevalrun_q1z2",
  "eval_id": "wfnodeeval_a",
  "workflow_id": "wf_abc123xyz",
  "block_id": "block_extract_invoice",
  "block_type": "extract",
  "lifecycle": { "status": "completed" },
  "timing": {
    "created_at": "2026-05-18T10:00:00Z",
    "started_at": "2026-05-18T10:00:02Z",
    "completed_at": "2026-05-18T10:00:20Z",
    "duration_ms": 18221
  },
  "handle_inputs": {
    "input-document-0": {
      "type": "file",
      "document": {
        "id": "file_invoice_q1",
        "filename": "invoice.pdf",
        "mime_type": "application/pdf"
      }
    }
  },
  "handle_outputs": {
    "output-json-0": {
      "type": "json",
      "data": { "total": 1234.56, "vendor": { "name": "Acme Inc" } }
    }
  },
  "assertion_result": {
    "assertion_id": "assert_xyz",
    "condition_kind": "equals",
    "outcome": "passed",
    "actual_value": 1234.56,
    "expected_value": 1234.56,
    "failure": null
  },
  "verdict": "passed",
  "verdict_summary": {
    "passed": true,
    "assertions_passed": 1,
    "assertions_failed": 0,
    "blocked_assertions": 0,
    "failed_assertion_ids": []
  }
}
Result lifecycle values match parent run lifecycle values. verdict and assertion_result.outcome are assertion outcomes: passed, failed, or blocked. Execution errors are represented in lifecycle.status = "error" and the lifecycle error details, not as a fourth verdict value.

Freshness and Drift

Read responses recompute drift against the current workflow draft.
FieldValuesMeaning
schema_driftnone, partial, drifted, unknownWhether the assertion target still resolves.
assertion_drift_statusvalid, drifted, brokenWhether the saved assertion is still usable.
freshness.statusfresh, stale, unknownWhether the latest run matches the baseline.
drift.statusnone, drifted, broken, unknownArtifact-level drift summary.
Latest-run summaries expose lifecycle and outcome separately:
{
  "run_record_id": "wfresult_a",
  "status": "completed",
  "outcome": "passed",
  "started_at": "2026-05-18T10:00:02Z",
  "completed_at": "2026-05-18T10:00:20Z",
  "assertions_passed": 1,
  "assertions_failed": 0,
  "blocked_assertions": 0
}

Endpoint Map

MethodPathPurpose
POST/v1/workflows/evalsCreate
GET/v1/workflows/evals?workflow_id={workflow_id}List
GET/v1/workflows/evals/{eval_id}Get
PATCH/v1/workflows/evals/{eval_id}Update
DELETE/v1/workflows/evals/{eval_id}Delete
POST/v1/workflows/evals/runsCreate Run
GET/v1/workflows/evals/runsList Runs
GET/v1/workflows/evals/runs/{run_id}Get Run
POST/v1/workflows/evals/runs/{run_id}/cancelCancel Run
GET/v1/workflows/evals/results?run_id={run_id}List Results
GET/v1/workflows/evals/results/{result_id}Get Result

MCP Tools

The eval API is also exposed through MCP tools:
  • workflows_evals_create
  • workflows_evals_list
  • workflows_evals_get
  • workflows_evals_update
  • workflows_evals_delete
  • workflows_evals_runs_create
  • workflows_evals_runs_get
  • workflows_evals_results_list
  • workflows_evals_results_get
The tool input schemas match the request bodies above. See the MCP page for setup.