Eval Definition
AWorkflowEval is a saved assertion plus the inputs needed to replay one
block.
target
target is a discriminated union by type.
type | Fields | Meaning |
|---|---|---|
block | block_id | Replay one block in the current workflow. |
source
source is also a discriminated union by type.
type | Fields | Meaning |
|---|---|---|
manual | handle_inputs | Explicit handle inputs supplied by the caller. |
run_step | run_id, optional step_id | Capture the inputs a block received during a previous workflow run. Use step_id for loop rows. |
handle_inputs values are typed:
run_step, Retab snapshots the inputs at create
time. File handles are materialized as durable Retab file references so later
eval runs do not depend on the original upload session.
assertion
Workflow evals use one assertion per eval. An assertion targets one output
handle and an optional path inside that handle’s payload.
condition.kind values are:
| Kind | Required fields |
|---|---|
exists, not_exists | none |
equals, not_equals | expected |
contains, not_contains | expected |
number_compare | op, expected |
between | lower, upper, optional inclusive |
starts_with, ends_with | expected |
matches_regex | pattern |
object_contains | expected object |
array_contains | expected object |
length_compare | op, expected |
json_schema_valid | schema |
all_items_match, any_item_matches | nested condition |
similarity_gte | reference, threshold, optional method |
llm_judged_as, llm_not_judged_as | rubric, optional expected_label |
split_iou_gte | expected, optional threshold |
number_compare and length_compare support op values gt, gte, lt,
lte, eq, and neq.
Runs
Create a parent eval run with:workflow_id. scope is optional:
scope.type | Fields | Meaning |
|---|---|---|
| omitted/null | none | Run every saved eval in the workflow. |
workflow | none | Run every saved eval in the workflow. |
block | block_id | Run every eval for one block. |
single | eval_id | Run one saved eval. |
WorkflowEvalRun is the parent resource for one batch:
pending, queued, running, completed, error,
and cancelled.
Results
After a parent run finishes, list child rows with:WorkflowEvalResult is the immutable record for one eval execution:
verdict and
assertion_result.outcome are assertion outcomes: passed, failed, or
blocked. Execution errors are represented in lifecycle.status = "error" and
the lifecycle error details, not as a fourth verdict value.
Freshness and Drift
Read responses recompute drift against the current workflow draft.| Field | Values | Meaning |
|---|---|---|
schema_drift | none, partial, drifted, unknown | Whether the assertion target still resolves. |
assertion_drift_status | valid, drifted, broken | Whether the saved assertion is still usable. |
freshness.status | fresh, stale, unknown | Whether the latest run matches the baseline. |
drift.status | none, drifted, broken, unknown | Artifact-level drift summary. |
Endpoint Map
| Method | Path | Purpose |
|---|---|---|
POST | /v1/workflows/evals | Create |
GET | /v1/workflows/evals?workflow_id={workflow_id} | List |
GET | /v1/workflows/evals/{eval_id} | Get |
PATCH | /v1/workflows/evals/{eval_id} | Update |
DELETE | /v1/workflows/evals/{eval_id} | Delete |
POST | /v1/workflows/evals/runs | Create Run |
GET | /v1/workflows/evals/runs | List Runs |
GET | /v1/workflows/evals/runs/{run_id} | Get Run |
POST | /v1/workflows/evals/runs/{run_id}/cancel | Cancel Run |
GET | /v1/workflows/evals/results?run_id={run_id} | List Results |
GET | /v1/workflows/evals/results/{result_id} | Get Result |
MCP Tools
The eval API is also exposed through MCP tools:workflows_evals_createworkflows_evals_listworkflows_evals_getworkflows_evals_updateworkflows_evals_deleteworkflows_evals_runs_createworkflows_evals_runs_getworkflows_evals_results_listworkflows_evals_results_get