Workflow Tables

Workflow tables are shared CSV-backed resources that workflows can query, validate, download, and mount into Function blocks. They are useful for stable reference data such as carrier codes, port mappings, product catalogs, tax rules, country lists, customer aliases, and reconciliation thresholds. Tables are scoped to the current environment. Any workflow in that environment can use the table by ID.

What tables are for

Use workflow tables when your workflow needs external reference data that should be maintained separately from the workflow graph:

map extracted names to internal IDs
fuzzy-match vendors, ports, products, or locations
validate extracted values against an approved list
enrich extraction results with metadata from a catalog
keep business rules editable without changing Function block code

Tables are designed for lookup and reference data. They are not a row-by-row transactional database. To change table contents, upload a replacement CSV.

Data model

A table stores:

Field	Description
`id`	Stable table ID, for example `tbl_ports`.
`name`	Human-readable table name.
`filename`	Original CSV filename.
`row_count`	Number of parsed data rows.
`columns`	Inferred or overridden column schemas.
`sample_rows`	First parsed rows for quick previews.
`source_file_id`	Stored original CSV file.
`snapshot_file_id`	Internal parquet snapshot used for fast reads.
`metadata`	User-defined table metadata.
`created_at`, `updated_at`	Table timestamps.

Each column has:

Field	Description
`name`	Header name from the CSV.
`json_schema`	Inferred or overridden JSON schema.
`sample_values`	Up to three sample non-empty values.
`required`	Whether the column is considered required.
`unique`	Whether the column is considered unique.

CSV upload rules

Create and replace operations accept a multipart/form-data upload with a file field containing CSV bytes. Current upload guardrails:

Rule	Behavior
Maximum file size	`20 MiB` hard cap. Larger files return `413`.
Empty files	Rejected.
Encodings	`utf-8-sig`, `utf-8`, and `latin-1` are supported.
Delimiters	Comma, semicolon, tab, and pipe are detected automatically.
Header row	Required. At least one named header must exist.
Header names	Trimmed before storage. Duplicate names are rejected.
Unnamed columns	Allowed only if every cell in that unnamed column is empty.
Blank cells	Trimmed and stored as `null`.

Trailing empty spreadsheet columns are ignored, but unnamed columns with data are rejected because there is no stable column name to expose downstream.

Schema inference

When you upload a CSV without schema overrides, Retab infers each column’s JSON schema from its non-empty values. Inference recognizes:

CSV values	Inferred schema
`true`, `false`	`{ "type": "boolean" }`
`2026-01-01`	`{ "type": "string", "format": "date" }`
`2026-01-01T12:30:00Z`	`{ "type": "string", "format": "date-time" }`
`12:30` or `12:30:00`	`{ "type": "string", "format": "time" }`
Integers	`{ "type": "integer" }`
Decimal or scientific numbers	`{ "type": "number" }`
JSON objects	`{ "type": "object" }`
JSON arrays	`{ "type": "array" }`
Mixed or unrecognized values	`{ "type": "string" }`

If a column contains blanks, the inferred type becomes nullable, for example:

{ "type": ["number", "null"] }

Long integer-looking values and values with leading zeroes are treated as identifier-like strings instead of numbers. This prevents IDs such as 0012345 or 18-digit account numbers from losing precision or formatting.

Schema overrides

You can override inferred column types during create or replace by sending a column_schema_overrides multipart form field. The value is a JSON array of objects with name and json_schema.

curl https://api.retab.com/v1/tables \
  -H "Authorization: Bearer $RETAB_API_KEY" \
  -F "name=Ports" \
  -F "file=@./ports.csv" \
  -F 'column_schema_overrides=[
    {"name":"unlocode","json_schema":{"type":"string"}},
    {"name":"latitude","json_schema":{"type":["number","null"]}},
    {"name":"opened_on","json_schema":{"type":"string","format":"date"}}
  ]'

Override rules:

supported base types are string, integer, number, boolean, object, and array
nullable schemas must be a single base type plus null
supported string formats are date, date-time, and time
override names must match CSV headers after trimming
duplicate overrides are rejected

Rows are coerced to the selected schema. If a value cannot be coerced, the upload fails with a validation error for that column.

Create a table

Use POST /v1/tables to create a table from a CSV:

curl https://api.retab.com/v1/tables \
  -H "Authorization: Bearer $RETAB_API_KEY" \
  -F "name=Carriers" \
  -F "file=@./carriers.csv"

See Create Table for the full API reference.

Replace table contents

Tables use a CSV-as-database write model. There are no row, column, or cell mutation endpoints. To change data, replace the full CSV:

curl -X PUT https://api.retab.com/v1/tables/tbl_carriers \
  -H "Authorization: Bearer $RETAB_API_KEY" \
  -F "file=@./carriers.csv"

Replacing a table updates the original CSV, regenerates the internal snapshot, updates the schema and sample rows, and preserves the table ID. See Replace Table CSV for the full API reference.

Update metadata

Use PATCH /v1/tables/{table_id} to rename a table or update metadata. This does not change rows, columns, cells, or the backing CSV.

curl -X PATCH https://api.retab.com/v1/tables/tbl_carriers \
  -H "Authorization: Bearer $RETAB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"name":"Carrier Codes","metadata":{"owner":"ops"}}'

See Update Table for the full API reference.

Query rows

Use POST /v1/tables/{table_id}/query to read rows. Queries are read-only.

curl -X POST https://api.retab.com/v1/tables/tbl_carriers/query \
  -H "Authorization: Bearer $RETAB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "filters": [
      {"column":"country","operator":"eq","value":"FR"}
    ],
    "sort": [
      {"column":"name","direction":"asc"}
    ],
    "select": ["name", "code", "country"],
    "offset": 0,
    "limit": 100
  }'

The maximum query limit is 500. Supported filter operators:

Operator	Meaning
`eq`, `ne`	Equals / not equals.
`gt`, `gte`, `lt`, `lte`	Numeric or ordered comparisons.
`contains`, `not_contains`	Substring matching.
`starts_with`, `ends_with`	Prefix or suffix matching.
`in`, `not_in`	Membership checks.
`between`	Range check.
`is_empty`, `is_not_empty`	Empty string checks.
`is_null`, `is_not_null`	Null checks.

Query requests also support:

Field	Purpose
`search`	Search text across all or selected columns.
`case_sensitive`	Toggle case-sensitive filters/search.
`select`	Return only selected columns.
`distinct`	Return distinct values for one column.
`group_by`	Group results by one or more columns.
`aggregations`	Compute `count`, `count_distinct`, `min`, `max`, `sum`, or `avg`.
`sample`	Return a random sample, size `1` to `500`.
`tail`	Return last rows, size `1` to `500`.
`count_only`	Return counts without row data.
`include_explain`	Include normalized query details in the response.
`viewer_mode`	Use `windowed` for dashboard-style virtual scrolling.

See Query Table for the full API reference.

Inspect and validate

Use these endpoints to inspect table shape:

Endpoint	Purpose
`GET /v1/tables`	List tables in the current environment.
`GET /v1/tables/{table_id}`	Get metadata, schema, and sample rows.
`GET /v1/tables/{table_id}/schema`	Get column schemas.
`GET /v1/tables/{table_id}/profile`	Get row counts, null counts, distinct counts, ranges, and samples.
`GET /v1/tables/{table_id}/download`	Download the backing CSV.
`POST /v1/tables/{table_id}/validate`	Validate required columns, column types, non-empty rules, and uniqueness.

Example validation request:

curl -X POST https://api.retab.com/v1/tables/tbl_ports/validate \
  -H "Authorization: Bearer $RETAB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "required_columns": ["unlocode", "name"],
    "columns": {
      "unlocode": {"type":"string", "is_not_empty": true},
      "opened_on": {"type":"string", "format":"date"}
    },
    "unique": [["unlocode"]]
  }'

The response contains diagnostics and has_errors.

Mount tables in Function blocks

Function blocks can mount workflow tables as CSV files in the sandbox. Use this when code needs to join, search, or fuzzy-match against table data.

{
  "language": "Python",
  "mounts": {
    "tables": [
      {
        "table_id": "tbl_ports",
        "path": "/tmp/data/ports.csv",
        "format": "csv"
      }
    ]
  },
  "output_schema": {
    "type": "object",
    "properties": {
      "unlocode": { "type": ["string", "null"] }
    }
  }
}

Mount paths must be absolute and should live under /tmp or /data. The format field defaults to csv; csv is the supported table mount format for Function blocks.

Legacy table_refs configs with mount_path are still normalized, but new workflow configs should use mounts.tables with path.

Function lookup example

Once mounted, the table is just a CSV file. You can use duckdb, pandas, the standard csv module, or string matching libraries such as rapidfuzz.

from models import Input, Output
import duckdb

def transform(input_data: Input) -> Output:
    db = duckdb.connect(":memory:")
    db.execute("CREATE TABLE carriers AS SELECT * FROM read_csv('/tmp/data/carriers.csv')")
    row = db.execute(
        """
        SELECT code
        FROM carriers
        ORDER BY jaro_winkler_similarity(name, ?) DESC
        LIMIT 1
        """,
        [input_data.carrier_name],
    ).fetchone()
    return Output(carrier_code=row[0] if row else None)

Best practices

Keep tables focused on stable reference data.
Use clear, unique header names.
Prefer string schemas for IDs, postal codes, account numbers, and other identifier-like values.
Validate required columns before relying on a table in production workflows.
Replace the whole CSV when changing contents, and keep source CSVs in version control when business-critical.
Keep mounted paths predictable, for example /tmp/data/carriers.csv.
Use profile to check null counts and distinct counts after upload.

Overview

Primitives

Workflows

Core Concepts

Enterprise

What tables are for

Data model

CSV upload rules

Schema inference

Schema overrides

Create a table

Replace table contents

Update metadata

Query rows

Inspect and validate

Mount tables in Function blocks

Function lookup example

Best practices

​What tables are for

​Data model

​CSV upload rules

​Schema inference

​Schema overrides

​Create a table

​Replace table contents

​Update metadata

​Query rows

​Inspect and validate

​Mount tables in Function blocks

​Function lookup example

​Best practices

​Related pages

What tables are for

Data model

CSV upload rules

Schema inference

Schema overrides

Create a table

Replace table contents

Update metadata

Query rows

Inspect and validate

Mount tables in Function blocks

Function lookup example

Best practices

Related pages