Formulas - Retab Docs

Formulas allow you to define computed fields in your extraction schema—fields whose values are calculated from other extracted data rather than directly pulled from the document. Use Formulas to create totals, apply business rules, validate consistency, and derive new values from your extracted data.

Overview

When extracting structured data from documents, you often need values that aren’t explicitly stated but can be computed from other fields. For example:

Line item totals: quantity * unit_price
Invoice totals: Sum of all line item totals
Reconciliation checks: Verify that computed totals match stated totals
Conditional values: Apply different logic based on field values

Formulas use a formula-based expression language that supports:

Arithmetic operations: +, -, *, /
Mathematical functions: ABS, SQRT, LOG, POW
Aggregation functions: SUM, AVERAGE, MIN, MAX, COUNT
Array operations: FILTER, FLATTEN, LOOKUP, MATCH
Logical operations: AND, OR, NOT, IF
Comparison operators: =, !=, >, >=, <, <=
String operations: CONCAT, TRIM, UPPER, LOWER, REPLACE, LEN
String matching and extraction: STARTS_WITH, ENDS_WITH, CONTAINS, BEFORE, AFTER, SPLIT_PART, REGEX_EXTRACT

Defining Computed Fields

Computed fields are defined in the project section of the dashboard (schema editor/formulas).

Expression Syntax

Paths and Field References

Use dot notation to reference fields in your extracted data:

Concept	Syntax Example	Description
Field path	`order.total`	Field `total` inside object `order`
Array field	`transactions`	Array of transaction objects
Wildcard element	`transactions.*`	”Each element” of `transactions`
Field per element	`transactions.*.amount`	`amount` field for every element in `transactions`

Wildcards for Arrays

When working with arrays, use * to reference each element:

# Sum the 'amount' field from each transaction
SUM(transactions.*.amount)

# Equivalent syntax using field projection
SUM(transactions, "amount")

Arithmetic Operators

All arithmetic operators work on single values (scalars) and arrays (element-wise).

Operator	Syntax	Description	Example
Add	`a + b`	Add numbers	`price + tax`
Subtract	`a - b`	Subtract `b` from `a`	`total - discount`
Multiply	`a * b`	Multiply numbers	`quantity * unit_price`
Divide	`a / b`	Divide `a` by `b` (null if b = 0)	`distance / time`

Example: Line Item Total

quantity * unit_price

Example: Discounted Price

price * (1 - discount_rate)

Example: Element-wise Array Operation

# Add a $5 fee to each transaction amount
transactions.*.amount + 5

Mathematical Functions

Function	Syntax	Description	Example
ABS	`ABS(x)`	Absolute value	`ABS(-10)` → `10`
SQRT	`SQRT(x)`	Square root (x must be non-negative)	`SQRT(25)` → `5`
LOG	`LOG(x)` or `LOG(x, base)`	Natural or custom-base logarithm	`LOG(100, 10)` → `2`
POW	`POW(base, exponent)`	Raise base to exponent power	`POW(2, 3)` → `8`

Example: Euclidean Distance

SQRT(POW(x2 - x1, 2) + POW(y2 - y1, 2))

Example: Compound Interest

principal * POW(1 + rate, years)

Aggregation Functions

Aggregation functions reduce arrays to single values.

Function	Syntax	Description	Example
SUM	`SUM(array, "field")`	Sum all values	`SUM(items, "price")`
AVERAGE	`AVERAGE(array, "field")`	Arithmetic mean	`AVERAGE(scores, "points")`
MIN	`MIN(array, "field")`	Smallest value	`MIN(items, "price")`
MAX	`MAX(array, "field")`	Largest value	`MAX(items, "rating")`
COUNT	`COUNT(array, value)`	Count occurrences	`COUNT(tasks.*.status, "done")`

Two equivalent syntaxes:

# Using wildcard path
SUM(items.*.price)

# Using field projection (recommended)
SUM(items, "price")

Example: Invoice Total

SUM(line_items, "total")

Example: Average Order Value

AVERAGE(orders, "amount")

Example: Count Completed Tasks

COUNT(tasks.*.status, "completed")

Array Functions

Function	Syntax	Description	Example
FILTER	`FILTER(array, "field", comparator, value)`	Keep elements matching condition	`FILTER(items, "active", "=", true)`
FLATTEN	`FLATTEN(nestedArray)` or `FLATTEN(array, "field")`	Merge nested arrays into one	`FLATTEN(groups, "items")`
LOOKUP	`LOOKUP(array, "field", comparator, value, [proj])`	Find first match, optionally project	`LOOKUP(users, "id", "=", 5, "name")`
MATCH	`MATCH(array, "field", comparator, value)`	Get index of first match (-1 if none)	`MATCH(items, "status", "=", "pending")`

FILTER

Keep only elements that satisfy a condition. Comparators: "=", "!=", ">", ">=", "<", "<=", "contains"

# All products over $100
FILTER(products, "price", ">", 100)

# All active users
FILTER(users, "status", "=", "active")

# Products containing "Pro" in name
FILTER(products, "name", "contains", "Pro")

Example: Sum Only Active Items

SUM(FILTER(items, "active", "=", true), "price")

FLATTEN

Merge nested arrays into a single flat array.

# Flatten array of arrays
FLATTEN([[1, 2], [3, 4]])  # → [1, 2, 3, 4]

# Flatten array field from objects
FLATTEN(orders, "items")
# If orders = [{items: [1,2]}, {items: [3,4]}]
# Result: [1, 2, 3, 4]

LOOKUP

Find the first matching element in an array.

# Get full user object with id = 5
LOOKUP(users, "id", "=", 5)

# Get just the name of user with id = 5
LOOKUP(users, "id", "=", 5, "name")

MATCH

Get the index (position) of the first matching element.

# Index of first pending task (0-based, -1 if not found)
MATCH(tasks, "status", "=", "pending")

Comparison Operators

Operator	Syntax	Function Form	Description	Example
Equal	`a = b`, `a == b`	`EQ(a, b)`	True if equal	`status = "active"`
Not Equal	`a != b`	—	True if different	`role != "admin"`
Greater	`a > b`	`GT(a, b)`	True if `a > b`	`price > 100`
Greater/Equal	`a >= b`	`GTE(a, b)`	True if `a >= b`	`quantity >= 10`
Less	`a < b`	`LT(a, b)`	True if `a < b`	`age < 18`
Less/Equal	`a <= b`	`LTE(a, b)`	True if `a <= b`	`score <= 100`

Inside array functions, use comparator strings:

FILTER(users, "role", "=", "admin")
FILTER(products, "price", ">", 100)
FILTER(items, "name", "contains", "Pro")

Logical Operators

Function	Syntax	Description	Example
AND	`AND(cond1, cond2, ...)`	True if all conditions true	`AND(user.active, user.verified)`
OR	`OR(cond1, cond2, ...)`	True if any condition true	`OR(priority = "high", overdue)`
NOT	`NOT(cond)` or `!cond`	Invert boolean value	`NOT(user.active)` or `!verified`

Example: Complex Condition

AND(user.active, user.verified, user.age >= 18)

Conditional Logic (IF)

The IF function returns different values based on a condition. Syntax: IF(condition, value_if_true, value_if_false)

# Label based on price
IF(price > 100, "expensive", "affordable")

# Default for missing field
IF(discount != null, discount, 0)

# Tiered pricing
IF(quantity >= 100, price * 0.8, IF(quantity >= 50, price * 0.9, price))

Array Behavior: When the condition is an array, IF returns an array of results:

# For each item, check if price > 50
IF(items.*.price > 50, "premium", "standard")
# Returns: ["premium", "standard", "premium", ...]

String Functions

Function	Syntax	Description	Example
CONCAT	`CONCAT(part1, part2, ..., partN)`	Join values into a string	`CONCAT("Hello ", name, "!")` → `"Hello Alice!"`
TRIM	`TRIM(text)`	Remove leading and trailing whitespace	`TRIM(" Alice ")` → `"Alice"`
UPPER	`UPPER(text)`	Convert text to uppercase	`UPPER(code)` → `"FE50"`
LOWER	`LOWER(text)`	Convert text to lowercase	`LOWER(email)`
REPLACE	`REPLACE(text, from, to)`	Replace all literal matches	`REPLACE(email, ".xpo", "")`
LEN	`LEN(text)`	Return string length	`LEN(invoice_number)`
STARTS_WITH	`STARTS_WITH(text, prefix)`	Case-sensitive prefix check	`STARTS_WITH(email, "FE50")`
ENDS_WITH	`ENDS_WITH(text, suffix)`	Case-sensitive suffix check	`ENDS_WITH(email, "@workflows.retab.com")`
CONTAINS	`CONTAINS(text, needle)`	Case-sensitive substring check	`CONTAINS(email, ".xpo@")`
BEFORE	`BEFORE(text, delimiter)`	Return text before the first delimiter	`BEFORE(email, "@")`
AFTER	`AFTER(text, delimiter)`	Return text after the first delimiter	`AFTER(email, "@")`
SPLIT_PART	`SPLIT_PART(text, delimiter, index)`	Return the 0-based token at `index`	`SPLIT_PART(email, ".", 0)`
REGEX_EXTRACT	`REGEX_EXTRACT(text, pattern, [group])`	Extract a regex capture group	`REGEX_EXTRACT(email, "^([^.@]+)")`

String Function Semantics

String helpers coerce scalar non-null inputs to strings.
null string inputs return null, except CONCAT, where null becomes an empty string.
Arrays and objects are invalid string inputs and return null.
REPLACE is literal, case-sensitive, and replaces all non-overlapping matches.
STARTS_WITH, ENDS_WITH, and formula CONTAINS are case-sensitive.
BEFORE and AFTER use the first delimiter occurrence.
SPLIT_PART is 0-based.
REGEX_EXTRACT defaults to capture group 1. Pass 0 to return the full match.
REGEX_EXTRACT does not support regex flags.
Missing delimiter, out-of-range split index, invalid regex, missing regex match, or invalid capture group all return null.

The formula function CONTAINS(text, needle) is case-sensitive. This is different from the "contains" comparator used by FILTER, LOOKUP, and MATCH, which performs case-insensitive substring matching.

Non-string scalar values are automatically converted when used with string helpers. For CONCAT, null values become empty strings.

# Build order reference
CONCAT(order.prefix, "-", order.number)
# e.g., "ORD-12345"

# Format full name
CONCAT(first_name, " ", last_name)

# Extract the sender code from an email-like identifier
BEFORE(BEFORE(email, "@"), ".")
# "FE50.xpo@workflows.retab.com" -> "FE50"

# Equivalent regex-based extraction
REGEX_EXTRACT(email, "^([^.@]+)")

Null Handling

Situation	Behavior
Numeric op with `null`	Treated as `0` (e.g., `5 + null` → `5`)
Logical op with `null`	Treated as `false`
Missing field	Returns `null`; use `IF` for safe defaults
String extraction miss	Returns `null`

Safe default pattern:

IF(field != null, field, 0)

Real-World Examples

Example 1: Invoice Reconciliation

Verify that the sum of all property totals matches the check amount, within 1% tolerance.

ABS(SUM(properties, "total_property_value") - check.amount) 
<= 0.01 * check.amount

Breakdown:

SUM(properties, "total_property_value") — Sum all property totals
- check.amount — Subtract the check amount
ABS(...) — Get absolute difference (error)
<= 0.01 * check.amount — Check if error is within 1%

Example 2: Filtered Aggregation

Sum only deposits from a transaction list:

SUM(FILTER(transactions, "type", "=", "deposit"), "amount")

Breakdown:

FILTER(transactions, "type", "=", "deposit") — Get only deposit transactions
SUM(..., "amount") — Sum their amounts

Example 3: Line Item Validation

For each line item, check if owner_value matches computed value (within 1%):

IF(
    line_items.*.owner_value <= 0,
    true,
    ABS(
        line_items.*.property_value * line_items.*.distribution_interest
        - line_items.*.owner_value
    ) <= line_items.*.owner_value * 0.01
)

Breakdown:

Skip validation for non-positive values (taxes, deductions)
Compute expected value: property_value * distribution_interest
Compare to actual owner_value with 1% tolerance

Example 4: Nested Array Aggregation

Sum values across all nested arrays:

SUM(FLATTEN(orders, "items"), "price")

Breakdown:

FLATTEN(orders, "items") — Collect all items from all orders
SUM(..., "price") — Sum all prices

Example 5: Conditional Labeling

Categorize orders by size:

IF(
    total >= 10000,
    "enterprise",
    IF(total >= 1000, "business", "personal")
)

Example 6: Extract a Sender Code from an Email

Extract FE50 from FE50.xpo@workflows.retab.com:

BEFORE(BEFORE(email, "@"), ".")

Or with regex:

REGEX_EXTRACT(email, "^([^.@]+)")

Best Practices

1. Use Field Projection Syntax

Prefer SUM(items, "price") over SUM(items.*.price) for clarity.

2. Handle Nulls Explicitly

# Bad: may fail on null
quantity * unit_price

# Good: safe default
IF(quantity != null, quantity, 0) * IF(unit_price != null, unit_price, 0)

3. Use Tolerance for Comparisons

Floating-point arithmetic can cause tiny differences. Use tolerance checks:

# Bad: exact comparison
computed_total = stated_total

# Good: tolerance comparison
ABS(computed_total - stated_total) <= stated_total * 0.01

4. Build Complex Expressions Incrementally

Define intermediate computed fields for readability:

# In line_item schema:
line_total: "quantity * unit_price"

# In order schema:
subtotal: "SUM(line_items, \"line_total\")"
tax: "subtotal * tax_rate"
total: "subtotal + tax"

5. Use FILTER for Conditional Aggregation

# Sum only active items
SUM(FILTER(items, "active", "=", true), "price")

# Count items by status
COUNT(FILTER(items, "status", "=", "pending"), "_")

6. Prefer Delimiter Helpers Before Regex

If your input has a stable delimiter structure, BEFORE, AFTER, and SPLIT_PART are easier to read and maintain than regex:

# Preferred when structure is stable
BEFORE(BEFORE(email, "@"), ".")

# Use regex only when the format is less regular
REGEX_EXTRACT(email, "^([^.@]+)")

Common Patterns

Header vs. Detail Reconciliation

# Check: sum of line items = header total
ABS(SUM(line_items, "amount") - header_total) <= header_total * 0.01

Cross-Reference Validation

# Check: computed value matches stated value
ABS(price * quantity * interest - owner_value) <= owner_value * 0.01

Conditional Processing

# Different calculation based on type
IF(
    type = "percentage",
    base_amount * rate / 100,
    rate
)

Aggregate with Multiple Conditions

# Sum only active, taxable items over $100
SUM(
    FILTER(
        FILTER(items, "active", "=", true),
        "price", ">", 100
    ),
    "price"
)

Quick Reference

Arithmetic

Operation	Example	Result
Add	`5 + 3`	`8`
Subtract	`10 - 4`	`6`
Multiply	`6 * 7`	`42`
Divide	`15 / 3`	`5`

Math Functions

Function	Example	Result
ABS	`ABS(-10)`	`10`
SQRT	`SQRT(16)`	`4`
POW	`POW(2, 3)`	`8`
LOG	`LOG(100, 10)`	`2`

Aggregation

Function	Example	Description
SUM	`SUM(items, "price")`	Total of all prices
AVERAGE	`AVERAGE(scores, "value")`	Mean of all values
MIN	`MIN(items, "price")`	Lowest price
MAX	`MAX(items, "rating")`	Highest rating
COUNT	`COUNT(tasks.*.status, "done")`	Count of “done”

Array Functions

Function	Example	Description
FILTER	`FILTER(items, "active", "=", true)`	Keep active items
FLATTEN	`FLATTEN(orders, "items")`	Merge all items
LOOKUP	`LOOKUP(users, "id", "=", 5)`	Find user by id
MATCH	`MATCH(items, "status", "=", "pending")`	Index of first match

String Functions

Function	Example	Description
CONCAT	`CONCAT(first_name, " ", last_name)`	Join values into a string
TRIM	`TRIM(name)`	Remove surrounding whitespace
UPPER / LOWER	`LOWER(email)`	Normalize text casing
REPLACE	`REPLACE(email, ".xpo", "")`	Replace literal substrings
LEN	`LEN(invoice_number)`	Get string length
STARTS_WITH / ENDS_WITH / CONTAINS	`ENDS_WITH(email, "@workflows.retab.com")`	Case-sensitive string matching
BEFORE / AFTER	`AFTER(email, "@")`	Extract around first delimiter
SPLIT_PART	`SPLIT_PART(email, ".", 0)`	Extract the 0-based token at an index
REGEX_EXTRACT	`REGEX_EXTRACT(email, "^([^.@]+)")`	Extract a regex capture group

Comparators for Array Functions

Comparator	Meaning	Example
`"="`	Equal	`FILTER(users, "role", "=", "admin")`
`"!="`	Not equal	`FILTER(users, "role", "!=", "admin")`
`">"`	Greater than	`FILTER(items, "price", ">", 100)`
`">="`	Greater or equal	`FILTER(orders, "total", ">=", 1000)`
`"<"`	Less than	`FILTER(items, "priority", "<", 3)`
`"<="`	Less or equal	`FILTER(scores, "value", "<=", 50)`
`"contains"`	Substring match	`FILTER(products, "name", "contains", "Pro")`

Go Further

Extraction - Learn how to extract structured data
Reasoning - Add step-by-step reasoning for complex calculations
Schema - Design your extraction schemas

Overview

Core Concepts

Consensus

Workflows

Projects

Evals

​Overview

​Defining Computed Fields

​Expression Syntax

​Paths and Field References

​Wildcards for Arrays

​Arithmetic Operators

​Mathematical Functions

​Aggregation Functions

​Array Functions

​FILTER

​FLATTEN

​LOOKUP

​MATCH

​Comparison Operators

​Logical Operators

​Conditional Logic (IF)

​String Functions

​String Function Semantics

​Null Handling

​Real-World Examples

​Example 1: Invoice Reconciliation

​Example 2: Filtered Aggregation

​Example 3: Line Item Validation

​Example 4: Nested Array Aggregation

​Example 5: Conditional Labeling

​Example 6: Extract a Sender Code from an Email

​Best Practices

​1. Use Field Projection Syntax

​2. Handle Nulls Explicitly

​3. Use Tolerance for Comparisons

​4. Build Complex Expressions Incrementally

​5. Use FILTER for Conditional Aggregation

​6. Prefer Delimiter Helpers Before Regex

​Common Patterns

​Header vs. Detail Reconciliation

​Cross-Reference Validation

​Conditional Processing

​Aggregate with Multiple Conditions

​Quick Reference

​Arithmetic

​Math Functions

​Aggregation

​Array Functions

​String Functions

​Comparators for Array Functions

​Go Further

Overview

Defining Computed Fields

Expression Syntax

Paths and Field References

Wildcards for Arrays

Arithmetic Operators

Mathematical Functions

Aggregation Functions

Array Functions

FILTER

FLATTEN

LOOKUP

MATCH

Comparison Operators

Logical Operators

Conditional Logic (IF)

String Functions

String Function Semantics

Null Handling

Real-World Examples

Example 1: Invoice Reconciliation

Example 2: Filtered Aggregation

Example 3: Line Item Validation

Example 4: Nested Array Aggregation

Example 5: Conditional Labeling

Example 6: Extract a Sender Code from an Email

Best Practices

1. Use Field Projection Syntax

2. Handle Nulls Explicitly

3. Use Tolerance for Comparisons

4. Build Complex Expressions Incrementally

5. Use FILTER for Conditional Aggregation

6. Prefer Delimiter Helpers Before Regex

Common Patterns

Header vs. Detail Reconciliation

Cross-Reference Validation

Conditional Processing

Aggregate with Multiple Conditions

Quick Reference

Arithmetic

Math Functions

Aggregation

Array Functions

String Functions

Comparators for Array Functions

Go Further