Skip to main content
Formulas allow you to define computed fields in your extraction schema—fields whose values are calculated from other extracted data rather than directly pulled from the document. Use Formulas to create totals, apply business rules, validate consistency, and derive new values from your extracted data.

Overview

When extracting structured data from documents, you often need values that aren’t explicitly stated but can be computed from other fields. For example:
  • Line item totals: quantity * unit_price
  • Invoice totals: Sum of all line item totals
  • Reconciliation checks: Verify that computed totals match stated totals
  • Conditional values: Apply different logic based on field values
Formulas use a formula-based expression language that supports:
  • Arithmetic operations: +, -, *, /
  • Mathematical functions: ABS, SQRT, LOG, POW
  • Aggregation functions: SUM, AVERAGE, MIN, MAX, COUNT
  • Array operations: FILTER, FLATTEN, LOOKUP, MATCH
  • Logical operations: AND, OR, NOT, IF
  • Comparison operators: =, !=, >, >=, <, <=
  • String operations: CONCAT, TRIM, UPPER, LOWER, REPLACE, LEN
  • String matching and extraction: STARTS_WITH, ENDS_WITH, CONTAINS, BEFORE, AFTER, SPLIT_PART, REGEX_EXTRACT

Defining Computed Fields

Computed fields are defined in the project section of the dashboard (schema editor/formulas).

Expression Syntax

Paths and Field References

Use dot notation to reference fields in your extracted data:
ConceptSyntax ExampleDescription
Field pathorder.totalField total inside object order
Array fieldtransactionsArray of transaction objects
Wildcard elementtransactions.*”Each element” of transactions
Field per elementtransactions.*.amountamount field for every element in transactions

Wildcards for Arrays

When working with arrays, use * to reference each element:
# Sum the 'amount' field from each transaction
SUM(transactions.*.amount)

# Equivalent syntax using field projection
SUM(transactions, "amount")

Arithmetic Operators

All arithmetic operators work on single values (scalars) and arrays (element-wise).
OperatorSyntaxDescriptionExample
Adda + bAdd numbersprice + tax
Subtracta - bSubtract b from atotal - discount
Multiplya * bMultiply numbersquantity * unit_price
Dividea / bDivide a by b (null if b = 0)distance / time
Example: Line Item Total
quantity * unit_price
Example: Discounted Price
price * (1 - discount_rate)
Example: Element-wise Array Operation
# Add a $5 fee to each transaction amount
transactions.*.amount + 5

Mathematical Functions

FunctionSyntaxDescriptionExample
ABSABS(x)Absolute valueABS(-10)10
SQRTSQRT(x)Square root (x must be non-negative)SQRT(25)5
LOGLOG(x) or LOG(x, base)Natural or custom-base logarithmLOG(100, 10)2
POWPOW(base, exponent)Raise base to exponent powerPOW(2, 3)8
Example: Euclidean Distance
SQRT(POW(x2 - x1, 2) + POW(y2 - y1, 2))
Example: Compound Interest
principal * POW(1 + rate, years)

Aggregation Functions

Aggregation functions reduce arrays to single values.
FunctionSyntaxDescriptionExample
SUMSUM(array, "field")Sum all valuesSUM(items, "price")
AVERAGEAVERAGE(array, "field")Arithmetic meanAVERAGE(scores, "points")
MINMIN(array, "field")Smallest valueMIN(items, "price")
MAXMAX(array, "field")Largest valueMAX(items, "rating")
COUNTCOUNT(array, value)Count occurrencesCOUNT(tasks.*.status, "done")
Two equivalent syntaxes:
# Using wildcard path
SUM(items.*.price)

# Using field projection (recommended)
SUM(items, "price")
Example: Invoice Total
SUM(line_items, "total")
Example: Average Order Value
AVERAGE(orders, "amount")
Example: Count Completed Tasks
COUNT(tasks.*.status, "completed")

Array Functions

FunctionSyntaxDescriptionExample
FILTERFILTER(array, "field", comparator, value)Keep elements matching conditionFILTER(items, "active", "=", true)
FLATTENFLATTEN(nestedArray) or FLATTEN(array, "field")Merge nested arrays into oneFLATTEN(groups, "items")
LOOKUPLOOKUP(array, "field", comparator, value, [proj])Find first match, optionally projectLOOKUP(users, "id", "=", 5, "name")
MATCHMATCH(array, "field", comparator, value)Get index of first match (-1 if none)MATCH(items, "status", "=", "pending")

FILTER

Keep only elements that satisfy a condition. Comparators: "=", "!=", ">", ">=", "<", "<=", "contains"
# All products over $100
FILTER(products, "price", ">", 100)

# All active users
FILTER(users, "status", "=", "active")

# Products containing "Pro" in name
FILTER(products, "name", "contains", "Pro")
Example: Sum Only Active Items
SUM(FILTER(items, "active", "=", true), "price")

FLATTEN

Merge nested arrays into a single flat array.
# Flatten array of arrays
FLATTEN([[1, 2], [3, 4]])  # → [1, 2, 3, 4]

# Flatten array field from objects
FLATTEN(orders, "items")
# If orders = [{items: [1,2]}, {items: [3,4]}]
# Result: [1, 2, 3, 4]

LOOKUP

Find the first matching element in an array.
# Get full user object with id = 5
LOOKUP(users, "id", "=", 5)

# Get just the name of user with id = 5
LOOKUP(users, "id", "=", 5, "name")

MATCH

Get the index (position) of the first matching element.
# Index of first pending task (0-based, -1 if not found)
MATCH(tasks, "status", "=", "pending")

Comparison Operators

OperatorSyntaxFunction FormDescriptionExample
Equala = b, a == bEQ(a, b)True if equalstatus = "active"
Not Equala != bTrue if differentrole != "admin"
Greatera > bGT(a, b)True if a > bprice > 100
Greater/Equala >= bGTE(a, b)True if a >= bquantity >= 10
Lessa < bLT(a, b)True if a < bage < 18
Less/Equala <= bLTE(a, b)True if a <= bscore <= 100
Inside array functions, use comparator strings:
FILTER(users, "role", "=", "admin")
FILTER(products, "price", ">", 100)
FILTER(items, "name", "contains", "Pro")

Logical Operators

FunctionSyntaxDescriptionExample
ANDAND(cond1, cond2, ...)True if all conditions trueAND(user.active, user.verified)
OROR(cond1, cond2, ...)True if any condition trueOR(priority = "high", overdue)
NOTNOT(cond) or !condInvert boolean valueNOT(user.active) or !verified
Example: Complex Condition
AND(user.active, user.verified, user.age >= 18)

Conditional Logic (IF)

The IF function returns different values based on a condition. Syntax: IF(condition, value_if_true, value_if_false)
# Label based on price
IF(price > 100, "expensive", "affordable")

# Default for missing field
IF(discount != null, discount, 0)

# Tiered pricing
IF(quantity >= 100, price * 0.8, IF(quantity >= 50, price * 0.9, price))
Array Behavior: When the condition is an array, IF returns an array of results:
# For each item, check if price > 50
IF(items.*.price > 50, "premium", "standard")
# Returns: ["premium", "standard", "premium", ...]

String Functions

FunctionSyntaxDescriptionExample
CONCATCONCAT(part1, part2, ..., partN)Join values into a stringCONCAT("Hello ", name, "!")"Hello Alice!"
TRIMTRIM(text)Remove leading and trailing whitespaceTRIM(" Alice ")"Alice"
UPPERUPPER(text)Convert text to uppercaseUPPER(code)"FE50"
LOWERLOWER(text)Convert text to lowercaseLOWER(email)
REPLACEREPLACE(text, from, to)Replace all literal matchesREPLACE(email, ".xpo", "")
LENLEN(text)Return string lengthLEN(invoice_number)
STARTS_WITHSTARTS_WITH(text, prefix)Case-sensitive prefix checkSTARTS_WITH(email, "FE50")
ENDS_WITHENDS_WITH(text, suffix)Case-sensitive suffix checkENDS_WITH(email, "@workflows.retab.com")
CONTAINSCONTAINS(text, needle)Case-sensitive substring checkCONTAINS(email, ".xpo@")
BEFOREBEFORE(text, delimiter)Return text before the first delimiterBEFORE(email, "@")
AFTERAFTER(text, delimiter)Return text after the first delimiterAFTER(email, "@")
SPLIT_PARTSPLIT_PART(text, delimiter, index)Return the 0-based token at indexSPLIT_PART(email, ".", 0)
REGEX_EXTRACTREGEX_EXTRACT(text, pattern, [group])Extract a regex capture groupREGEX_EXTRACT(email, "^([^.@]+)")

String Function Semantics

  • String helpers coerce scalar non-null inputs to strings.
  • null string inputs return null, except CONCAT, where null becomes an empty string.
  • Arrays and objects are invalid string inputs and return null.
  • REPLACE is literal, case-sensitive, and replaces all non-overlapping matches.
  • STARTS_WITH, ENDS_WITH, and formula CONTAINS are case-sensitive.
  • BEFORE and AFTER use the first delimiter occurrence.
  • SPLIT_PART is 0-based.
  • REGEX_EXTRACT defaults to capture group 1. Pass 0 to return the full match.
  • REGEX_EXTRACT does not support regex flags.
  • Missing delimiter, out-of-range split index, invalid regex, missing regex match, or invalid capture group all return null.
The formula function CONTAINS(text, needle) is case-sensitive. This is different from the "contains" comparator used by FILTER, LOOKUP, and MATCH, which performs case-insensitive substring matching.
Non-string scalar values are automatically converted when used with string helpers. For CONCAT, null values become empty strings.
# Build order reference
CONCAT(order.prefix, "-", order.number)
# e.g., "ORD-12345"

# Format full name
CONCAT(first_name, " ", last_name)

# Extract the sender code from an email-like identifier
BEFORE(BEFORE(email, "@"), ".")
# "FE50.xpo@workflows.retab.com" -> "FE50"

# Equivalent regex-based extraction
REGEX_EXTRACT(email, "^([^.@]+)")

Null Handling

SituationBehavior
Numeric op with nullTreated as 0 (e.g., 5 + null5)
Logical op with nullTreated as false
Missing fieldReturns null; use IF for safe defaults
String extraction missReturns null
Safe default pattern:
IF(field != null, field, 0)

Real-World Examples

Example 1: Invoice Reconciliation

Verify that the sum of all property totals matches the check amount, within 1% tolerance.
ABS(SUM(properties, "total_property_value") - check.amount) 
<= 0.01 * check.amount
Breakdown:
  1. SUM(properties, "total_property_value") — Sum all property totals
  2. - check.amount — Subtract the check amount
  3. ABS(...) — Get absolute difference (error)
  4. <= 0.01 * check.amount — Check if error is within 1%

Example 2: Filtered Aggregation

Sum only deposits from a transaction list:
SUM(FILTER(transactions, "type", "=", "deposit"), "amount")
Breakdown:
  1. FILTER(transactions, "type", "=", "deposit") — Get only deposit transactions
  2. SUM(..., "amount") — Sum their amounts

Example 3: Line Item Validation

For each line item, check if owner_value matches computed value (within 1%):
IF(
    line_items.*.owner_value <= 0,
    true,
    ABS(
        line_items.*.property_value * line_items.*.distribution_interest
        - line_items.*.owner_value
    ) <= line_items.*.owner_value * 0.01
)
Breakdown:
  1. Skip validation for non-positive values (taxes, deductions)
  2. Compute expected value: property_value * distribution_interest
  3. Compare to actual owner_value with 1% tolerance

Example 4: Nested Array Aggregation

Sum values across all nested arrays:
SUM(FLATTEN(orders, "items"), "price")
Breakdown:
  1. FLATTEN(orders, "items") — Collect all items from all orders
  2. SUM(..., "price") — Sum all prices

Example 5: Conditional Labeling

Categorize orders by size:
IF(
    total >= 10000,
    "enterprise",
    IF(total >= 1000, "business", "personal")
)

Example 6: Extract a Sender Code from an Email

Extract FE50 from FE50.xpo@workflows.retab.com:
BEFORE(BEFORE(email, "@"), ".")
Or with regex:
REGEX_EXTRACT(email, "^([^.@]+)")

Best Practices

1. Use Field Projection Syntax

Prefer SUM(items, "price") over SUM(items.*.price) for clarity.

2. Handle Nulls Explicitly

# Bad: may fail on null
quantity * unit_price

# Good: safe default
IF(quantity != null, quantity, 0) * IF(unit_price != null, unit_price, 0)

3. Use Tolerance for Comparisons

Floating-point arithmetic can cause tiny differences. Use tolerance checks:
# Bad: exact comparison
computed_total = stated_total

# Good: tolerance comparison
ABS(computed_total - stated_total) <= stated_total * 0.01

4. Build Complex Expressions Incrementally

Define intermediate computed fields for readability:
# In line_item schema:
line_total: "quantity * unit_price"

# In order schema:
subtotal: "SUM(line_items, \"line_total\")"
tax: "subtotal * tax_rate"
total: "subtotal + tax"

5. Use FILTER for Conditional Aggregation

# Sum only active items
SUM(FILTER(items, "active", "=", true), "price")

# Count items by status
COUNT(FILTER(items, "status", "=", "pending"), "_")

6. Prefer Delimiter Helpers Before Regex

If your input has a stable delimiter structure, BEFORE, AFTER, and SPLIT_PART are easier to read and maintain than regex:
# Preferred when structure is stable
BEFORE(BEFORE(email, "@"), ".")

# Use regex only when the format is less regular
REGEX_EXTRACT(email, "^([^.@]+)")

Common Patterns

Header vs. Detail Reconciliation

# Check: sum of line items = header total
ABS(SUM(line_items, "amount") - header_total) <= header_total * 0.01

Cross-Reference Validation

# Check: computed value matches stated value
ABS(price * quantity * interest - owner_value) <= owner_value * 0.01

Conditional Processing

# Different calculation based on type
IF(
    type = "percentage",
    base_amount * rate / 100,
    rate
)

Aggregate with Multiple Conditions

# Sum only active, taxable items over $100
SUM(
    FILTER(
        FILTER(items, "active", "=", true),
        "price", ">", 100
    ),
    "price"
)

Quick Reference

Arithmetic

OperationExampleResult
Add5 + 38
Subtract10 - 46
Multiply6 * 742
Divide15 / 35

Math Functions

FunctionExampleResult
ABSABS(-10)10
SQRTSQRT(16)4
POWPOW(2, 3)8
LOGLOG(100, 10)2

Aggregation

FunctionExampleDescription
SUMSUM(items, "price")Total of all prices
AVERAGEAVERAGE(scores, "value")Mean of all values
MINMIN(items, "price")Lowest price
MAXMAX(items, "rating")Highest rating
COUNTCOUNT(tasks.*.status, "done")Count of “done”

Array Functions

FunctionExampleDescription
FILTERFILTER(items, "active", "=", true)Keep active items
FLATTENFLATTEN(orders, "items")Merge all items
LOOKUPLOOKUP(users, "id", "=", 5)Find user by id
MATCHMATCH(items, "status", "=", "pending")Index of first match

String Functions

FunctionExampleDescription
CONCATCONCAT(first_name, " ", last_name)Join values into a string
TRIMTRIM(name)Remove surrounding whitespace
UPPER / LOWERLOWER(email)Normalize text casing
REPLACEREPLACE(email, ".xpo", "")Replace literal substrings
LENLEN(invoice_number)Get string length
STARTS_WITH / ENDS_WITH / CONTAINSENDS_WITH(email, "@workflows.retab.com")Case-sensitive string matching
BEFORE / AFTERAFTER(email, "@")Extract around first delimiter
SPLIT_PARTSPLIT_PART(email, ".", 0)Extract the 0-based token at an index
REGEX_EXTRACTREGEX_EXTRACT(email, "^([^.@]+)")Extract a regex capture group

Comparators for Array Functions

ComparatorMeaningExample
"="EqualFILTER(users, "role", "=", "admin")
"!="Not equalFILTER(users, "role", "!=", "admin")
">"Greater thanFILTER(items, "price", ">", 100)
">="Greater or equalFILTER(orders, "total", ">=", 1000)
"<"Less thanFILTER(items, "priority", "<", 3)
"<="Less or equalFILTER(scores, "value", "<=", 50)
"contains"Substring matchFILTER(products, "name", "contains", "Pro")

Go Further

  • Extraction - Learn how to extract structured data
  • Reasoning - Add step-by-step reasoning for complex calculations
  • Schema - Design your extraction schemas