likelihoods) for each field.
Type-Specific Strategies
The algorithm uses different strategies based on value type:| Type | Strategy | Example |
|---|---|---|
| Boolean | Majority vote | [True, True, False] → True |
| Number | Clustering | [100, 101, 200] → 100.5 |
| String | Semantic clustering | ["Acme Corp", "Acme Corporation"] → "Acme Corp" |
| Object | Recursive by field | Each field processed independently |
| Array | Element-wise (after alignment) | Each item processed independently |
Boolean Consensus
Simple majority voting:| Input | Result | Confidence |
|---|---|---|
[True, True, True] | True | 1.00 |
[True, True, False] | True | 0.67 |
[True, False, None] | True | 0.33 |
Numeric Consensus
Numbers are clustered with a 3% tolerance, then the largest cluster wins.Clustering Rule
Two numbers are in the same cluster if:Example 1: Clear winner
Example 2: With None values
Why 3%?
| a | b | Difference | 3% of max | Same cluster? |
|---|---|---|---|---|
| 100 | 103 | 3 | 3.09 | ✅ Yes |
| 100 | 104 | 4 | 3.12 | ❌ No |
| 1.00 | 1.02 | 0.02 | 0.03 | ✅ Yes |
String Consensus
Strings use semantic clustering with embeddings.How it works
- Compute pairwise similarity using embeddings
- Group similar strings into clusters
- Pick the medoid (most central value) of the largest cluster
Example
Handling variations
The engine compares strings under multiple “views”:| View | "Product ABC-123" becomes |
|---|---|
| Original | "Product ABC-123" |
| Digits only | "123" |
| Letters only | "productabc" |
| Sorted tokens | "abc-123 product" |
"ABC-123"vs"ABC 123"(same digits)"ProductName"vs"PRODUCTNAME"(same letters)
Nested Objects
Objects are processed recursively, field by field:Note: Strings like"Acme"vs"ACME"are treated as identical because multi-view comparison includes an alpha-only view that normalizes to lowercase.
Arrays (After Alignment)
Arrays are processed element by element after alignment:Reading Confidence Scores
Thelikelihoods object mirrors the structure of your extracted data:
Interpretation Guide
| Score | Meaning | Action |
|---|---|---|
| 1.0 | All sources agreed exactly | ✅ High confidence |
| 0.8-0.99 | Minor variations, strong consensus | ✅ Generally reliable |
| 0.6-0.79 | Some disagreement | ⚠️ Review recommended |
| 0.4-0.59 | Significant disagreement | ⚠️ Flag for human review |
| < 0.4 | Major disagreement | ❌ Likely ambiguous |
Full Example
Summary
| Type | Method | Confidence Formula |
|---|---|---|
| Boolean | Majority vote | winner_count / total |
| Number | 3% clustering | cluster_size / total |
| String | Semantic clustering | dominance × cohesion |
| Object | Recursive | Per-field confidence |
| Array | Element-wise | Per-element confidence |
Special Case: n=2 (Similarity Mode)
When you have exactly 2 sources, the system can operate in two modes:Consensus Mode (default)
Same as n > 2: merge values, output a single result with likelihoods.
Note: If values are in different clusters (e.g., [10, 20]), tie-breaking picks one value (preferring larger absolute values), not the mean.
Similarity Mode (for evaluation)
Instead of merging, compute how similar each field is between the two sources. This is useful when comparing a model’s extraction against a ground truth.When to Use Which?
| Use Case | Mode | Output |
|---|---|---|
Multiple model runs (n_consensus=3) | Consensus | Merged value + confidence |
| Compare extraction vs ground truth | Similarity | Per-field similarity score |
| A/B test two models | Similarity | Per-field similarity to reference |
| Quality evaluation | Similarity | Total similarity score |
Key difference: Consensus produces a merged value. Similarity produces a score (0-1) measuring how close the values are.