Entity Resolution for KYB: The Complete Guide

Entity resolution is the technical foundation of effective Know Your Business (KYB) verification. It’s the process of determining when different records refer to the same real-world business: connecting “GTL Services LLC” in Delaware’s registry to “Green Thumb Landscaping” on a merchant application to “GREEN THUMB LANDSCAPE” in payment processor records.

Without entity resolution, KYB verification fails at scale; with good resolution, businesses sail through verification while fraudulent applications get flagged. This guide explains how entity resolution works, why it matters for KYB, and what separates basic matching from production-grade resolution.

Why Can’t You Just Match on Name?

Business information exists across thousands of sources, each with its own format, naming conventions, and data quality:

State filing: “GTL Services LLC” Trade name filing: “Green Thumb Landscaping” Payment processor: “GREEN THUMB LANDSCAPE” Google Business Profile: “Green Thumb Landscaping & Lawn Care” Credit bureau: “GTL SERVICES”

These are all the same business. But how does a system know that?

Why Names Don’t Match

Legal names vs. trade names: Businesses register legally as “XYZ Holdings LLC” but operate publicly as “Joe’s Pizza”

Abbreviations: “Corporation” becomes “Corp” or “Co”; “Limited Liability Company” becomes “LLC” or “L.L.C.”

Stylization: “McDonald’s” vs “McDonalds” vs “MCDONALDS”

Typos and data entry errors: “Acme” becomes “Acmee” or “Acne”

Evolution: Business names change through rebranding, acquisition, or legal restructuring

Beyond Names

Name is just one attribute. Entity resolution must also handle:

Address variations:

“123 Main Street, Suite 100” vs “123 Main St Ste 100” vs “123 Main St #100”
Registered agent addresses vs. operating addresses vs. mailing addresses
Businesses that relocate

Identifier inconsistencies:

Not all sources include EIN or registration numbers
Different identifier types across jurisdictions
Missing or incorrect identifiers

Corporate structures:

Parent-subsidiary relationships
Franchises (same brand, different legal entities)
DBAs that span multiple entities

Why Entity Resolution Matters for KYB

Verification Accuracy

The core KYB question is: “Is this business legitimate?” Answering requires matching the application to authoritative records.

Consider a business applying for a merchant account as “Green Thumb Landscaping” at “456 Main St, Columbus OH.” The Secretary of State record shows “GTL Services LLC” registered at “1209 Orange St, Wilmington DE.”

Without entity resolution: no match found, so the application goes to manual review or rejection.

With entity resolution: match found with high confidence (trade name filing links to legal entity, operating address differs from registered address as expected), and the application routes to auto-approval.

Entity resolution determines whether legitimate businesses pass verification or get stuck in manual review queues.

Straight-Through Processing

Straight-through processing (STP) rates measure how many applications resolve automatically without human intervention. Entity resolution directly impacts STP:

Resolution Quality	Typical STP Rate
Exact match only	30-40%
Basic fuzzy matching	50-60%
Advanced multi-attribute	70-80%
Graph-based with enrichment	80-90%

The difference between 40% and 80% STP is the difference between a sustainable operation and one buried in manual review backlogs.

Risk Detection

Entity resolution reveals patterns invisible to record-by-record analysis:

Shell company detection: Multiple businesses at the same registered agent address, sharing the same formation date and officer, despite claiming to be independent

Fraud rings: Applications with different business names but connected through shared addresses, phones, or beneficial owners

Sanctions evasion: An entity with a slightly misspelled name that would otherwise match a sanctioned party

Serial fraud: An individual appearing as the beneficial owner of multiple failed businesses

Beneficial Ownership Verification

Tracing ownership requires connecting entities through ownership chains:

Application: "Green Thumb Landscaping"
    ↓ resolution
Legal Entity: "GTL Services LLC" (Delaware)
    ↓ ownership lookup
Parent: "Smith Holdings LLC" (Wyoming)
    ↓ ownership lookup
Beneficial Owner: "Jane Smith" (person)

Without entity resolution, ownership verification stops at the legal entity name on the application.

Entity Resolution Techniques

Deterministic Matching

Match records using exact values of unique identifiers:

Identifiers used:

EIN (Employer Identification Number)
State registration number
DUNS number
LEI (Legal Entity Identifier)

Example:

Application EIN: 12-3456789
Registry EIN: 12-3456789
→ Exact match

Strengths:

100% precision (no false positives)
Fast execution
Simple implementation

Limitations:

Requires identifier presence in both records
Many records lack standardized identifiers
Typos in identifiers cause false negatives
Different identifier types don’t cross-match

Deterministic matching is the starting point, but it’s insufficient alone. It typically matches only 20-40% of records.

Probabilistic (Fuzzy) Matching

Compare multiple attributes using similarity algorithms and weighted scoring:

Name similarity algorithms:

Edit distance (Levenshtein): How many character changes to transform one string to another
Phonetic matching (Soundex, Metaphone): Match names that sound alike
Token-based: Compare word sets regardless of order
TF-IDF: Weight uncommon terms higher than common ones

Example:

Application name: "Green Thumb Landscaping LLC"
Registry name: "GTL Services LLC"
Trade name: "Green Thumb Landscaping"

Name similarity: Low (0.3)
Trade name similarity: High (0.95)
Address similarity: Medium (0.7)
→ Weighted score: 0.82 → Match

Address standardization:

Parse addresses into components (street, city, state, zip)
Standardize abbreviations (St to Street, Ste to Suite)
Handle unit number variations
Compare individual components

Weighted scoring:

Match score =
  (name_sim × 0.35) +
  (address_sim × 0.25) +
  (city_state × 0.15) +
  (identifier × 0.25)

Threshold tuning is critical; too low creates false positives, too high creates false negatives.

Machine Learning Approaches

Train models on labeled match/non-match pairs to learn complex patterns:

Supervised learning:

Training data: Human-labeled pairs (match/non-match)
Features: Similarity scores across multiple attributes
Models: Random forests, gradient boosting, neural networks
Output: Match probability

Benefits:

Captures non-obvious patterns
Adapts to specific data characteristics
Can improve over time with feedback

Challenges:

Requires labeled training data
Model explainability for compliance
Ongoing model maintenance

Graph-Based Resolution

Connect records through relationships, not just attribute similarity:

Relationship types:

Shared address links records
Same registered agent links records
Common officers/directors link records
Same phone number links records
Ownership connections link records

Transitive connections:

Record A shares address with Record B
Record B shares officer with Record C
→ A may be related to C (transitive link)

Graph analysis:

Connected components (all records that link together)
Centrality measures (identify hub entities like formation agents)
Community detection (clusters of related businesses)

Graph-based resolution excels at:

Revealing corporate structures
Detecting shell company networks
Identifying formation agents and registered agent patterns

Building Entity Resolution for KYB

Architecture Components

1. Data Ingestion

Connect to authoritative sources (Secretary of State APIs, business registries)
Ingest application data
Handle various formats and schemas

Normalization

Standardize names (remove punctuation, normalize case)
Parse and standardize addresses
Clean and validate identifiers
Extract structured data from unstructured fields

Blocking/Indexing

Group records that might match (blocking keys)
Avoid comparing every record to every other record
Common blocks: first N characters of name, zip code, phonetic codes

Comparison

Apply similarity algorithms to candidate pairs
Calculate weighted match scores
Capture comparison vectors for classification

Classification

Determine match/non-match/maybe
Apply thresholds or ML models
Handle edge cases

Clustering

Group matched records into clusters
Handle transitive closure (if A=B and B=C, then A=C)
Create unified entity records

API/Output

Expose resolution as a service
Return match results with confidence scores
Provide audit trails for compliance

Tuning for KYB

False positive vs. false negative tradeoffs:

Scenario	False Positive Risk	False Negative Risk
Auto-approve legitimate	✓ Safe	✗ Lost customer
Auto-approve fraud	✗ Fraud loss	✓ Caught in review
Reject legitimate	✗ Lost customer	✓ Safe
Reject fraud	✓ Prevented	✗ Fraud approved

For KYB, false negatives (missing legitimate matches) are often more costly than false positives (flagging matches that need review). Tune accordingly, but monitor both.

Threshold calibration:

Start conservative (higher threshold, more manual review)
Analyze manual review outcomes
Gradually adjust based on observed precision/recall
Different thresholds for different risk tiers

Handling Edge Cases

New businesses: Recently formed entities may not appear in all data sources yet. Use formation documents plus initial signals.

Sole proprietors may have no state filing at all. Match on individual identity plus business signals (trade name if registered, address, web presence).

Franchises share a brand but are different legal entities. Match to the correct franchisee entity, not the franchisor.

Name changes create ghost records. Historical names may still appear in some sources; maintain name history and match against all known names.

International entities involve different identifier types, character sets, and registry structures. Resolution needs to be jurisdiction-aware.

Measuring Resolution Quality

Precision and Recall

Precision: Of records the system says match, what percentage actually match?

Precision = True Positives / (True Positives + False Positives)

Recall: Of records that actually match, what percentage does the system find?

Recall = True Positives / (True Positives + False Negatives)

F1 Score: Harmonic mean balancing precision and recall

F1 = 2 × (Precision × Recall) / (Precision + Recall)

KYB-Specific Metrics

STP Rate: Percentage of applications resolved without manual review

Match Rate: Percentage of applications successfully matched to authoritative records

Review Yield: Percentage of manual reviews that result in different decisions than the automated suggestion

Time to Decision: How long from application submission to verification decision

Entity Resolution in Agentic Workflows

As AI agents take on business verification tasks, entity resolution becomes a critical dependency in the agent pipeline rather than a pre-processing step handled elsewhere.

Why disambiguation must be explicit

When a human analyst processes a KYB application, they can apply judgment to ambiguous matches, recognizing that a name variation looks like a trade-name relationship, or that an address discrepancy is consistent with how Delaware-registered operating companies are typically structured. An AI agent operating at scale can’t apply this judgment unless disambiguation is an explicit, structured step in the workflow.

Entity resolution for agentic workflows should surface:

The candidate records considered
The similarity signals evaluated for each candidate
The confidence score on the selected match
A flag when confidence falls below the threshold for automatic resolution

Without these signals, an agent has no basis for routing uncertain cases to human review.

Latency requirements change the architecture

Human-reviewed KYB workflows tolerate batch entity resolution; records can be pre-resolved before the analyst’s queue is populated. Agentic workflows often require real-time resolution at query time: the agent receives a business name on an application and needs to resolve it to an entity identity in milliseconds before proceeding with downstream verification steps.

This shifts entity resolution from an ETL concern to an API design concern. Resolution at agent speed requires pre-indexed blocking structures for fast candidate retrieval, cached results for previously seen inputs, tiered resolution paths (deterministic before probabilistic, with fast-exit on high-confidence matches), and confidence scores in the response payload so the agent can make routing decisions without a second query.

Confidence propagation

A confidence score from entity resolution should propagate through every downstream step in the agent’s workflow. If the agent resolved the input entity with 0.62 confidence, the agent’s downstream conclusions should carry that uncertainty, and its final decision should reflect it.

Agents that reach confident verification decisions on top of low-confidence entity matches are producing false precision. Match uncertainty is material and should be surfaced, not suppressed.

For more on how entity resolution fits into the broader agentic KYB stack, see The Business Identity Stack for Agentic AI.

Key Takeaways

Entity resolution is the foundation of accurate business verification; without it, KYB fails
Names don’t match across sources; resolution handles variations through multiple techniques
Deterministic matching is precise but limited; probabilistic matching handles variation; graph-based resolution reveals structure
Resolution quality directly impacts STP rates. Better resolution means more automation
Tune for your risk profile: balance false positives and false negatives based on cost
Measure and iterate. Track precision, recall, and business metrics to improve over time
In agentic workflows, disambiguation must be explicit and confidence must propagate. Agents routing on implicit matches produce false precision

What is KYB?: Foundational overview
Business Graph: Graph models for business relationships
KYB Automation: Automation strategies
Entity Resolution (glossary): Quick reference
The New Enigma KYB: More Automatic Approvals: How Enigma approaches entity resolution