Building a KYC Sanctions Triage Agentic System for Financial Services
Post #1 in my series on agentic AI use cases across Financial Services and TMT.
I'm spending the next stretch of weeks shipping small, real agentic AI prototypes across FS and TMT sub-domains — wealth, capital markets, payments, KYC/AML, telco, media, ad-tech, gaming, and more. The goal is breadth with depth: build the thing, learn the domain, share what's actually useful for leaders trying to figure out where agentic AI fits.
First up: KYC Sanctions Alert Triage.
I spent a couple of hours this week building a small agentic AI prototype to teach me something real about the domain and to stress-test where agentic AI actually fits in regulated workflows.
Why sanctions screening
What it is. When you send a wire, open an account, or onboard a vendor at a US financial institution, your name gets checked against government watchlists of individuals, entities, vessels, and aircraft that US persons are prohibited from transacting with.
The most well-known list is the Specially Designated Nationals (SDN) list, maintained by OFAC — the US Treasury's Office of Foreign Assets Control. OFAC administers sanctions against foreign governments, terrorists, narcotics traffickers, and others who threaten US foreign policy or national security. The list updates almost daily.
Who actually does the screening. Not OFAC. The financial institutions themselves are legally required to screen every transaction and customer, and to block or report anything that matches. Banks, broker-dealers, fintechs, crypto exchanges — all of them. They run vendor screening engines against a consolidated lists and generate an alert whenever a possible match is found.
An alert is not a conviction. It is "this transaction might involve a sanctioned party — a human needs to review and decide." Compliance analysts then investigate each one and disposition it: escalate if the match looks real, clear if it's a false positive, or gather more information if it's ambiguous.
Why it's so expensive. Tier-1 banks generate hundreds of thousands of alerts a month, with industry false-positive rates routinely above 95%. Each alert flows through tiered analyst review — L1 triage, L2 investigation, L3 escalation — adding up to nine-figure annual operations. The regulatory posture — OCC, FinCEN, BSA examination standards — expects every alert to be reviewed by a qualified human and every disposition to be documented and defensible. One missed true positive can become a consent order.
The economics push hard toward automation. The regulatory posture pushes hard the other way. That tension is exactly where agentic AI deserves a careful look, and exactly where overclaiming is dangerous.
That tension is exactly where agentic AI deserves a careful look, and exactly where overclaiming is dangerous.
What I built
Sanctions Triage Agentic System Architecture
A multi-agentic system consisting of three-agent workflow in LangGraph, running on Gemini:
A Name Match Agent that combines deterministic fuzzy matching with LLM judgment on match strength and common-name risk
A Context Agent that weighs corroborating evidence — DOB, country, sanctions program, entity type
A Rationale Agent that synthesizes the evidence into a disposition (escalate, clear, or needs more info) with a compliance-style audit rationale
Worth being precise on language here: this is an agentic AI system, not a single agent. The distinction matters. A single agent is one model with tools doing a task end-to-end. An agentic system is a structured workflow of specialized agents, each with a narrow job, orchestrated together — with deterministic guardrails at the seams. In regulated work, that structure isn't optional. It's how you get auditability, scoped reasoning, and the ability to swap or harden any single component without rebuilding the whole thing.
The test set was five synthetic alerts modeled on the public OFAC SDN schema, covering exact matches, alias matches, common-name false positives, entity-name overlaps, and a genuinely ambiguous case with missing DOB.
What I learned
Two things, both more about the domain than the technology.
First, "no information" and "disconfirming information" are opposite signals — and easy to collapse. My first version treated a common name with no DOB the same as a common name with the wrong DOB. Both got routed to needs more info. But in real compliance work, a common name with disconfirming evidence is the strongest false-positive signal there is. Fixing that one prompt rule took deflection from 1-in-3 to 2-in-3, with zero false negatives.
Second, jurisdiction matters more than I'd internalized. A country mismatch is a meaningful disconfirming signal for Russia-program designations (RUSSIA-EO14024 expects a Russian nexus), but it tells you almost nothing for global terrorism designations (SDGT — the party can transact from anywhere). That kind of program-level nuance is the difference between a prototype that looks right and one a compliance lead would actually engage with.
Making it production ready
To get this system to be production ready, you'd need official OFAC ingestion with daily refresh, broader and adversarial evaluation, structured outputs, full audit logging, deterministic policy rules outside the prompt layer, reviewer workflows, monitoring, model risk management under SR 11-7, and proper governance.
The framing I'd offer leaders
The honest opportunity for agentic AI in regulated workflows isn't auto-disposition. It's analyst augmentation — structuring the investigation, gathering evidence, applying deterministic controls, producing auditable rationale, and keeping humans on every high-risk decision. Any vendor pitch that auto-clears regulated alerts should get pushed back hard.
One last note on how I built this: I used an agentic engineering workflow — Claude, Codex, and similar tools — to move fast on implementation, review, and iteration. The domain framing, the architecture decisions, the prompt rules that fixed the disconfirming-evidence bug — those stayed human-led. That's the pattern I keep seeing work: AI-native tooling accelerates the build, but the strategy lens and the domain judgment are still the job.
Repo is here if you want to dig into the code: github.com/Oladotun/FS_kyc_aml_alert_triage
Following along? I'm publishing one of these every week — a hands-on agentic AI build across Financial Services and TMT, with the domain lessons that actually matter. Next up rotates into TMT. Hit follow if you want the rest of the series, and drop a comment if there's a sub-domain you want me to tackle.