New frontiers in AI and finance: Kepler built verifiable AI fon financial services with Calude

0
65
Kepler

Kepler, the verifiable AI platform for financial research, expanded on the architecture documented in a recent Anthropic customer profile, “How Kepler built verifiable AI for financial services with Claude.” The profile details a system built on a single thesis: language models alone cannot meet the rigor financial workflows require, and traditional financial software cannot match the flexibility analysts need from AI. Kepler’s architecture combines them. That combination is what makes the answers defensible in an IC memo or under audit.

Financial firms operate in a heavily regulated environment where reporting has to be auditable and accountable. Every figure in a regulatory filing, deal pitch, or research report needs to be verifiable against source documents.

The tools the financial industry has traditionally relied on can pull data, but they still require analysts for that verification process. An analytics system can’t interpret a freeform question, decompose it into steps, or work out that a single metric requires pulling three different line items across specific fiscal periods. AI systems can do that interpretation, but they handle it in the same step as the computation, so the numbers they produce are generated by the model, which can make mistakes.

Vinoo Ganesh and John McRaven spent years at Palantir building data systems for defense, energy, and financial firms. That work shaped how they think about trust in environments where answers must be verifiable. Before founding Kepler, they spoke with 147 financial firms, including private equity, hedge funds, and investment banks, and heard the same thing at nearly all of them: everyone wanted to use AI for research, but nobody trusted the output. As one managing director told them, “How am I supposed to trust something I can’t audit?” 

The duo’s answer was to build deterministic infrastructure that serves as a trust and verification layer for AI. That infrastructure, together with Claude as the reasoning and interpretation layer, powers Kepler Finance: a research platform for financial services used by analysts to ask questions in plain English and receive instantly verifiable answers. 

Handling long, multi-step tasks and flagging ambiguity

Financial analysis involves complex, multi-step calculations, dense data, and overloaded terminology, and has no tolerance for error. With that in mind, Kepler needed a model that could hold a long plan together without drift and flag ambiguity. 

For example, if an analyst asks for a company’s inventory days outstanding over the last eight quarters, the model needs to figure out what the answer needs: the right formula, correct fiscal periods, and any restatements that might affect the numbers. 

The team benchmarked across all frontier models. They found that on straightforward queries, models performed comparably. But when it came to long, multi-step plans with interdependencies, all but Claude started taking shortcuts or losing track of constraints by the fourth or fifth step.  “On our workloads, Claude was the model that consistently held the plan together,” Ganesh says. “Other models would start strong and then quietly drop a constraint by step five.”

Engineering the context around Claude

The Kepler team found that Claude produced better results when given precisely defined tasks enhanced with structured domain knowledge, definitions, and hard boundaries on what to resolve versus what to escalate. “In finance, the model can’t be the whole  system. We treat it as one stage in a pipeline whose job is to hand the model exactly what it needs to succeed at exactly that stage,” says McRaven. “Prompt engineering optimizes a call while content engineering optimizes the system around it.”

The clearest difference was how each model handled uncertainty and kept humans in the loop. For example, in situations where one term can have two different meanings, most models picked one meaning and kept going. Claude stopped and asked the analyst to decide. “That behavior matters more than any benchmark score,” Ganesh says. “One wrong assumption early in a financial analysis breaks everything downstream.”

Scaling with Claude

Kepler Finance has indexed more than 26 million SEC filings across 14,000+ companies, 50M+ public documents, and 1M+ private documents spanning 27 global markets. Claude makes that volume of unstructured data usable, interpreting questions against the entire corpus and reconciling differences in terminology across companies and time periods. Kepler’s retrieval layer then pulls figures from verified SEC filings, computes the result, and assembles the results into the desk’s Excel template, where with a single click analysts can trace each number back to its exact line item highlighted in the source document. 

For the buy-side analyst, that architecture changes three things in the daily workflow. Every figure in a Kepler answer clicks through to its filing, page, and line item, so a number in an IC memo can be defended at the level of the underlying 10-K. Calculations are explicit and reproducible, so running the same query twice returns the same number. Outputs are auditable end-to-end, so compliance reviews and examiner requests do not require redoing the work. Buy-side analysts at private equity firms, hedge funds, and investment banks use Kepler today. The platform was built in under three months, informed by interviews with 147 financial firms. The analysts Kepler serves want answers they can put their name on. The company is continuing to scale across the buy-side research stack, with expansion into private credit underway.