Anthropic's Financial Services Reference Repo: Building Production AI for Finance

A guided tour of Anthropic's open-source reference implementation for building Claude-powered financial applications — covering tool use, compliance-aware prompting, RAG document analysis, and why each pattern matters in a regulated industry.

Estimated time: ~12 min
Difficulty: intermediate
Sources: 5 sources

A hedge fund analyst asks an AI assistant: “Does our portfolio breach any VaR limits today?” A general-purpose LLM hallucinates a number. A properly engineered financial AI calls your risk system, reads the actual data, and responds with a cited answer that a compliance officer can audit. The gap between those two outcomes is what this repository is designed to close.

What the Repository Actually Is

The anthropics/financial-services repository is an open-source reference implementation published by Anthropic. ^{[anthropics/financial-services — GitHub]} It is not a product, a managed service, or a model. It is a collection of working code patterns that show developers how to integrate Claude into financial-sector applications — the kind of applications where a wrong number, an unattributed claim, or a missing disclosure can carry regulatory and legal consequences.

Think of it as an opinionated starter kit that already handles the hard parts: connecting Claude to live data sources, keeping answers within regulatory guardrails, producing auditable tool-call chains, and structuring outputs that downstream systems can parse reliably.

Reference implementation def.

A reference implementation is a fully functional, working codebase that demonstrates a set of design patterns — not meant to be run as-is in production, but to be read, understood, adapted, and integrated. It trades production hardening (authentication, scaling, full error handling) for clarity of pattern.

Who this is for

The primary audience is a software engineer or ML engineer at a bank, asset manager, hedge fund, fintech, or insurance company who has been asked to build a Claude-powered feature and needs to know the right architectural patterns from the start — not just “does Claude work?” but “how do we build this in a way our legal and compliance teams will accept?”

Check your understanding

What is the anthropics/financial-services repository?

The Core Pattern: Tool Use for Live Financial Data

The most important pattern the repository demonstrates is function calling (also called tool use). Here is why it matters.

Claude, like every LLM, has a training data cutoff. It does not know today’s stock prices, your portfolio’s current composition, your firm’s proprietary risk metrics, or last quarter’s earnings. If you ask it these questions without tool use, it guesses — and guessing about financial data is dangerous.

Tool use changes this. You register a set of functions with the Claude API. When Claude determines it needs live data to answer the question honestly, it returns a structured function call instead of a prose answer. Your backend executes that function, returns the result, and Claude synthesises the final answer grounded in real data.

The anatomy of a tool call

The user asks: “What is the current yield on 10-year US Treasuries, and how does it compare to last quarter?”

Without tool use, Claude either refuses or guesses. With tool use:

Claude returns { "name": "get_bond_yield", "arguments": { "instrument": "US10Y", "compare_to": "last_quarter" } }
Your backend fetches the actual yield from your data provider.
Claude receives the result and writes a response: “The 10-year Treasury is yielding 4.31%, up 18bps from last quarter’s 4.13%.”

Every number is sourced from your data layer — none invented by the model.

Select a scenario to trace the full user → Claude → tool → result → answer cycle. Notice that Claude's decision to call a tool is explicit and logged — this is the auditable core of the pattern.

Common misconception

Claude looks up financial data automatically once you connect it to the internet.

What's actually true

Claude does not have autonomous internet access. Tool use is explicit and developer-controlled. You define exactly which functions Claude may call, with which argument shapes, and your backend executes them. Claude never makes an HTTP request itself — it returns a structured request that you fulfill. This means the data access layer remains entirely under your control, which is what compliance and security teams require.

Check your understanding

Why does tool use matter more for financial applications than for a general-purpose assistant?

Compliance-Aware Prompting: The Regulatory Layer

Every regulated financial institution operates under a web of rules: MiFID II in Europe, FINRA and SEC regulations in the US, and internal compliance policies layered on top. The reference repo shows how to encode these constraints into the system prompt and output structure so Claude’s behaviour stays within the regulatory envelope by default. ^{[FINRA Report on Artificial Intelligence (AI) in the Securities Industry (2020)]}

The three main mechanisms are:

1. System-prompt constraints — Instructions that prevent Claude from offering personalised investment advice (a licensed activity), speculating on future prices, or omitting legally required disclosures. These are encoded as explicit rules, not relied upon from model training.

2. Structured output schemas — Financial reports and compliance-sensitive responses are generated as typed JSON that downstream systems can validate. If Claude attempts to produce an invalid field (say, a fabricated ISIN number), the schema catches it before it reaches the user.

3. Retrieval grounding — Answers are required to cite source passages. Claude is instructed to refuse to make claims it cannot ground in the provided documents. This dramatically reduces the hallucination surface.

	Characteristic	Naïve integration
Data currency	Training-time knowledge only	Live via tool calls
Citation	No sourcing of claims	Mandatory source passage citation
Investment advice guardrail	Depends on model defaults	Explicit system-prompt rule
Audit trail	Text log at best	Structured tool-call chain with inputs/outputs
PII handling	Uncontrolled	Masked before context window; schema prevents surfacing
Output schema	Free-form prose	Typed JSON, downstream-validated

Compliance posture: standard LLM integration vs. the reference repo's patterns

The trade-off is real: more constraint means more guardrails to maintain, more tokens in the system prompt, and slightly higher latency. The widget below lets you explore this directly.

Drag the sliders to see how regulatory scope, audit depth, and PII sensitivity interact with latency, hallucination risk, and response richness. The 'Standard compliance regime' is where the reference repo defaults.

The reference repo’s compliance patterns are engineering patterns, not legal compliance. They significantly reduce risk but do not substitute for review by qualified legal counsel. Every firm’s regulatory environment is different. The patterns are a starting point.

Check your understanding

A developer wants to skip the system-prompt compliance constraints to get faster, richer responses. What is the key risk?

RAG for Financial Documents: Grounding Answers in Your Corpus

Retrieval-Augmented Generation (RAG) is the pattern that lets Claude answer questions grounded in your firm’s proprietary documents — 10-K filings, earnings call transcripts, internal research, compliance manuals — without those documents ever being used to train the model or leaving your infrastructure. ^{[anthropics/financial-services — GitHub]}

The pipeline has six stages:

flowchart LR
  Q[User query] --> E[Embed query]
  E --> V[Vector search
corpus]
  V --> R[Rerank
passages]
  R --> A[Augment
prompt]
  A --> G[Claude
generates]
  G --> ANS[Cited answer]
  style G fill:#1d4ed8,color:#fff
  style ANS fill:#166534,color:#fff

The RAG pipeline for financial document Q&A

The key insight is that Claude never needs to be fine-tuned on your documents. The documents stay in your vector store. Claude only ever sees the top-k most relevant passages for the specific question being asked — nothing more. This is crucial for firms handling non-public information: the model never ingests MNPI, it only reads what your retrieval layer decides to surface.

Select a query type to trace how it moves through embedding, vector search, reranking, prompt augmentation, and grounded generation. Notice how every claim in the final answer maps back to a specific passage.

Why domain-specific embeddings matter for finance

General-purpose embedding models (trained on web text) encode words like “carry trade,” “basis risk,” “duration,” and “convexity” imprecisely — these terms appear rarely in general web text and often mean something different in a financial context. Domain-specific embedding models trained on financial corpora (Voyage’s voyage-finance-2 is one example) produce vectors where financial jargon clusters correctly. The practical result is that vector similarity search returns more relevant passages, which means Claude has better grounding material and produces more accurate answers. The reference repo notes this distinction and recommends evaluating domain-specific embeddings before deploying to production.

Check your understanding

In the RAG pattern, where does Claude store your proprietary financial documents for future answers?

When to Use This Repo — and When Not To

The reference implementation is the right starting point when:

You are building a Claude-powered feature for a regulated financial institution.
Your use case involves live or proprietary data (not just reasoning over public knowledge).
You need a structured audit trail for compliance purposes.
Your output will be consumed by downstream systems that need typed, validated data.

It is overkill or the wrong fit when:

You are building an internal, low-stakes productivity tool (document summariser for internal memos, meeting note taker) where hallucination consequences are low and regulatory scrutiny is absent.
Your primary use case is reasoning over well-known public information (explaining a concept, answering general economics questions) where grounding is not required.
You need a rapid prototype to test if Claude can understand your domain — use the raw API first, then layer in the reference patterns once you’re confident.

	Use case	Reference repo patterns needed?
Client portfolio Q&A chatbot	Yes	Live data + compliance guardrails + audit trail all required
Earnings call transcript summariser	Partial (RAG only)	RAG pattern helps; compliance guardrails depend on distribution
Internal research assistant	Partial	RAG useful; lighter compliance constraints may suffice
Explaining compound interest to a student	No	No live data, no compliance risk, no proprietary corpus
Automated SEC filing drafting	Yes	High-stakes output; structured schema + human review essential

Use-case fit matrix

Common misconception

Using the reference repo makes your application automatically compliant with financial regulations.

What's actually true

The repository provides engineering patterns that support compliance — they do not confer regulatory compliance. Whether your application is compliant depends on your specific regulatory context, your firm’s legal review, and how you deploy and operate the patterns. The repo is a head start, not a guarantee.

Check your understandingQ 1 / 5

What does Claude return when it decides to use a tool?