Anthropic's Financial Services Reference Repo: Building Production AI for Finance
A guided tour of Anthropic's open-source reference implementation for building Claude-powered financial applications — covering tool use, compliance-aware prompting, RAG document analysis, and why each pattern matters in a regulated industry.
- Estimated time
- ~12 min
- Difficulty
- intermediate
- Sources
- 5 sources
A hedge fund analyst asks an AI assistant: “Does our portfolio breach any VaR limits today?” A general-purpose LLM hallucinates a number. A properly engineered financial AI calls your risk system, reads the actual data, and responds with a cited answer that a compliance officer can audit. The gap between those two outcomes is what this repository is designed to close.
What the Repository Actually Is
The anthropics/financial-services repository is an open-source reference implementation published by Anthropic.
[anthropics/financial-services — GitHub]
It is not a product, a managed service, or a model. It is a collection of working code patterns that show developers how to integrate Claude into financial-sector applications — the kind of applications where a wrong number, an unattributed claim, or a missing disclosure can carry regulatory and legal consequences.
Think of it as an opinionated starter kit that already handles the hard parts: connecting Claude to live data sources, keeping answers within regulatory guardrails, producing auditable tool-call chains, and structuring outputs that downstream systems can parse reliably.
A reference implementation is a fully functional, working codebase that demonstrates a set of design patterns — not meant to be run as-is in production, but to be read, understood, adapted, and integrated. It trades production hardening (authentication, scaling, full error handling) for clarity of pattern.
Who this is for
The primary audience is a software engineer or ML engineer at a bank, asset manager, hedge fund, fintech, or insurance company who has been asked to build a Claude-powered feature and needs to know the right architectural patterns from the start — not just “does Claude work?” but “how do we build this in a way our legal and compliance teams will accept?”
Check your understanding
What is the anthropics/financial-services repository?
The Core Pattern: Tool Use for Live Financial Data
The most important pattern the repository demonstrates is function calling (also called tool use). Here is why it matters.
Claude, like every LLM, has a training data cutoff. It does not know today’s stock prices, your portfolio’s current composition, your firm’s proprietary risk metrics, or last quarter’s earnings. If you ask it these questions without tool use, it guesses — and guessing about financial data is dangerous.
Tool use changes this. You register a set of functions with the Claude API. When Claude determines it needs live data to answer the question honestly, it returns a structured function call instead of a prose answer. Your backend executes that function, returns the result, and Claude synthesises the final answer grounded in real data.
The anatomy of a tool call
The user asks: “What is the current yield on 10-year US Treasuries, and how does it compare to last quarter?”
Without tool use, Claude either refuses or guesses. With tool use:
- Claude returns
{ "name": "get_bond_yield", "arguments": { "instrument": "US10Y", "compare_to": "last_quarter" } } - Your backend fetches the actual yield from your data provider.
- Claude receives the result and writes a response: “The 10-year Treasury is yielding 4.31%, up 18bps from last quarter’s 4.13%.”
Every number is sourced from your data layer — none invented by the model.
Common misconception
Claude looks up financial data automatically once you connect it to the internet.
What's actually true
Claude does not have autonomous internet access. Tool use is explicit and developer-controlled. You define exactly which functions Claude may call, with which argument shapes, and your backend executes them. Claude never makes an HTTP request itself — it returns a structured request that you fulfill. This means the data access layer remains entirely under your control, which is what compliance and security teams require.
Check your understanding
Why does tool use matter more for financial applications than for a general-purpose assistant?
Compliance-Aware Prompting: The Regulatory Layer
Every regulated financial institution operates under a web of rules: MiFID II in Europe, FINRA and SEC regulations in the US, and internal compliance policies layered on top. The reference repo shows how to encode these constraints into the system prompt and output structure so Claude’s behaviour stays within the regulatory envelope by default. [FINRA Report on Artificial Intelligence (AI) in the Securities Industry (2020)]
The three main mechanisms are:
1. System-prompt constraints — Instructions that prevent Claude from offering personalised investment advice (a licensed activity), speculating on future prices, or omitting legally required disclosures. These are encoded as explicit rules, not relied upon from model training.
2. Structured output schemas — Financial reports and compliance-sensitive responses are generated as typed JSON that downstream systems can validate. If Claude attempts to produce an invalid field (say, a fabricated ISIN number), the schema catches it before it reaches the user.
3. Retrieval grounding — Answers are required to cite source passages. Claude is instructed to refuse to make claims it cannot ground in the provided documents. This dramatically reduces the hallucination surface.
| Characteristic | Naïve integration | Reference repo patterns | |
|---|---|---|---|
| Data currency | Training-time knowledge only | Live via tool calls | |
| Citation | No sourcing of claims | Mandatory source passage citation | |
| Investment advice guardrail | Depends on model defaults | Explicit system-prompt rule | |
| Audit trail | Text log at best | Structured tool-call chain with inputs/outputs | |
| PII handling | Uncontrolled | Masked before context window; schema prevents surfacing | |
| Output schema | Free-form prose | Typed JSON, downstream-validated |
The trade-off is real: more constraint means more guardrails to maintain, more tokens in the system prompt, and slightly higher latency. The widget below lets you explore this directly.
The reference repo’s compliance patterns are engineering patterns, not legal compliance. They significantly reduce risk but do not substitute for review by qualified legal counsel. Every firm’s regulatory environment is different. The patterns are a starting point.
Check your understanding
A developer wants to skip the system-prompt compliance constraints to get faster, richer responses. What is the key risk?
RAG for Financial Documents: Grounding Answers in Your Corpus
Retrieval-Augmented Generation (RAG) is the pattern that lets Claude answer questions grounded in your firm’s proprietary documents — 10-K filings, earnings call transcripts, internal research, compliance manuals — without those documents ever being used to train the model or leaving your infrastructure. [anthropics/financial-services — GitHub]
The pipeline has six stages:
flowchart LR Q[User query] --> E[Embed query] E --> V[Vector search corpus] V --> R[Rerank passages] R --> A[Augment prompt] A --> G[Claude generates] G --> ANS[Cited answer] style G fill:#1d4ed8,color:#fff style ANS fill:#166534,color:#fff
The key insight is that Claude never needs to be fine-tuned on your documents. The documents stay in your vector store. Claude only ever sees the top-k most relevant passages for the specific question being asked — nothing more. This is crucial for firms handling non-public information: the model never ingests MNPI, it only reads what your retrieval layer decides to surface.
Why domain-specific embeddings matter for finance
General-purpose embedding models (trained on web text) encode words like “carry trade,” “basis risk,” “duration,” and “convexity” imprecisely — these terms appear rarely in general web text and often mean something different in a financial context. Domain-specific embedding models trained on financial corpora (Voyage’s voyage-finance-2 is one example) produce vectors where financial jargon clusters correctly. The practical result is that vector similarity search returns more relevant passages, which means Claude has better grounding material and produces more accurate answers. The reference repo notes this distinction and recommends evaluating domain-specific embeddings before deploying to production.
Check your understanding
In the RAG pattern, where does Claude store your proprietary financial documents for future answers?
When to Use This Repo — and When Not To
The reference implementation is the right starting point when:
- You are building a Claude-powered feature for a regulated financial institution.
- Your use case involves live or proprietary data (not just reasoning over public knowledge).
- You need a structured audit trail for compliance purposes.
- Your output will be consumed by downstream systems that need typed, validated data.
It is overkill or the wrong fit when:
- You are building an internal, low-stakes productivity tool (document summariser for internal memos, meeting note taker) where hallucination consequences are low and regulatory scrutiny is absent.
- Your primary use case is reasoning over well-known public information (explaining a concept, answering general economics questions) where grounding is not required.
- You need a rapid prototype to test if Claude can understand your domain — use the raw API first, then layer in the reference patterns once you’re confident.
| Use case | Reference repo patterns needed? | Why | |
|---|---|---|---|
| Client portfolio Q&A chatbot | Yes | Live data + compliance guardrails + audit trail all required | |
| Earnings call transcript summariser | Partial (RAG only) | RAG pattern helps; compliance guardrails depend on distribution | |
| Internal research assistant | Partial | RAG useful; lighter compliance constraints may suffice | |
| Explaining compound interest to a student | No | No live data, no compliance risk, no proprietary corpus | |
| Automated SEC filing drafting | Yes | High-stakes output; structured schema + human review essential |
Common misconception
Using the reference repo makes your application automatically compliant with financial regulations.
What's actually true
The repository provides engineering patterns that support compliance — they do not confer regulatory compliance. Whether your application is compliant depends on your specific regulatory context, your firm’s legal review, and how you deploy and operate the patterns. The repo is a head start, not a guarantee.
Check your understandingQ 1 / 5
What does Claude return when it decides to use a tool?