Running an LLM Locally vs. Using ChatGPT or Claude: A Privacy-First Guide

When you type into ChatGPT, your words travel to a server in another company's data center. For most tasks that's fine — but for private financial documents, medical records, HR files, or legal strategy, it's a dealbreaker. This lesson explains what actually changes when you run a model on your own machine, walks through four concrete scenarios, and gives you an honest picture of where local models win and where cloud still leads.

Estimated time
~30 min
Difficulty
intro
Sources
9 sources

Not professional legal or financial advice

This lesson explains technical and practical concepts around data privacy and AI tools. It does not constitute legal, medical, financial, or compliance advice. Before deploying any AI system on sensitive organizational data, consult qualified legal counsel, your compliance team, and your information security officer. Regulatory requirements (HIPAA, GDPR, attorney-client privilege, SEC rules) vary significantly by jurisdiction and context.

Imagine handing a folder of your company’s unannounced deal documents to an assistant who works for a different company — one with its own terms of service, its own logging policies, and its own financial incentives. That is, more or less, what happens every time you paste sensitive text into ChatGPT or Claude. Running a model locally closes that door entirely.

What Actually Happens When You Use a Cloud LLM

When you type a message into ChatGPT, Claude, or Gemini, here is the literal sequence of events:

  1. Your text leaves your device and travels over the internet to a server owned by OpenAI, Anthropic, or Google.
  2. Their server loads the model, runs your query, and generates a response.
  3. The response travels back to your screen.

Your original text — the part containing whatever you typed — lives on their infrastructure for some period of time. How long, and what happens to it, depends on their terms of service and your account type. Enterprise plans typically offer stronger data-isolation guarantees than free or consumer plans, but even then, your data is processed on hardware you don’t control.

Toggle between cloud and local modes to see the physical path your data takes.

Analogy — Cloud LLM is like Sending mail through a copying service

Using a cloud LLM is like sending a letter through a copying service: your message reaches its destination, but the service reads it, possibly logs it, and keeps a copy for some period. For most letters, this doesn’t matter. For your lawyer’s strategy memo, it does.

When a local model runs instead: the model files (several gigabytes of weights) are downloaded once and stored on your machine. When you send a query, it never leaves. The computation happens on your CPU or GPU, the response appears on your screen, and nothing has touched the internet.

Check your understanding

When you use ChatGPT on a free account, where does your input text go?

Four Use Cases Where Cloud Is a Dealbreaker

The question isn’t “is cloud AI bad?” — for most tasks it’s excellent and appropriate. The question is: does your specific use case involve data that must not leave your controlled environment?

Here are four categories where the answer is clearly yes.

Click each scenario to see the risk breakdown and whether a local model is up to the task.

1. Private financial statements and M&A documents

A corporate development team is analyzing a potential acquisition target. The documents include unpublished revenue figures, cost structures, and valuation models. This information is material non-public information (MNPI) in most jurisdictions. Sending it to any third-party server — even under a privacy policy — creates real legal risk. A local model can summarize the documents, extract key ratios, and draft memos without any of this data touching an external server. [SEC Regulation FD — Selective Disclosure and Insider Trading (17 CFR Part 243)]

2. Patient medical records

Healthcare providers in the US operate under HIPAA, which restricts where Protected Health Information (PHI) can be sent and who can process it. [HIPAA and cloud computing guidance (HHS)] Consumer cloud AI products are not HIPAA Business Associates by default — meaning uploading patient records to a standard ChatGPT account likely violates HIPAA. A local model can help physicians summarize notes, generate structured data from clinical text, or answer queries about a patient’s history — all without the record leaving the hospital’s network.

3. Internal HR documents

Performance improvement plans, compensation bands, disciplinary records, reduction-in-force planning, and union-sensitive communications are all highly confidential. Employee privacy laws in many countries (especially EU GDPR) place strict obligations on how personal data is processed. Running a local model lets HR teams draft communications, summarize large policy documents, and answer employee-handbook queries without routing confidential employee data through a third-party service. [GDPR Article 28 — Processor obligations (gdpr-info.eu)]

4. Legal contracts and litigation strategy

Attorney-client privilege is a foundational protection in legal systems: communications between a lawyer and client for the purpose of legal advice are confidential. Transmitting privileged communications to a third-party cloud service without appropriate legal controls can constitute a waiver of that privilege. [ABA Formal Opinion 477R — Securing Communication of Protected Client Information (2017)] A local model can assist with contract redlining, clause extraction, and document comparison — tasks that make up the bulk of legal AI use today — without the documents leaving the firm’s infrastructure.

Common misconception

Enterprise plans from OpenAI or Anthropic make my data completely safe and private.

What's actually true

Enterprise plans do typically include stronger data protections — often a Data Processing Agreement (DPA), promises not to train on your data, and retention limits. But your data is still processed on their infrastructure. “We promise not to look” and “it never left your machine” are categorically different guarantees. For regulated industries, the distinction often determines compliance.

What Changes When the Model Runs on Your Machine

Running a local LLM means downloading a model file — typically between 4 GB and 40 GB depending on model size — and running software that loads it. The most beginner-friendly tool today is Ollama, which reduces setup to a two-command install. [ollama — run LLMs locally]

What you get:

  • Zero data egress. Nothing leaves your machine. Full stop.
  • Offline capability. Works on a plane, in a secure facility, or anywhere without internet.
  • No per-query cost. Once you have the model, every query is free. For high-volume use cases (processing thousands of documents), this compounds quickly.
  • Auditability. You can log exactly what was sent, what was returned, and when — without relying on a vendor’s audit trail.

What you give up:

  • Raw reasoning power. The best local models today — Llama 3, Mistral, Qwen, Gemma — are excellent. But GPT-4o and Claude Sonnet still outperform them on complex multi-step reasoning, coding, and nuanced writing tasks. The gap is real, though it has narrowed rapidly in 2024–2025.
  • Speed (on modest hardware). Cloud providers run models on custom hardware clusters. A local model on a laptop runs slower. On a modern machine with a GPU (NVIDIA, Apple Silicon), performance is usable; on a CPU-only machine, it can be slow for large models.
  • Context length. Cloud models often handle 128K or more tokens (roughly 100,000 words) in a single request. Most local models comfortable on consumer hardware top out at 8K–32K. Large document sets require chunking.
  • Maintenance. You are responsible for updates, version management, and (if you expose the model to others) running a server.

The hardware question

The single most common question: “What machine do I need?”

  • Apple Silicon Mac (M1/M2/M3/M4): Excellent local model performance out of the box. The unified memory architecture means 16–24 GB RAM models run at practical speeds. Most people starting here run a 7B–13B parameter model comfortably.
  • PC with a modern NVIDIA GPU (8+ GB VRAM): Similar capability. The GPU handles the math; more VRAM allows larger models.
  • CPU-only laptop or desktop: Possible, but slow. A 7B model on a modern Intel/AMD CPU generates roughly 3–10 tokens per second — readable but not fast. Adequate for batch processing overnight.
  • Server (for a team): A single machine with a powerful GPU can serve a small team via an Ollama or vLLM server endpoint — the model stays inside your network, multiple people can query it.

Check your understanding

A law firm wants to use an LLM to extract key clauses from client contracts. The contracts contain privileged communications. Which setup is most consistent with maintaining attorney-client privilege?

Honest Tradeoffs: Where Local Wins, Where Cloud Still Leads

Neither option dominates. The right choice depends on what you are trying to do and what constraints bind you.

Explore the full tradeoff landscape. Neither option wins on every axis.
Cloud LLM (GPT-4o, Claude) Local LLM (Llama 3, Mistral)
Raw reasoning quality Excellent — frontier modelsGood for most tasks; gaps on complex reasoning
Privacy / data control Data leaves your machineData never leaves your machine
Speed Fast — dedicated inference hardwareSlower on consumer hardware; fast on GPU
Cost (high volume) Per-token billing adds up quicklyEffectively free after hardware
Context window (large docs) Up to 200K tokens on frontier modelsTypically 8K–32K; improving fast
Offline / air-gapped use Requires internet connectionWorks fully offline
Setup complexity Sign up, API key, doneDownload model, install Ollama, configure
Latest knowledge Updated regularly by vendorFixed at training cutoff; requires upgrade
Cloud vs Local LLM — honest comparison

The decision heuristic: Start with the data. If the documents you want to process could plausibly appear in a regulatory filing, a legal brief, or a news article that would damage your organization — and you haven’t signed a DPA with your cloud vendor and verified compliance — local is the default-safe choice. If the task involves no sensitive data and you need maximum reasoning quality, cloud wins.

What about hybrid approaches?

Several organizations use a hybrid architecture: a local model handles document ingestion and sensitive extraction, and the result (a non-sensitive summary or structured output) is optionally sent to a cloud model for additional refinement. This keeps raw sensitive data local while still benefiting from cloud reasoning quality on the sanitized output. Some enterprise vendors also offer dedicated deployment models where the LLM runs on cloud infrastructure that only your organization can access — effectively a private cloud. Examples include Azure OpenAI (with VNet integration), AWS Bedrock, and on-premises enterprise offerings from Mistral and Llama-based vendors.


Your Ownable Artifact: The Data-Sensitivity Decision Card

Before using any LLM — cloud or local — run through these four questions about the documents you are about to paste in:

  1. Who created this data? If a client, patient, or employee, it’s likely protected by contract, regulation, or both.
  2. What would happen if a journalist had this text? If the answer is “headline risk,” it stays local.
  3. What is the simplest task I actually need? Summarize, extract, compare, draft? Most of these are well within local model capability.
  4. What hardware do I have? If you have an Apple Silicon Mac or any modern NVIDIA GPU machine, local is practical today with Ollama and a 7B–13B model.

Write these four questions on a card. Keep it next to your keyboard for every AI task that involves non-public information.


Check your understandingQ 1 / 4

A doctor wants to use an LLM to summarize patient discharge notes. Which option is most compliant with HIPAA?