The five things UK businesses keep pasting into ChatGPT

Every incident review tells the same story

When we speak to UK firms after a near-miss — or a regulator letter — the details differ but the payload does not. Staff paste structured business data into a public chat box because the model is good at the task and nobody stopped them at the keyboard. You do not need a forensic engagement to guess what went in. These five categories cover the majority of real-world leaks we see in professional services, finance, property, and healthcare admin.

1. Client and customer names tied to context

Not just “John Smith,” but John Smith — the insolvency we are advising on, deadline Thursday, opposing counsel is… Once a name is paired with matter context, it is personal data about identifiable individuals and often confidential client information. Models store nothing permanently in the way people fear, but the prompt still crossed into a US provider’s environment — and your DPA almost certainly required you to minimise that.

What to watch for: matter titles in the first line, “our client X,” CRM exports pasted as bullet lists.

2. Draft contracts, letters, and board packs

Whole clauses, NDAs, employment agreements, and investment memos go in because “rewrite this more clearly” works. The risk is not only personal data — it is unpublished commercial terms, pricing, and counterparties. For law firms this can touch legal professional privilege; for everyone else it is straight confidentiality.

What to watch for: PDF text dumps, tracked-change paragraphs, “here is the LOI we received.”

3. Spreadsheets: payroll, pipeline, and performance data

Finance and ops love pasting tables. A CSV with employee names, salaries, and NI numbers is a single action. So is the sales pipeline with deal values and competitor notes. Spreadsheets feel less “serious” than a database export; to a classifier they are worse, because everything is explicit and labelled.

What to watch for: tab-separated rows, “column A is revenue,” bonus calculations, redundancy lists.

4. UK identifiers the regex layer was built for

National Insurance numbers, UK postcodes in address blocks, company registration numbers, VAT IDs, sort codes and account numbers, NHS numbers in admin notes — these are machine-detectable and should never reach a public model unredacted. They appear constantly in HR tickets, KYC packs, and property completions because staff assume the model “needs the real number to be useful.”

What to watch for: onboarding forms, AML packs, “validate this address,” patient admin copied from the PAS.

5. Your own client list and internal codenames

Generic NER misses the data only you care about: Project Falcon, client ref 8842, the Acme renewal we cannot lose. That list lives in your CRM, not in a public training set. One paste teaches the session everything a competitor would pay for. Custom rule packs exist precisely because this category is firm-specific.

What to watch for: “here are our top 20 accounts,” sprint names, deal codenames in subject lines.

What to do Monday morning

You will not fix this with another policy PDF. Short, practical steps:

  • Run a 30-minute workshop with team leads using these five headings — ask “which did we almost do last month?”
  • Put inspection on the path staff already use — not a separate portal — so redaction happens before send.
  • Seed rules from your client/matter export on day one; expand when someone clicks “add to block list.”
  • Log sessions so when the board asks, you show prevention counts — not a ban that nobody obeyed.

The goal is not to scare people off AI. It is to stop these five payloads crossing the line while your team keeps the productivity win. Recognise the patterns early and you will not be the firm explaining to the ICO why a whole spreadsheet lived in a chat log for thirty seconds — which was thirty seconds too long.

Book a demo

See the appliance, the redaction pipeline, and the audit log — 20 minutes, no slide deck.

Book a demo