A Tool That Audits All of Your Customer Data

Most customer data is quietly broken. I built a tool that catches it before you waste a day on a wrong report.

May 26, 2026

∙ Paid

May 25, 2026 · Build log

Calligrapher's workshop with an unfinished audit on the page

The short version

I built a new tool. It checks customer data before you try to build anything with it. If the data is broken, the tool stops you. If the data is fine, it hands the next tool a clean list of what to use and what to skip.

That's it. The rest of this post explains why it matters and how it works.

First: what's a customer dossier?

Think of it like a one-page report for a single customer. It pulls in everything you know about them — from your CRM, your billing system, your call recordings, your product analytics, your support tickets — and lines it up on a timeline. Then it tells you the truth about that account in plain English. Are they active? Are they about to churn? Who do they actually talk to?

I have a tool called Dossier Builder that makes these reports. Point it at a customer's data, and it gives you the dossier.

But Dossier Builder only works if the data going in is good. And the data is almost never good.

Why dossiers usually break

Three things that wreck dossiers: bad joins, duplicate revenue columns, identity confusion

Three problems show up in almost every customer's data. Let me explain each one in plain English.

Problem 1: Data that doesn't actually match

Most companies have multiple systems. A CRM. A billing tool. A call recorder. These systems are supposed to talk to each other. The CRM has a customer ID. The billing tool has a customer ID. Match them up, and you can see one customer across both systems.

Except the matching is often wrong. The CRM ID column might point at customers that don't exist in the billing tool. Or the matching rule might be "same email address" — but two unrelated people have the same email at a big company.

At one customer I worked with, 282 customer records got matched to the wrong account because of a sloppy email rule. Nobody noticed. Every report built on top of that data was wrong.

Problem 2: Too many columns saying the same thing

Open a typical Salesforce account, and you'll find six columns that all claim to be "revenue." Annual Revenue. MRR. Monthly Revenue Summary. Reported Revenue. Subscription Revenue. Annual Revenue Range.

They don't agree with each other. Most of them haven't been updated in 18 months. None of them are labeled as the "real" one.

If Dossier Builder picks the wrong column, the dossier shows the wrong revenue. The CEO closes the report. You lose.

Problem 3: The same company shows up three times

One company can appear in your CRM under three names: the legal name, a brand name, and a "doing business as" name. Each one looks like a separate account. Each one shows half the activity. The real picture is split across all three.

Sometimes the data is even worse — a company gets listed as its own parent company, which makes no sense.

The fix: check the data first

A checker stands at the gate, refusing to let the builder pass until the data is clean

The normal way people handle these problems: build the dossier, find the bugs, fix the data, rebuild. That wastes a whole day.

My new tool reverses the order. It runs first. It looks for all three problems. If it finds any, it writes down exactly what's broken and refuses to let Dossier Builder run until you either fix the data or tell the tool "I know, do it anyway."

I call it the Dossier Data Auditor.

The auditor and the builder talk to each other through a single file. The auditor writes the file. The builder reads it. No human has to copy anything by hand.

What it caught when I tested it

Before I let the auditor near any real customer data, I tested it against a fake dataset I made up. I deliberately put broken stuff in the fake data. Here's what it caught:

60% of the matches were wrong. A column that was supposed to connect two tables actually only worked 40% of the time. The auditor labeled this as a hard failure. Dossier Builder would have run on top of broken matches for every record.
One company was listed as its own parent. The auditor caught the loop and flagged that one row for repair.
A revenue column that should never be used got picked. I put a column called Monthly Revenue Summary in the data and marked it as do-not-use. The auditor read the rule, skipped that column, and picked the right one (MRR) instead.

Each catch is the kind of thing that would have broken a real dossier.

The rule that makes it trustworthy

Math goes on the bottom. AI reasoning sits on top of the numbers

The auditor does math first, then asks the AI to make a call. Never the other way around.

Why this matters: if you let an AI compute numbers directly, it's roughly right most of the time and wrong in unpredictable ways the rest of the time. You can't tell which is which.

So the auditor does it like this. Step one: a small Python script counts how many records in column A have a match in column B. It writes the answer to a file. The answer is a real number. You can check it by hand.

Step two: an AI reads that file. The AI's only job is to look at the number and call it pass, warning, soft fail, or hard fail. The AI is allowed to have an opinion. It is not allowed to invent the number.

This is the load-bearing rule. The math is on disk. The AI is making a judgment call. You can audit either one.

The order matters too

The auditor checks things in a strict sequence. Each step has to pass before the next one starts.

1. Domains first. Figure out the real website for each company. Is it the brand site or the corporate site? Both? Neither? Some companies own multiple domains. The auditor uses a stack of tools to ground-truth this.
2. Joins second. Check the matches between systems. Throw out matches that don't survive a real test.
3. Revenue third. Pick the canonical revenue column. Skip the do-not-use ones. Make sure the picked column actually has data in it.
4. Sanity last. Look for impossible combinations. Customers with positive revenue but zero calls in six months. Columns nobody has updated in years. Self-referencing parent companies.

You can't flip this order. You can't compare revenue across two systems if you don't trust the matches between them. You can't trust the matches if you don't know which website belongs to which company.

What it costs to run

The math part is free. No AI tokens, no API calls. Just Python running on your laptop.

The AI part — eleven small AI helpers, each reading a different number and making a call — runs about $8 to $15 per customer-data folder. A real dossier build costs many times that. So the audit pays for itself the first time it catches anything.

— Jordan

Written with Claude Opus 4.7

Below is the geeky version. Copy it into Claude Code and rebuild the whole thing yourself.

Or don't. Annual subscribers install the tool I actually built with one command — every tool I ship, all 3 courses, weekly office hours.

→ Go annual — $2,499/yr · Start at $50/mo (most readers start here)

On the Edge by Blueprint