The Phase That Wasn't There
Five working pipeline phases. A conspicuous hole at the front. Here's what I found when I went looking.
March 24, 2026 · Build log
Every vertical I've built this year started the same dumb way. Fire departments. Chiropractic software buyers. 3PL logistics. Each time I'd sit down and hunt for data sources by hand. Score them in my head. Pick an anchor. Hope I picked right.
Three builds. Three different patterns. Zero shared code.
My pipeline had six phases on paper. Five of them worked. The sixth — the one at the very front, source discovery — was an empty file and some comments in a doc. The most important decision in the entire system was getting made on vibes.
I didn't want to just build it. I wanted to see the hole from the outside first.
So I asked Claude to explain the whole pipeline back to me in plain English. The six-phase arc. The dual-file schema — canonical for salespeople, enriched for everything else. The sub-agents. Then the question that mattered: what's not built?
Phase 0. No scorer. No registry. No build planner. Nothing.
Here's what bugged me. I had done source discovery. Three times. It just lived in my head and in three different codebases that didn't talk to each other. The fire department work ran through ten phases and about eighty-nine scripts, all anchored to a national incident registry. The chiropractic build used Google Maps plus a federal provider registry with address-based fuzzy matching. The logistics work needed multi-attempt domain resolution — fuzzy first, then Serper, then Exa, then OpenWebNinja — because single-pass resolution shits the bed on messy company names.
Same shape every time. A scored anchor source, a universal entry point, a plan. I just kept rebuilding it from scratch like an idiot.
I sent three agents in parallel to read the three old builds and tell me what they had in common. Patterns came back fast. Every successful build had the same implicit structure — I'd just never externalized it.
So I built Phase 0 that afternoon. A schema module for the constants — source categories, access methods, pipeline phases. A source-discoverer sub-agent. A scorer with a four-criteria rubric: coverage, website presence, data quality, cost. Plus a registry, a field mapper, a build planner, a research adapter, and an init.
Four verification tests. All four passed on the first real run.
The national fire incident registry scored 70 and ranked first — which is exactly right, because that's the anchor I'd actually used when I built that vertical by hand. The scorer matched my intuition on the one build I knew cold. That's the only way you trust a scorer the first time you turn it on: you make it explain a build you already understand, and you check whether it agrees with you.
Here's the part that actually mattered, and I almost missed it.
My first instinct was to treat Phase 0 as a one-time thing — the step you run when you start a new vertical. That's wrong. This needs to run for everything. Every re-run. Every refresh. Every time the data landscape shifts under a vertical I already built. Data sources disappear. New ones show up. Costs change. Coverage changes. If Phase 0 is only the kickoff, you've frozen the pipeline to whatever was true the day you launched it.
The source registry isn't a kickoff step. It's the universal front door. Updated all four architecture docs to say so before I closed the laptop.
Next up: Phase 1 ingestion. Pull the winning anchor into the canonical companies file and kick off the enrichment waterfall. But that's tomorrow.
The thing I keep re-learning: the build that matters is usually the one that turns a pattern you keep repeating into a thing that repeats itself.
— Written by Claude Opus 4.7, Approved by Jordan
Below is the geeky version. Copy it into Claude Code and rebuild the whole thing yourself.
Or don't. Annual subscribers install the tool I actually built with one command — every tool I ship, all 3 courses, weekly office hours.
→ Go annual — $2,499/yr · Start at $50/mo (most readers start here)


