On the Edge by Blueprint

On the Edge by Blueprint

Daily Builds

The $148 Scrape That Found 34,677 Customers

And the $0 one that found 80 Jira tenants. Why they can't swap.

Jordan Crawford's avatar
Jordan Crawford
Apr 22, 2026
∙ Paid

April 22, 2026 · Build log

Last year I ran the same question against two SaaS vendors: who uses this thing?

Jira: 80+ confirmed customer tenants. Three hours. $0.

Jane App, a patient-portal EHR: 34,677 practice domains across sixteen platforms. One afternoon. $148 all in.

Different cost. Different tools. Different technique entirely. And if I'd tried Jira's technique on Jane App, I'd have gotten zero. Vice versa.

That's the most expensive lesson in vendor-customer discovery, so learn it off my dime: silver bullets are vendor-architecture-dependent. Before you write a scraper, you have to diagnose what kind of vendor you're hunting.

Family A — the embed trick. EHR vendors ship JavaScript and iframes that practices paste into their own websites. The embed is the product. Jane App's value to a practice is the booking widget that lives on yourclinic.com/book — the practice doesn't move to janeapp.com, they integrate Jane's code into their site.

That means every Jane App customer's website contains Jane's code in its HTML source. Every one. PublicWWW indexes 516 million pages of HTML. Query: "cdn.janeapp.com". Get back every site that loads Jane's script. Thirty queries across sixteen EHR platforms cost $148 and returned 34,677 domains at 80%+ precision.

Family B — the tenant trick. Jira Cloud doesn't embed in customer sites. Every Jira customer gets their own subdomain: acme.atlassian.net, beta-corp.atlassian.net, fifty thousand of them. PublicWWW finds zero of those tenants. It finds Atlassian's own marketing site and some blog posts that link to acme.atlassian.net. Not customers.

What works: Common Crawl has crawled the web 121 times. Each crawl writes a CDX index listing every URL it hit. Query all 121 indices for **.atlassian.net*. You get a noisy list of candidate subdomains.

Then the dagger. Atlassian returns a specific HTTP header on 404s for tenants that don't exist: atl-missing-tcs: true. For tenants that DO exist, that header is absent. Run a HEAD probe against each candidate. Binary 1/0 confirmation. No false positives, no false negatives. Three hours, 80+ confirmed tenants. $0.

Family C — the logo wall. Most enterprise B2B vendors display customer logos on a /customers or /case-studies page. Near-100% precision because the vendor's legal team approved every logo. Crawl the page once, extract names from alt-tags, resolve to domains. Marketing teams can't help themselves.

Here's the asymmetry. PublicWWW query for "atlassian.net" returns Atlassian's own marketing pages. Not customers. Customers don't embed Atlassian's HTML — they ARE Atlassian, at a subdomain Atlassian itself hosts. Family B HEAD probes against Jane App sweep for **.janeapp.com* subdomains Jane doesn't use. Generic 404 with no signature header. Zero. Family C scrape on a vendor without a customer page — zero.

The scraper isn't broken. The scraper is matched to the wrong architecture. The fix isn't more queries, better regex, or a more expensive tool. The fix is diagnosis first.

That's the whole lesson. Three families. Ask which one applies before you write anything. Yes to "does it embed in customer HTML?" → Family A. Yes to "does it host customers at subdomains with a tell?" → Family B. Yes to "does it publish a customer page?" → Family C. No to all three → you need a swarm, not a silver bullet.

Diagnose first. Scrape second.

— Written by Claude Opus 4.7, Approved by Jordan


Below is the geeky version. Copy it into Claude Code and build the whole thing yourself.

Or don't. Annual subscribers don't build it. They type the vendor name into Claude Code and the tool shows up.

→ Go annual — $2,499/yr · Start at $50/mo (most readers start here)


This post is for paid subscribers

Already a paid subscriber? Sign in
© 2026 Jordan Crawford · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture