Automated 3 Deep Research Agents (Claude, ChatGPT, Gemini) — $0 API costs

One question fans across three engines. A wave of Claude agents fact-checks every citation. The disagreements that survive — claims only one engine surfaced — are the actually useful intel.

May 27, 2026

∙ Paid

Three deep research engines fanning one question, with the disagreements that survive a hostile fact-check as the asymmetric edge

Ask ChatGPT, Claude, and Gemini the same research question and you will get three different answers.

Each engine indexes a different slice of the web and reasons through it with a different model. Each one catches things the others miss.

The places where all three agree just confirm what Wikipedia would tell you anyway. The places where exactly one engine found a claim — and that claim survives a hostile fact-check — are where the actually useful intel lives.

That set-difference between the three engines is the asymmetric edge.

Capturing it in practice has been almost impossible until now. Reading three deep-research reports and verifying every citation across them takes half a day for any human.

Triple Deep Research is the tool I built to do that work for you.

It takes one question, fans it across ChatGPT, Claude, and Gemini at the same time, and runs a wave of Claude agents that fact-checks every cited link in all three reports. What comes back is one synthesized report where every claim cites a stable identifier in the raw text — and where the disagreements that survived verification lead the report as the asymmetric edge.

What it does

You feed Triple one question.

Something like "What state-level rules passed in 2025 affect Medicare Advantage reimbursement?" Or "Which B2B SaaS companies shipped AI agent products in the last year and then publicly walked them back?"

The tool fans that question to all three engines at the same time. ChatGPT runs its deep research on its own slice of the web. Claude runs its deep research on its own index. Gemini runs its deep research on the Google corpus.

Three engines. Three models. Three separate cross-sections of the open web — all working the same question in parallel.

Once the reports come back, a wave of Claude agents takes them apart.

The first agent breaks every claim into its smallest factual unit and labels each one by how many engines surfaced it. Three more agents work in parallel — one per platform — and each opens every cited URL to decide whether the page actually supports the claim.

A fourth agent catches the date errors the engines tend to miss. A judge ranks every surviving claim by how much it actually shifts your thinking.

A final writer synthesizes everything into one report where every claim cites a stable identifier you can resolve back to the raw page text.

A run takes 15 to 55 minutes, gated by whichever engine is slowest.

The cost is nothing on the deep research itself. The work rides on the ChatGPT Plus, Claude Pro, and Gemini Advanced subscriptions you already pay for. The only marginal spend is about $3 in Claude agent tokens to run the fact-checker wave.

Why three engines, not one

Three deep research engines overlapping in a Venn diagram; the slice that only one engine produced — the set-difference — is the asymmetric edge

The three engines do not see the same web.

ChatGPT runs on the Bing index. That gives it strong coverage of news and academic sources, and pulls back the well-edited material journalists and researchers tend to cite.

Claude runs on its own index. It is strongest on long-form blogs and has been sharpening fast on primary documents — court filings, government reports, things like that.

Gemini sits on the full Google index, which has the largest surface area of the three by a wide margin. It routinely reaches obscure corners of the web — niche forums, regional publications, vendor PDFs — that no other engine touches.

Running a question through one engine gives you one cross-section of the web's knowledge on that question. Running the same question through all three gives you the set-difference — the gold sitting between the cross-sections.

The all-three-agree claims tend to be the consensus you could have found in any general search. They are a useful baseline, but they are not insight.

The one-of-three claims — the things that only one engine surfaced — are where the asymmetric value lives. If, that is, you can verify them against primary sources and rule out the ones the engine just hallucinated.

The asymmetric value is the set-difference. Everything else is just an answer.

The proof point: 1 link → 60 citations

Two paths exist for reading a deep-research report off these platforms. One returns almost nothing. The other returns everything.

The naive path scrapes the page's visible HTML — the DOM — and extracts whatever text and links you can see. The problem: what you see is almost never what is actually there.

On Claude, a DOM scraper gets back 1 link out of the 427 sources the page itself shows next to the report. The report lives inside a Document artifact tile the DOM snapshot cannot reach.

On ChatGPT, a DOM scraper gets back 0 characters of the report. ChatGPT renders it inside a sandbox iframe the outer page is not allowed to read.

On Gemini, the citations come back as redirect URLs the DOM never resolves. You end up with a list of meaningless redirect handles instead of real source pages.

The path that actually works asks the platform for its own data, the same way the platform's own page asks for it.

Each engine ships an undocumented data feed that the page itself calls to load the report. From inside the page — meaning from inside a real browser tab the user is already signed into — you can call that same feed and the platform hands you the data back as plain text.

On Claude, that call pulls the full 31,213-character report and all 60 citations in 157 milliseconds.

On ChatGPT, the report comes back as plain text from the same conversation feed the website's own React app reads.

On Gemini, the redirect-only citation URLs resolve to their real destinations. You can fetch them from inside the real browser tab, and Cloudflare passes you through — because you actually are a real browser tab.

Same conversation. Same question. Different read path. One returns roughly 0% of the information. The other returns 100%.

1 link of 427 sources via DOM scraper, versus the full 31,213-character report and all 60 citations via the page's own data feed in 157 milliseconds

Every library you'd reach for is broken

No off-the-shelf scraper handles all three platforms cleanly. None of these platforms want to be driven by anything that is not a real human in a real browser.

They have layers of defenses. Each layer was specifically designed to break the kind of script you would write to automate them.

ChatGPT runs a real-time signal-validation system inside the page. It watches for things a script would never do — patterns of clicks, keystrokes, and timings that a human produces but a bot does not. It silently flags any session that fails the test.

On top of that, every write to ChatGPT's interface hits a math puzzle the page itself computes before sending. A normal request cannot pass without running the page's own code.

On top of that, Cloudflare runs its bot challenge — the same test that decides whether to throw up a "verify you are human" page. The challenge token regenerates every few seconds.

Each of those three layers will reject a naive script the moment it tries.

Claude has a different problem.

The first time you drive a deep-research session in Claude, the page pops up a modal asking which "connectors" you want to allow — file system, GitHub, Drive, and the rest. The page refuses to run the research until the modal is dismissed.

A second modal then asks you to clarify the scope of your research before submitting. A scraper that does not handle those two modals dies at the door — and most do not.

Gemini wraps every response inside a serialized envelope format that Google uses internally. The envelope arrives in the network log as a chunked stream.

The chunks expire seconds after the network event fires. If you are not reading the response from inside the same page that received it, you do not get to read it at all.

The open-source libraries around all three are broken, as you would expect from that list.

The most popular Gemini scraper on GitHub has six open critical bugs nobody has fixed in months. ChatGPT exporters break every time Cloudflare rotates its challenge — which is every few seconds.

Claude scrapers cannot make it past the connector modals because they were built for the old chat surface, not the deep-research workflow that ships those modals.

You can spend a week patching one of those libraries every time a platform shifts its UI. Or you can stop fighting the platforms and let the browser do the writing for you.

Two ways to read a deep-research report: the DOM scraper that breaks on every platform versus reading the page's own data feed from inside the page

The insight: browser owns writes, JSON owns reads

The trick is to stop pretending you are not a browser.

Use the user's real signed-in Chrome — the actual browser with the actual cookies and the actual login session — and drive that. Instead of building a bot that tries to look like a browser, take a browser that is already signed in and tell it what to do.

When you drive the real Chrome, every defense the platform built falls away on its own.

The math puzzle gets computed by the platform's own page code — the page is the thing computing it. The bot-challenge tokens come along for free, because they are generated inside the page session you are inside of.

The IP address Cloudflare sees is the user's real IP. The handshake is the user's real handshake. Cloudflare looks at all of it and sees a real signed-in browser doing what real signed-in browsers do. It lets the request through.

That is the write path. Every input goes through the real browser.

For the read path, you ask the same page for its data, from inside the page. Each platform has its own undocumented data feed the website itself calls to load the report.

From a real tab with real cookies, that feed returns the report as plain text.

No DOM scraping. No screenshot tricks. No envelope-unwrapping from outside the page.

Browser owns writes. JSON owns reads.

Type into the real browser, then ask the real browser for the data back.

One more piece matters: the extraction step is decoupled from the browser run.

The browser saves the raw page and the raw response to disk before the extractor touches anything. The extractor then reads from disk, not from the live browser.

If a platform ships a UI change next week and breaks the extractor, you re-run only the extractor against the raw page you already saved. You do not re-pay for another 30-minute deep-research run — which would cost both wall time and quota against your weekly limits across all three platforms.

How the fact-checker wave defends every claim

Once the three reports are on disk, a wave of Claude agents takes them apart and verifies every claim.

The waves run in a specific order, because each one builds on what the previous wave wrote. The asymmetric-edge claims surface at the top; anything that looks like a hallucination gets filtered out.

Cross-Corroborate. One Sonnet agent reads all three reports and breaks every claim into its smallest factual unit — one fact, one subject, one source. Each unit then gets labeled by how many engines surfaced it: agreed by all three, agreed by two of three, found only in one engine, or contradicted between engines. That labeling is the substrate everything else depends on.
Link & Fact Skeptic, three at the same time. One Sonnet agent runs per platform, with all three in parallel. Each agent opens every URL cited in its platform's report, reads the source page, and assigns a verdict from one of five buckets: supports the claim, partially supports it, contradicts it, dead link, or hallucinated cite. Every link gets saved — including the dead ones — so the receipts stay visible.
Date Skeptic. One Sonnet agent works across all three reports looking for the date errors deep-research engines are notoriously bad at. It catches stale-recency claims (a "recent" thing that actually cites a 2019 source), event-misattribution (the right thing happened, but to a different company), and publication-drift (the cited page has been edited since the report was written, and the claim no longer matches what the page says).
Usefulness Judge. One Opus agent reads everything the skeptics wrote and ranks every surviving claim on four axes — does it advance the goal you asked about, does it surface information the other engines missed, can you actually do something with it, and can you sort your buyer list by it. Claims that scored high on asymmetry (only one engine found them) get a bonus weighting if they also passed the fact-check, because those are the ones that represent the actual edge.
Final Synthesizer. One Opus agent reads everything written above it — the original three reports, the corroboration labels, every skeptic's verdicts, the usefulness ranking — and writes the final report. Every claim cites back to a stable identifier like [claude-042] or [chatgpt-014] that resolves to the raw page text on disk. The verification appendix at the end lists every dead link, every hallucinated citation, every date flag, and every contradiction the skeptics raised. Receipts visible.

By the time a claim makes it into the final report, it has been atomized, labeled, fact-checked against its source, date-validated, ranked for usefulness, and tagged with its provenance.

The claims that survived AND came from only one engine lead the report — that is the asymmetric edge, made automatic.

Five Claude agents in sequence: Cross-Corroborate, Link & Fact Skeptic x 3 in parallel, Date Skeptic, Usefulness Judge, Final Synthesizer

Fail-fast partial Triple

One platform failing does not block the run.

If Gemini times out at submit, or ChatGPT hits a weekly Deep Research quota you are bumping up against, or Claude's connector modal takes too long to clear — that platform exits with a clear terminal status. The other two keep going.

The fact-checker wave then runs on whatever 1, 2, or 3 platforms succeeded.

The asymmetric-edge thesis still works with two of three engines. You just have less to compare against.

It even works with one of three. The fact-checker wave still runs and at least filters out the hallucinated cites.

Partial Triple is degraded, but it is still useful.

When NOT to invoke

Situations where Triple is the wrong call:

Quick factual lookups. If your question is something like "What's the population of Boise?" — one engine is fine. Do not burn 45 minutes on it.
Code search. Use Grep. Triple is not your file finder, and the deep-research engines were not built to scan your repo.
Account-specific research. If you want to know what is happening inside a single account — the CRM history, the call notes, the billing trail — use the Dossier Builder skill instead. Triple is for the open web, not for the data you already own.
Anything where wall-clock under five minutes matters. Triple is 15 to 55 minutes by the clock. If you need the answer right now, ask one engine and move on.

Below is the geeky version. Copy it into Claude Code and rebuild the whole thing yourself.

Or don't. Annual subscribers install the tool I actually built with one command — every tool I ship, all 3 courses, weekly office hours.

→ Go annual — $2,499/yr · Start at $50/mo (most readers start here)

On the Edge by Blueprint