810 Prompts in 17 Days: How a Non-Engineer Runs Claude Code

Every quote machine-verified. Where my instincts beat Anthropic's docs — and where the docs beat me.

Jun 15, 2026

Claude Code running a dynamic workflow — 35 agents across six phases on Opus 4.8

Thirty-five agents, six phases, one screen — Claude Code running a dynamic workflow on Opus 4.8 (the last phase is "Verify & Report"). This is what "a non-engineer runs Claude Code" looks like at full tilt. Source: Anthropic, "Introducing dynamic workflows in Claude Code," May 28 2026.

The first number the machine gave me was 20,000.

I'd asked Claude to count every prompt I'd typed into it over the last two and a half weeks. The plan was a forensic audit of how I actually talk to AI — run by the AI being audited. Twenty thousand prompts in 17 days sounded heroic. It also sounded fake. Nobody types 20,000 of anything in 17 days.

So I ran the rule I run on every number Claude hands me: prove it. Show me where it came from.

The real count was 810. The transcript logs every action the machine takes — every file it reads, every script it runs — right next to my typing, and the first pass had counted all of it as me. Off by 25x, in a report about how carefully I check the machine's work.

My own rule caught my own audit. The 810 verified prompts are what this article is built on.

The short version: I pulled every prompt I typed into Claude Code over 17 days — 810 of them, across 159 sessions and 16 projects — and sent agents to mine them for patterns. Every quote was machine-checked against the original transcript before it could appear in the findings. Ten habits fell out. The three biggest are below, set against what Anthropic's official guidance says. The docs agree with me more than I expected. They also convict me on five counts, and the charges are printed in full near the bottom.

What 810 prompts actually look like

I can't read code. The whole business — client market maps, contracts, published reports — runs through Claude Code anyway, by conversation.

The data is humbling in its smallness. A third of my prompts are under 60 characters. 41% contain a question. And each one triggered about 21 machine actions — I say one sentence, Claude does 21 things. The transcript reads like Slack messages to a very fast coworker: short, conversational, constant.

I voice-type almost everything, so the record is full of typos, and the typos turned out to be the least interesting finding in the corpus. The model decodes bad spelling fine. What kills it is ambiguous pointing — "fix the grease thing" said to a fresh session that has never heard of the grease thing.

A non-engineer dictating a long, run-on prompt into Claude Code's voice mode

Claude Code's voice mode: a non-engineer holds space and dictates a rambling, run-on prompt — typos, tangents and all — and the agent runs with it. Source: XDA Developers, March 2026.

How do you audit yourself with the tool being audited?

Claude Code keeps a transcript of every session in a folder on your machine. A script pulled every record where a human actually typed — dropping the machine noise that inflated the first count — and kept a record of the exact file and line each prompt came from.

Then eight mining agents each took a slice of the corpus and went hunting for patterns, under one rule that made the output trustworthy: every quote in their findings had to be an exact, word-for-word slice of a real prompt, because a separate script would re-open the transcripts and check. 274 quotes survived the mining. All 274 passed verification.

A second team of agents did the outside research — Anthropic's documentation, the practitioners worth reading — under the same rule. No claim without a URL and the verbatim sentence it came from.

The habit I'd teach first: make Claude write its own next prompt

Every long session dies the same death. The context fills with half-finished attempts and stale corrections, and the model gets dumber about your specific problem. Anthropic's engineers have a name for this — context rot — and their measurements show recall degrading as the conversation fills.

My corpus shows 55 sightings of the same response to it:

"You keep messing up, so I'm just going to open a new agent. Just write me a brand new prompt for a new session and copy it on my clipboard."

The dying session knows everything about the problem — what failed, what's confirmed, where the files live. So before I kill it, it writes the briefing for its replacement. Paste, fresh start, nothing re-explained.

Anthropic diagram comparing compaction against a fresh session opened with a hand-written brief

The fork Anthropic now draws: a lossy summary, or a fresh session with your brief. Source: the Claude blog.

"I just want you to give me a prompt for a new session that will avoid all the pitfalls that we've already found"

I got there through irritation. Anthropic got there through research — their own blog now tells users to have Claude write the handoff itself, and describes it as "a message to the previous iteration of Claude from its future self." Same destination.

The habit I use most: run a team, not a chat

The single most common pattern in the corpus — 68 sightings — is me refusing to accept one opinion from one machine.

"Can you have three different narrative agents critique this?"

"All right, I'm gonna have Codex check your work."

That second one is Claude being told its work will be checked by Codex, OpenAI's competing agent. A model trained by different people on different data has different blind spots, which makes it a real reviewer instead of an echo.

OpenAI's Codex reviewing Claude Code's diff and returning a ranked table of issues

"I'm gonna have Codex check your work" — OpenAI's Codex reviewing Claude Code's own diff and filing a ranked P1/P2 bug table, inside the terminal. Source: Nathan Onn, April 2026.

Anthropic put a number on the team idea: a lead agent directing a team of parallel agents beat their best single agent by 90.2% on their internal research eval. Their honest caveat matches my experience — teams pay off on work that splits into independent pieces, like research and review and enrichment, and much less on tightly-coupled builds. And their newest guidance lands on something I'd learned by feel: a fresh agent reviewing cold outperforms a model critiquing its own output. The agent that wrote the thing should never be the one grading it.

Claude Code's agent view showing eleven agents grouped by status

Eleven agents on one screen, grouped by what needs you, what's working, and what's done. Run a team, not a chat. Source: Anthropic, "Agent view in Claude Code," May 2026.

The habit that saved this article: demand receipts

53 sightings of me refusing to believe the machine.

"print the actual damn eamils so I can read them"

"Whhy do you say "~$254,000" why the ~ ?"

Both verbatim, typos preserved. The first is the artifact demand — show me the actual output, never the summary of it. The second is the number interrogation — a tilde in front of a dollar figure means somebody estimated, and I want to know who and from what.

This habit has the strongest official backing of anything in the corpus. Boris Cherny, who created Claude Code, calls giving Claude a way to verify its work "probably the most important thing to get great results." Anthropic's docs say to have Claude show evidence rather than asserting success. And the newest model guidance goes further than either: Anthropic now publishes an instruction telling the model to audit its own progress claims against actual results, because on long runs models fabricate status reports. Their testing says the instruction "nearly eliminated" them — which tells you fabricated reports were real enough to need eliminating.

Claude Code running typecheck and tests to verify its own changes before claiming done

Claude Code showing its receipts: after the edit it runs typecheck and lint, reports "The typecheck passed," then runs the tests — before it says done. A runnable check turns "done" into evidence. Source: Boris Cherny, creator of Claude Code.

The 20,000-prompt estimate at the top of this article is what the habit looks like when it pays off.

The other seven

Ranked by sightings: paste the exact bad sentence and state the rule you want (55). Turn every fix into a standing rule the machine remembers (43). Say who's reading the output and where it lands — a text message gets four sentences, a Slack post gets formatting (37). Name the specific tool you want used instead of correcting bad output afterward (36). Point Claude at the raw source — the email, the transcript, the PDF — instead of summarizing from your own lossy memory (31). Ask for the plan first and approve it in five words (27). Fence off what must not be touched (12).

All ten live in the full reference guide, with the feature walkthroughs and what the tools cost: the field guide to talking to Claude. This article is the story; that page is the manual.

The models changed underneath me twice in 17 days

The corpus runs May 25 to June 10. Claude Opus 4.8 shipped May 28. Fable 5 — Anthropic's new top model — shipped June 9. And on June 12, while this article was being written, a US government export-control directive ordered access to Fable 5 suspended. Anthropic says it's complying, that it disagrees with the order, and that it's working to restore access. Eighteen days, two new frontier models, one recall.

So I went deep on what Anthropic actually published about prompting this generation, and the documents describe a model that has been moving steadily toward the way a non-engineer already talks.

Opus 4.8 benchmark table vs Opus 4.7, GPT-5.5, and Gemini 3.1 Pro

Opus 4.8, May 28. Look at the terminal-coding row: GPT-5.5 edges it, 78.2 to 74.6 — a green cell conceding a column to OpenAI, on Anthropic's own card. The frontier is contested, not owned. Source: Anthropic.

Benchmark table from the Claude Fable 5 launch comparing it to Opus 4.8, GPT 5.5, and Gemini 3.1 Pro

Twelve days later, June 9: Fable 5 resets the top row again. Three days after that, a US export-control directive ordered access suspended — Anthropic is complying while disputing it. The card you're reading is for a model you currently can't call. Source: Anthropic.

Three findings worth your time.

The models got literal. Starting with Opus 4.7 and deeper in 4.8, the official guidance says the model "does not silently generalize an instruction from one item to another, and it does not infer requests you didn't make." Your exact words are the spec now. The docs' own example is brutal: if an old review prompt says "only report high-severity issues," the model finds the bugs, judges them below your stated bar, and stays quiet. Your stale hedges get obeyed against you.

METR chart showing the length of tasks AI agents complete at 50% reliability doubling every few months

The era in one chart: the task length an agent finishes at 50% reliability keeps doubling — lately every four to five months. Newest model plotted is Claude Opus 4.5. Source: METR, "Time Horizon 1.1," January 2026.

Old instructions became poison. The Fable 5 guide says prompts and skills written for earlier models are "often too prescriptive" and "can degrade output quality" — and recommends deleting them. Years of accumulated prompt scaffolding is now a liability the vendor tells you to remove.

The docs converged on conversation. Fable 5's guidance says you can "steer most behaviors with a brief instruction rather than enumerating each behavior by name," and one of its section headings is "Give the reason, not only the request" — attaching the why is now a documented performance technique. Plain sentences, the reason included, boundaries stated. That's how you'd brief a colleague — which is the only way I ever knew how to do it.

Where the docs convict me

Five charges, none softened.

I shout. When Claude ignored a rule, my correction came in capital letters. The official guidance now instructs the opposite — "dial back any aggressive language," with the exact before-and-after: replace "CRITICAL: You MUST use this tool when..." with "Use this tool when...". Newer models over-obey emphasis. What actually fixed things in my corpus was pasting the offending sentence and stating the rule. The pasted line did the work. The caps were theater.

I bundle. My prompts routinely carry four or five unrelated asks. Benchmark research on instruction-following keeps finding the same shape: every ask you stack lowers the odds that all of them get done, and the misses are silent. Front-loading context is good; front-loading five jobs is how the third one vanishes. One ask per prompt, with everything it needs attached.

2026 chart of frontier models' instruction-following accuracy falling as constraints per prompt rise

Even the 2026 frontier eventually cracks under stacked instructions — GPT-5.5, Opus 4.7 and Gemini 3.1 all bend toward the floor as the count climbs past the old ceiling. The instructions that drop out fail silently, which is exactly why bundling burns you. Source: Arize AI, May 2026, extending the IFScale benchmark.

I babysit. Around 60 prompts in the corpus are me checking on running jobs. "Still going?" "how are we doing gents?" Claude Code has a feature that watches a job and re-checks it on a schedule — /loop — and the corpus shows me using it exactly zero times. Sixty prompts of me doing a timer's job.

I hoard rules. My standing-rules habit — 43 sightings, every correction saved to memory — has a documented failure mode I'd been feeding. Anthropic's guidance, exclamation mark theirs: "Bloated CLAUDE.md files cause Claude to ignore your actual instructions!" That's the rules file Claude reads at the start of every session, and mine only ever grew. The docs say prune it like code; the practitioner standard is that every line must trace back to a real mistake. My best habit, left uncurated, became my fourth failure.

Run it on yourself

Your transcripts are sitting on your machine right now — every session, every prompt, in a folder Claude can read. The audit that produced this article is one paragraph of instructions.

What Claude needs to know:

Read every session transcript under ~/.claude/projects/. Keep only the messages I actually typed — drop tool results, system output, slash commands, and anything under 10 characters. Count what's left and tell me where the count came from. Then mine the prompts for my five most common habits and my three worst ones, quoting my real prompts back to me word for word. Before showing me any quote, verify it appears verbatim in a transcript. Show the file and line for every number.

Fair warning from experience: the first count will be inflated, because the machine logs its own actions next to your words. Make it prove the number. That's the audit auditing itself — and if that habit isn't already yours, it's the first one worth stealing.

Claude Fable 5 wrote this article inside Claude Code on June 12, 2026 — three days after the model shipped, and the same day a US government directive ordered its access suspended. It worked the way the article says to work: it planned first and asked its questions before writing, fanned research agents out through Exa and Parallel, and verified every quote against its source before printing it. Jordan quoted the bad lines, pointed arrows at them, and demanded the receipts.

What Annual Adds

This one was free. Paid gets the build. Annual gives you the tools that run it.

Every tool I ship. Edge Copilot installs to your Claude Code — talk to all my knowledge, every method, every data source. Current: Edge Copilot, AutoClaygent, Agent 7, Who to Target and What to Say, Blueprint Cloud, Technology Finder, Video List Extractor, Competitor Monitor, LinkedIn Engagement, Domain & LinkedIn Finder, Dossier Builder, PDF Contact Finder, TAM Contact Harvester, Find a Rep. Whatever ships next is included.
All 3 courses: Who to Target and What to Say, Agent 7, AutoClaygent.
Weekly office hours.

Run /edge install <slug> for any tool I've shipped — they all install the same way.

License key hits your email.

→ Go annual — $2,499/yr · Start at $50/mo (most readers start here)

On the Edge by Blueprint

Discussion about this post

Ready for more?