Don't write your CLAUDE.md yourself

Erik Wistrand · July 2026 · structuring-agent-docs

Don't write your CLAUDE.md yourself. I don't know why people think they should, but the idea is everywhere: either you hand-write the file like it's a craft, or you paste in the "ultimate template" from some insanely long blog post and hope. My colleagues do both. Both miss what the file is. It's the agent's working memory, and the agent should be the one writing it.

The hand-writers do have a study on their side: generated context files make agents worse. However, the files it measured were one-shot /init dumps that restated what the agent could already find in the repo, and the official fix, refining the dump by hand, only fails slower: the human refines once and stops, and the file drifts as soon as the code moves. What matters is who maintains the file, not who drafts it.

I got here from a repo that had no chance of fitting in one CLAUDE.md, working in plan-implement-fix loops: the agent writes a plan doc, implements against it, fixes what broke, and the plan gets promoted to an architecture doc once it's real. I never wrote those docs. The agent wrote them for its own future sessions, which start with no memory of anything. The surprise was how well that loop worked. The docs stayed current because the same process that changed the code changed them, and reviewing a doc diff takes less time than writing the doc.

So the useful question is not what you should write. It's what rules the agent needs so the docs it keeps stay usable. That's what I put in the skill:

project/
├── README.md      # humans: what, why, quick start
├── CLAUDE.md      # agent entry point (AGENTS.md symlinks here)
└── agent_docs/    # one topic per file, loaded on demand
    ├── architecture.md
    └── gotchas.md

Every agent_docs/ file is linked from the entry point with a plain markdown link, one level deep. The agent loads the skill when it sets the docs up and every time it touches them afterwards.

Most of the rules are obvious once stated. One isn't, and it decides where everything goes: when a fact moves out of the always-loaded entry point into a linked file, context is only saved if the agent opens that file. If it doesn't, the fact is gone and nothing tells you. A useless fact kept inline costs a few dozen tokens. A critical fact moved out and skipped produces an edit that runs on the wrong assumption and looks fine in review. So you don't split by size, you split by what a miss costs: anything whose absence silently corrupts an edit stays inline.

I benchmarked this, because it sounded true and sounding true is worth nothing. A synthetic repo, a planted fact ("scheduler delays are in centiseconds"), a task that quietly depends on it, and the fact placed five ways:

placement	fact honored
inline in the entry point	100%, every model, every prompt
behind an `@`-import	100%, but costs the same context as inline
one link away	0 to 100%, depending on the harness
two links away	missed too, even when the harness urged reading
absent	0%

The link rows are the interesting part: a weak model never opened a link, and the strongest model in the set skipped a well-labeled one unless the system prompt pushed it to read files. The descriptive labels and per-link hints I had written into the skill turned out to be nearly irrelevant; what makes an agent open a doc is the harness telling it to read. Mainstream harnesses do, so a capable model usually follows links there, but subagents, headless runs, and cheap models get no such cover. You don't control the harness. You control what stays inline.

The benchmark also caught the skill causing the exact failure it warns about: an agent given the skill and asked to write the docs filed the centiseconds fact in a tidy agent_docs file, keeping the entry point lean, and a consumer that didn't follow links missed it every time. I wasn't surprised. Agents make mistakes, the same way they skip links, and it's why blast radius is now the loudest rule in the skill. The harness itself broke twice along the way; twice I assumed a bug rather than bad luck, and twice that was correct. What survived the fixes is in the findings: inline wins, and an author given the skill points at source instead of hand-copying values, so its docs don't rot when a constant changes.

That last result points at the real question: do we need most of these docs at all? The code must be able to carry this, right? Mostly it can. A rule that can be a type or a test goes in code. Anything derivable from source gets generated by a build target, since a hand-copied value is stale the day after someone edits the constant. What's left is what nothing can recover later: why a constraint exists, and the trap that ate an afternoon before it was understood. The agent was there when that happened, which is one more reason it should be the one writing it down.

So: stop hand-writing your CLAUDE.md, and stop pasting templates into it. Give the agent the rules, review its doc diffs like any other diffs, and let it keep its own memory.

Postscript: does this apply to humans? The findings, read against linguistics and cognitive science. Most of it transfers, and older fields paid more to learn it.