What to do with your plans when the code is done

Abe Gong

02 Mar 2026 — 4 min read

Spec-driven development is having a moment. In a nutshell: coding agents are much more likely to build the right thing if you write a detailed plan upfront.

This has worked beautifully for me, and it shows up in the data. Over the last couple months, I've gone from shipping 3-5 PRs in a typical engineering day, to roughly double that. This includes a few explosively productive days with >20 meaningful PRs created and merged. The size and scope of these PRs is harder to measure, but I'm confident saying that the overall trend is towards bigger PRs that manage more complexity.

However, spec-driven development at this pace creates some practical problems. The most obvious one is a pile of completed planning artifacts (usually markdown files), scattered in a bunch of different parts of the repo, linear, google docs. etc.

Iteration 1

The naive solution is to delete the spec when you're done with it.

For a while, this was my process:

Create spec
Have AI do the work, possibly with manual testing and feedback along the way
Open a draft PR
Review
Delete the spec
Promote to full PR, get peer review

This cleaned up the mess, and maps to the way I would have done this before AI.

However, this approach has a few flaws:

First, it deletes context too early. A decent fraction of the time, code review would kick off a new round of questions or changes. For many of these conversations, the spec would be exactly the right context to share. And of course, that doesn't work if you delete the spec before the PR.
Second, it deletes context. Some of the content in the spec is valuable as long-term documentation. Only a small fraction—most of the plan becomes irrelevant once the PR is accepted—but that fraction is important.

Iteration 2

For a while, we dealt with these problems by moving specs into an old-planning-docs/ directory just before submitting a PR. I knew this was a band-aid, but it solved both problems from Iteration 1: the spec stuck around through code review, and potentially useful content wasn't destroyed. Also, it let us build up a library of specs to learn from.

Here's what we learned.

Not surprisingly, AI agents sometimes treated old specs as authoritative. This didn't happen as often as I expected—calling the folder old-planning-docs/ probably helped.
Bots ran into another failure mode: they would sometimes try to bring the old docs up to date. Reverting/accepting changes to stale docs was annoying—although it did occasionally give me some insight into whether the bot understood the problem we were working on.
Also not surprisingly, extracting useful information from stale docs was time-consuming, and required a lot of context about how the codebase had changed since the spec was written. This was rarely worth the effort.
We mostly ignored others' docs in old-planning-docs/. The only person who would ever look at an old spec was the person who had written in. In other words, old-planning-docs/ didn't really solve a tacit knowledge problem: it just moved tacit knowledge from being exclusively in one person's head to being in their head plus a doc.

Keeping these old docs was a useful experiment in understanding how context and tacit knowledge rot over time. If people are interested, I might share more detailed notes in a future post.

Iteration 3

Just this week, we rolled out a new way of handling specs. We've included them directly in the codebase as a mix of documentation and skills, available to agents. Most of the text in this section is taken directly from those markdown files.

First, we now distinguish between two types of docs. (Github's specKit makes a similar distinction.)

Spec ({slug}.spec.md) — Describes a feature or architecture change: the problem, the design, domain model and architecture decisions. Answers what and why.
Plan ({slug}.plan.md) — Step-by-step implementation guide for a spec. Answers how. A plan always references a spec. A spec can exist without a plan.

Here's our new flow:

Write the spec. Start from _SPEC_TEMPLATE.md. Focus on the problem, design, and open questions. Status: planning.
Resolve open questions. Move resolutions into the Design section. When no open questions remain, the spec is ready for a plan.
Write the plan. Start from _PLAN_TEMPLATE.md. Reference the spec. Break the work into phases and steps. Status stays planning until work begins.
Implement. Status: implementing. Update the plan as you go — mark phases done, note deviations.
Ship and verify. Status: done.
Graduate content into permanent docs (see below). The spec and plan stay in specs/ as historical record.

IMO, graduating content is the most interesting part of this process. Here's the main content of the skill:

## Graduating content

When a spec reaches **done**, pull its relevant content into permanent documentation. The spec itself stays in `specs/` as a historical record of the design process, including context and rejected alternatives that don't belong in reference docs.

Content graduates into three places — see [How We Document](how-we-document.md) for what belongs in each:

- **`AGENTS.md`** — New conventions, required patterns, gotchas introduced by the feature
- **Developer docs** — Architecture (domain model changes, new subsystems, glossary terms), guides (new recurring tasks), QA scripts
- **User docs** — Feature descriptions and workflows (can wait until UX stabilizes)

### Graduation checklist

When moving a spec to **done**:

- [ ] Update `AGENTS.md` files in affected directories with new conventions/rules
- [ ] Update or create developer docs (architecture, guides, QA) as needed
- [ ] File a follow-up for user docs if the feature is user-facing
- [ ] Move the spec to the "Done" section in [specs/SUMMARY.md](/specs/SUMMARY.md)

Notes:

Specs are team documents; plans usually aren't. We sometimes review specs together. Plans are owned by whoever owns the PR and are rarely of interest to anyone else.
Separating "why" from "how" makes cleanup easier. The "why" tends to be durable beyond the immediate work — design rationale, rejected alternatives, domain model decisions. Separating them up front makes cleanup at the end much easier.
A plan is written for an audience that stops existing: "us, before the refactor." Once the PR lands, that audience is gone. Graduation is the act of rewriting for the people who come after, with totally different needs for sense-making.

Closing thought: Cognitive debt

A recent rockoder blog post argues that AI dramatically increases "cognitive debt": the gap between how much code you've shipped and how well you understand it.

I agree that cognitive debt is a real problem, and that AI could make it worse.

However, my experience is that AI is at least as good at writing docs as code—which means that teams that use AI intelligently will probably be able to reduce cognitive debt.

We're already seeing this in our codebase, and I expect that Iteration 3 will make it even better.

Feedback welcome!