Externalize State: The Source-of-Truth Discipline

Advanced

After this you can run an AI agent across resets without losing the thread. You keep durable state in files the agent re-reads, treat its own recall as untrusted, and police the line between what gets persisted and what is only mentioned, so the next session starts from fact instead of reconstruction.

Understand

An agent has no memory. For the length of a session the context window is all it has to work from, and the moment that window resets, by compaction, a new session, a /clear, or a crash, everything not written down is gone. This is easy to forget because a long session feels like continuity. It is not. It is one window being topped up until something flushes it, and then a blank one.

The failure this creates has a specific shape, and you have probably watched it happen. The agent comes back, re-reads the code it already understood an hour ago, re-derives a decision you settled yesterday, asks again a question you already answered, and quietly re-opens a choice everyone had moved on from. Nothing in the fresh window told it those things were settled, so it treats them as open. The workspace these lessons come from names the cure in one line at the top of its operating instructions: the database and files are the source of truth, never your memory of them. The point of that line is that the model's recollection is not a backup copy of the truth. There is no copy. A fact that was not written down is not anywhere once the window flushes.

So the move is to externalize. Anything that has to outlive the window goes into a file or a database the agent reads on the way back in, and the window holds only what the current task needs. But "save your state" is the beginner version, and stopping there is what leaves operators stuck. The advanced half is knowing where each kind of fact lives, because not everything that shows up in a session is the same kind of thing.

Where each fact livesa single incoming fact routed by what it is, not where it surfaced, so each kind lands in exactly one home.

The trap that catches people who have already learned to persist state is letting the wrong surface hold the truth. Say one agent finishes some work and tells another "I locked the column naming to snake_case." It is tempting to let that sentence, sitting in a message or a chat log, be the record. Now the decision lives in a place nobody treats as authoritative. Someone reads the canonical doc, sees nothing about naming, and re-decides. The message channel has quietly become a second, competing source of truth, and the two will drift. The discipline is to put the decision in the canonical doc and let the message carry only the pointer: "FYI, naming is locked, see the doc." The channel coordinates. The doc persists. The moment a coordination surface starts holding truth, it has become a shadow database, and you now have two answers to every question.

The shadow-database anti-patternthe same decision, logged two ways, and why one of them rots into a second unauthoritative source.

One more distinction separates a state store that stays trustworthy from one that rots. Not every file you keep is the same. Some files you author by hand and defend as truth. Some are derived, regenerated from the authored ones, and you never hand-edit them because the next regen overwrites you. Some are transient, a scratchpad you drain to zero every cycle so stale notes never masquerade as current. And some are append-only history you add to but never rewrite. Labeling each file by which of these it is keeps you from the slow-motion corruption where a derived view gets hand-patched, disagrees with its source, and you can no longer tell which one to believe.

Four roles in one state storethe same set of files sorted by how each is allowed to change, which is what keeps them from contradicting each other.

Where it breaks

Externalizing state is not free, and the failure modes are real enough that over-applying it has its own cost. The first is ceremony. If closing a session means writing four files and reconciling three of them, a five-minute task can cost more in bookkeeping than it saved. Match the weight of the persistence to the weight of the work. A throwaway exploration does not need a handoff doc.

The second is staleness. A persisted fact is only true as of when you wrote it, and nothing in a flat file notices when the world moves on. A pinned decision from last week may already be wrong, and the agent will read it with the same confidence as a fresh one. Files give you durability, not freshness, and you have to supply the freshness yourself by re-checking load-bearing facts rather than trusting age alone.

The third is the comprehension gap. You can build a structural check that confirms all your files agree with each other, and it will pass while the state they agree on is wrong. Structure is checkable. Meaning is not. A green light on "everything is consistent" tells you nothing about whether the consistent thing is correct.

And externalizing only protects what you actually wrote down before the reset. There is usually no continuous checkpoint. If the window flushes mid-task, everything since your last deliberate save is gone, which means the discipline is only as good as how often you persist, not how good your files are once written.

Do it now

Two artifacts. The first is a routing rule you keep where you start tasks, so a fact lands in the right home by reflex instead of by wherever it happened to come up. The second is a close-out block you paste at the end of any session worth continuing.

Paste this

STATE ROUTING — where a fact goes by what it is, not where it surfaced:
- A decision that constrains future work  → the canonical doc (the source of truth)
- Notice that a decision was made          → a message/note that POINTS at the doc, holds no truth itself
- Something that must get done             → the task tracker
- Thinking-out-loud, half-formed           → a scratch buffer you delete before you stop
Rule: exactly one home per fact. If it's truth, it lives in the doc; everything else only points at it.

Paste this

BEFORE YOU STOP — write this to a file the next session will open first:

## State as of <date>
- What changed this session:
- What is now true (decisions that are settled — do NOT re-open these):
- What to do next:
- What is deliberately out of scope / already rejected:
- Load-bearing facts to re-verify (may have gone stale):

Then in your next session, read this file before doing anything else.

The second block is the one that kills the amnesiac restart. The "do NOT re-open these" line is easy to omit and does the most work, because it is what stops the next session from re-litigating decisions you already closed.

Worked example

Illustrative

Illustrative. A constructed pair to show the discipline, not a real session.

A long refactor settles a naming convention partway through. Without externalized state, the window later compacts:

You: Continue the refactor.

Assistant: I notice the columns use snake_case and camelCase inconsistently. I'll standardize on camelCase across the new tables...

You: No — we decided snake_case two hours ago.

The decision was real, but it lived only in the flushed window, so the agent re-opened it and guessed the opposite. The same session with one externalized line:

You: Continue the refactor. First read decisions.md.

Assistant: decisions.md says: "DECIDED 2026-06-02 — DB columns are snake_case; reject camelCase in review." Continuing in snake_case, and I'll flag any camelCase I find as a fix rather than a new choice.

Same agent, same reset, one file of differencewhy the second run holds the line and the first re-opens it.

The file did not have to be elaborate. One dated line, placed where the agent was told to read first, carried the decision across the reset intact. Durability came from two things together: the decision living on disk, and the agent being pointed at it on the way back in. A perfect handoff file the next session never opens protects nothing.