Memory: What Persists and What Doesn't

Intermediate

After this you can place any fact where it will actually survive, knowing which things vanish at the next reset and which you have to write outside the window for the model to ever see again.

Understand

The model has no memory. Almost everything that feels like memory is something the harness re-sends, not something the model retained. When you send turn five of a conversation, the model is not recalling turns one through four. The harness re-sends them, glued together into one block of text, and the model reads the whole thing fresh as if it had never seen any of it. "Continuity" is just the same context getting re-supplied every turn. There is no running mind behind the chat that accumulates what you said. There is a stateless function that gets handed a transcript and predicts the next tokens, then forgets it ever ran.

That reframing matters the moment a reset happens. A reset is any event that breaks the chain of re-supplied context: a new session, a /clear, a crash, or the quieter one most people never notice, auto-compaction, where the harness silently summarizes and discards older turns to free up room. After any of these, the transcript the model gets is different. Whatever was only "remembered" because it sat in the window is now gone, and the model will not know it is gone. It will answer confidently from whatever it does have, which is the trap. An empty memory announces itself. A partial memory does not.

So the real question is never "does the model remember this?" The model remembers nothing. The question is: when the next reset hits, does this fact live somewhere the model will be re-shown? If it lives only in the conversation, the answer is no. If it lives in a file, a database row, or a note the harness reloads, the answer is yes. Memory, operationally, is not a property of the model. It is a property of where you put the fact.

That turns persistence into a placement decision, and the skill is matching a fact to its half-life. A piece of throwaway coordination like "do the auth module before the tests" has a half-life of minutes and belongs in the live turn. A decision you do not want re-argued next week, say "we settled on Postgres, not Mongo, because of the join patterns," has a half-life of the whole project and belongs in a durable source-of-truth file the next session reads first. Structured entity data that other tools query belongs in a database. A universal lesson you want to apply across every future project belongs in long-term memory the harness loads at every start. Put the fact at the wrong altitude and one of two things happens. You either bloat the window with junk that should have been a file, or you lose a decision you needed because it was only ever spoken.

Place a fact by its half-lifefour destinations ranked by how long a fact must outlive the live turn, and which ones survive a reset.

Not all survival is equal among things written outside the window. Compaction is the reset most people miss, because the session simply keeps going. What it actually preserves is narrow and specific, and operators who assume it is all still there get burned in the exact spot where it is not.

Where it breaks

Compaction does not preserve the window. It preserves a curated subset, and the curation is not yours. In a tooling environment like Claude Code, an auto-compaction keeps the root project config file, roughly the last handful of files you touched, and your restored task list, then drops most of the rest of the conversation into a lossy summary. The failure shows up at the seams. A config file scoped to a subdirectory, loaded earlier when you were working in that folder, does not come back. A plan or handoff you wrote into the chat but never saved to disk does not come back, because from the harness's view it was just more conversation to summarize. People lose exactly the artifact they were most relying on, the careful plan they typed out, precisely because typing it into the window felt like saving it, and it wasn't.

The second break is subtler and it is about trust, not loss. A summary of code is not the code. After a compaction, the model holds a paraphrase of the file it was editing, not the file, and a paraphrase is enough to sound confident and wrong. The fix is to re-read the actual file rather than trusting the post-compaction recap. The model cannot tell you what it no longer has.

The third break is on the other side, over-externalizing. Not every passing thought earns a file. If you write durable notes for ephemeral coordination, your source-of-truth file bloats, stops getting read, and a source of truth no one reads is no source of truth at all. The half-life judgment cuts both ways. Under-persist and you lose decisions. Over-persist and you drown the ones that mattered.

Do it now

Two assets. The first is a check-line you paste at the start of any longer session, before context is at risk, so a reset can't quietly eat your plan:

Paste this

Before we start: write the plan, the goal, and any decision we lock
to <a file on disk>, not just into this chat. Anything only in the
conversation will be lost at the next compaction or reset. After each
locked decision, append it to that file in one line. If you're unsure
whether something survives a reset, assume it does not.

The second is a routing rule you run whenever a fact surfaces and you're not sure where it goes. Paste it, answer the four questions, and the fact lands at the right altitude:

Paste this

A fact just came up. Route it before moving on:

1. Is this throwaway coordination for the next few minutes?
   -> leave it in the conversation. Do not write it anywhere.
2. Is this a decision or constraint I don't want re-argued later?
   -> append one line to the project source-of-truth file.
3. Is this structured data another tool will query (an entity, a status, a record)?
   -> write it to the database / structured store, not a prose file.
4. Is this a lesson true across every future project, not just this one?
   -> put it in long-term memory that loads at every session start.

If it matches none, it's probably (1). When two seem to fit, pick the
longest half-life — a thing written too durably is cheap to delete; a
thing lost at a reset is gone.

Most facts are (1) and need no home. The operator move is catching the (2)s and (3)s as they happen, while the context is cheap, instead of trying to reconstruct them after a reset has already swallowed them.

What a compaction keeps and what it dropsthe specific survival line in a tooling environment — why a chat-only plan is the thing most likely to vanish.

Worked example

Illustrative

Illustrative. A constructed before/after to show the placement habit, not a real session.

A developer is three hours into a debugging session. Mid-way, they reason out loud with the model and settle something real: the retry logic has to be idempotent because the payment webhook can fire twice, so every handler keys on the event ID. They type a tidy summary of this into the chat and keep going. An hour later, auto-compaction fires. The session continues, so nothing looks wrong.

The amnesiac restart:

You: Pick up where we left off on the webhook handler.

Assistant: Sure. To handle duplicate webhook deliveries, we should add an idempotency key — I'd suggest keying on the event ID so a repeated delivery is a no-op. Want me to add that?

You: We already decided that. Two hours ago.

The decision was real, but it lived only in the conversation, so compaction summarized it into vapor. The model is not being dense. It is reasoning correctly from a transcript that no longer contains the choice, and because it half-remembers the shape of the problem it sounds confident while re-deriving a settled answer. Worse cases re-litigate the other direction and quietly undo the decision.

The continuity restart, same point, one habit changed. When the decision was made, it went to a file, not just the chat:

You: Pick up where we left off on the webhook handler.

Assistant: Reading the project notes first. Locked decision on record: retry logic is idempotent, every handler keys on the event ID, because the payment webhook can fire twice. Continuing from there — the refund handler doesn't key on event ID yet, so that's the gap. Want me to fix it?

Amnesiac restart vs. continuity restartthe same reset, two outcomes — a chat-only decision forces re-derivation, a filed decision loads in one read.

The two sessions ran the same model through the same compaction and ended in opposite places. The second orients in one read because the decision survived where the conversation did not. Nothing about the model changed. The load-bearing fact was placed somewhere a reset could not reach, and the model was pointed at it on the way back in.