Context Engineering: Curating the Window

Intermediate

After this you can run the context window as a curated working set. You decide what the model actually needs to see and keep everything else out, so it stays sharp on a hard task instead of slowly degrading as you pile material in.

Understand

The instinct when a task is going badly is to add more. More background, the whole document, the entire thread so far. Context engineering is the habit of doing the opposite. The context window is the fixed amount of text the model can see at once, and everything shares it at the same time. Your instructions, the conversation history, every file you pasted, the tool definitions, and the model's own replies all draw from the same fixed allowance. Treat it as a budget you spend, not a bucket you fill.

Budget, not bucketone fixed window shared by every component at once, so a pasted file that keeps growing steals room and attention from the real question.

What makes this more than a size limit is the part beginners never see. A model does not read a full window with even attention. Every token you add competes with every other token for the model's focus, so a 40-page doc pasted "just in case" is not sitting there harmlessly. It dilutes the attention available for the one paragraph that actually mattered. The effect shows up long before the window is anywhere near full. On long inputs, models lose accuracy even on easy tasks, and the drop holds even when the relevant fact is definitely in there and the model is pointed straight at it. Length itself is a tax. Attention also is not even across the window. A model leans on what sits at the very start and the very end and sags through the middle, an effect known as lost in the middle, so where a fact sits changes whether the model actually uses it, not just whether it is present.

Attention by positionthe U-shaped curve — strong attention at the start and end of the window, weakest in the middle, which is why a buried instruction gets ignored.

This is why the advertised window and the usable window are different sizes. A model sold on a 200K or million-token context often does its most reliable work well under that ceiling, and many practitioners treat something like the first 32K to 64K of well-chosen tokens as the zone to stay inside for precision work. The headline number is insurance, not a target. An operator treats the effective window as the smaller, real one and curates to fit it.

So the real skill is subtractive. Beginners ask what to put into the context. Operators ask what to keep out, and they start from the fact that the window is never empty when they arrive. The system prompt, a long instructions file, and every tool definition are already loaded and already spending the budget before the first real question, sometimes a large share of it. That is why the operator reflex is to audit what is already in the window and cut the dead weight before adding anything new. Loading the minimum sufficient context, and pushing everything else to a file the model can reopen on demand, is the single most useful habit in this whole spine.

Where it breaks

Curation only helps when the problem is crowding. If the model is wrong on a short, clean prompt, context is not your issue. That is a reasoning or knowledge gap, and trimming a near-empty window will not fix it. The opposite failure is just as real and easier to fall into once "less is more" lodges in your head: starving the model. Cut a constraint it actually needed and it will not stop to ask. It will guess instead, and the answer will look just as confident as a correct one, so you will not notice until it is already wrong. And there is no gauge for any of this. No meter tells you the moment the budget tips from healthy to crowded. You read it from symptoms instead, the model forgetting an instruction or answering as if it never saw what you just pasted. Those symptoms mean the window is already crowded, and recovering a session that has drifted that far is its own skill, which the next lesson on context rot covers. This lesson is about not getting there.

Do it now

Curate the window before a hard task instead of after it goes wrong. Open a fresh chat, paste this block, fill the four lines, and let the last line do the safety work:

Paste this

Task: <the one outcome you want>
Constraints that bind: <the 2-4 that actually matter, e.g. budget, deadline, format, reader>
Context attached: <only the slices that bear on this task, quote the paragraph not the file>
Out of scope: <what you are deliberately not asking about>

Before answering: if anything above is missing a detail you need, ask for it. Do not fill a gap with a guess.

The block is doing two things at once. The first three lines force you to curate, because you cannot fill "context attached" with a whole document without noticing you are dumping. The last line covers the failure on the other side, so a window you trimmed too hard surfaces as a question instead of a confident wrong answer.

One habit makes the rest automatic: paste the slice, not the document. Quote the one section the model needs rather than the whole 40-page file, so the part that matters gets full attention instead of being buried under pages it will never use. When a single instruction absolutely cannot be lost, put it at the very top or the very bottom of what you send, never in the middle where attention sags.

Worked example

Illustrative

Illustrative. A constructed exchange to show the habit, not a real session.

One test is failing in a large module. The dump move is to hand over everything and hope:

You: [pastes the entire 800-line file and the full test-run log] One test is failing. Fix it.

Assistant: I see a deprecation warning in the log, let me update that call first...

It goes after the wrong thing. The one failing assertion is buried under 800 lines of working code and a log full of passing tests and warnings, so the model latches onto the most visible thing rather than the thing that matters. Curated, you quote only the failing test and the function beneath it:

You: This assertion fails: expect(total).toBe(60) returns 59. Here is the function it tests: [pastes the 12-line sumRange function]. What is wrong?

Assistant: The loop runs i < end, so it stops one short of the last value. For an inclusive range use i <= end, which gives 60.

Dump versus curatethe same model on the same bug — the full dump latches onto the wrong thing, the curated slice finds the real one.

Same model, same bug. The second works because the failing assertion and the code under it are the only things in the window, sitting where the model's attention actually lands.