Build Primitives, Not Prompts

Advanced

After this you can tell when a piece of AI work has earned the right to become a reusable primitive, and how to build it so it carries its own guarantees. You stop re-typing the same instructions and start invoking named units that behave the same way every time, compose with each other, and run correctly even when nothing in the session remembers why they exist.

Understand

Most people accumulate AI workflows the same way. You write a good prompt for a task, it works, and a week later you need almost the same thing, so you write it again from memory. It comes out a little different. The third time, different again. Each copy is plausible, none is canonical, and when one of them turns out to have a flaw you fix it in the copy in front of you and the others keep the bug. This is the natural failure of prompts as your unit of reuse. A prompt is a thing you retype, and retyping is how copies quietly diverge.

A primitive is the opposite stance. It is one named, stored unit of work that you invoke rather than rewrite. The version you run today is byte-for-byte the version you ran last month, because there is only one. The point is not tidiness. It is that a single canonical copy is the only thing you can attach guarantees to. You can validate its shape against a schema. You can test it. You can fix it once and have the fix apply everywhere it is used. None of that is possible for a prompt that lives in your head and gets re-typed each time.

Prompt versus primitivethe same task as a retyped prompt and as an invoked primitive, and where the guarantees can attach.

The version of this worth learning from is a real one. The workspace behind these lessons runs roughly sixty of these primitives, and it treats each as a file the agent executes verbatim, written so that an agent arriving with zero memory of the project still runs it correctly. That last constraint is what makes them more than saved prompts. A primitive is authored for a reader who knows nothing, which forces every assumption out of your head and into the file. The shape is consistent on purpose. Each one reads its own state from disk before doing anything, pauses once for human approval at the single point where a human actually needs to decide, does the work, verifies its own output, and handles the few error cases that real inputs produce. Those sections are not a document outline. They are a control structure. "Read state first" is there so the primitive never acts on a stale assumption. "Verify at the end" is there so it can tell you whether it actually worked rather than just finishing.

A primitive is a control structurethe anatomy of a primitive labeled by what each part is *for*, not what it contains.

Primitives become powerful when they compose. One primitive can invoke another, and the interesting design question is what happens to the human approval step when it does. If you are running a primitive by hand, it should pause and ask you. But if another primitive is calling it as one step of a larger job you already approved, pausing again is just friction. The workspace solves this with an explicit handoff: the caller passes a flag that says "you are being composed, I already gated this," and the called primitive swaps its human pause for a single logged line instead. The call chain gets recorded in the commit so you can later see which primitive invoked which. The detail worth stealing is not the mechanism, it is the rule they enforce about it. That "you are being composed" convention is stated in exactly one place, and every primitive points at that one statement. The earlier version restated the same paragraph inside every primitive that could be composed, which meant the same rule lived in dozens of places, and dozens of places is dozens of chances to update some and forget others. Stated once and pointed at from everywhere, a shared rule has exactly one place it can go stale, which is the one place you then have to keep current.

Composing two primitiveshow the "I already approved this" signal travels so the inner primitive skips its pause and the call is still auditable.

Where it breaks

The strongest signal that someone has half-learned this lesson is premature primitivization. You do a task once, it felt important, and you immediately wrap it in a reusable unit with a schema and a test. The problem is you do not yet know the right shape, because one run does not reveal which parts are essential and which were incidental to that one case. You end up maintaining a rigid abstraction over a thing you understand poorly, and changing it is now expensive. The honest rule the same workspace holds against itself is to defer this until repetition has actually shown you the contract. Build the primitive when you have felt the pain of retyping it, not before.

That workspace is also a useful warning about over-building, because it did it to itself and says so. It built a whole apparatus for heavyweight task environments, a skill, several schemas, and a scaffold script, and across eleven active projects that apparatus has been instantiated exactly zero times. The machinery is real, it works, and nobody ever needed it. Structure provisioned ahead of demonstrated need is not foresight, it is inventory you now have to maintain.

There is a standing tax, too. Once you have a shared anatomy across all your primitives, any change to that anatomy is a migration across every one of them. The consistency that makes them reliable is the same consistency that makes spine-level changes costly, so you want the shared shape to settle before you have fifty things conforming to it. And the subtlest failure is reading a described guarantee as a live one. Some of what a mature primitive system describes is enforced by tooling, and some is enforced only by you remembering to follow it. The composed-call trace in that workspace is recorded but not actually checked. A primitive can be bypassed and the only evidence is a missing line nobody is watching for. Know which of your guarantees are enforced and which are merely conventions you are trusting yourself to keep.

Do it now

Before you wrap anything in a primitive, run it past the gate below. It is written so a "no" is the common, correct answer.

Paste this

SHOULD THIS BE A PRIMITIVE YET? — all four must be yes:
[ ] I have actually done this task more than twice (not "I imagine I'll need it")
[ ] The shape has stopped changing run-to-run (I know what's essential vs incidental)
[ ] There is a real guarantee worth enforcing (a schema, a check, a fixed output shape)
[ ] Fixing it once and having the fix apply everywhere is worth the cost of maintaining it
If any box is empty, keep it as a prompt. Primitivize on repetition, not on anticipation.

When it does pass, author it for a zero-context reader using this skeleton, and resist filling sections that do not earn their place:

Paste this

NAME: <verb-led, says what it does>
WHEN TO USE / WHEN TO SKIP: <so it's invoked for the right task and not the wrong one>
READ FIRST: <the state this must load from disk before acting>
GATE: <the one point a human must approve — or "none" if fully deterministic>
STEPS: <the work, written so someone with no context runs it right>
VERIFY: <how it confirms it actually worked, not just finished>
ERRORS: <only the cases real inputs actually produce>

Worked example

Illustrative

Illustrative. A constructed pair, not a real artifact.

Imagine a "summarize this meeting transcript" task. As a prompt, the third time you need it you type:

Summarize this transcript. Pull out decisions and action items.

It works, mostly. But last month's version also said "attribute each action item to a person" and "ignore small talk," and you have forgotten both, so this summary is shaped differently from the others and you do not notice until someone asks why half the summaries name owners and half do not.

As a primitive, you invoke it by name. The stored unit already encodes every constraint you have learned: decisions and action items, each owner named, small talk dropped, output in a fixed shape your downstream tooling expects. When you later discover it should also flag unresolved disagreements, you add that once, to the one unit, and every future summary has it.

The primitive is no smarter than the prompt was. It simply holds the constraints you learned the hard way in a file, so they apply on every run whether or not you remember them that day. The prompt's quality lived in your recollection of how you phrased it last time, and that recollection is the first thing to go. Building a primitive is worth its cost once a task has proven it recurs, which is exactly where this lesson began, and not a run before.