Workflow & Playbook Templates
After this you can pull a proven workflow off the shelf and adapt it to your own task instead of rebuilding the orchestration from scratch every time. You leave with a small set of durable, reusable units and the discipline to tell which one is safe to copy as-is and which one will break the moment it leaves the workspace it was born in.
Reusable templates are the highest-leverage asset in this whole catalog and the one almost nobody bothers to write down. A good prompt gets copied around; a good workflow — the sequence of steps, the gate between them, the file the model writes to — usually lives in one person's head and dies when they move on. The reason it goes uncatalogued is that it does not look like a thing. It looks like "how I happen to do this," and only after the third time you run the same five steps does it occur to you that you have been hand-rolling a template all along.
The trap on the way in is the trap that runs through the rest of the Toolkit: you accumulate instead of subtract. You collect a folder of clever slash commands, a starter config someone posted, a playbook screenshot from a thread, and treat the size of the collection as capability. It is not. A template only earns its place when it encodes a process you have actually run and watched succeed, so adapt a template to a proven process rather than the reverse. The failure here is specific. A template that quietly bakes in one workspace's idiosyncrasy — a folder layout, a naming scheme, a coordination rule that only makes sense given five other rules you cannot see — works beautifully where it was made and produces confident nonsense everywhere else. Lift-and-shift is the move that bites: copy a playbook whole, run it against a process that does not match its hidden assumptions, and the steps execute fine while the outcome drifts wrong.
So the operator habit is to read a template for its shape before trusting its content, and to generalize the idiosyncratic parts out before reuse. The entries below each carry the thing, the reason it earns its place, and the signal to read before you lift it.
A browsable shelf of workflow and playbook templates worth keeping. Each one is something you can run, not just admire. Read the where it breaks line on each before you trust it.
The harness reusable-template anatomy
Anthropic's "Claude Code: Best practices for agentic coding" names four durable, shareable units that live inside a working harness. These are the templates that compound. Treated as a set they are the most portable thing in this section.
- A
CLAUDE.md(or project memory file). A refined, context-consuming prompt that loads automatically at the start of every session. It holds the conventions, the source-of-truth rules, and the do-not list the model should never have to be re-told. It earns its place because it converts "things I keep re-explaining" into a one-time write. Where it breaks: it is itself a context tax. Everything in it is loaded on every turn before your real question arrives, so a bloatedCLAUDE.mdquietly steals the budget the actual task needs. Keep it lean and treat length as a cost. - Custom slash commands stored as markdown in
.claude/commands/. Each file is a reusable prompt template invoked by name. The PKOS workspace this Handbook is built in runs more than sixty of them, and the shape is consistent and copyable. A command declares its trigger in frontmatter, names the primitives it composes, and lays out its steps as numbered sections. Where it breaks: a command that hard-codes one project's folder paths or naming scheme is the textbook idiosyncrasy trap. The skeleton travels; the baked-in paths do not. - Subagents to preserve the main context window. A worker spawned to do a bounded piece of work — read a large file, run a search, draft one section — and return only a short result, so the bulk of the material never enters your main thread. It earns its place by keeping the orchestrator's window clean across long runs. Where it breaks: a subagent cannot see your conversation. Anything you have not written into its brief, it does not know, and it will guess rather than ask.
- A project convention-check script. A small deterministic check that verifies the codebase or workspace still obeys its own rules. It earns its place because deterministic code beats asking the model to "remember" a convention. Where it breaks: it only catches what you taught it to check; a green run means "the rules I encoded still hold," not "the work is correct."
Demote the PKOS specifics here to what they are: one live workspace as a case study, not a universal standard. The principle travels everywhere — a loaded memory file, named reusable commands, context-preserving workers, a deterministic checker. The exact filenames are local color.
A copy-pasteable slash-command skeleton
This is the durable shape behind those sixty-plus commands, stripped of any one project's specifics. Save a workflow you have run more than twice as a file like this and you stop re-typing it.
---
description: "TRIGGER: <when to reach for this>. SKIP: <when not to, and what to use instead>."
composes: [/<leaf-skill-it-calls>, ...]
---
# <Command Name>
**Argument:** $ARGUMENTS — <what the caller passes; what happens if it's empty>
## 1.0 Read state first
Read <the files/inputs this needs> before doing anything. Do not act from memory.
## 2.0 Plan & confirm
State the plan in one block. Pause for approval.
(If invoked by another command, log one line instead of pausing.)
## 3.0 Execute
<the ordered steps — atomic, each independently re-runnable>
## 4.0 Verify
<the check that proves it worked — not "looks done">
## 5.0 Return
<the summary the caller gets back>The load-bearing parts are the frontmatter trigger and the read-state-first / verify bookends. The trigger line is the whole API of the command — it decides whether the right thing fires. Read-state-first stops the command acting on a stale picture, and the verify step stops it reporting success it did not earn.
The high-autonomy safety triad
Three moves that make a high-autonomy agent run survivable, sequenced. The recurring practice among people who run CLI agents daily reduces to this: plan mode first, commit often, read the diff. Enter plan mode before any non-trivial task so the agent reasons through the approach before it touches anything. Commit frequently so rollback is cheap. Read every diff because the agent will confidently do the wrong thing some of the time, and the read is where you catch it.
The triad earns its place because the cost of being wrong at high autonomy is asymmetric. When an agent is right it saves you an hour; when it is wrong across a dozen files you can spend two hours reviewing and fixing, more than if you had done the work yourself. The triad is what shifts that distribution back in your favor. Where it breaks: it assumes you actually read the diff rather than skim it. The failure mode it does not protect against is rubber-stamping — a long, plausible diff that you approve without comprehension, which is exactly the "subtly wrong across twelve files" case the triad exists to catch. The higher the autonomy, the more the burden of specifying the task well shifts onto you, because vague instructions produce vague results, amplified.
The signal-tier playbook template
A reusable triage shape for any "which leads deserve attention now" decision, drawn from signal-based selling. Rather than personalizing a long cold list, you tier incoming signals by how strongly each correlates with the action you want, and you act fastest on the strongest. One GTM engineer's working taxonomy: Tier 1 high-intent signals run 10–20x baseline conversion — a prospect on your pricing page, a G2 comparison-page visit, a job posting for a role your product supports — and the play is to reach out within roughly two hours, while the signal is hot. Lower tiers are weaker and decay fast; a generic "this account is showing surging interest" score sits near the bottom and is oversold.
It earns its place because it is the rare playbook with a number attached to the discipline: signal-based systems in that account reported 9.1% reply rates against a 2.3% cold-outbound average, a roughly 4x lift attributable to timing, not copy. The template generalizes past sales — it is a priority-by-decay-rate rule for any inbound stream. Where it breaks: the tiers are only as good as the correlation behind them, and most "intent data" is noise. Copy the structure, but re-derive your own Tier 1 from signals you have watched actually convert. Borrowing someone else's tier definitions is the idiosyncrasy trap in playbook form.
The base → enriched → agnostic de-vendoring transform
The repeatable three-stage move for turning a workspace-specific lesson or template into something reusable without hollowing it out. This is the answer to the idiosyncrasy problem the Understand section names, run as a process.
- Stage 1 — base. Write it concrete, with your real examples and your real machinery. This is the honest version, full of specifics, and it is also where all the idiosyncrasy lives.
- Stage 2 — enriched. Keep the same spine. Add the glossary callouts, the worked cost example, the side-by-side good/bad comparison — the layer that makes it teachable rather than merely correct.
- Stage 3 — agnostic. Hold the spine and the depth, but swap the vendor-specific parts for neutral equivalents. The reusable shell.
In the build that grounds this Handbook, that transform ran across a full teaching guide: a base draft full of real workspace examples, an enriched pass that held the same heading spine while adding beginner callouts and a worked cost example, and an agnostic rewrite that renamed the workspace's project-truth.md / working-notes / task-queue files to neutral overview.md / notes.md / todos.md and swapped the real examples for a generic persona. It earns its place because it makes de-vendoring a procedure instead of an act of taste. Where it breaks: the agnostic stage trades concreteness for portability. The neutral examples read lighter than the real ones, so the credible move for a published asset is to pull structure and ordering from the agnostic version and re-inject real specifics where the reader needs to see the thing actually work. Agnostic for the skeleton, concrete for the flesh.
Worked example
IllustrativeIllustrative. A constructed walkthrough of one template applied to one decision, not a real run.
You have a recurring job. Every week you pull three feeds, dedupe the items, and write a short brief. You have done it four or five times by hand. You reach for the slash-command skeleton above to turn it into a named command rather than re-typing the steps each week.
The naive version is a single instruction: "summarize this week's feeds." That bakes nothing in and re-decides everything every run. Filled against the skeleton instead, the durable version comes out like this:
--- description: "TRIGGER: weekly feed roundup. SKIP: one-off single-source reads (just ask in chat)." composes: [/crawl-feed] --- # Weekly Feed Brief **Argument:** $ARGUMENTS — the week's date range. If empty, default to the last 7 days. ## 1.0 Read state first Read last week's brief so the new one does not repeat already-covered items. ## 2.0 Plan & confirm List the three feeds and the date window. Pause for approval. ## 3.0 Execute Pull each feed, dedupe against last week, group by theme. ## 4.0 Verify Every item in the brief traces to a real feed entry. No invented links. ## 5.0 Return The brief plus a one-line count of items kept vs deduped.
Run it and the difference shows up at the bookends, not the body. Read-state-first means it stops re-surfacing last week's stories, which the bare "summarize the feeds" prompt repeated every single time. The verify step means a fabricated link gets caught before the brief ships rather than after.
Then comes the part the Understand section warns about. You want to hand this command to a teammate, but the composes: [/crawl-feed] line and "read last week's brief" assume your file layout and your crawl tool. That is the idiosyncrasy. Before handing it over you run the de-vendoring move in miniature: keep the five-section spine and the read-then-verify discipline, but replace the workspace-specific composed tool and the "last week's brief" path with a neutral "your previous output, wherever you keep it." Generalizing those hard-coded parts out is the step that turns a personal command into a shareable one, so run it before the handoff, while you still remember which paths are yours and which are universal.