Agent & Harness Starter Configs

Intermediate

After this you can stand up the durable parts of an agent harness from working starter configs — a project instruction file, a slash-command skeleton, a subagent definition, a convention-check script, and an MCP wiring stub — and adapt each to your own stack instead of pasting someone else's assumptions.

Understand

The reusable, shareable units of orchestration practice are not prompts. They are the harness files that sit around the model: a CLAUDE.md that loads on every session, the slash commands stored as markdown in .claude/commands/, the subagents you delegate to, and the small scripts that check your own conventions. Anthropic's Claude Code best-practices guide treats exactly these as the durable assets, and they are the least-catalogued asset class precisely because they encode accumulated judgment rather than clever phrasing. A starter config is worth more than a starter prompt because it is the thing that survives between sessions and between people.

The trap is treating a starter config the way beginners treat MCP servers — collect it, paste it whole, trust that breadth equals capability. A config you lift wholesale carries the originating stack's assumptions: its folder layout, its naming, its enforcement mechanisms, its concurrency model. When those assumptions do not match your situation, the config costs more than it saves, because now you are debugging someone else's architecture inside your own. The starter is a skeleton to adapt, not a finished system to install. The same discipline that governs MCP servers governs configs: take the few you understand, adapt them to your situation, and leave the rest.

This is where the framework-versus-building-blocks question lives too. Frameworks like LangChain promise to abstract orchestration away, and that abstraction is genuinely useful while your needs stay simple and match the framework's shape. It becomes a liability the moment your use case diverges. Octomind documented this directly when they ripped LangChain out of their AI agents. The high-level abstractions hid the prompts being sent to the model, so every tweak meant digging through layers, and the framework stopped composing once their needs deviated. They replaced it with direct API calls plus a handful of utility functions and got a smaller, more debuggable codebase. The starter configs below sit at the building-block layer for the same reason, because a thin orchestration layer you fully understand beats a heavy one you adopt before you understand the problem. Reach for machinery when a real run proves you need it, not in anticipation. The system these examples come from even carries its own corrective for this drift, defer complexity until tests prove the need, and it still over-builds and then flags the over-build.

Adopt a framework, or wire building blocksthe decision that routes you to a heavy framework versus a thin custom layer, gated on whether your use case matches the framework's shape.

Where it breaks

The configs help when you adapt them and break when you install them. The most common failure is framework-first orchestration: reaching for LangChain or CrewAI before you understand the problem, then fighting the abstraction exactly when stakes are highest and you need to see what is actually being sent to the model. The second failure is adopting heavyweight coordination machinery ahead of demonstrated need. The starter set below references a single-writer-per-file rule and a parallel "wave-mode" for concurrent sessions; those solve real torn-write problems at multi-agent concurrency, but on a single-operator setup that rarely runs many concurrent sessions, that machinery is provisioned ahead of the pressure that justifies it. Copy the convention-check script and the slash-command skeleton freely. Treat the concurrency rules as a case study of what to build when your runs actually start colliding, not as table stakes for your first harness.

Do it now

Each block below is a starting point, not a finished file. They are grounded in a real workspace harness — over sixty slash commands, a live git hook, a working convention script — so the shapes are load-bearing rather than invented. Read each entry's why it earns its place and the where it breaks signal before you trust it. Adapt the names, paths, and enforcement to your stack.

1. A starter `CLAUDE.md`

Why it earns its place: this file loads into context on every session, so it is the single highest-leverage place to put the conventions you are tired of restating. It is also a context-consuming prompt, which is the constraint that shapes it.

Where it breaks: every line here is spent on every turn, so a bloated CLAUDE.md taxes precision on real work. Keep it to durable rules a fresh agent needs to act correctly, not a wiki. Cut anything the agent can read on demand from a file instead.

Paste thismarkdown

# <Project> — Agent Entry Point

You are an orchestration agent for <project>. The files and the database are
the source of truth — never your memory of them. Read before acting.

## Architecture
- Canonical data: <where structured state lives, e.g. Postgres / Supabase>
- Working documents: <where markdown/working files live on disk>
- Entry points: <the 2-3 commands or files a new session should read first>

## Conventions
- <naming rule, e.g. task titles use Title Case>
- <commit rule, e.g. commit to main; no feature branches>
- <a real constraint you are tired of repeating>

## Workflow
- Plan before any non-trivial (3+ step) change; stop and re-plan if it drifts.
- Never mark work done without proving it (tests run, output checked).
- No file or DB writes without a preview and explicit approval.

2. A slash-command (skill) skeleton

A slash command in this harness is a markdown file at .claude/commands/{slug}.md that the agent executes. It is not documentation about a workflow; it is the workflow, written so an agent with zero conversation context runs it correctly. The non-obvious part is that each § section is a control structure, not a doc-outline heading. The transferable principle is to give reusable commands a fixed, role-bearing section order; the exact §-numbering below is this workspace's own schema convention, not a universal one.

Why it earns its place: this is the unit that turns a one-off prompt into a repeatable primitive. §1.0 forces read-state-first discipline. §2.0 is the one place the command pauses for a human. §6.0 scopes error handling to the common-input failures only, which is an explicit anti-bloat rule rather than an exhaustive enumeration.

Where it breaks: the spine is uniform on purpose, but do not cargo-cult sections you do not need. A command that produces objective, non-destructive output can drop the §2.0 human gate entirely and renumber — a pause that adds friction without protecting anything is overhead.

Paste thismarkdown

---
description: <one line — what it does. TRIGGER when …. SKIP when … (use /other-skill).>
composed_by: []
composes: []
---

# <Command Name>

**Argument:** $ARGUMENTS (<what it accepts, and the fallback if omitted>)

## Preamble
### Runtime Inputs
Parse $ARGUMENTS once. Resolve <slug/path> or halt with §6.0.

## 1.0 Context Capture
Read the current state from disk/DB first. Reconstruct truth from files,
never from memory of them.

## 2.0 Plan & Confirm
Show the plan + the exact diff/writes. Pause for approval.
(If invoked by another command, log one line instead of pausing.)

## 3.0 Execute
Do the work. One atomic block per step.

## 4.0 Verify   (optional)
Prove it: re-read the written file, run the check, diff against intent.

## 5.0 Summary
Report what changed in 1-3 lines + paths touched.

## 6.0 Error Handling
Common-input failures only (bad slug, missing parent). Skip universal rows.

3. A subagent definition skeleton

Why it earns its place: a named subagent is addressable by subagent_type, which means you delegate to it the same way every time instead of re-describing the role in each spawn prompt. It also preserves the main context window — the subagent reads heavy material and returns a compact summary, so the orchestrator stays sharp.

Where it breaks: a subagent has none of your conversation context. It can read files and run commands, but it will miss nuance a human in the loop would catch, and in this harness it cannot directly edit pre-existing canonical files — write-heavy delegation has to route through a command that uses temp-file patterns. Delegate reading, research, and parallel analysis; keep judgment-heavy writes in the main thread.

Paste thismarkdown

---
name: <slug>
description: <when to spawn this agent, and what it returns>
tools: [Read, Bash, Grep]   # least-privilege — add Write only if it must produce files
---

# <Agent Name>

## Role
You are a <focused role>. One task per spawn. You have no conversation context —
everything you need is in this prompt and the files it names.

## Inputs
- <the paths / scope you are given>

## Steps
1. Read <the named sources> firsthand before writing any claim.
2. <do the focused work>
3. Return: <the exact compact shape the orchestrator expects back>.

## Out of scope
- <what you must NOT touch — e.g. do not edit canonical files; report instead>

4. A convention-check script stub

Why it earns its place: this is the cheapest enforcement you can buy. Every project drifts during iterative development, and a small script that fails on the drift patterns you have actually hit turns a convention from a hope into a check. The real version of this stub runs five checks, each tied to a specific drift caught during a codebase audit.

Where it breaks: it only catches what you encode. It is a guardrail, not a corrective engine — when it fires one or two small items per run, that is healthy specialization, not a broken upstream. Add a check when a drift bites you, not speculatively.

Paste thisjavascript

#!/usr/bin/env node
// convention-check — fails (exit 1) on drift patterns we have actually hit.
// Each check corresponds to a real failure, not a hypothetical one.

import { readFileSync, readdirSync } from 'fs'
import { join } from 'path'

let violations = []
const fail = (check, msg) => violations.push({ check, msg })

// Check 1: <real drift, e.g. shared constants not imported from lib/constants>
// Check 2: <real drift, e.g. a banned inline style / magic number>
// ...add a check the day a drift bites you, not before.

if (violations.length) {
  for (const v of violations) console.error(`✗ ${v.check}: ${v.msg}`)
  process.exit(1)
}
console.log('✓ conventions clean')

5. An `.mcp.json` wiring starter

Why it earns its place: this is where you connect the model's hands to the outside world, and the right default is an empty object. You add a server only when a task needs it.

Where it breaks: every connected server's tool definitions are loaded into context before the agent does any work — with dozens of servers, the definitions alone can consume tens of thousands of tokens up front, and Anthropic reports cases where shifting to load-on-demand cut tool-related context by over 90%. A server is also third-party code running with your credentials, and its tool descriptions are an untrusted injection surface. So the starter is deliberately minimal, and the rule is to wire the fewest servers that cover the task, each scoped to least privilege.

Paste thisjson

{
  "mcpServers": {
    "filesystem": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "/path/to/scoped/dir"]
    }
  }
}

Paste this before connecting any new server — the one-line vet check: Who maintains it, when did it last ship, does it scope credentials, and does it return shaped output or raw 50KB JSON? If you can't answer all four, don't wire it in.

The case-study caveat (read before copying the enforcement)

These configs come from a single-operator workspace that runs each command as a byte-identical pair — one runtime copy and one source-of-truth mirror — kept honest by a live git pre-commit hook that fails any commit where the two diverge. That mirror-parity gate is the cleanest example here of enforcement beating instruction: a machine refuses to let the divergence land, rather than a comment asking you not to. It is also idiosyncratic. The transferable principle is to convert your highest-value convention into a deterministic gate the moment instruction-level discipline stops holding. The specific two-file mirror, the wave-mode reconciliation, the single-writer queue — treat those as one operator's answer, not a starting requirement.

How strong is each guarantee, reallythe enforcement spectrum from a hope to a deterministic gate, so you know which of your conventions actually hold under pressure.

Worked example

Illustrative

Illustrative. A constructed fill of the slash-command skeleton, not a real session.

The task: you keep manually checking which open PRs have failed CI before standup, and you want it as a repeatable command. The skeleton above adapts to this directly. Note that this command produces a read-only report, so it drops the §2.0 human gate and renumbers — there is nothing destructive to approve.

Paste thismarkdown

---
description: List open PRs with failing CI for the current repo. TRIGGER before standup or when triaging the PR queue. SKIP for a single named PR (use `gh pr checks <n>` directly).
composed_by: []
composes: []
---

# Failing-PR Triage

**Argument:** $ARGUMENTS (optional `--repo=<owner/name>`; defaults to the current repo)

## Preamble
### Runtime Inputs
Resolve REPO from --repo= or `gh repo view --json nameWithOwner`. Halt §5.0 if neither resolves.

## 1.0 Context Capture
List open PRs and their check states from the live source — never from memory:
`gh pr list --repo $REPO --state open --json number,title,statusCheckRollup`

## 2.0 Execute
Filter to PRs where any check rollup state is FAILURE. For each, collect the
failing check name + its log URL.

## 3.0 Verify
Re-query each flagged PR's checks once to confirm the failure is current, not a
stale/in-progress state.

## 4.0 Summary
Print one line per failing PR: `#<num> <title> — <failing check>`. If none, say so.

## 5.0 Error Handling
Bad repo slug, or gh not authenticated → report and stop.

Run against a repo mid-sprint, the realistic output is short and decision-ready:

Paste this

#412 Add rate-limit middleware — failing: e2e-tests
#418 Refactor auth context — failing: typecheck
2 of 7 open PRs have failing CI.

The skeleton did the load-bearing work before any prompting cleverness. §1.0 pulled state from the live gh query instead of trusting a remembered list, the verify step caught checks that were merely in-progress rather than truly failed, and dropping the unnecessary approval gate kept a read-only command from pausing for nothing. The control structure underneath is what lets a command like this run unattended without surprising you.