Skip to lesson

Multi-Agent Orchestration: When One Isn't Enough

After this you can judge whether a task actually benefits from multiple agents, fan it out without the results collapsing into incoherence, and recognize that synthesis, not spawning, is where the real work lives.

Understand

The reflex when one agent is struggling is to add another. It rarely helps, and the reason is structural rather than accidental. The public argument about multi-agent systems is loud and split in two. One camp says swarms of agents will replace whole teams, the other says never build them at all. Both are describing real experiences, and both have taken a single task shape and turned it into a universal law. The operator read is that the two famous positions do not actually disagree. Anthropic reported a multi-agent research system that clearly beat a single agent on broad, read-heavy research. Cognition argued, just as credibly, that for building software a single coherent thread beats a swarm almost every time. Put them on the same axes and the contradiction dissolves. Multi-agent wins when the subtasks are independent and read-heavy, and it loses when they are coupled and write-heavy.

The debate, reconciledthe two famous positions plotted on the axes that actually predict the outcome, so they stop looking like a contradiction.
The debate, reconciledthe two famous positions plotted on the axes that actually predict the outcome, so they stop looking like a contradiction.

The cost side is what turns this from an architecture question into an economic one. Anthropic's multi-agent system used roughly 15 times the tokens of a single chat, and token usage alone explained about 80% of the performance variance across their evaluations. That reframes everything. You are not asking whether the design is elegant, you are asking whether this task is worth a 15x bill, and for routine work the honest answer is no. Multi-agent buys you two things and neither of them is raw intelligence. It buys coverage, because five workers explore five branches at once, and it buys wall-clock, because they do it in the time of one. If the task does not split into independent branches whose results you will simply union, you are paying 15x for work you will have to throw away and redo in order anyway.

Should you add an agent?the decision walked as three gates, where any single failed gate routes you back to one agent or a fixed workflow.
Should you add an agent?the decision walked as three gates, where any single failed gate routes you back to one agent or a fixed workflow.

When it does fit, the shape that ships is orchestrator-workers. A lead agent breaks the task into self-contained briefs, hands each to a worker with its own clean context, and then, the part everyone underestimates, reconciles what comes back. Spawning is the easy ten percent. The ninety percent is synthesis. The orchestrator never sees the workers' raw context, only their returned outputs, so the whole thing holds together only when each worker returns a structured artifact against a known shape instead of a wall of prose the orchestrator has to re-read and re-decide. Delegation here is really task-brief authoring. A worker handed "research X" with no scope does redundant or divergent work, the same way a vague instruction to a new hire does.

Fan out, then synthesizethe orchestrator-worker pattern with synthesis marked as the load-bearing step, where structured returns are what make the fan-in mechanical.
Fan out, then synthesizethe orchestrator-worker pattern with synthesis marked as the load-bearing step, where structured returns are what make the fan-in mechanical.

Where it breaks

The signature failure is decision fragmentation. Hand two workers two halves of the same artifact and each makes a locally reasonable choice that is globally inconsistent. Two halves of a UI built in parallel come back with different button styles, different spacing, different assumptions, and the merge is incoherent in a way that costs more to untangle than the parallelism ever saved. Isolation is free when workers are reading and expensive when they are writing, because writing embeds decisions that have to stay consistent across the whole. The second failure is quieter. The orchestrator itself fills up. Every worker summary it holds eats its own context budget until the captain starts losing the thread, which is the exact problem the fan-out was supposed to solve, reappearing one level up. The way out is to keep the orchestrator lean. Workers write their findings to a file the orchestrator reopens per worker, so it reads only the fields it needs for synthesis instead of carrying twenty full responses in its window at once. And the most common failure is the plainest one, reaching for a swarm at all when a single agent with one tool call would have done the job for a fifteenth of the cost.

Do it now

Two artifacts. First the gate, which you paste before you fan anything out:

Paste this
Add agents only if ALL are true:
[ ] The subtasks are independent — none needs another's output mid-flight
[ ] The work is read-heavy (gathering, analysis), not writing one shared artifact
[ ] The result is worth ~10-15x the tokens of a single run
If any box is unchecked → one agent, or a fixed workflow. Do not fan out.

Second the worker brief. If you cannot fill one of these out per worker, you are not ready to delegate yet:

Paste this
Worker objective: <the one self-contained outcome>
Scope and boundaries: <what's in, what's explicitly out — keeps siblings from overlapping>
Facts you need: <the minimum context; do NOT inherit the whole parent transcript>
Return exactly this shape: <the fields or schema the orchestrator will fan back in>

The return-shape line is what makes synthesis mechanical instead of a second guessing game. Structured returns fan in cleanly. Free prose forces the orchestrator to re-decide everything the workers already decided.

Worked example

Illustrative

Illustrative. Two constructed runs to show the fit, not real sessions.

The same instinct, "this is big, split it," applied to two different task shapes.

A market scan across 30 competitors fans out cleanly. Five workers take six competitors each, every one reads public pages and returns the same shape, name and pricing and positioning and one risk, and the orchestrator stacks 30 rows into a table. The branches never touched each other, so nothing conflicts, and the only real work left is reading the union.

You: Scan these 30 competitors. (fans out 5 workers, 6 each, fixed return shape) Orchestrator: 30 of 30 returned. Merged to one table. 4 flagged for a closer look.

Now the same move on "build the pricing page." Three workers take the hero, the table, and the FAQ. Each picks a reasonable button style, a reasonable heading scale, a reasonable tone. They come back and nothing lines up: three blue-ish buttons that do not match, two heading systems, a voice that lurches mid-page. Stitching it into something coherent takes longer than building it in one pass would have, because every worker quietly made layout decisions that needed to agree and never did.

You: Build the pricing page. (fans out 3 workers by section) Orchestrator: 3 of 3 returned. Merge is inconsistent — conflicting styles, mismatched headings. Reconciling…

The split that worked and the split that backfired ran on the identical mechanism. What separated them was never the number of agents, it was whether the pieces could be decided independently. Reading divides. Writing toward one artifact does not. The orchestration is rarely the bottleneck. Putting the pieces back together is, and you only get to skip that bill when the pieces never needed to agree in the first place.