Skip to lesson

Parallel Extraction and Dedicated Synthesis

After this you can take a job that spans many sources and run it as one agent per source in parallel, each returning a compact summary, with a separate step that does nothing but reconcile those summaries. You get the breadth of reading everything without drowning a single context window in it, and you stop asking one agent to both gather and make sense of more than it can hold.

Understand

Put twenty documents in front of a model and ask your question across all of them, and the thoroughness is mostly an illusion. A single window holding twenty sources is exactly the crowded context that degrades attention, and it crowds the model before it runs into any other limit. The model does not read that window evenly. It leans on whichever source is loudest or sits at the edges of the window, and the careful point buried in source eleven gets no weight at all. You did the reading and threw away most of its value by stacking it into one place where most of it cannot be attended to.

The structure that works splits the job in two along a line most people never draw. Gathering and sense-making are different tasks with different shapes, and trying to do both at once is what overloads the window. So you separate them. Each source gets its own agent, in its own fresh context, whose only job is to read that one source fully and return a short summary of what it holds. Twenty sources become twenty isolated reads happening at once, none of them competing for room with the others, each producing maybe a page of distilled findings instead of ten pages of raw material. Then, and only then, a second step takes those twenty summaries and does the actual synthesis, the cross-source reconciling, the noticing that sources four and twelve disagree. The reading is parallel and wide. The synthesis is one focused pass over already-compressed material.

Isolation beats stackingthe same sources processed as one crowded window versus N isolated reads each returning a compact summary.
Isolation beats stackingthe same sources processed as one crowded window versus N isolated reads each returning a compact summary.

Anthropic's own multi-agent research system shows the gap is not just tidiness but measured performance. Its sub-agents are built around the idea that the essence of search is compression, and each returns roughly a thousand to two thousand tokens of summary rather than its raw findings. The system beat a single strong agent on a research task by about ninety percent, and when they looked at what predicted performance, token usage alone explained around eighty percent of the variance. The catch is in the same paper: it costs roughly fifteen times the tokens of a normal chat, and it only wins when the subtasks are genuinely independent. That last condition is the whole game, and the next section is about what happens when you ignore it.

The synthesis step has to stay genuinely separate, and the reason is structural. If your synthesizing agent reads all twenty full source documents to do its job, you have rebuilt the crowded window one level up, and the orchestrator now rots the way a single overloaded agent would. The fix is to make the extractors write their summaries to disk and have the synthesizer read only what it needs, which decouples how many sources you process from how much the synthesizer has to hold at once. The workspace behind these lessons runs research this way as a standard move. It spawns a set of cluster agents in parallel, each working a slice of the question against a search tool, and then hands their structured outputs to one dedicated synthesis agent whose entire job is to reconcile across clusters, including resolving the citation collisions where two clusters found the same source and numbered it differently. The synthesis is not a courtesy pass at the end. It is its own agent with its own task.

Fan out, then a dedicated reconcilethe captain pattern where parallel workers each return a structured artifact and one synthesis agent merges them.
Fan out, then a dedicated reconcilethe captain pattern where parallel workers each return a structured artifact and one synthesis agent merges them.

It helps to know which flavor of parallel you are running, because they reconcile differently. Sometimes each agent owns a distinct slice and the pieces simply add up, so the merge is mostly concatenation and de-duplication. Other times you run the same task several times to cross-check it, and the merge is a vote or a judgment about which answer to trust. Both are parallel, but the first reconciles by assembling and the second by adjudicating, and naming which one you are doing tells you what the synthesis step actually needs to be.

Two shapes of parallel, two kinds of mergesplitting distinct subtasks versus repeating one task for cross-check, and how each is reconciled.
Two shapes of parallel, two kinds of mergesplitting distinct subtasks versus repeating one task for cross-check, and how each is reconciled.

Where it breaks

The condition that makes this work, independent subtasks, is also the cliff you fall off when it is missing. Reading is separable because one source's content does not depend on another's. Writing usually is not. Hand two agents two halves of the same feature and they make locally sensible choices that clash when joined, because each was blind to the decisions the other was making. The same goes for debugging, where the bug often lives in the interaction between parts no single agent saw whole. When subtasks are coupled, fanning out does not buy you parallelism, it buys you a reconciliation fight, and you would have been faster with one agent holding the whole thing. The roughly-fifteen-times token cost is real whether or not the parallelism paid off, so spending it on coupled work is the worst case: you pay the premium and get incoherence.

Even on genuinely independent work, the architecture has a lossy seam. The synthesizer never sees the sources, only the summaries, which means anything an extractor dropped is invisible from then on and unrecoverable. If an extraction agent decides a detail is minor and leaves it out, no amount of careful synthesis downstream can use it, because to the synthesizer it never existed. Fidelity leaks at every hand-off, and the more hops between raw source and final answer, the more has quietly gone missing. Write your extraction instructions knowing the summary is all that survives.

A smaller failure hides inside the synthesis step itself. A synthesizer holding twenty summaries is still subject to the same uneven attention as any long context, so a crucial finding sitting in the middle of the pile can be the one it overlooks. Ordering the inputs and keeping the synthesis input lean matters as much here as anywhere. And the cost point deserves repeating as a gate, not a footnote: this is an expensive pattern reserved for high-value, breadth-first work. For a question one good agent can answer from two sources, fanning out is pure overhead.

There is a real tension worth holding about the model you give each role. Mechanical, single-source extraction is grunt work you can run on a cheaper, faster model, and economizing there is sensible. But genuine multi-source synthesis, where the value is in connections that only appear across sources, is exactly the work cheaper models do worst, and saving tokens on the synthesizer tends to cost you in missed links and re-runs. Spend on the step that draws the connections, economize on the step that only reads.

Do it now

You need two prompts that never run together. One goes to each extraction agent, shaped so every summary comes back in the same form for the synthesizer to line up. The other is the synthesis pass itself, run once over the collected summaries.

Paste this
EXTRACTION BRIEF (one per source) — paste, fill the source, send N of these in parallel:
Read this one source FULLY: <source>
Return ONLY a summary in this exact shape, nothing else:
- Source: <id/title>
- Core claims (3–6 bullets, each with the specific number/name/quote that backs it):
- Anything that disagrees with or qualifies common belief on <topic>:
- What this source does NOT cover:
Keep it under ~300 words. Omit nothing load-bearing; the synthesis step sees only this, not the source.
Paste this
SYNTHESIS PROMPT (run once, as its own step, over all the summaries):
Here are <N> source summaries on <question>. You will NOT see the sources, only these.
1. Where do sources agree? (claim + which sources)
2. Where do they conflict? (state both sides; do not average them away)
3. What does the weight of evidence support, and how confident should I be?
4. What's missing — a question none of the summaries answered?
Cite by source id. If two summaries cite the same source under different labels, merge them.

The discipline that makes this pay off is resisting the urge to let one agent do both jobs. The extraction briefs go out in parallel, the summaries come back, and only then does the synthesis prompt run. Collapsing the two back into one request is how you end up with the crowded window you were trying to avoid.

Worked example

Illustrative

Illustrative. A constructed comparison, not a real run.

You need to answer whether a team should adopt a particular database, drawing on eight write-ups of varying quality. The stacked approach:

You: [pastes all eight articles] Should we adopt this database?

Assistant: Based on the benchmarks, it's extremely fast and a clear yes...

It anchored on the one source full of benchmark charts, because that source was the most attention-grabbing in a crowded window, and never weighed the two write-ups describing operational pain at scale that sat lower in the paste. Run as parallel extraction instead, each of the eight gets its own agent and comes back as a fixed-shape summary. One of those summaries reads: "Core claims: fast on synthetic benchmarks; Disagrees-with-common-belief: author notes real-world write amplification erased the benchmark advantage above ~2TB." Now the synthesis step sees that qualification sitting beside the benchmark enthusiasm and produces a genuinely different answer:

Synthesis: Evidence splits. Synthetic benchmarks favor adoption; two operational reports describe write amplification past a few terabytes that negates the speed win. Confidence: moderate. Missing: nobody tested your actual workload size. Recommend a scoped trial at your data volume before committing.

Both runs had the same eight sources. Stacked into one window, most were skimmed for whatever surfaced loudest, and the two write-ups describing real operational pain lost out to the source full of benchmark charts. Read in parallel, each source got enough attention to register on its own before anything weighed it against the others, which is why source six's qualification reached the synthesis step at full strength and changed the answer.