Multi-Agent Orchestration: The Orchestrator-Worker Pattern
The Orchestrator-Worker Pattern
At the largest scale, subagents power true multi-agent systems. Anthropic's own multi-agent research system uses the orchestrator-worker pattern: a lead agent analyzes the query, develops a strategy, and spawns specialized subagents that explore different aspects in parallel — then synthesizes their findings into a final answer. This is the architecture behind Claude's advanced research capabilities.
Orchestrator-worker: an Opus lead agent fans out parallel Sonnet subagents over different aspects, then synthesizes.
The Numbers
This isn't a marginal improvement. In Anthropic's published results, a multi-agent system with Claude Opus 4 as the lead agent and Claude Sonnet 4 subagents outperformed single-agent Claude Opus 4 by 90.2% on their internal research eval. The advantage was strongest for breadth-first queries that benefit from pursuing many directions at once.
| Metric | Finding |
|---|---|
| Multi-agent vs single-agent | +90.2% on Anthropic's internal research eval |
| Token usage (multi-agent) | ~15× more tokens than a chat interaction |
| Token usage (standard agent) | ~4× more tokens than chat |
| What explains performance | Token usage alone explains ~80% of the variance |
| Parallel tool calling | Agents using 3+ tools at once cut research time up to 90% |
The headline figures from Anthropic's multi-agent research system — big gains, but at a real token cost.
Performance comes from spending tokens
That ~80% figure is the crucial nuance: most of the multi-agent advantage comes from simply doing more work (more tokens, more parallel exploration). This is why multi-agent isn't a free win — it's a deliberate trade of cost for capability on tasks where that trade pays off.
When Multi-Agent Is Worth It
The 15× token cost means multi-agent orchestration only makes sense for the right tasks. Anthropic's guidance maps cleanly onto everything you've learned about delegation and the decision rule:
| Multi-agent is worth it for | Multi-agent is a poor fit for |
|---|---|
| High-value tasks that justify the token cost | Tasks needing shared context across all agents |
| Heavy parallelization (independent directions) | Heavy interdependencies between steps |
| Information exceeding a single context window | Real-time coordination between agents |
| Complex tool interfaces | Most coding tasks |
Multi-agent shines on parallel, high-value research; it struggles where agents must share context or coordinate tightly — including most coding.
The same decision rule, scaled up
Notice this is the 'does the intermediate work matter?' rule at system scale. Parallel research over independent directions (journey doesn't matter) → multi-agent wins. Tightly coupled work where steps depend on each other (journey matters) → keep it coordinated in one place. Most coding is the latter, which is why multi-agent is a poor fit there.
Prompt-Engineering an Orchestrator
Building a good orchestrator is mostly about teaching it to delegate well — which is exactly the description-and-output-format discipline from earlier, applied to the lead agent:
- •Give each subagent a detailed task: objective, output format, tool guidance, and clear boundaries (so it knows exactly what to return).
- •Embed scaling rules: a simple query might need one agent making 3-10 calls; complex research warrants 10+ subagents.
- •Use parallel tool calling — agents using 3+ tools simultaneously cut research time up to 90%.
- •Start broad, then progressively narrow; give heuristics rather than rigid rules.
- •Keep humans in the loop for evaluation — automation misses edge cases like source-selection bias and hallucinations on unusual queries.
Next
You've now seen subagents from a single helper all the way to a 90%-better research system. The final lesson consolidates everything and gives you exam-focused pointers.
Key Takeaways
- ✓The orchestrator-worker pattern: a lead agent plans, spawns specialized subagents to explore aspects in parallel, then synthesizes — the architecture behind Anthropic's multi-agent research system.
- ✓Anthropic's result: a multi-agent system (Opus 4 lead + Sonnet 4 subagents) outperformed single-agent Opus 4 by 90.2% on their internal research eval, especially for breadth-first queries.
- ✓Token cost is real: multi-agent uses ~15× the tokens of a chat (standard agents ~4×), and token usage alone explains ~80% of performance variance.
- ✓Multi-agent is worth it for high-value, heavily parallel tasks with info exceeding one context window; it's a poor fit for shared-context, interdependent, or real-time work — including most coding.
- ✓This is the 'does the intermediate work matter?' decision rule applied at system scale — independent parallel work wins, tightly coupled work doesn't.
- ✓Orchestrator prompt-engineering = detailed per-subagent tasks (objective/format/tools/boundaries), scaling rules, parallel tool calling, broad→narrow, and humans in the loop.
Check Your Understanding
Test what you learned in this lesson.
Q1.What is the orchestrator-worker pattern?
Q2.By how much did Anthropic's multi-agent system outperform single-agent Claude Opus 4 on their internal research eval?
Q3.What is the key cost nuance of multi-agent systems?
Q4.For which kind of task is multi-agent orchestration a POOR fit?
Practice This Lesson