Error Recovery & Retry Patterns
AdvancedImplement structured error responses for MCP tools · Difficulty 3/5
0%
error-recoveryretryresilienceerror-handling
Prerequisites
Effective error recovery requires distinguishing error types and applying the right recovery strategy for each.
Retryable vs Non-Retryable Errors
Returning structured metadata with isRetryable prevents wasted retry attempts:
| Error Type | Retryable | Recovery Strategy |
|---|
|------------|-----------|-------------------|
| Timeout / service unavailable | Yes | Retry with exponential backoff |
|---|---|---|
| Rate limit exceeded | Yes | Wait for reset window, then retry |
| Invalid input format | No | Fix input parameters, then retry |
| Policy violation (e.g., refund > $500) | No | Inform user, escalate, or suggest alternative |
| Permission denied | No | Escalate to human or request credentials |
| File corruption | No | Report failure, do not retry |
Business Rule Violations
For business errors, include:
retriable: false to prevent pointless retriesLocal vs Propagated Recovery
Subagents should handle transient failures locally:
Propagate to the coordinator only when:
Access Failures vs Valid Empty Results
This distinction is critical and commonly confused:
Treating an access failure as an empty result means missing data. Treating an empty result as a failure wastes retries and may cause incorrect escalation.
Key Takeaways
- ✓Structured isRetryable metadata prevents wasted retry attempts on non-retryable errors
- ✓Business rule violations need retriable: false plus customer-friendly explanations
- ✓Subagents should exhaust local recovery before propagating errors to the coordinator
- ✓Never confuse access failures (retry) with valid empty results (accept)