Structured Error Response Design

Core

Implement structured error responses for MCP tools · Difficulty 3/5

0%
error-handlingmcpstructured-errorsrecovery

Structured error responses enable agents to make intelligent recovery decisions instead of failing generically or retrying blindly.

The Problem with Generic Errors

Returning "Operation failed" for every error prevents the agent from:

  • Knowing whether to retry (transient) or not (validation)
  • Communicating the right message to users (business rule vs system error)
  • Deciding whether to escalate or try an alternative approach
  • MCP isError Flag

    The MCP isError flag pattern communicates tool failures back to the agent, distinguishing errors from successful empty results.

    Error Categories

    CategoryExamplesRetryableAgent Action

    |----------|----------|-----------|--------------|

    TransientTimeout, service unavailableYesRetry with backoff
    ValidationInvalid input, bad formatNoFix input and retry
    BusinessPolicy violation, limit exceededNoInform user, suggest alternative
    PermissionUnauthorized, forbiddenNoEscalate or request credentials

    Structured Error Metadata

    {
      "isError": true,
      "errorCategory": "business",
      "isRetryable": false,
      "message": "Refund exceeds $500 policy limit",
      "userFriendlyMessage": "This refund requires manager approval. Let me escalate this for you."
    }

    Access Failures vs Valid Empty Results

    Critical distinction:

  • Access failure: Database timeout -- needs retry, do not treat as "no results"
  • Valid empty result: Query succeeded, 0 matches found -- accept and move on
  • Confusing these leads to either missed data (treating failures as empty) or wasted retries (retrying successful empty queries).

    Local Error Recovery

    Subagents should handle transient failures locally (retries, fallbacks). Only propagate to the coordinator errors that cannot be resolved locally, along with:

  • What was attempted
  • Partial results obtained
  • The specific failure that could not be resolved
  • Key Takeaways

    • Use the MCP isError flag to distinguish tool failures from successful empty results
    • Include errorCategory, isRetryable, and human-readable descriptions in error responses
    • Generic 'Operation failed' errors prevent intelligent agent recovery
    • Handle transient errors locally in subagents; propagate only unresolvable errors with context