Context Management & Reliability

Manage conversation context across long interactions, design escalation and ambiguity resolution, implement error propagation, handle codebase exploration, design human review workflows, and preserve information provenance.

15% of exam6 task statements15 concepts57 questions
Domain Mastery
0%
This domain is 15% of the Claude SA exam — see full Exam Guide

Manage conversation context to preserve critical information across long interactions

Techniques for managing context windows, mitigating position effects, and preserving critical details across long conversations.

Knowledge of:

  • Progressive summarization risks: condensing numerical values, percentages, dates, and customer-stated expectations into vague summaries
  • The "lost in the middle" effect: models reliably process information at the beginning and end of long inputs but may omit findings from middle sections
  • How tool results accumulate in context and consume tokens disproportionately to their relevance (e.g., 40+ fields per order lookup when only 5 are relevant)
  • The importance of passing complete conversation history in subsequent API requests to maintain conversational coherence

Skills in:

  • Extracting transactional facts (amounts, dates, order numbers, statuses) into a persistent "case facts" block included in each prompt, outside summarized history
  • Extracting and persisting structured issue data (order IDs, amounts, statuses) into a separate context layer for multi-issue sessions
  • Trimming verbose tool outputs to only relevant fields before they accumulate in context (e.g., keeping only return-relevant fields from order lookups)
  • Placing key findings summaries at the beginning of aggregated inputs and organizing detailed results with explicit section headers to mitigate position effects
  • Requiring subagents to include metadata (dates, source locations, methodological context) in structured outputs to support accurate downstream synthesis
  • Modifying upstream agents to return structured data (key facts, citations, relevance scores) instead of verbose content and reasoning chains when downstream agents have limited context budgets

Design effective escalation and ambiguity resolution patterns

Defining when to escalate to humans and how to handle ambiguous tool results safely.

Knowledge of:

  • Appropriate escalation triggers: customer requests for a human, policy exceptions/gaps (not just complex cases), and inability to make meaningful progress
  • The distinction between escalating immediately when a customer explicitly demands it versus offering to resolve when the issue is straightforward
  • Why sentiment-based escalation and self-reported confidence scores are unreliable proxies for actual case complexity
  • How multiple customer matches require clarification (requesting additional identifiers) rather than heuristic selection

Skills in:

  • Adding explicit escalation criteria with few-shot examples to the system prompt demonstrating when to escalate versus resolve autonomously
  • Honoring explicit customer requests for human agents immediately without first attempting investigation
  • Acknowledging frustration while offering resolution when the issue is within the agent's capability, escalating only if the customer reiterates their preference
  • Escalating when policy is ambiguous or silent on the customer's specific request (e.g., competitor price matching when policy only addresses own-site adjustments)
  • Instructing the agent to ask for additional identifiers when tool results return multiple matches, rather than selecting based on heuristics

Implement error propagation strategies across multi-agent systems

Structured error handling, failure type distinction, and graceful degradation in multi-agent architectures.

Knowledge of:

  • Structured error context (failure type, attempted query, partial results, alternative approaches) as enabling intelligent coordinator recovery decisions
  • The distinction between access failures (timeouts needing retry decisions) and valid empty results (successful queries with no matches)
  • Why generic error statuses ("search unavailable") hide valuable context from the coordinator
  • Why silently suppressing errors (returning empty results as success) or terminating entire workflows on single failures are both anti-patterns

Skills in:

  • Returning structured error context including failure type, what was attempted, partial results, and potential alternatives to enable coordinator recovery
  • Distinguishing access failures from valid empty results in error reporting so the coordinator can make appropriate decisions
  • Having subagents implement local recovery for transient failures and only propagate errors they cannot resolve, including what was attempted and partial results
  • Structuring synthesis output with coverage annotations indicating which findings are well-supported versus which topic areas have gaps due to unavailable sources

Manage context effectively in large codebase exploration

Strategies for managing context degradation, scratchpad persistence, and crash recovery during extended codebase exploration.

Knowledge of:

  • Context degradation in extended sessions: models start giving inconsistent answers and referencing "typical patterns" rather than specific classes discovered earlier
  • The role of scratchpad files for persisting key findings across context boundaries
  • Subagent delegation for isolating verbose exploration output while the main agent coordinates high-level understanding
  • Structured state persistence for crash recovery: each agent exports state to a known location, and the coordinator loads a manifest on resume

Skills in:

  • Spawning subagents to investigate specific questions (e.g., "find all test files," "trace refund flow dependencies") while the main agent preserves high-level coordination
  • Having agents maintain scratchpad files recording key findings, referencing them for subsequent questions to counteract context degradation
  • Summarizing key findings from one exploration phase before spawning sub-agents for the next phase, injecting summaries into initial context
  • Designing crash recovery using structured agent state exports (manifests) that the coordinator loads on resume and injects into agent prompts
  • Using /compact to reduce context usage during extended exploration sessions when context fills with verbose discovery output

Design human review workflows and confidence calibration

Stratified sampling, confidence calibration, and routing strategies for human review of AI outputs.

Knowledge of:

  • The risk that aggregate accuracy metrics (e.g., 97% overall) may mask poor performance on specific document types or fields
  • Stratified random sampling for measuring error rates in high-confidence extractions and detecting novel error patterns
  • Field-level confidence scores calibrated using labeled validation sets for routing review attention
  • The importance of validating accuracy by document type and field segment before automating high-confidence extractions

Skills in:

  • Implementing stratified random sampling of high-confidence extractions for ongoing error rate measurement and novel pattern detection
  • Analyzing accuracy by document type and field to verify consistent performance across all segments before reducing human review
  • Having models output field-level confidence scores, then calibrating review thresholds using labeled validation sets
  • Routing extractions with low model confidence or ambiguous/contradictory source documents to human review, prioritizing limited reviewer capacity

Preserve information provenance and handle uncertainty in multi-source synthesis

Maintaining source attribution, handling conflicting data, and preserving temporal context through synthesis pipelines.

Knowledge of:

  • How source attribution is lost during summarization steps when findings are compressed without preserving claim-source mappings
  • The importance of structured claim-source mappings that the synthesis agent must preserve and merge when combining findings
  • How to handle conflicting statistics from credible sources: annotating conflicts with source attribution rather than arbitrarily selecting one value
  • Temporal data: requiring publication/collection dates in structured outputs to prevent temporal differences from being misinterpreted as contradictions

Skills in:

  • Requiring subagents to output structured claim-source mappings (source URLs, document names, relevant excerpts) that downstream agents preserve through synthesis
  • Structuring reports with explicit sections distinguishing well-established findings from contested ones, preserving original source characterizations and methodological context
  • Completing document analysis with conflicting values included and explicitly annotated, letting the coordinator decide how to reconcile before passing to synthesis
  • Requiring subagents to include publication or data collection dates in structured outputs to enable correct temporal interpretation
  • Rendering different content types appropriately in synthesis outputs -- financial data as tables, news as prose, technical findings as structured lists -- rather than converting everything to a uniform format