Context Management &amp; Reliability

Context window = input tokens + output tokens combined

Foundation·3 terms

contextprovisiontokens

Lost in the Middle & Position Effects

Models attend best to beginning and end of context, less to the middle

lost-in-middlecontextattention

Progressive Summarization Risks

Extract critical transactional facts into a persistent block outside summarized history

summarizationmemoryconversation

Context Token Management & Caching

✓Advanced

Trim verbose tool outputs to only relevant fields before they accumulate in context

Intermediate·1 prereq·1 terms

tokenscachingoptimization

Design effective escalation and ambiguity resolution patterns

Defining when to escalate to humans and how to handle ambiguous tool results safely.

Knowledge of:

Appropriate escalation triggers: customer requests for a human, policy exceptions/gaps (not just complex cases), and inability to make meaningful progress
The distinction between escalating immediately when a customer explicitly demands it versus offering to resolve when the issue is straightforward
Why sentiment-based escalation and self-reported confidence scores are unreliable proxies for actual case complexity
How multiple customer matches require clarification (requesting additional identifiers) rather than heuristic selection

Skills in:

Adding explicit escalation criteria with few-shot examples to the system prompt demonstrating when to escalate versus resolve autonomously
Honoring explicit customer requests for human agents immediately without first attempting investigation
Acknowledging frustration while offering resolution when the issue is within the agent's capability, escalating only if the customer reiterates their preference
Escalating when policy is ambiguous or silent on the customer's specific request (e.g., competitor price matching when policy only addresses own-site adjustments)
Instructing the agent to ask for additional identifiers when tool results return multiple matches, rather than selecting based on heuristics

Escalation Criteria & Patterns

Escalate for genuine policy gaps, not just complexity

escalationcriteriasystem-prompt

Appropriate Escalation Triggers

Escalate for genuine policy gaps, not just complexity

escalationtriggerspolicy-gaps

Ambiguous Result Handling

ambiguitysafetyclarification

Ask for clarification on ambiguous results rather than guessing

Foundation

Implement error propagation strategies across multi-agent systems

Structured error handling, failure type distinction, and graceful degradation in multi-agent architectures.

Knowledge of:

Structured error context (failure type, attempted query, partial results, alternative approaches) as enabling intelligent coordinator recovery decisions
The distinction between access failures (timeouts needing retry decisions) and valid empty results (successful queries with no matches)
Why generic error statuses ("search unavailable") hide valuable context from the coordinator
Why silently suppressing errors (returning empty results as success) or terminating entire workflows on single failures are both anti-patterns

Skills in:

Returning structured error context including failure type, what was attempted, partial results, and potential alternatives to enable coordinator recovery
Distinguishing access failures from valid empty results in error reporting so the coordinator can make appropriate decisions
Having subagents implement local recovery for transient failures and only propagate errors they cannot resolve, including what was attempted and partial results
Structuring synthesis output with coverage annotations indicating which findings are well-supported versus which topic areas have gaps due to unavailable sources

Error Propagation in Multi-Agent Systems

Handle errors at the lowest level capable of resolving them

error-handlingmulti-agentresilience

Graceful Degradation with Transparency

Continue operating with partial data but annotate gaps transparently

degradationtransparencyresilience

Manage context effectively in large codebase exploration

Strategies for managing context degradation, scratchpad persistence, and crash recovery during extended codebase exploration.

Knowledge of:

Context degradation in extended sessions: models start giving inconsistent answers and referencing "typical patterns" rather than specific classes discovered earlier
The role of scratchpad files for persisting key findings across context boundaries
Subagent delegation for isolating verbose exploration output while the main agent coordinates high-level understanding
Structured state persistence for crash recovery: each agent exports state to a known location, and the coordinator loads a manifest on resume

Skills in:

Spawning subagents to investigate specific questions (e.g., "find all test files," "trace refund flow dependencies") while the main agent preserves high-level coordination
Having agents maintain scratchpad files recording key findings, referencing them for subsequent questions to counteract context degradation
Summarizing key findings from one exploration phase before spawning sub-agents for the next phase, injecting summaries into initial context
Designing crash recovery using structured agent state exports (manifests) that the coordinator loads on resume and injects into agent prompts
Using /compact to reduce context usage during extended exploration sessions when context fills with verbose discovery output

Context Budget Management & Upstream Reduction

Reduce data volume at the source rather than trying to handle large inputs downstream

context-budgetdata-reductionupstream

Codebase Exploration Context Strategies

Use scratchpad files to externalize findings beyond the context window

codebaseexplorationscratchpad

Design human review workflows and confidence calibration

Stratified sampling, confidence calibration, and routing strategies for human review of AI outputs.

Knowledge of:

The risk that aggregate accuracy metrics (e.g., 97% overall) may mask poor performance on specific document types or fields
Stratified random sampling for measuring error rates in high-confidence extractions and detecting novel error patterns
Field-level confidence scores calibrated using labeled validation sets for routing review attention
The importance of validating accuracy by document type and field segment before automating high-confidence extractions

Skills in:

Implementing stratified random sampling of high-confidence extractions for ongoing error rate measurement and novel pattern detection
Analyzing accuracy by document type and field to verify consistent performance across all segments before reducing human review
Having models output field-level confidence scores, then calibrating review thresholds using labeled validation sets
Routing extractions with low model confidence or ambiguous/contradictory source documents to human review, prioritizing limited reviewer capacity

Human Review Workflow Design

Aggregate accuracy metrics can mask poor performance on specific segments

human-reviewsamplingaccuracy

Confidence Calibration & Review Thresholds

✓Advanced

Calibrate confidence thresholds using labeled validation sets, not intuition

Advanced·1 prereq

confidencecalibrationvalidation

Preserve information provenance and handle uncertainty in multi-source synthesis

Maintaining source attribution, handling conflicting data, and preserving temporal context through synthesis pipelines.

Knowledge of:

How source attribution is lost during summarization steps when findings are compressed without preserving claim-source mappings
The importance of structured claim-source mappings that the synthesis agent must preserve and merge when combining findings
How to handle conflicting statistics from credible sources: annotating conflicts with source attribution rather than arbitrarily selecting one value
Temporal data: requiring publication/collection dates in structured outputs to prevent temporal differences from being misinterpreted as contradictions

Skills in:

Requiring subagents to output structured claim-source mappings (source URLs, document names, relevant excerpts) that downstream agents preserve through synthesis
Structuring reports with explicit sections distinguishing well-established findings from contested ones, preserving original source characterizations and methodological context
Completing document analysis with conflicting values included and explicitly annotated, letting the coordinator decide how to reconcile before passing to synthesis
Requiring subagents to include publication or data collection dates in structured outputs to enable correct temporal interpretation
Rendering different content types appropriately in synthesis outputs -- financial data as tables, news as prose, technical findings as structured lists -- rather than converting everything to a uniform format

Handling Conflicting Data Sources

Preserve conflicting data with source attribution; don't choose one

conflictsdata-qualityattribution

Information Provenance & Claim-Source Mappings

Require structured claim-source mappings from subagents to preserve attribution