Context Management & Reliability
Manage conversation context across long interactions, design escalation and ambiguity resolution, implement error propagation, handle codebase exploration, design human review workflows, and preserve information provenance.
Manage conversation context to preserve critical information across long interactions
Techniques for managing context windows, mitigating position effects, and preserving critical details across long conversations.
Knowledge of:
- Progressive summarization risks: condensing numerical values, percentages, dates, and customer-stated expectations into vague summaries
- The "lost in the middle" effect: models reliably process information at the beginning and end of long inputs but may omit findings from middle sections
- How tool results accumulate in context and consume tokens disproportionately to their relevance (e.g., 40+ fields per order lookup when only 5 are relevant)
- The importance of passing complete conversation history in subsequent API requests to maintain conversational coherence
Skills in:
- Extracting transactional facts (amounts, dates, order numbers, statuses) into a persistent "case facts" block included in each prompt, outside summarized history
- Extracting and persisting structured issue data (order IDs, amounts, statuses) into a separate context layer for multi-issue sessions
- Trimming verbose tool outputs to only relevant fields before they accumulate in context (e.g., keeping only return-relevant fields from order lookups)
- Placing key findings summaries at the beginning of aggregated inputs and organizing detailed results with explicit section headers to mitigate position effects
- Requiring subagents to include metadata (dates, source locations, methodological context) in structured outputs to support accurate downstream synthesis
- Modifying upstream agents to return structured data (key facts, citations, relevance scores) instead of verbose content and reasoning chains when downstream agents have limited context budgets
Context Windows & Provision Strategies
✎CoreContext window = input tokens + output tokens combined
Lost in the Middle & Position Effects
✎CoreModels attend best to beginning and end of context, less to the middle
Progressive Summarization Risks
✎CoreExtract critical transactional facts into a persistent block outside summarized history
Context Token Management & Caching
✓AdvancedTrim verbose tool outputs to only relevant fields before they accumulate in context
Design effective escalation and ambiguity resolution patterns
Defining when to escalate to humans and how to handle ambiguous tool results safely.
Knowledge of:
- Appropriate escalation triggers: customer requests for a human, policy exceptions/gaps (not just complex cases), and inability to make meaningful progress
- The distinction between escalating immediately when a customer explicitly demands it versus offering to resolve when the issue is straightforward
- Why sentiment-based escalation and self-reported confidence scores are unreliable proxies for actual case complexity
- How multiple customer matches require clarification (requesting additional identifiers) rather than heuristic selection
Skills in:
- Adding explicit escalation criteria with few-shot examples to the system prompt demonstrating when to escalate versus resolve autonomously
- Honoring explicit customer requests for human agents immediately without first attempting investigation
- Acknowledging frustration while offering resolution when the issue is within the agent's capability, escalating only if the customer reiterates their preference
- Escalating when policy is ambiguous or silent on the customer's specific request (e.g., competitor price matching when policy only addresses own-site adjustments)
- Instructing the agent to ask for additional identifiers when tool results return multiple matches, rather than selecting based on heuristics
Escalation Criteria & Patterns
✎CoreEscalate for genuine policy gaps, not just complexity
Appropriate Escalation Triggers
✎CoreEscalate for genuine policy gaps, not just complexity
Ambiguous Result Handling
✎CoreAsk for clarification on ambiguous results rather than guessing
Implement error propagation strategies across multi-agent systems
Structured error handling, failure type distinction, and graceful degradation in multi-agent architectures.
Knowledge of:
- Structured error context (failure type, attempted query, partial results, alternative approaches) as enabling intelligent coordinator recovery decisions
- The distinction between access failures (timeouts needing retry decisions) and valid empty results (successful queries with no matches)
- Why generic error statuses ("search unavailable") hide valuable context from the coordinator
- Why silently suppressing errors (returning empty results as success) or terminating entire workflows on single failures are both anti-patterns
Skills in:
- Returning structured error context including failure type, what was attempted, partial results, and potential alternatives to enable coordinator recovery
- Distinguishing access failures from valid empty results in error reporting so the coordinator can make appropriate decisions
- Having subagents implement local recovery for transient failures and only propagate errors they cannot resolve, including what was attempted and partial results
- Structuring synthesis output with coverage annotations indicating which findings are well-supported versus which topic areas have gaps due to unavailable sources
Error Propagation in Multi-Agent Systems
✎CoreHandle errors at the lowest level capable of resolving them
Graceful Degradation with Transparency
✎CoreContinue operating with partial data but annotate gaps transparently
Manage context effectively in large codebase exploration
Strategies for managing context degradation, scratchpad persistence, and crash recovery during extended codebase exploration.
Knowledge of:
- Context degradation in extended sessions: models start giving inconsistent answers and referencing "typical patterns" rather than specific classes discovered earlier
- The role of scratchpad files for persisting key findings across context boundaries
- Subagent delegation for isolating verbose exploration output while the main agent coordinates high-level understanding
- Structured state persistence for crash recovery: each agent exports state to a known location, and the coordinator loads a manifest on resume
Skills in:
- Spawning subagents to investigate specific questions (e.g., "find all test files," "trace refund flow dependencies") while the main agent preserves high-level coordination
- Having agents maintain scratchpad files recording key findings, referencing them for subsequent questions to counteract context degradation
- Summarizing key findings from one exploration phase before spawning sub-agents for the next phase, injecting summaries into initial context
- Designing crash recovery using structured agent state exports (manifests) that the coordinator loads on resume and injects into agent prompts
- Using /compact to reduce context usage during extended exploration sessions when context fills with verbose discovery output
Context Budget Management & Upstream Reduction
✎CoreReduce data volume at the source rather than trying to handle large inputs downstream
Codebase Exploration Context Strategies
✎CoreUse scratchpad files to externalize findings beyond the context window
Design human review workflows and confidence calibration
Stratified sampling, confidence calibration, and routing strategies for human review of AI outputs.
Knowledge of:
- The risk that aggregate accuracy metrics (e.g., 97% overall) may mask poor performance on specific document types or fields
- Stratified random sampling for measuring error rates in high-confidence extractions and detecting novel error patterns
- Field-level confidence scores calibrated using labeled validation sets for routing review attention
- The importance of validating accuracy by document type and field segment before automating high-confidence extractions
Skills in:
- Implementing stratified random sampling of high-confidence extractions for ongoing error rate measurement and novel pattern detection
- Analyzing accuracy by document type and field to verify consistent performance across all segments before reducing human review
- Having models output field-level confidence scores, then calibrating review thresholds using labeled validation sets
- Routing extractions with low model confidence or ambiguous/contradictory source documents to human review, prioritizing limited reviewer capacity
Human Review Workflow Design
✎CoreAggregate accuracy metrics can mask poor performance on specific segments
Confidence Calibration & Review Thresholds
✓AdvancedCalibrate confidence thresholds using labeled validation sets, not intuition
Preserve information provenance and handle uncertainty in multi-source synthesis
Maintaining source attribution, handling conflicting data, and preserving temporal context through synthesis pipelines.
Knowledge of:
- How source attribution is lost during summarization steps when findings are compressed without preserving claim-source mappings
- The importance of structured claim-source mappings that the synthesis agent must preserve and merge when combining findings
- How to handle conflicting statistics from credible sources: annotating conflicts with source attribution rather than arbitrarily selecting one value
- Temporal data: requiring publication/collection dates in structured outputs to prevent temporal differences from being misinterpreted as contradictions
Skills in:
- Requiring subagents to output structured claim-source mappings (source URLs, document names, relevant excerpts) that downstream agents preserve through synthesis
- Structuring reports with explicit sections distinguishing well-established findings from contested ones, preserving original source characterizations and methodological context
- Completing document analysis with conflicting values included and explicitly annotated, letting the coordinator decide how to reconcile before passing to synthesis
- Requiring subagents to include publication or data collection dates in structured outputs to enable correct temporal interpretation
- Rendering different content types appropriately in synthesis outputs -- financial data as tables, news as prose, technical findings as structured lists -- rather than converting everything to a uniform format
Handling Conflicting Data Sources
✎CorePreserve conflicting data with source attribution; don't choose one
Information Provenance & Claim-Source Mappings
✎CoreRequire structured claim-source mappings from subagents to preserve attribution