CCA-F
CCA-F CourseConcept LibraryExam GuidePractice QuestionsMock TestsCode LabsCheat SheetsStudy Plan
Start Learning
Start Free
← Back to Quick Reference

Domain 1: Agentic Architecture & Orchestration

27% of exam
ts-1.1

Design and implement agentic loops for autonomous task execution

Key Points

  • The agentic loop lifecycle: send request, inspect stop_reason, execute tools, append results, repeat until end_turn.
  • stop_reason is the sole authoritative signal for loop control -- not text parsing, not iteration counts.
  • Tool results must be appended to conversation history so Claude can reason about the next action.
  • Model-driven tool selection (Claude decides which tool based on context) is the default; pre-configured sequences are for strict compliance.
  • Each iteration should include the full conversation context so Claude maintains coherent reasoning.

Decision Rules

When: stop_reason === 'tool_use'

→Execute the requested tool(s), append results to messages, and call Claude again.

When: stop_reason === 'end_turn'

→Terminate the loop and present the final response to the user.

When: You need a safety guardrail against runaway loops

→Add a max iteration count as a backstop, but keep stop_reason as the primary control signal.

✗ Anti-Patterns to Reject

  • Parsing response text for phrases like 'I've completed' to determine loop termination instead of using stop_reason.
  • Using an arbitrary iteration cap as the primary stopping mechanism rather than a safety backstop.
ts-1.2

Orchestrate multi-agent systems with coordinator-subagent patterns

Key Points

  • Hub-and-spoke: coordinator manages all inter-subagent communication, error handling, and information routing.
  • Subagents operate with isolated context -- they do NOT inherit the coordinator's conversation history.
  • The coordinator is responsible for task decomposition, delegation, result aggregation, and deciding which subagents to invoke.
  • Overly narrow task decomposition by the coordinator leads to incomplete coverage of broad topics.
  • Route all communication through the coordinator for observability, consistent error handling, and controlled information flow.

Decision Rules

When: Multiple specialized capabilities are needed (search, analysis, synthesis)

→Use coordinator-subagent pattern; coordinator delegates to specialized agents and aggregates results.

When: A subagent's output needs to reach another subagent

→Route through the coordinator -- never allow direct agent-to-agent communication.

When: Research output is missing entire topic areas

→Check the coordinator's task decomposition first -- it likely defined subtasks too narrowly.

✗ Anti-Patterns to Reject

  • Allowing direct agent-to-agent communication that bypasses the coordinator, breaking observability and error handling.
  • Having the coordinator always route through the full pipeline instead of dynamically selecting which subagents to invoke.
ts-1.3

Configure subagent invocation, context passing, and spawning

Key Points

  • The Task tool is the mechanism for spawning subagents; allowedTools must include 'Task' for the coordinator.
  • Subagent context must be explicitly provided in the prompt -- subagents do NOT automatically inherit parent context.
  • AgentDefinition configures descriptions, system prompts, and tool restrictions per subagent type.
  • Use fork-based session management to explore divergent approaches from a shared analysis baseline.
  • Spawn parallel subagents by emitting multiple Task tool calls in a single coordinator response.

Decision Rules

When: A subagent needs data from a prior agent's output

→Include the complete findings directly in the subagent's prompt via the coordinator.

When: You need parallel research across multiple source types

→Emit multiple Task tool calls in a single coordinator turn to spawn parallel subagents.

When: Coordinator prompts lead to rigid subagent behavior

→Specify research goals and quality criteria rather than step-by-step procedural instructions.

✗ Anti-Patterns to Reject

  • Assuming subagents inherit the coordinator's context or share memory between invocations.
  • Writing step-by-step procedural coordinator prompts instead of goal-oriented ones that allow subagent adaptability.
ts-1.4

Implement multi-step workflows with enforcement and handoff patterns

Key Points

  • Programmatic enforcement (hooks, prerequisite gates) provides deterministic guarantees; prompt instructions are probabilistic.
  • When deterministic compliance is required (e.g., identity verification before financial ops), prompts alone have a non-zero failure rate.
  • For multi-concern requests, decompose into distinct items, investigate each in parallel using shared context, then synthesize.
  • Structured handoff summaries (customer ID, root cause, refund amount, recommended action) are essential for human escalation.

Decision Rules

When: A specific tool sequence is required for critical business logic (e.g., verify customer before refund)

→Use programmatic prerequisites that block downstream tools until prior steps complete.

When: Customer sends a multi-concern message

→Decompose into distinct concerns, investigate in parallel with shared context, then synthesize a unified resolution.

When: Agent escalates to a human who lacks access to the conversation transcript

→Compile a structured handoff summary with customer ID, root cause, amounts, and recommended action.

✗ Anti-Patterns to Reject

  • Relying solely on prompt instructions to enforce required tool ordering for operations with financial consequences.
  • Processing multiple customer concerns sequentially, re-fetching shared context for each one.
ts-1.5

Apply Agent SDK hooks for tool call interception and data normalization

Key Points

  • PostToolUse hooks intercept tool results for transformation BEFORE the model processes them.
  • Hook patterns can also intercept outgoing tool calls to enforce compliance rules (e.g., block refunds above a threshold).
  • Hooks provide deterministic guarantees; prompt instructions provide only probabilistic compliance.
  • Use PostToolUse to normalize heterogeneous data formats: Unix timestamps, ISO 8601, numeric status codes.

Decision Rules

When: Tools return heterogeneous formats (Unix timestamps, ISO dates, numeric codes) and the agent misinterprets them

→Implement a PostToolUse hook to normalize all outputs before agent processing.

When: Business rules require guaranteed compliance (e.g., refunds > $500 must be escalated)

→Use a hook to intercept and block policy-violating tool calls, redirecting to the appropriate workflow.

When: Third-party MCP tools return data you cannot modify at the source

→Use PostToolUse hooks as a centralized normalization layer rather than prompt instructions.

✗ Anti-Patterns to Reject

  • Adding format documentation to the system prompt instead of using hooks when deterministic normalization is required.
  • Creating a separate normalize_data tool the agent must remember to call, instead of automatic hook-based transformation.
ts-1.6

Design task decomposition strategies for complex workflows

Key Points

  • Use fixed sequential pipelines (prompt chaining) for predictable multi-aspect reviews; dynamic decomposition for open-ended investigation.
  • Splitting large reviews into per-file local analysis plus a separate cross-file integration pass avoids attention dilution.
  • Adaptive investigation plans generate subtasks based on what is discovered at each step.
  • For open-ended tasks, first map the structure, identify high-impact areas, then create a prioritized plan.

Decision Rules

When: A single-pass review of 14+ files produces inconsistent depth and contradictory findings

→Split into per-file analysis passes plus a separate cross-file integration pass.

When: The task is predictable with known steps (e.g., multi-aspect code review)

→Use prompt chaining: a fixed sequential pipeline.

When: The task is exploratory with unknown scope (e.g., 'add tests to a legacy codebase')

→Use dynamic decomposition: map first, identify high-impact areas, then create a prioritized adaptive plan.

✗ Anti-Patterns to Reject

  • Reviewing all files in a large PR in a single pass, leading to attention dilution and contradictory feedback.
  • Using a fixed pipeline for an open-ended investigation task where subtasks depend on intermediate findings.
ts-1.7

Manage session state, resumption, and forking

Key Points

  • Use --resume <session-name> to continue named investigation sessions across work sessions.
  • fork_session creates independent branches from a shared analysis baseline for exploring divergent approaches.
  • When resuming after code modifications, inform the agent about specific file changes for targeted re-analysis.
  • Starting a new session with a structured summary is more reliable than resuming with stale tool results.

Decision Rules

When: Prior context is mostly valid and you want to continue an investigation

→Use --resume with the session name; inform Claude about any file changes since last session.

When: Prior tool results are stale (significant code changes since last session)

→Start a new session with an injected summary of prior findings instead of resuming.

When: You want to compare two refactoring approaches from the same analysis baseline

→Use fork_session to create parallel exploration branches.

✗ Anti-Patterns to Reject

  • Resuming a session after significant code changes without informing the agent, leading to stale context reasoning.
  • Re-exploring the entire codebase from scratch instead of informing a resumed session about targeted changes.

Domain 2: Tool Design & MCP Integration

18% of exam
ts-2.1

Design effective tool interfaces with clear descriptions and boundaries

Key Points

  • Tool descriptions are the PRIMARY mechanism LLMs use for tool selection -- minimal descriptions lead to unreliable selection.
  • Include input formats, example queries, edge cases, and boundaries explaining when to use a tool vs similar alternatives.
  • Ambiguous or overlapping descriptions (e.g., analyze_content vs analyze_document) cause misrouting.
  • Keyword-sensitive system prompt instructions can override well-written tool descriptions, creating unintended tool associations.
  • Rename tools and update descriptions to eliminate functional overlap (e.g., analyze_content -> extract_web_results).

Decision Rules

When: Agent consistently selects the wrong tool among similar options

→Review and expand tool descriptions FIRST -- include input formats, example queries, and boundary explanations.

When: Two tools have near-identical names/descriptions causing misrouting

→Rename the tools and rewrite descriptions to clearly distinguish each tool's purpose.

When: Tool descriptions are clear but the agent still misroutes based on keywords like 'account'

→Review the system prompt for keyword-sensitive instructions that create unintended tool associations.

✗ Anti-Patterns to Reject

  • Writing minimal descriptions like 'Retrieves customer information' without specifying inputs, outputs, or boundaries.
  • Adding a routing layer or classifier as the first step instead of improving tool descriptions.
ts-2.2

Implement structured error responses for MCP tools

Key Points

  • Use the MCP isError flag to communicate tool failures back to the agent.
  • Distinguish error categories: transient (timeouts), validation (bad input), business (policy violations), permission errors.
  • Return structured metadata: errorCategory, isRetryable boolean, and human-readable descriptions.
  • Uniform 'Operation failed' errors prevent the agent from making appropriate recovery decisions.
  • Distinguish access failures (needing retries) from valid empty results (successful queries with no matches).

Decision Rules

When: A tool encounters a transient failure (timeout, service unavailable)

→Return isError: true with errorCategory: 'transient', isRetryable: true, and what was attempted.

When: A business rule is violated (e.g., refund exceeds policy limit)

→Return isError: true with errorCategory: 'business', isRetryable: false, and a customer-friendly explanation.

When: A query returns zero results but executed successfully

→Return a success response (isError: false) with empty results -- do NOT treat this as an error.

✗ Anti-Patterns to Reject

  • Returning generic 'Operation failed' for all error types, preventing intelligent agent recovery decisions.
  • Treating valid empty results (0 matches) the same as access failures (timeouts), causing unnecessary retries.
ts-2.3

Distribute tools appropriately across agents and configure tool choice

Key Points

  • Too many tools (e.g., 18 instead of 4-5) degrades tool selection reliability by increasing decision complexity.
  • Agents with tools outside their specialization tend to misuse them (e.g., synthesis agent doing web searches).
  • Apply principle of least privilege: give each agent only tools needed for its role, plus limited cross-role tools for high-frequency needs.
  • tool_choice options: 'auto' (default), 'any' (must call a tool), forced selection ({'type': 'tool', 'name': '...'}).

Decision Rules

When: A specialized agent misuses tools outside its role (e.g., doc analysis agent doing web searches)

→Replace generic tools with purpose-specific constrained alternatives (e.g., fetch_url -> load_document).

When: 85% of a subagent's verification needs are simple fact-checks with 15% complex

→Give a scoped verify_fact tool for simple lookups; route complex cases through the coordinator.

When: You need to guarantee the model calls a specific tool first in a sequence

→Use tool_choice: {'type': 'tool', 'name': 'extract_metadata'} for the first turn, then switch to 'auto'.

✗ Anti-Patterns to Reject

  • Giving all agents access to all tools, leading to cross-specialization misuse and unreliable selection.
  • Giving the synthesis agent full web search tools when a scoped verify_fact tool handles 85% of its needs.
ts-2.4

Integrate MCP servers into Claude Code and agent workflows

Key Points

  • Project-scoped .mcp.json for shared team tooling; user-scoped ~/.claude.json for personal/experimental servers.
  • Use environment variable expansion (${GITHUB_TOKEN}) in .mcp.json for credential management without committing secrets.
  • Tools from all configured MCP servers are discovered at connection time and available simultaneously.
  • MCP resources expose content catalogs (issue summaries, database schemas) to reduce exploratory tool calls.
  • Prefer community MCP servers for standard integrations (Jira, GitHub); build custom servers only for team-specific workflows.

Decision Rules

When: Team needs shared MCP tooling with per-developer credentials

→Use project-scoped .mcp.json with ${ENV_VAR} expansion for tokens; document required vars in README.

When: A developer wants to experiment with a personal MCP server

→Configure it in user-scoped ~/.claude.json so it does not affect teammates.

When: A standard integration exists (GitHub, Jira) and you are considering a custom server

→Use the existing community MCP server; reserve custom implementations for team-specific workflows.

✗ Anti-Patterns to Reject

  • Building custom MCP server wrappers when native env var expansion in .mcp.json already handles credential injection.
  • Having each developer configure the MCP server in user scope instead of using a shared project-scoped .mcp.json.
ts-2.5

Select and apply built-in tools (Read, Write, Edit, Bash, Grep, Glob) effectively

Key Points

  • Grep for content search: finding function names, error messages, import statements within file contents.
  • Glob for path pattern matching: finding files by name or extension (e.g., **/*.test.tsx).
  • Read/Write for full file operations; Edit for targeted modifications using unique text matching.
  • When Edit fails due to non-unique text matches, fall back to Read + Write for reliable file modifications.
  • Build codebase understanding incrementally: Grep to find entry points, then Read to follow imports and trace flows.

Decision Rules

When: You need to find all callers of a specific function across the codebase

→Use Grep to search file contents for the function name.

When: You need to find all test files regardless of directory location

→Use Glob with pattern **/*.test.tsx to match by naming convention.

When: Edit fails because the anchor text appears multiple times in the file

→Use Read to load full contents, then Write the modified version as a fallback.

✗ Anti-Patterns to Reject

  • Reading all files upfront to understand a codebase instead of incrementally tracing from entry points via Grep.
  • Using Bash for file search/content operations when dedicated Grep and Glob tools are available.

Domain 3: Claude Code Configuration & Workflows

20% of exam
ts-3.1

Configure CLAUDE.md files with appropriate hierarchy, scoping, and modular organization

Key Points

  • Hierarchy: user-level (~/.claude/CLAUDE.md), project-level (.claude/CLAUDE.md or root CLAUDE.md), directory-level (subdirectory CLAUDE.md).
  • User-level settings apply only to that user and are NOT shared via version control.
  • Use .claude/rules/ directory for topic-specific rule files as an alternative to a monolithic CLAUDE.md.
  • Use @import syntax to reference external files and keep CLAUDE.md modular.
  • New team members not receiving guidelines? Check if instructions are in user-level (~/) rather than project-level (.claude/).

Decision Rules

When: A guideline must apply to all team members (current and future)

→Place it in project-level .claude/CLAUDE.md or .claude/rules/, NOT in user-level ~/.claude/CLAUDE.md.

When: CLAUDE.md exceeds 400+ lines mixing multiple concerns

→Split into topic-specific files in .claude/rules/ (e.g., testing.md, api-conventions.md).

When: A new team member is not receiving project guidelines

→Verify the guideline exists in project-level config, not just in existing developers' user-level config.

✗ Anti-Patterns to Reject

  • Putting team-wide guidelines in ~/.claude/CLAUDE.md (user-level) instead of project-level, so new members miss them.
  • Using README.md files as instruction sources -- only CLAUDE.md and .claude/rules/ are recognized by Claude Code.
ts-3.2

Create and configure custom slash commands and skills

Key Points

  • Project-scoped commands in .claude/commands/ (shared via version control); user-scoped in ~/.claude/commands/ (personal).
  • Skills in .claude/skills/ with SKILL.md support frontmatter: context: fork, allowed-tools, argument-hint.
  • context: fork runs the skill in an isolated sub-agent context, preventing output from polluting the main conversation.
  • Project skills take precedence over personal skills with the same name; use a different name for personal variants.
  • Skills are on-demand (invoked via slash command); CLAUDE.md is always-loaded for universal standards.

Decision Rules

When: A skill produces verbose output that causes Claude to lose track of the original task

→Add context: fork to the skill's frontmatter to run in an isolated sub-agent context.

When: A developer wants a personal variant of a team skill without affecting teammates

→Create a personal skill in ~/.claude/skills/ with a DIFFERENT name (project skills shadow same-named personal ones).

When: Context is only useful for a specific workflow (e.g., endpoint generation) and not general work

→Create a skill with the exemplar code; invoke on-demand via slash command instead of putting it in CLAUDE.md.

✗ Anti-Patterns to Reject

  • Creating a personal skill with the same name as a project skill -- the project version shadows it.
  • Putting task-specific workflow guidance in CLAUDE.md (always loaded) instead of a skill (on-demand).
ts-3.3

Apply path-specific rules for conditional convention loading

Key Points

  • Use .claude/rules/ files with YAML frontmatter paths field containing glob patterns for conditional rule activation.
  • Path-scoped rules load only when editing matching files, reducing irrelevant context and token usage.
  • Glob patterns apply conventions by file type regardless of directory location (e.g., **/*.test.tsx for all test files).
  • Path-specific rules are better than subdirectory CLAUDE.md files when conventions span multiple directories.

Decision Rules

When: Different coding conventions apply to different file types (React components vs API handlers vs tests)

→Create .claude/rules/ files with YAML frontmatter paths glob patterns for each file type.

When: Test files are spread throughout the codebase alongside source files

→Use path-specific rules with **/*.test.tsx glob rather than subdirectory CLAUDE.md files.

When: You want conventions to apply to terraform files in any directory

→Use paths: ['terraform/**/*'] in rule frontmatter instead of a terraform/CLAUDE.md file.

✗ Anti-Patterns to Reject

  • Relying on Claude to infer which conventions apply by putting all rules in a single root CLAUDE.md.
  • Using subdirectory CLAUDE.md files for cross-cutting concerns like test conventions that span multiple directories.
ts-3.4

Determine when to use plan mode vs direct execution

Key Points

  • Plan mode: complex tasks with multiple valid approaches, architectural decisions, multi-file changes, unfamiliar domains.
  • Direct execution: simple, well-scoped changes with a clear implementation path (e.g., single-file bug fix).
  • The Explore subagent isolates verbose discovery output and returns summaries, preserving main conversation context.
  • Combine plan mode for investigation with direct execution for implementation (e.g., plan migration, then execute).

Decision Rules

When: Task involves ambiguous requirements with multiple valid integration approaches (e.g., adding Slack support)

→Enter plan mode to explore options and architectural implications before implementing.

When: Task is a well-understood change with clear scope (e.g., bug fix with a clear stack trace)

→Use direct execution -- no need for plan mode.

When: Discovery phase generates verbose output that fills the context window

→Use the Explore subagent to isolate verbose output and return a concise summary to the main conversation.

✗ Anti-Patterns to Reject

  • Starting direct execution on an ambiguous architectural task without exploring trade-offs first.
  • Using plan mode for a simple, well-scoped change that has an obvious implementation.
ts-3.5

Apply iterative refinement techniques for progressive improvement

Key Points

  • Concrete input/output examples are the most effective way to communicate transformations when prose is interpreted inconsistently.
  • Test-driven iteration: write test suites first, then iterate by sharing test failures to guide improvement.
  • The interview pattern: have Claude ask questions to surface design considerations before implementing in unfamiliar domains.
  • Address multiple interacting issues in a single message when fixes interact; use sequential iteration for independent issues.

Decision Rules

When: Claude interprets prose requirements differently each iteration, producing inconsistent output structure

→Provide 2-3 concrete input/output examples showing the expected transformation.

When: You are implementing in an unfamiliar domain and want to surface edge cases

→Use the interview pattern: have Claude ask about design considerations before implementing.

When: Multiple bugs interact with each other

→Describe all interacting issues in a single message rather than fixing them sequentially.

✗ Anti-Patterns to Reject

  • Continuing to refine prose descriptions when Claude consistently misinterprets them -- provide examples instead.
  • Fixing interacting bugs one at a time, leading to regressions when each fix invalidates the others.
ts-3.6

Integrate Claude Code into CI/CD pipelines

Key Points

  • Use the -p (or --print) flag for non-interactive mode in automated pipelines -- prevents hanging on interactive input.
  • Use --output-format json with --json-schema for enforced structured output in CI contexts.
  • CLAUDE.md provides project context (testing standards, review criteria) to CI-invoked Claude Code.
  • A second independent Claude instance reviewing code is more effective than self-review -- eliminates confirmation bias.
  • Include prior review findings in context when re-running after new commits to avoid duplicate comments.

Decision Rules

When: Running Claude Code in an automated CI pipeline

→Use the -p flag for non-interactive mode; use --output-format json with --json-schema for structured output.

When: The same Claude session generated code and you need a review

→Use a second, independent Claude instance without access to the generator's reasoning context.

When: Re-running review after developer pushes fixes, and getting duplicate findings on already-fixed code

→Include prior review findings in context, instructing Claude to only report new or still-unaddressed issues.

✗ Anti-Patterns to Reject

  • Running claude without -p flag in CI, causing the job to hang waiting for interactive input.
  • Asking Claude to self-review its own generated code in the same session -- confirmation bias persists.

Domain 4: Prompt Engineering & Structured Output

20% of exam
ts-4.1

Design prompts with explicit criteria to improve precision and reduce false positives

Key Points

  • Explicit criteria ('flag comments only when claimed behavior contradicts actual code') beat vague instructions ('check that comments are accurate').
  • General instructions like 'be conservative' or 'only report high-confidence findings' fail to improve precision.
  • High false positive rates in some categories undermine trust in ALL categories -- developers dismiss everything.
  • Define explicit severity criteria with concrete code examples for each severity level to achieve consistent classification.

Decision Rules

When: Automated review produces high false positive rates that erode developer trust

→Temporarily disable high false-positive categories; keep only high-precision categories while improving prompts.

When: Severity ratings are inconsistent across similar issues

→Add explicit severity criteria with concrete code examples for each level, not general 'be conservative' instructions.

When: A prompt instruction is vague (e.g., 'check comments are accurate')

→Replace with explicit criteria defining exactly what constitutes a problem (e.g., 'flag only when claimed behavior contradicts code').

✗ Anti-Patterns to Reject

  • Adding confidence scores alongside findings and expecting developers to self-triage -- they will not trust self-reported scores.
  • Keeping high false-positive categories enabled while 'improving prompts over the coming weeks' -- trust erodes immediately.
ts-4.2

Apply few-shot prompting to improve output consistency and quality

Key Points

  • Few-shot examples are the most effective technique when detailed instructions alone produce inconsistent results.
  • Target 2-4 examples at ambiguous scenarios showing reasoning for why one action was chosen over alternatives.
  • Few-shot examples enable generalization to novel patterns, not just matching pre-specified cases.
  • For extraction tasks, few-shot examples reduce hallucination by showing how to handle varied document structures.

Decision Rules

When: Detailed format instructions produce variable output quality (sometimes detailed, sometimes vague)

→Add 3-4 few-shot examples showing the exact desired format with issue, location, and specific fix.

When: Agent misroutes between tools on ambiguous requests

→Add 4-6 few-shot examples targeting ambiguous scenarios, each showing reasoning for the tool choice.

When: Agent handles individual concerns well (94%) but fails on multi-concern messages (58%)

→Add few-shot examples demonstrating correct reasoning and tool sequencing for multi-concern requests.

✗ Anti-Patterns to Reject

  • Further refining abstract instructions when instructions have already failed -- examples are more reliable than rules.
  • Grouping few-shot examples by tool instead of showing comparative reasoning across tools for ambiguous cases.
ts-4.3

Enforce structured output using tool use and JSON schemas

Key Points

  • tool_use with JSON schemas is the most reliable approach for guaranteed schema-compliant structured output.
  • tool_choice: 'auto' (may return text), 'any' (must call a tool), forced selection (must call a specific tool).
  • Strict JSON schemas via tool use eliminate syntax errors but do NOT prevent semantic errors (values in wrong fields, line items not summing).
  • Design schema fields as optional (nullable) when source documents may not contain the information, preventing hallucinated values.

Decision Rules

When: You need guaranteed structured output with no JSON syntax errors

→Define an extraction tool with JSON schema as input parameters; extract data from the tool_use response.

When: Multiple extraction schemas exist and the document type is unknown

→Set tool_choice: 'any' to guarantee a tool call while letting the model choose which extraction schema.

When: Source documents may not contain all required fields

→Design those schema fields as optional (nullable) to prevent the model from fabricating values.

✗ Anti-Patterns to Reject

  • Relying on prompt instructions to produce JSON instead of using tool_use for guaranteed schema compliance.
  • Making all schema fields required when source documents may lack the data, causing the model to hallucinate values.
ts-4.4

Implement validation, retry, and feedback loops for extraction quality

Key Points

  • Retry-with-error-feedback: append specific validation errors to the prompt on retry to guide the model toward correction.
  • Retries are ineffective when required information is simply absent from the source document (vs format or structural errors).
  • Track which code constructs trigger findings (detected_pattern field) to enable systematic analysis of dismissal patterns.
  • Semantic validation (values don't sum, wrong field placement) requires separate validation logic -- tool use only prevents syntax errors.

Decision Rules

When: Extraction output has format or structural errors (wrong nesting, bad date format)

→Retry with the original document, the failed extraction, and specific validation errors appended.

When: Required data simply does not exist in the source document

→Do NOT retry -- retries cannot conjure missing information. Accept null/empty or flag for human review.

When: Developers frequently dismiss automated findings and you want to improve accuracy

→Add detected_pattern fields to structured findings to track which constructs produce false positives.

✗ Anti-Patterns to Reject

  • Retrying extraction when the source document does not contain the required information.
  • Using generic retry prompts like 'try again' without including the specific validation errors that triggered the retry.
ts-4.5

Design efficient batch processing strategies

Key Points

  • Message Batches API: 50% cost savings, up to 24-hour processing window, no guaranteed latency SLA.
  • Batch processing is appropriate for non-blocking, latency-tolerant workloads (overnight reports, weekly audits, nightly test generation).
  • The batch API does NOT support multi-turn tool calling within a single request -- breaks iterative workflows.
  • Use custom_id fields for correlating batch request/response pairs and handling failures.

Decision Rules

When: Workflow is latency-sensitive and blocks developers (pre-merge checks)

→Use synchronous API calls, NOT batch processing.

When: Workflow is scheduled and latency-tolerant (overnight reports, weekly audits, nightly test generation)

→Use Message Batches API for 50% cost savings.

When: Workflow requires iterative tool calling (analyze file, request related files, continue analysis)

→Do NOT use batch processing -- it cannot execute tools mid-request and return results.

✗ Anti-Patterns to Reject

  • Using batch processing for blocking pre-merge checks where developers are waiting for results.
  • Attempting to use batch processing for iterative tool-calling workflows that require mid-request tool execution.
ts-4.6

Design multi-instance and multi-pass review architectures

Key Points

  • Self-review limitation: a model retains reasoning context from generation, making it less likely to question its own decisions.
  • Independent review instances (without prior reasoning context) catch subtle issues that self-review and extended thinking miss.
  • Multi-pass review: split into per-file local analysis passes plus cross-file integration passes to avoid attention dilution.
  • Include reasoning and confidence assessments inline with each finding to speed up developer triage.

Decision Rules

When: Claude-generated code has subtle issues that only surface during human peer review

→Use a second, independent Claude instance to review without access to the generator's reasoning.

When: Single-pass review of many files produces inconsistent depth and contradictory feedback

→Split into per-file local passes plus a separate cross-file integration pass.

When: Developers spend too much time investigating each finding to decide if it is real

→Require Claude to include reasoning and confidence assessment inline with each finding.

✗ Anti-Patterns to Reject

  • Asking Claude to self-review its own output in the same session -- confirmation bias means it rationalizes the same way.
  • Using extended thinking as a substitute for independent review -- the same session context still biases the review.

Domain 5: Context Management & Reliability

15% of exam
ts-5.1

Manage conversation context to preserve critical information across long interactions

Key Points

  • Progressive summarization loses precise details: amounts, percentages, dates get condensed into vague phrases.
  • The 'lost in the middle' effect: models reliably process the beginning and end of long inputs but may omit middle sections.
  • Tool results accumulate tokens disproportionate to their relevance (e.g., 40+ fields when only 5 are relevant).
  • Place key findings summaries at the beginning of aggregated inputs; organize detailed results with explicit section headers.

Decision Rules

When: Customer references specific amounts ('the 15% discount I mentioned') that were summarized away

→Extract transactional facts (amounts, dates, order numbers) into a persistent 'case facts' block outside summarized history.

When: Synthesis agent omits critical findings from the middle of 75K+ token aggregated input

→Place a key findings summary at the beginning; organize the rest with explicit section headers.

When: Tool outputs return 40+ fields per lookup when only 5 are relevant

→Trim verbose tool outputs to only relevant fields before they accumulate in context.

✗ Anti-Patterns to Reject

  • Relying on progressive summarization to preserve exact numerical values and dates from early in a conversation.
  • Increasing the summarization threshold (e.g., 70% to 85%) instead of extracting critical facts into a persistent block.
ts-5.2

Design effective escalation and ambiguity resolution patterns

Key Points

  • Appropriate escalation triggers: customer explicitly requests human, policy exceptions/gaps, inability to make meaningful progress.
  • Escalate immediately when customer explicitly demands a human -- do not first attempt investigation.
  • Sentiment-based escalation and self-reported confidence scores are unreliable proxies for actual case complexity.
  • When multiple customer matches are returned, ask for an additional identifier (email, phone, order number) rather than guessing.

Decision Rules

When: Policy is ambiguous or silent on the customer's specific request (e.g., competitor price matching)

→Escalate to a human for policy interpretation -- do not fabricate a policy.

When: get_customer returns multiple matches and the agent guesses wrong 15% of the time

→Instruct the agent to ask for an additional identifier before taking any customer-specific action.

When: The issue is straightforward but the customer explicitly asks for a human agent

→Escalate immediately -- honor the explicit request without attempting to resolve first.

✗ Anti-Patterns to Reject

  • Using heuristics (most recent order, conversational context clues) to guess the right customer from multiple matches.
  • Implementing sentiment analysis or self-reported confidence scores as escalation triggers.
ts-5.3

Implement error propagation strategies across multi-agent systems

Key Points

  • Structured error context (failure type, attempted query, partial results, alternative approaches) enables intelligent coordinator recovery.
  • Distinguish access failures (timeouts needing retry decisions) from valid empty results (successful queries with no matches).
  • Silently suppressing errors (returning empty as success) or terminating on single failures are both anti-patterns.
  • Subagents should handle transient failures locally and only propagate errors they cannot resolve, with partial results.

Decision Rules

When: A subagent encounters a timeout (transient failure)

→Attempt local recovery; if it fails, propagate structured error context (failure type, what was attempted, partial results) to the coordinator.

When: A subagent encounters a corrupted file (permanent failure)

→Return the error with context to the coordinator -- do NOT retry (corruption is permanent).

When: Some source categories succeed while others fail in a multi-source research task

→Proceed with available data; annotate synthesis output with coverage gaps indicating which sources were unavailable.

✗ Anti-Patterns to Reject

  • Returning empty results marked as 'success' when a timeout occurred, hiding the failure from the coordinator.
  • Terminating the entire research workflow when one source fails, discarding all successful results.
ts-5.4

Manage context effectively in large codebase exploration

Key Points

  • Context degradation in extended sessions: models start referencing 'typical patterns' instead of specific classes discovered earlier.
  • Scratchpad files persist key findings across context boundaries, countering degradation.
  • Subagent delegation isolates verbose exploration output while the main agent coordinates high-level understanding.
  • Structured state persistence: each agent exports state to a known location; the coordinator loads a manifest on resume.

Decision Rules

When: Discovery phase generates verbose output that fills the main context window

→Use the Explore subagent or context: fork to isolate verbose output; return a concise summary.

When: Extended exploration session shows signs of context degradation (vague references instead of specifics)

→Have agents maintain scratchpad files recording key findings; use /compact to reduce context usage.

When: Multi-phase task needs to persist findings across context boundaries

→Summarize key findings from one phase before spawning sub-agents for the next; inject summaries into initial context.

✗ Anti-Patterns to Reject

  • Continuing all phases in the main conversation using /compact repeatedly -- lossy compression discards important details.
  • Re-exploring the entire codebase from scratch instead of persisting findings in scratchpad files.
ts-5.5

Design human review workflows and confidence calibration

Key Points

  • Aggregate accuracy metrics (97% overall) may mask poor performance on specific document types or fields.
  • Use stratified random sampling to measure error rates in high-confidence extractions and detect novel patterns.
  • Field-level confidence scores should be calibrated using labeled validation sets for routing review attention.
  • Validate accuracy by document type AND field segment before automating high-confidence extractions.

Decision Rules

When: Overall accuracy is 97% but you suspect some document types perform poorly

→Analyze accuracy by document type and field to identify hidden poor-performing segments.

When: You want to reduce human review overhead on high-confidence extractions

→Implement stratified random sampling of high-confidence outputs; only reduce review after validating by segment.

When: Model outputs field-level confidence scores but they do not correlate with actual accuracy

→Calibrate confidence thresholds using labeled validation sets rather than trusting raw model scores.

✗ Anti-Patterns to Reject

  • Trusting aggregate accuracy metrics without breaking down performance by document type and field.
  • Automating all high-confidence extractions without validating that confidence correlates with actual accuracy per segment.
ts-5.6

Preserve information provenance and handle uncertainty in multi-source synthesis

Key Points

  • Source attribution is lost during summarization if claim-source mappings are not preserved.
  • Conflicting statistics from credible sources should be annotated with source attribution, not arbitrarily resolved.
  • Require publication/collection dates in structured outputs to prevent temporal differences from being misinterpreted as contradictions.
  • Render different content types appropriately: financial data as tables, news as prose, technical findings as structured lists.

Decision Rules

When: Two credible sources report conflicting statistics on a key metric

→Include both values with explicit source attribution; let the coordinator decide how to reconcile before synthesis.

When: Subagent outputs are compressed and downstream agents lose track of which claims came from where

→Require subagents to output structured claim-source mappings (source URLs, document names, excerpts).

When: Data from different time periods appears contradictory

→Require publication/collection dates in structured outputs to enable correct temporal interpretation.

✗ Anti-Patterns to Reject

  • Applying source credibility heuristics to select one value over another -- this oversteps the subagent's role.
  • Converting all content types to a uniform format (e.g., all prose) instead of rendering each type appropriately.