Domain 1: Agentic Architecture & Orchestration

27% of exam

ts-1.1

Design and implement agentic loops for autonomous task execution

Key Points

The agentic loop lifecycle: send request, inspect stop_reason, execute tools, append results, repeat until end_turn.
stop_reason is the sole authoritative signal for loop control -- not text parsing, not iteration counts.
Tool results must be appended to conversation history so Claude can reason about the next action.
Model-driven tool selection (Claude decides which tool based on context) is the default; pre-configured sequences are for strict compliance.
Each iteration should include the full conversation context so Claude maintains coherent reasoning.

Decision Rules

When: stop_reason === 'tool_use'

→Execute the requested tool(s), append results to messages, and call Claude again.

When: stop_reason === 'end_turn'

→Terminate the loop and present the final response to the user.

When: You need a safety guardrail against runaway loops

→Add a max iteration count as a backstop, but keep stop_reason as the primary control signal.

✗ Anti-Patterns to Reject

Parsing response text for phrases like 'I've completed' to determine loop termination instead of using stop_reason.
Using an arbitrary iteration cap as the primary stopping mechanism rather than a safety backstop.

ts-1.2

Orchestrate multi-agent systems with coordinator-subagent patterns

Key Points

Hub-and-spoke: coordinator manages all inter-subagent communication, error handling, and information routing.
Subagents operate with isolated context -- they do NOT inherit the coordinator's conversation history.
The coordinator is responsible for task decomposition, delegation, result aggregation, and deciding which subagents to invoke.
Overly narrow task decomposition by the coordinator leads to incomplete coverage of broad topics.
Route all communication through the coordinator for observability, consistent error handling, and controlled information flow.

Decision Rules

When: Multiple specialized capabilities are needed (search, analysis, synthesis)

→Use coordinator-subagent pattern; coordinator delegates to specialized agents and aggregates results.

When: A subagent's output needs to reach another subagent

→Route through the coordinator -- never allow direct agent-to-agent communication.

When: Research output is missing entire topic areas

→Check the coordinator's task decomposition first -- it likely defined subtasks too narrowly.

✗ Anti-Patterns to Reject

Allowing direct agent-to-agent communication that bypasses the coordinator, breaking observability and error handling.
Having the coordinator always route through the full pipeline instead of dynamically selecting which subagents to invoke.

ts-1.3

Configure subagent invocation, context passing, and spawning

Key Points

The Task tool is the mechanism for spawning subagents; allowedTools must include 'Task' for the coordinator.
Subagent context must be explicitly provided in the prompt -- subagents do NOT automatically inherit parent context.
AgentDefinition configures descriptions, system prompts, and tool restrictions per subagent type.
Use fork-based session management to explore divergent approaches from a shared analysis baseline.
Spawn parallel subagents by emitting multiple Task tool calls in a single coordinator response.

Decision Rules

When: A subagent needs data from a prior agent's output

→Include the complete findings directly in the subagent's prompt via the coordinator.

When: You need parallel research across multiple source types

→Emit multiple Task tool calls in a single coordinator turn to spawn parallel subagents.

When: Coordinator prompts lead to rigid subagent behavior

→Specify research goals and quality criteria rather than step-by-step procedural instructions.

✗ Anti-Patterns to Reject

Assuming subagents inherit the coordinator's context or share memory between invocations.
Writing step-by-step procedural coordinator prompts instead of goal-oriented ones that allow subagent adaptability.

ts-1.4

Implement multi-step workflows with enforcement and handoff patterns

Key Points

Programmatic enforcement (hooks, prerequisite gates) provides deterministic guarantees; prompt instructions are probabilistic.
When deterministic compliance is required (e.g., identity verification before financial ops), prompts alone have a non-zero failure rate.
For multi-concern requests, decompose into distinct items, investigate each in parallel using shared context, then synthesize.
Structured handoff summaries (customer ID, root cause, refund amount, recommended action) are essential for human escalation.

Decision Rules

When: A specific tool sequence is required for critical business logic (e.g., verify customer before refund)

→Use programmatic prerequisites that block downstream tools until prior steps complete.

When: Customer sends a multi-concern message

→Decompose into distinct concerns, investigate in parallel with shared context, then synthesize a unified resolution.

When: Agent escalates to a human who lacks access to the conversation transcript

→Compile a structured handoff summary with customer ID, root cause, amounts, and recommended action.

✗ Anti-Patterns to Reject

Relying solely on prompt instructions to enforce required tool ordering for operations with financial consequences.
Processing multiple customer concerns sequentially, re-fetching shared context for each one.

ts-1.5

Apply Agent SDK hooks for tool call interception and data normalization

Key Points

PostToolUse hooks intercept tool results for transformation BEFORE the model processes them.
Hook patterns can also intercept outgoing tool calls to enforce compliance rules (e.g., block refunds above a threshold).
Hooks provide deterministic guarantees; prompt instructions provide only probabilistic compliance.
Use PostToolUse to normalize heterogeneous data formats: Unix timestamps, ISO 8601, numeric status codes.

Decision Rules

When: Tools return heterogeneous formats (Unix timestamps, ISO dates, numeric codes) and the agent misinterprets them

→Implement a PostToolUse hook to normalize all outputs before agent processing.

When: Business rules require guaranteed compliance (e.g., refunds > $500 must be escalated)

→Use a hook to intercept and block policy-violating tool calls, redirecting to the appropriate workflow.

When: Third-party MCP tools return data you cannot modify at the source

→Use PostToolUse hooks as a centralized normalization layer rather than prompt instructions.

✗ Anti-Patterns to Reject

Adding format documentation to the system prompt instead of using hooks when deterministic normalization is required.
Creating a separate normalize_data tool the agent must remember to call, instead of automatic hook-based transformation.

ts-1.6

Design task decomposition strategies for complex workflows

Key Points

Use fixed sequential pipelines (prompt chaining) for predictable multi-aspect reviews; dynamic decomposition for open-ended investigation.
Splitting large reviews into per-file local analysis plus a separate cross-file integration pass avoids attention dilution.
Adaptive investigation plans generate subtasks based on what is discovered at each step.
For open-ended tasks, first map the structure, identify high-impact areas, then create a prioritized plan.

Decision Rules

When: A single-pass review of 14+ files produces inconsistent depth and contradictory findings

→Split into per-file analysis passes plus a separate cross-file integration pass.

When: The task is predictable with known steps (e.g., multi-aspect code review)

→Use prompt chaining: a fixed sequential pipeline.

When: The task is exploratory with unknown scope (e.g., 'add tests to a legacy codebase')

→Use dynamic decomposition: map first, identify high-impact areas, then create a prioritized adaptive plan.

✗ Anti-Patterns to Reject

Reviewing all files in a large PR in a single pass, leading to attention dilution and contradictory feedback.
Using a fixed pipeline for an open-ended investigation task where subtasks depend on intermediate findings.

ts-1.7

Manage session state, resumption, and forking

Key Points

Use --resume <session-name> to continue named investigation sessions across work sessions.
fork_session creates independent branches from a shared analysis baseline for exploring divergent approaches.
When resuming after code modifications, inform the agent about specific file changes for targeted re-analysis.
Starting a new session with a structured summary is more reliable than resuming with stale tool results.

Decision Rules

When: Prior context is mostly valid and you want to continue an investigation

→Use --resume with the session name; inform Claude about any file changes since last session.

When: Prior tool results are stale (significant code changes since last session)

→Start a new session with an injected summary of prior findings instead of resuming.

When: You want to compare two refactoring approaches from the same analysis baseline

→Use fork_session to create parallel exploration branches.

✗ Anti-Patterns to Reject

Resuming a session after significant code changes without informing the agent, leading to stale context reasoning.
Re-exploring the entire codebase from scratch instead of informing a resumed session about targeted changes.

Domain 2: Tool Design & MCP Integration

18% of exam

ts-2.1

Design effective tool interfaces with clear descriptions and boundaries

Key Points

Tool descriptions are the PRIMARY mechanism LLMs use for tool selection -- minimal descriptions lead to unreliable selection.
Include input formats, example queries, edge cases, and boundaries explaining when to use a tool vs similar alternatives.
Ambiguous or overlapping descriptions (e.g., analyze_content vs analyze_document) cause misrouting.
Keyword-sensitive system prompt instructions can override well-written tool descriptions, creating unintended tool associations.
Rename tools and update descriptions to eliminate functional overlap (e.g., analyze_content -> extract_web_results).

Decision Rules

When: Agent consistently selects the wrong tool among similar options

→Review and expand tool descriptions FIRST -- include input formats, example queries, and boundary explanations.

When: Two tools have near-identical names/descriptions causing misrouting

→Rename the tools and rewrite descriptions to clearly distinguish each tool's purpose.

When: Tool descriptions are clear but the agent still misroutes based on keywords like 'account'

→Review the system prompt for keyword-sensitive instructions that create unintended tool associations.

✗ Anti-Patterns to Reject

Writing minimal descriptions like 'Retrieves customer information' without specifying inputs, outputs, or boundaries.
Adding a routing layer or classifier as the first step instead of improving tool descriptions.

ts-2.2

Implement structured error responses for MCP tools

Key Points

Use the MCP isError flag to communicate tool failures back to the agent.
Distinguish error categories: transient (timeouts), validation (bad input), business (policy violations), permission errors.
Return structured metadata: errorCategory, isRetryable boolean, and human-readable descriptions.
Uniform 'Operation failed' errors prevent the agent from making appropriate recovery decisions.
Distinguish access failures (needing retries) from valid empty results (successful queries with no matches).

Decision Rules

When: A tool encounters a transient failure (timeout, service unavailable)

→Return isError: true with errorCategory: 'transient', isRetryable: true, and what was attempted.

When: A business rule is violated (e.g., refund exceeds policy limit)

→Return isError: true with errorCategory: 'business', isRetryable: false, and a customer-friendly explanation.

When: A query returns zero results but executed successfully

→Return a success response (isError: false) with empty results -- do NOT treat this as an error.

✗ Anti-Patterns to Reject

Returning generic 'Operation failed' for all error types, preventing intelligent agent recovery decisions.
Treating valid empty results (0 matches) the same as access failures (timeouts), causing unnecessary retries.

ts-2.3

Distribute tools appropriately across agents and configure tool choice

Key Points

Too many tools (e.g., 18 instead of 4-5) degrades tool selection reliability by increasing decision complexity.
Agents with tools outside their specialization tend to misuse them (e.g., synthesis agent doing web searches).
Apply principle of least privilege: give each agent only tools needed for its role, plus limited cross-role tools for high-frequency needs.
tool_choice options: 'auto' (default), 'any' (must call a tool), forced selection ({'type': 'tool', 'name': '...'}).

Decision Rules

When: A specialized agent misuses tools outside its role (e.g., doc analysis agent doing web searches)

→Replace generic tools with purpose-specific constrained alternatives (e.g., fetch_url -> load_document).

When: 85% of a subagent's verification needs are simple fact-checks with 15% complex

→Give a scoped verify_fact tool for simple lookups; route complex cases through the coordinator.

When: You need to guarantee the model calls a specific tool first in a sequence

→Use tool_choice: {'type': 'tool', 'name': 'extract_metadata'} for the first turn, then switch to 'auto'.

✗ Anti-Patterns to Reject

Giving all agents access to all tools, leading to cross-specialization misuse and unreliable selection.
Giving the synthesis agent full web search tools when a scoped verify_fact tool handles 85% of its needs.

ts-2.4

Integrate MCP servers into Claude Code and agent workflows

Key Points

Project-scoped .mcp.json for shared team tooling; user-scoped ~/.claude.json for personal/experimental servers.
Use environment variable expansion (${GITHUB_TOKEN}) in .mcp.json for credential management without committing secrets.
Tools from all configured MCP servers are discovered at connection time and available simultaneously.
MCP resources expose content catalogs (issue summaries, database schemas) to reduce exploratory tool calls.
Prefer community MCP servers for standard integrations (Jira, GitHub); build custom servers only for team-specific workflows.

Decision Rules

When: Team needs shared MCP tooling with per-developer credentials

→Use project-scoped .mcp.json with ${ENV_VAR} expansion for tokens; document required vars in README.

When: A developer wants to experiment with a personal MCP server

→Configure it in user-scoped ~/.claude.json so it does not affect teammates.

When: A standard integration exists (GitHub, Jira) and you are considering a custom server

→Use the existing community MCP server; reserve custom implementations for team-specific workflows.

✗ Anti-Patterns to Reject

Building custom MCP server wrappers when native env var expansion in .mcp.json already handles credential injection.
Having each developer configure the MCP server in user scope instead of using a shared project-scoped .mcp.json.

ts-2.5

Select and apply built-in tools (Read, Write, Edit, Bash, Grep, Glob) effectively

Key Points

Grep for content search: finding function names, error messages, import statements within file contents.
Glob for path pattern matching: finding files by name or extension (e.g., **/*.test.tsx).
Read/Write for full file operations; Edit for targeted modifications using unique text matching.
When Edit fails due to non-unique text matches, fall back to Read + Write for reliable file modifications.
Build codebase understanding incrementally: Grep to find entry points, then Read to follow imports and trace flows.

Decision Rules

When: You need to find all callers of a specific function across the codebase

→Use Grep to search file contents for the function name.

When: You need to find all test files regardless of directory location

→Use Glob with pattern **/*.test.tsx to match by naming convention.

When: Edit fails because the anchor text appears multiple times in the file

→Use Read to load full contents, then Write the modified version as a fallback.

✗ Anti-Patterns to Reject

Reading all files upfront to understand a codebase instead of incrementally tracing from entry points via Grep.
Using Bash for file search/content operations when dedicated Grep and Glob tools are available.

Domain 3: Claude Code Configuration & Workflows

20% of exam

ts-3.1

Configure CLAUDE.md files with appropriate hierarchy, scoping, and modular organization

Key Points

Hierarchy: user-level (~/.claude/CLAUDE.md), project-level (.claude/CLAUDE.md or root CLAUDE.md), directory-level (subdirectory CLAUDE.md).
User-level settings apply only to that user and are NOT shared via version control.
Use .claude/rules/ directory for topic-specific rule files as an alternative to a monolithic CLAUDE.md.
Use @import syntax to reference external files and keep CLAUDE.md modular.
New team members not receiving guidelines? Check if instructions are in user-level (~/) rather than project-level (.claude/).

Decision Rules

When: A guideline must apply to all team members (current and future)

→Place it in project-level .claude/CLAUDE.md or .claude/rules/, NOT in user-level ~/.claude/CLAUDE.md.

When: CLAUDE.md exceeds 400+ lines mixing multiple concerns

→Split into topic-specific files in .claude/rules/ (e.g., testing.md, api-conventions.md).

When: A new team member is not receiving project guidelines

→Verify the guideline exists in project-level config, not just in existing developers' user-level config.

✗ Anti-Patterns to Reject

Putting team-wide guidelines in ~/.claude/CLAUDE.md (user-level) instead of project-level, so new members miss them.
Using README.md files as instruction sources -- only CLAUDE.md and .claude/rules/ are recognized by Claude Code.

ts-3.2

Create and configure custom slash commands and skills

Key Points

Project-scoped commands in .claude/commands/ (shared via version control); user-scoped in ~/.claude/commands/ (personal).
Skills in .claude/skills/ with SKILL.md support frontmatter: context: fork, allowed-tools, argument-hint.
context: fork runs the skill in an isolated sub-agent context, preventing output from polluting the main conversation.
Project skills take precedence over personal skills with the same name; use a different name for personal variants.
Skills are on-demand (invoked via slash command); CLAUDE.md is always-loaded for universal standards.

Decision Rules

When: A skill produces verbose output that causes Claude to lose track of the original task

→Add context: fork to the skill's frontmatter to run in an isolated sub-agent context.

When: A developer wants a personal variant of a team skill without affecting teammates

→Create a personal skill in ~/.claude/skills/ with a DIFFERENT name (project skills shadow same-named personal ones).

When: Context is only useful for a specific workflow (e.g., endpoint generation) and not general work

→Create a skill with the exemplar code; invoke on-demand via slash command instead of putting it in CLAUDE.md.

✗ Anti-Patterns to Reject

Creating a personal skill with the same name as a project skill -- the project version shadows it.
Putting task-specific workflow guidance in CLAUDE.md (always loaded) instead of a skill (on-demand).

ts-3.3

Apply path-specific rules for conditional convention loading

Key Points

Use .claude/rules/ files with YAML frontmatter paths field containing glob patterns for conditional rule activation.
Path-scoped rules load only when editing matching files, reducing irrelevant context and token usage.
Glob patterns apply conventions by file type regardless of directory location (e.g., **/*.test.tsx for all test files).
Path-specific rules are better than subdirectory CLAUDE.md files when conventions span multiple directories.

Decision Rules

When: Different coding conventions apply to different file types (React components vs API handlers vs tests)

→Create .claude/rules/ files with YAML frontmatter paths glob patterns for each file type.

When: Test files are spread throughout the codebase alongside source files

→Use path-specific rules with **/*.test.tsx glob rather than subdirectory CLAUDE.md files.

When: You want conventions to apply to terraform files in any directory

→Use paths: ['terraform/**/*'] in rule frontmatter instead of a terraform/CLAUDE.md file.

✗ Anti-Patterns to Reject

Relying on Claude to infer which conventions apply by putting all rules in a single root CLAUDE.md.
Using subdirectory CLAUDE.md files for cross-cutting concerns like test conventions that span multiple directories.

ts-3.4

Determine when to use plan mode vs direct execution

Key Points

Plan mode: complex tasks with multiple valid approaches, architectural decisions, multi-file changes, unfamiliar domains.
Direct execution: simple, well-scoped changes with a clear implementation path (e.g., single-file bug fix).
The Explore subagent isolates verbose discovery output and returns summaries, preserving main conversation context.
Combine plan mode for investigation with direct execution for implementation (e.g., plan migration, then execute).

Decision Rules

When: Task involves ambiguous requirements with multiple valid integration approaches (e.g., adding Slack support)

→Enter plan mode to explore options and architectural implications before implementing.

When: Task is a well-understood change with clear scope (e.g., bug fix with a clear stack trace)

→Use direct execution -- no need for plan mode.

When: Discovery phase generates verbose output that fills the context window

→Use the Explore subagent to isolate verbose output and return a concise summary to the main conversation.

✗ Anti-Patterns to Reject

Starting direct execution on an ambiguous architectural task without exploring trade-offs first.
Using plan mode for a simple, well-scoped change that has an obvious implementation.

ts-3.5

Apply iterative refinement techniques for progressive improvement

Key Points

Concrete input/output examples are the most effective way to communicate transformations when prose is interpreted inconsistently.
Test-driven iteration: write test suites first, then iterate by sharing test failures to guide improvement.
The interview pattern: have Claude ask questions to surface design considerations before implementing in unfamiliar domains.
Address multiple interacting issues in a single message when fixes interact; use sequential iteration for independent issues.

Decision Rules

When: Claude interprets prose requirements differently each iteration, producing inconsistent output structure

→Provide 2-3 concrete input/output examples showing the expected transformation.

When: You are implementing in an unfamiliar domain and want to surface edge cases

→Use the interview pattern: have Claude ask about design considerations before implementing.

When: Multiple bugs interact with each other

→Describe all interacting issues in a single message rather than fixing them sequentially.

✗ Anti-Patterns to Reject

Continuing to refine prose descriptions when Claude consistently misinterprets them -- provide examples instead.
Fixing interacting bugs one at a time, leading to regressions when each fix invalidates the others.

ts-3.6

Integrate Claude Code into CI/CD pipelines

Key Points

Use the -p (or --print) flag for non-interactive mode in automated pipelines -- prevents hanging on interactive input.
Use --output-format json with --json-schema for enforced structured output in CI contexts.
CLAUDE.md provides project context (testing standards, review criteria) to CI-invoked Claude Code.
A second independent Claude instance reviewing code is more effective than self-review -- eliminates confirmation bias.
Include prior review findings in context when re-running after new commits to avoid duplicate comments.

Decision Rules

When: Running Claude Code in an automated CI pipeline

→Use the -p flag for non-interactive mode; use --output-format json with --json-schema for structured output.

When: The same Claude session generated code and you need a review

→Use a second, independent Claude instance without access to the generator's reasoning context.

When: Re-running review after developer pushes fixes, and getting duplicate findings on already-fixed code

→Include prior review findings in context, instructing Claude to only report new or still-unaddressed issues.

✗ Anti-Patterns to Reject

Running claude without -p flag in CI, causing the job to hang waiting for interactive input.
Asking Claude to self-review its own generated code in the same session -- confirmation bias persists.

Domain 4: Prompt Engineering & Structured Output

20% of exam

ts-4.1

Design prompts with explicit criteria to improve precision and reduce false positives

Key Points

Explicit criteria ('flag comments only when claimed behavior contradicts actual code') beat vague instructions ('check that comments are accurate').
General instructions like 'be conservative' or 'only report high-confidence findings' fail to improve precision.
High false positive rates in some categories undermine trust in ALL categories -- developers dismiss everything.
Define explicit severity criteria with concrete code examples for each severity level to achieve consistent classification.

Decision Rules

When: Automated review produces high false positive rates that erode developer trust

→Temporarily disable high false-positive categories; keep only high-precision categories while improving prompts.

When: Severity ratings are inconsistent across similar issues

→Add explicit severity criteria with concrete code examples for each level, not general 'be conservative' instructions.

When: A prompt instruction is vague (e.g., 'check comments are accurate')

→Replace with explicit criteria defining exactly what constitutes a problem (e.g., 'flag only when claimed behavior contradicts code').

✗ Anti-Patterns to Reject

Adding confidence scores alongside findings and expecting developers to self-triage -- they will not trust self-reported scores.
Keeping high false-positive categories enabled while 'improving prompts over the coming weeks' -- trust erodes immediately.

ts-4.2

Apply few-shot prompting to improve output consistency and quality

Key Points

Few-shot examples are the most effective technique when detailed instructions alone produce inconsistent results.
Target 2-4 examples at ambiguous scenarios showing reasoning for why one action was chosen over alternatives.
Few-shot examples enable generalization to novel patterns, not just matching pre-specified cases.
For extraction tasks, few-shot examples reduce hallucination by showing how to handle varied document structures.

Decision Rules

When: Detailed format instructions produce variable output quality (sometimes detailed, sometimes vague)

→Add 3-4 few-shot examples showing the exact desired format with issue, location, and specific fix.

When: Agent misroutes between tools on ambiguous requests

→Add 4-6 few-shot examples targeting ambiguous scenarios, each showing reasoning for the tool choice.

When: Agent handles individual concerns well (94%) but fails on multi-concern messages (58%)

→Add few-shot examples demonstrating correct reasoning and tool sequencing for multi-concern requests.

✗ Anti-Patterns to Reject

Further refining abstract instructions when instructions have already failed -- examples are more reliable than rules.
Grouping few-shot examples by tool instead of showing comparative reasoning across tools for ambiguous cases.

ts-4.3

Enforce structured output using tool use and JSON schemas

Key Points

tool_use with JSON schemas is the most reliable approach for guaranteed schema-compliant structured output.
tool_choice: 'auto' (may return text), 'any' (must call a tool), forced selection (must call a specific tool).
Strict JSON schemas via tool use eliminate syntax errors but do NOT prevent semantic errors (values in wrong fields, line items not summing).
Design schema fields as optional (nullable) when source documents may not contain the information, preventing hallucinated values.

Decision Rules

When: You need guaranteed structured output with no JSON syntax errors

→Define an extraction tool with JSON schema as input parameters; extract data from the tool_use response.

When: Multiple extraction schemas exist and the document type is unknown

→Set tool_choice: 'any' to guarantee a tool call while letting the model choose which extraction schema.

When: Source documents may not contain all required fields

→Design those schema fields as optional (nullable) to prevent the model from fabricating values.

✗ Anti-Patterns to Reject

Relying on prompt instructions to produce JSON instead of using tool_use for guaranteed schema compliance.
Making all schema fields required when source documents may lack the data, causing the model to hallucinate values.

ts-4.4

Implement validation, retry, and feedback loops for extraction quality

Key Points

Retry-with-error-feedback: append specific validation errors to the prompt on retry to guide the model toward correction.
Retries are ineffective when required information is simply absent from the source document (vs format or structural errors).
Track which code constructs trigger findings (detected_pattern field) to enable systematic analysis of dismissal patterns.
Semantic validation (values don't sum, wrong field placement) requires separate validation logic -- tool use only prevents syntax errors.

Decision Rules

When: Extraction output has format or structural errors (wrong nesting, bad date format)

→Retry with the original document, the failed extraction, and specific validation errors appended.

When: Required data simply does not exist in the source document

→Do NOT retry -- retries cannot conjure missing information. Accept null/empty or flag for human review.

When: Developers frequently dismiss automated findings and you want to improve accuracy

→Add detected_pattern fields to structured findings to track which constructs produce false positives.

✗ Anti-Patterns to Reject

Retrying extraction when the source document does not contain the required information.
Using generic retry prompts like 'try again' without including the specific validation errors that triggered the retry.

ts-4.5

Design efficient batch processing strategies

Key Points

Message Batches API: 50% cost savings, up to 24-hour processing window, no guaranteed latency SLA.
Batch processing is appropriate for non-blocking, latency-tolerant workloads (overnight reports, weekly audits, nightly test generation).
The batch API does NOT support multi-turn tool calling within a single request -- breaks iterative workflows.
Use custom_id fields for correlating batch request/response pairs and handling failures.

Decision Rules

When: Workflow is latency-sensitive and blocks developers (pre-merge checks)

→Use synchronous API calls, NOT batch processing.

When: Workflow is scheduled and latency-tolerant (overnight reports, weekly audits, nightly test generation)

→Use Message Batches API for 50% cost savings.

When: Workflow requires iterative tool calling (analyze file, request related files, continue analysis)

→Do NOT use batch processing -- it cannot execute tools mid-request and return results.

✗ Anti-Patterns to Reject

Using batch processing for blocking pre-merge checks where developers are waiting for results.
Attempting to use batch processing for iterative tool-calling workflows that require mid-request tool execution.

ts-4.6

Design multi-instance and multi-pass review architectures

Key Points

Self-review limitation: a model retains reasoning context from generation, making it less likely to question its own decisions.
Independent review instances (without prior reasoning context) catch subtle issues that self-review and extended thinking miss.
Multi-pass review: split into per-file local analysis passes plus cross-file integration passes to avoid attention dilution.
Include reasoning and confidence assessments inline with each finding to speed up developer triage.

Decision Rules

When: Claude-generated code has subtle issues that only surface during human peer review

→Use a second, independent Claude instance to review without access to the generator's reasoning.

When: Single-pass review of many files produces inconsistent depth and contradictory feedback

→Split into per-file local passes plus a separate cross-file integration pass.

When: Developers spend too much time investigating each finding to decide if it is real

→Require Claude to include reasoning and confidence assessment inline with each finding.

✗ Anti-Patterns to Reject

Asking Claude to self-review its own output in the same session -- confirmation bias means it rationalizes the same way.
Using extended thinking as a substitute for independent review -- the same session context still biases the review.

Domain 5: Context Management & Reliability

15% of exam

ts-5.1

Manage conversation context to preserve critical information across long interactions

Key Points

Progressive summarization loses precise details: amounts, percentages, dates get condensed into vague phrases.
The 'lost in the middle' effect: models reliably process the beginning and end of long inputs but may omit middle sections.
Tool results accumulate tokens disproportionate to their relevance (e.g., 40+ fields when only 5 are relevant).
Place key findings summaries at the beginning of aggregated inputs; organize detailed results with explicit section headers.

Decision Rules

When: Customer references specific amounts ('the 15% discount I mentioned') that were summarized away

→Extract transactional facts (amounts, dates, order numbers) into a persistent 'case facts' block outside summarized history.

When: Synthesis agent omits critical findings from the middle of 75K+ token aggregated input

→Place a key findings summary at the beginning; organize the rest with explicit section headers.

When: Tool outputs return 40+ fields per lookup when only 5 are relevant

→Trim verbose tool outputs to only relevant fields before they accumulate in context.

✗ Anti-Patterns to Reject

Relying on progressive summarization to preserve exact numerical values and dates from early in a conversation.
Increasing the summarization threshold (e.g., 70% to 85%) instead of extracting critical facts into a persistent block.

ts-5.2

Design effective escalation and ambiguity resolution patterns

Key Points

Appropriate escalation triggers: customer explicitly requests human, policy exceptions/gaps, inability to make meaningful progress.
Escalate immediately when customer explicitly demands a human -- do not first attempt investigation.
Sentiment-based escalation and self-reported confidence scores are unreliable proxies for actual case complexity.
When multiple customer matches are returned, ask for an additional identifier (email, phone, order number) rather than guessing.

Decision Rules

When: Policy is ambiguous or silent on the customer's specific request (e.g., competitor price matching)

→Escalate to a human for policy interpretation -- do not fabricate a policy.

When: get_customer returns multiple matches and the agent guesses wrong 15% of the time

→Instruct the agent to ask for an additional identifier before taking any customer-specific action.

When: The issue is straightforward but the customer explicitly asks for a human agent

→Escalate immediately -- honor the explicit request without attempting to resolve first.

✗ Anti-Patterns to Reject

Using heuristics (most recent order, conversational context clues) to guess the right customer from multiple matches.
Implementing sentiment analysis or self-reported confidence scores as escalation triggers.

ts-5.3

Implement error propagation strategies across multi-agent systems

Key Points

Structured error context (failure type, attempted query, partial results, alternative approaches) enables intelligent coordinator recovery.
Distinguish access failures (timeouts needing retry decisions) from valid empty results (successful queries with no matches).
Silently suppressing errors (returning empty as success) or terminating on single failures are both anti-patterns.
Subagents should handle transient failures locally and only propagate errors they cannot resolve, with partial results.

Decision Rules

When: A subagent encounters a timeout (transient failure)

→Attempt local recovery; if it fails, propagate structured error context (failure type, what was attempted, partial results) to the coordinator.

When: A subagent encounters a corrupted file (permanent failure)

→Return the error with context to the coordinator -- do NOT retry (corruption is permanent).

When: Some source categories succeed while others fail in a multi-source research task

→Proceed with available data; annotate synthesis output with coverage gaps indicating which sources were unavailable.

✗ Anti-Patterns to Reject

Returning empty results marked as 'success' when a timeout occurred, hiding the failure from the coordinator.
Terminating the entire research workflow when one source fails, discarding all successful results.

ts-5.4

Manage context effectively in large codebase exploration

Key Points

Context degradation in extended sessions: models start referencing 'typical patterns' instead of specific classes discovered earlier.
Scratchpad files persist key findings across context boundaries, countering degradation.
Subagent delegation isolates verbose exploration output while the main agent coordinates high-level understanding.
Structured state persistence: each agent exports state to a known location; the coordinator loads a manifest on resume.

Decision Rules

When: Discovery phase generates verbose output that fills the main context window

→Use the Explore subagent or context: fork to isolate verbose output; return a concise summary.

When: Extended exploration session shows signs of context degradation (vague references instead of specifics)

→Have agents maintain scratchpad files recording key findings; use /compact to reduce context usage.

When: Multi-phase task needs to persist findings across context boundaries

→Summarize key findings from one phase before spawning sub-agents for the next; inject summaries into initial context.

✗ Anti-Patterns to Reject

Continuing all phases in the main conversation using /compact repeatedly -- lossy compression discards important details.
Re-exploring the entire codebase from scratch instead of persisting findings in scratchpad files.

ts-5.5

Design human review workflows and confidence calibration

Key Points

Aggregate accuracy metrics (97% overall) may mask poor performance on specific document types or fields.
Use stratified random sampling to measure error rates in high-confidence extractions and detect novel patterns.
Field-level confidence scores should be calibrated using labeled validation sets for routing review attention.
Validate accuracy by document type AND field segment before automating high-confidence extractions.

Decision Rules

When: Overall accuracy is 97% but you suspect some document types perform poorly

→Analyze accuracy by document type and field to identify hidden poor-performing segments.

When: You want to reduce human review overhead on high-confidence extractions

→Implement stratified random sampling of high-confidence outputs; only reduce review after validating by segment.

When: Model outputs field-level confidence scores but they do not correlate with actual accuracy

→Calibrate confidence thresholds using labeled validation sets rather than trusting raw model scores.

✗ Anti-Patterns to Reject

Trusting aggregate accuracy metrics without breaking down performance by document type and field.
Automating all high-confidence extractions without validating that confidence correlates with actual accuracy per segment.

ts-5.6

Preserve information provenance and handle uncertainty in multi-source synthesis

Key Points

Source attribution is lost during summarization if claim-source mappings are not preserved.
Conflicting statistics from credible sources should be annotated with source attribution, not arbitrarily resolved.
Require publication/collection dates in structured outputs to prevent temporal differences from being misinterpreted as contradictions.
Render different content types appropriately: financial data as tables, news as prose, technical findings as structured lists.

Decision Rules

When: Two credible sources report conflicting statistics on a key metric

→Include both values with explicit source attribution; let the coordinator decide how to reconcile before synthesis.

When: Subagent outputs are compressed and downstream agents lose track of which claims came from where

→Require subagents to output structured claim-source mappings (source URLs, document names, excerpts).

When: Data from different time periods appears contradictory

→Require publication/collection dates in structured outputs to enable correct temporal interpretation.

✗ Anti-Patterns to Reject

Applying source credibility heuristics to select one value over another -- this oversteps the subagent's role.
Converting all content types to a uniform format (e.g., all prose) instead of rendering each type appropriately.