Prompt Engineering & Structured Output

Design prompts with explicit criteria, apply few-shot patterns, enforce structured output via JSON schemas, implement validation loops, design batch processing strategies, and architect multi-instance reviews.

20% of exam6 task statements14 concepts37 questions
Domain Mastery
0%
This domain is 20% of the Claude SA exam — see full Exam Guide

Design prompts with explicit criteria to improve precision and reduce false positives

Writing specific, categorical prompt criteria that improve precision and reduce false positive rates.

Knowledge of:

  • The importance of explicit criteria over vague instructions (e.g., "flag comments only when claimed behavior contradicts actual code behavior" vs "check that comments are accurate")
  • How general instructions like "be conservative" or "only report high-confidence findings" fail to improve precision compared to specific categorical criteria
  • The impact of false positive rates on developer trust: high false positive categories undermine confidence in accurate categories

Skills in:

  • Writing specific review criteria that define which issues to report (bugs, security) versus skip (minor style, local patterns) rather than relying on confidence-based filtering
  • Temporarily disabling high false-positive categories to restore developer trust while improving prompts for those categories
  • Defining explicit severity criteria with concrete code examples for each severity level to achieve consistent classification

Apply few-shot prompting to improve output consistency and quality

Using targeted few-shot examples to achieve consistent formatting, handle ambiguous cases, and reduce hallucination.

Knowledge of:

  • Few-shot examples as the most effective technique for achieving consistently formatted, actionable output when detailed instructions alone produce inconsistent results
  • The role of few-shot examples in demonstrating ambiguous-case handling (e.g., tool selection for ambiguous requests, branch-level test coverage gaps)
  • How few-shot examples enable the model to generalize judgment to novel patterns rather than matching only pre-specified cases
  • The effectiveness of few-shot examples for reducing hallucination in extraction tasks (e.g., handling informal measurements, varied document structures)

Skills in:

  • Creating 2-4 targeted few-shot examples for ambiguous scenarios that show reasoning for why one action was chosen over plausible alternatives
  • Including few-shot examples that demonstrate specific desired output format (issue, severity, suggested fix) to achieve consistency
  • Providing few-shot examples distinguishing acceptable code patterns from genuine issues to reduce false positives while enabling generalization
  • Using few-shot examples to demonstrate correct handling of varied document structures (inline citations vs bibliographies, methodology sections vs embedded details)
  • Adding few-shot examples showing correct extraction from documents with varied formats to address empty/null extraction of required fields

Enforce structured output using tool use and JSON schemas

Using tool_use with JSON schemas for guaranteed structured output, understanding tool_choice options, and schema design.

Knowledge of:

  • Tool use (tool_use) with JSON schemas as the most reliable approach for guaranteed schema-compliant structured output, eliminating JSON syntax errors
  • The distinction between tool_choice: "auto" (model may return text instead of calling a tool), "any" (model must call a tool but can choose which), and forced tool selection (model must call a specific named tool)
  • That strict JSON schemas via tool use eliminate syntax errors but do not prevent semantic errors (e.g., line items that don't sum to total, values in wrong fields)
  • Schema design considerations: required vs optional fields, enum fields with "other" + detail string patterns for extensible categories

Skills in:

  • Defining extraction tools with JSON schemas as input parameters and extracting structured data from the tool_use response
  • Setting tool_choice: "any" to guarantee structured output when multiple extraction schemas exist and the document type is unknown
  • Forcing a specific tool with tool_choice: {"type": "tool", "name": "extract_metadata"} to ensure a particular extraction runs before enrichment steps
  • Designing schema fields as optional (nullable) when source documents may not contain the information, preventing the model from fabricating values to satisfy required fields
  • Adding enum values like "unclear" for ambiguous cases and "other" + detail fields for extensible categorization
  • Including format normalization rules in prompts alongside strict output schemas to handle inconsistent source formatting

Implement validation, retry, and feedback loops for extraction quality

Designing retry-with-error-feedback loops, identifying when retries will succeed vs fail, and building systematic feedback mechanisms.

Knowledge of:

  • Retry-with-error-feedback: appending specific validation errors to the prompt on retry to guide the model toward correction
  • The limits of retry: retries are ineffective when the required information is simply absent from the source document (vs format or structural errors)
  • Feedback loop design: tracking which code constructs trigger findings (detected_pattern field) to enable systematic analysis of dismissal patterns
  • The difference between semantic validation errors (values don't sum, wrong field placement) and schema syntax errors (eliminated by tool use)

Skills in:

  • Implementing follow-up requests that include the original document, the failed extraction, and specific validation errors for model self-correction
  • Identifying when retries will be ineffective (e.g., information exists only in an external document not provided) versus when they will succeed (format mismatches, structural output errors)
  • Adding detected_pattern fields to structured findings to enable analysis of false positive patterns when developers dismiss findings
  • Designing self-correction validation flows: extracting "calculated_total" alongside "stated_total" to flag discrepancies, adding "conflict_detected" booleans for inconsistent source data

Design efficient batch processing strategies

Matching API approach to workflow latency requirements, handling batch failures, and optimizing batch submission.

Knowledge of:

  • The Message Batches API: 50% cost savings, up to 24-hour processing window, no guaranteed latency SLA
  • Batch processing is appropriate for non-blocking, latency-tolerant workloads (overnight reports, weekly audits, nightly test generation) and inappropriate for blocking workflows (pre-merge checks)
  • The batch API does not support multi-turn tool calling within a single request (cannot execute tools mid-request and return results)
  • custom_id fields for correlating batch request/response pairs

Skills in:

  • Matching API approach to workflow latency requirements: synchronous API for blocking pre-merge checks, batch API for overnight/weekly analysis
  • Calculating batch submission frequency based on SLA constraints (e.g., 4-hour windows to guarantee 30-hour SLA with 24-hour batch processing)
  • Handling batch failures: resubmitting only failed documents (identified by custom_id) with appropriate modifications (e.g., chunking documents that exceeded context limits)
  • Using prompt refinement on a sample set before batch-processing large volumes to maximize first-pass success rates and reduce iterative resubmission costs

Design multi-instance and multi-pass review architectures

Using independent review instances and multi-pass strategies to catch issues that self-review misses.

Knowledge of:

  • Self-review limitations: a model retains reasoning context from generation, making it less likely to question its own decisions in the same session
  • Independent review instances (without prior reasoning context) are more effective at catching subtle issues than self-review instructions or extended thinking
  • Multi-pass review: splitting large reviews into per-file local analysis passes plus cross-file integration passes to avoid attention dilution and contradictory findings

Skills in:

  • Using a second independent Claude instance to review generated code without the generator's reasoning context
  • Splitting large multi-file reviews into focused per-file passes for local issues plus separate integration passes for cross-file data flow analysis
  • Running verification passes where the model self-reports confidence alongside each finding to enable calibrated review routing