Prompt Engineering & Structured Output

Design prompts with explicit criteria, apply few-shot patterns, enforce structured output via JSON schemas, implement validation loops, design batch processing strategies, and architect multi-instance reviews.

20% of exam6 task statements14 concepts37 questions

Domain Mastery

This domain is 20% of the Claude SA exam — see full Exam Guide

Design prompts with explicit criteria to improve precision and reduce false positives

Writing specific, categorical prompt criteria that improve precision and reduce false positive rates.

Knowledge of:

The importance of explicit criteria over vague instructions (e.g., "flag comments only when claimed behavior contradicts actual code behavior" vs "check that comments are accurate")
How general instructions like "be conservative" or "only report high-confidence findings" fail to improve precision compared to specific categorical criteria
The impact of false positive rates on developer trust: high false positive categories undermine confidence in accurate categories

Skills in:

Writing specific review criteria that define which issues to report (bugs, security) versus skip (minor style, local patterns) rather than relying on confidence-based filtering
Temporarily disabling high false-positive categories to restore developer trust while improving prompts for those categories
Defining explicit severity criteria with concrete code examples for each severity level to achieve consistent classification

Explicit Criteria over Vague Instructions

✎Core

Replace vague goals with specific, categorical criteria the model can apply deterministically

Foundation·1 terms

explicit-criteriaprecisionprompt-design

Prompt Specificity & Precision

✎Core

Replace vague goals with specific, actionable criteria

Foundation

specificitycriteriaprompting

Classification Consistency & False Positive Reduction

✎Core

Use absolute criteria with concrete examples for each classification level

Intermediate·1 prereq

classificationconsistencyfalse-positives

Apply few-shot prompting to improve output consistency and quality

Using targeted few-shot examples to achieve consistent formatting, handle ambiguous cases, and reduce hallucination.

Knowledge of:

Few-shot examples as the most effective technique for achieving consistently formatted, actionable output when detailed instructions alone produce inconsistent results
The role of few-shot examples in demonstrating ambiguous-case handling (e.g., tool selection for ambiguous requests, branch-level test coverage gaps)
How few-shot examples enable the model to generalize judgment to novel patterns rather than matching only pre-specified cases
The effectiveness of few-shot examples for reducing hallucination in extraction tasks (e.g., handling informal measurements, varied document structures)

Skills in:

Creating 2-4 targeted few-shot examples for ambiguous scenarios that show reasoning for why one action was chosen over plausible alternatives
Including few-shot examples that demonstrate specific desired output format (issue, severity, suggested fix) to achieve consistency
Providing few-shot examples distinguishing acceptable code patterns from genuine issues to reduce false positives while enabling generalization
Using few-shot examples to demonstrate correct handling of varied document structures (inline citations vs bibliographies, methodology sections vs embedded details)
Adding few-shot examples showing correct extraction from documents with varied formats to address empty/null extraction of required fields

Few-Shot Prompting Techniques

✎Core

Few-shot examples are more reliable than instructions for consistent formatting

Foundation·1 prereq·1 terms

few-shotexamplesprompting

Concrete Input-Output Examples

✎Core

Concrete examples eliminate ambiguity that prose descriptions create

Foundation·1 prereq

examplesdata-transformationprompting

Enforce structured output using tool use and JSON schemas

Using tool_use with JSON schemas for guaranteed structured output, understanding tool_choice options, and schema design.

Knowledge of:

Tool use (tool_use) with JSON schemas as the most reliable approach for guaranteed schema-compliant structured output, eliminating JSON syntax errors
The distinction between tool_choice: "auto" (model may return text instead of calling a tool), "any" (model must call a tool but can choose which), and forced tool selection (model must call a specific named tool)
That strict JSON schemas via tool use eliminate syntax errors but do not prevent semantic errors (e.g., line items that don't sum to total, values in wrong fields)
Schema design considerations: required vs optional fields, enum fields with "other" + detail string patterns for extensible categories

Skills in:

Defining extraction tools with JSON schemas as input parameters and extracting structured data from the tool_use response
Setting tool_choice: "any" to guarantee structured output when multiple extraction schemas exist and the document type is unknown
Forcing a specific tool with tool_choice: {"type": "tool", "name": "extract_metadata"} to ensure a particular extraction runs before enrichment steps
Designing schema fields as optional (nullable) when source documents may not contain the information, preventing the model from fabricating values to satisfy required fields
Adding enum values like "unclear" for ambiguous cases and "other" + detail fields for extensible categorization
Including format normalization rules in prompts alongside strict output schemas to handle inconsistent source formatting

Structured Output via Tool Use & JSON Schemas

✎Core

Tool use with JSON schemas eliminates syntax errors but not semantic errors

Intermediate·1 prereq·2 terms

structured-outputjson-schematool-use

tool_choice Options & Forced Tool Selection

✎Core

tool_choice: 'auto' may return text; 'any' guarantees a tool call; forced selection guarantees a specific tool

Intermediate·1 prereq·1 terms

tool-choiceapi-configstructured-output

Implement validation, retry, and feedback loops for extraction quality

Designing retry-with-error-feedback loops, identifying when retries will succeed vs fail, and building systematic feedback mechanisms.

Knowledge of:

Retry-with-error-feedback: appending specific validation errors to the prompt on retry to guide the model toward correction
The limits of retry: retries are ineffective when the required information is simply absent from the source document (vs format or structural errors)
Feedback loop design: tracking which code constructs trigger findings (detected_pattern field) to enable systematic analysis of dismissal patterns
The difference between semantic validation errors (values don't sum, wrong field placement) and schema syntax errors (eliminated by tool use)

Skills in:

Implementing follow-up requests that include the original document, the failed extraction, and specific validation errors for model self-correction
Identifying when retries will be ineffective (e.g., information exists only in an external document not provided) versus when they will succeed (format mismatches, structural output errors)
Adding detected_pattern fields to structured findings to enable analysis of false positive patterns when developers dismiss findings
Designing self-correction validation flows: extracting "calculated_total" alongside "stated_total" to flag discrepancies, adding "conflict_detected" booleans for inconsistent source data

Retry-with-Error-Feedback Pattern

✎Core

Append specific validation errors to the retry prompt -- not just 'try again'

Intermediate·1 prereq

validationretryerror-feedback

Feedback Loop Design & Dismissal Pattern Analysis

✓Advanced

Add detected_pattern fields to enable systematic analysis of false positive patterns

Intermediate·1 prereq

feedback-loopsdismissal-patternsquality-improvement

Design efficient batch processing strategies

Matching API approach to workflow latency requirements, handling batch failures, and optimizing batch submission.

Knowledge of:

The Message Batches API: 50% cost savings, up to 24-hour processing window, no guaranteed latency SLA
Batch processing is appropriate for non-blocking, latency-tolerant workloads (overnight reports, weekly audits, nightly test generation) and inappropriate for blocking workflows (pre-merge checks)
The batch API does not support multi-turn tool calling within a single request (cannot execute tools mid-request and return results)
custom_id fields for correlating batch request/response pairs

Skills in:

Matching API approach to workflow latency requirements: synchronous API for blocking pre-merge checks, batch API for overnight/weekly analysis
Calculating batch submission frequency based on SLA constraints (e.g., 4-hour windows to guarantee 30-hour SLA with 24-hour batch processing)
Handling batch failures: resubmitting only failed documents (identified by custom_id) with appropriate modifications (e.g., chunking documents that exceeded context limits)
Using prompt refinement on a sample set before batch-processing large volumes to maximize first-pass success rates and reduce iterative resubmission costs

Batch Processing Strategy & API Selection

✎Core

Batch API saves 50% but has up to 24-hour processing with no latency SLA

Intermediate·2 terms

batchcostapi

Batch Failure Handling & Constraints

✎Core

Resubmit only failed documents identified by custom_id, not the entire batch

Intermediate·1 prereq

batchfailuresconstraints

Batch Cost Optimization Strategies

✓Advanced

50% batch savings are reduced by resubmission costs -- maximize first-pass success

Intermediate·1 prereq

costoptimizationbatch

Design multi-instance and multi-pass review architectures

Using independent review instances and multi-pass strategies to catch issues that self-review misses.

Knowledge of:

Self-review limitations: a model retains reasoning context from generation, making it less likely to question its own decisions in the same session
Independent review instances (without prior reasoning context) are more effective at catching subtle issues than self-review instructions or extended thinking
Multi-pass review: splitting large reviews into per-file local analysis passes plus cross-file integration passes to avoid attention dilution and contradictory findings

Skills in:

Using a second independent Claude instance to review generated code without the generator's reasoning context
Splitting large multi-file reviews into focused per-file passes for local issues plus separate integration passes for cross-file data flow analysis
Running verification passes where the model self-reports confidence alongside each finding to enable calibrated review routing

Self-Critique Limitations & Independent Review

✎Core

Self-review in the same context suffers from confirmation bias -- the model retains generation reasoning

Intermediate·1 prereq·1 terms

self-critiquereviewconfirmation-bias

Multi-Pass Review Architecture

✎Core

Split large reviews into per-file local passes plus cross-file integration passes

Advanced·1 prereq

multi-passreviewverification