Prompt Engineering & Structured Output

20% of exam

Design prompts with explicit criteria, apply few-shot patterns, enforce structured output via JSON schemas, implement validation loops, design batch processing strategies, and architect multi-instance reviews.

6

task statements

14

concepts

37

practice questions

Domain Mastery

0%
See full Exam Guide
ts-4.1

Design prompts with explicit criteria to improve precision and reduce false positives

Writing specific, categorical prompt criteria that improve precision and reduce false positive rates.

Knowledge of

  • The importance of explicit criteria over vague instructions (e.g., "flag comments only when claimed behavior contradicts actual code behavior" vs "check that comments are accurate")
  • How general instructions like "be conservative" or "only report high-confidence findings" fail to improve precision compared to specific categorical criteria
  • The impact of false positive rates on developer trust: high false positive categories undermine confidence in accurate categories

Skills in

  • Writing specific review criteria that define which issues to report (bugs, security) versus skip (minor style, local patterns) rather than relying on confidence-based filtering
  • Temporarily disabling high false-positive categories to restore developer trust while improving prompts for those categories
  • Defining explicit severity criteria with concrete code examples for each severity level to achieve consistent classification

Concepts

ts-4.2

Apply few-shot prompting to improve output consistency and quality

Using targeted few-shot examples to achieve consistent formatting, handle ambiguous cases, and reduce hallucination.

Knowledge of

  • Few-shot examples as the most effective technique for achieving consistently formatted, actionable output when detailed instructions alone produce inconsistent results
  • The role of few-shot examples in demonstrating ambiguous-case handling (e.g., tool selection for ambiguous requests, branch-level test coverage gaps)
  • How few-shot examples enable the model to generalize judgment to novel patterns rather than matching only pre-specified cases
  • The effectiveness of few-shot examples for reducing hallucination in extraction tasks (e.g., handling informal measurements, varied document structures)

Skills in

  • Creating 2-4 targeted few-shot examples for ambiguous scenarios that show reasoning for why one action was chosen over plausible alternatives
  • Including few-shot examples that demonstrate specific desired output format (issue, severity, suggested fix) to achieve consistency
  • Providing few-shot examples distinguishing acceptable code patterns from genuine issues to reduce false positives while enabling generalization
  • Using few-shot examples to demonstrate correct handling of varied document structures (inline citations vs bibliographies, methodology sections vs embedded details)
  • Adding few-shot examples showing correct extraction from documents with varied formats to address empty/null extraction of required fields

Concepts

ts-4.3

Enforce structured output using tool use and JSON schemas

Using tool_use with JSON schemas for guaranteed structured output, understanding tool_choice options, and schema design.

Knowledge of

  • Tool use (tool_use) with JSON schemas as the most reliable approach for guaranteed schema-compliant structured output, eliminating JSON syntax errors
  • The distinction between tool_choice: "auto" (model may return text instead of calling a tool), "any" (model must call a tool but can choose which), and forced tool selection (model must call a specific named tool)
  • That strict JSON schemas via tool use eliminate syntax errors but do not prevent semantic errors (e.g., line items that don't sum to total, values in wrong fields)
  • Schema design considerations: required vs optional fields, enum fields with "other" + detail string patterns for extensible categories

Skills in

  • Defining extraction tools with JSON schemas as input parameters and extracting structured data from the tool_use response
  • Setting tool_choice: "any" to guarantee structured output when multiple extraction schemas exist and the document type is unknown
  • Forcing a specific tool with tool_choice: {"type": "tool", "name": "extract_metadata"} to ensure a particular extraction runs before enrichment steps
  • Designing schema fields as optional (nullable) when source documents may not contain the information, preventing the model from fabricating values to satisfy required fields
  • Adding enum values like "unclear" for ambiguous cases and "other" + detail fields for extensible categorization
  • Including format normalization rules in prompts alongside strict output schemas to handle inconsistent source formatting

Concepts

ts-4.4

Implement validation, retry, and feedback loops for extraction quality

Designing retry-with-error-feedback loops, identifying when retries will succeed vs fail, and building systematic feedback mechanisms.

Knowledge of

  • Retry-with-error-feedback: appending specific validation errors to the prompt on retry to guide the model toward correction
  • The limits of retry: retries are ineffective when the required information is simply absent from the source document (vs format or structural errors)
  • Feedback loop design: tracking which code constructs trigger findings (detected_pattern field) to enable systematic analysis of dismissal patterns
  • The difference between semantic validation errors (values don't sum, wrong field placement) and schema syntax errors (eliminated by tool use)

Skills in

  • Implementing follow-up requests that include the original document, the failed extraction, and specific validation errors for model self-correction
  • Identifying when retries will be ineffective (e.g., information exists only in an external document not provided) versus when they will succeed (format mismatches, structural output errors)
  • Adding detected_pattern fields to structured findings to enable analysis of false positive patterns when developers dismiss findings
  • Designing self-correction validation flows: extracting "calculated_total" alongside "stated_total" to flag discrepancies, adding "conflict_detected" booleans for inconsistent source data

Concepts

ts-4.5

Design efficient batch processing strategies

Matching API approach to workflow latency requirements, handling batch failures, and optimizing batch submission.

Knowledge of

  • The Message Batches API: 50% cost savings, up to 24-hour processing window, no guaranteed latency SLA
  • Batch processing is appropriate for non-blocking, latency-tolerant workloads (overnight reports, weekly audits, nightly test generation) and inappropriate for blocking workflows (pre-merge checks)
  • The batch API does not support multi-turn tool calling within a single request (cannot execute tools mid-request and return results)
  • custom_id fields for correlating batch request/response pairs

Skills in

  • Matching API approach to workflow latency requirements: synchronous API for blocking pre-merge checks, batch API for overnight/weekly analysis
  • Calculating batch submission frequency based on SLA constraints (e.g., 4-hour windows to guarantee 30-hour SLA with 24-hour batch processing)
  • Handling batch failures: resubmitting only failed documents (identified by custom_id) with appropriate modifications (e.g., chunking documents that exceeded context limits)
  • Using prompt refinement on a sample set before batch-processing large volumes to maximize first-pass success rates and reduce iterative resubmission costs

Concepts

ts-4.6

Design multi-instance and multi-pass review architectures

Using independent review instances and multi-pass strategies to catch issues that self-review misses.

Knowledge of

  • Self-review limitations: a model retains reasoning context from generation, making it less likely to question its own decisions in the same session
  • Independent review instances (without prior reasoning context) are more effective at catching subtle issues than self-review instructions or extended thinking
  • Multi-pass review: splitting large reviews into per-file local analysis passes plus cross-file integration passes to avoid attention dilution and contradictory findings

Skills in

  • Using a second independent Claude instance to review generated code without the generator's reasoning context
  • Splitting large multi-file reviews into focused per-file passes for local issues plus separate integration passes for cross-file data flow analysis
  • Running verification passes where the model self-reports confidence alongside each finding to enable calibrated review routing

Concepts