4.1 Prompts with Explicit Criteria

4.1.1 Why 'Be Careful' Doesn't Work

Domain 4 is about the craft of the prompt itself — getting Claude to produce precise, reliable output. Task Statement 4.1 starts with the most common prompting mistake: vague instructions. You tell Claude to 'be conservative' or 'only report high-confidence issues,' and you get inconsistent results — sometimes too many false alarms, sometimes missed problems. Why?

Because instructions like 'be careful' or 'be conservative' give Claude no actual DECISION BOUNDARY. Imagine training a new reviewer by telling them only 'flag the important stuff.' Important by whose definition? They'll guess, and guess differently each time. Now imagine instead handing them a checklist: 'Flag a comment ONLY when the claimed behavior contradicts what the code actually does. Do NOT flag style preferences.' That's a crisp, verifiable rule — anyone applying it lands in the same place. The difference between vague and explicit instructions is the difference between guessing and following a rule.

This is the core of 4.1: replace vague qualitative instructions with EXPLICIT CATEGORICAL criteria — concrete statements of exactly what to flag and what to skip. Explicit criteria are how you make Claude's judgment consistent and verifiable. The lesson also covers a counterintuitive trick for rescuing trust when one category goes wrong, and why 'just be more confident' isn't the fix people think it is.

Vague qualitative instructions ('be conservative') give no decision boundary and produce inconsistent results; explicit categorical criteria define exactly what to flag, making judgment consistent.

ℹ️

The one idea to hold onto

Vague instructions ('be conservative', 'only high-confidence') give Claude no decision boundary and produce inconsistent results. Replace them with EXPLICIT categorical criteria that state exactly what to flag and what to skip.

4.1.2 Writing Explicit Criteria

An explicit criterion names the category and the precise condition. Instead of 'check that comments are accurate' (vague), write 'flag a comment only when the claimed behavior contradicts the actual code behavior' (explicit). The second version is verifiable — you can look at any comment and definitively say whether the rule applies. That verifiability is what makes the output consistent across runs and across cases.

The same applies to deciding WHAT to report at all. Rather than 'report important issues,' specify the categories to report (bugs, security vulnerabilities) versus the categories to skip (minor style nits, local stylistic patterns). And for severity, don't describe levels in prose ('high severity means serious') — show actual CODE PATTERNS for each level, so 'high' has a concrete, consistent meaning. Explicit criteria turn fuzzy judgment calls into rule-following.

Vague (✗)	Explicit criteria (✓)
"Check that comments are accurate"	"Flag only when the comment's claim contradicts the code"
"Report important issues"	"Report bugs and security; skip minor style"
"Use good judgment on severity"	Show concrete code patterns for each severity level
"Be conservative"	Define exactly which categories to flag vs skip

Explicit criteria name the category and the precise, verifiable condition — replacing fuzzy adjectives with rules anyone (or any run) applies the same way.

⭐

4.1.2 — Key Concept

Write specific, verifiable criteria: name which categories to report (bugs, security) vs skip (style), and define conditions precisely ('flag only when the claim contradicts the code'). For severity, show concrete code patterns per level rather than prose adjectives.

4.1.3 The False-Positive Trust Problem

Here's a subtle dynamic the exam tests. Suppose your review system has several categories, and ONE of them — say 'documentation mismatch' — produces a lot of false positives (40% of its flags are wrong). What's the damage? It's not just that one noisy category. Once developers see that category crying wolf, they start ignoring ALL the system's findings, including the accurate ones. A high false-positive rate in one category erodes trust in EVERY category.

The fix is counterintuitive, and that's exactly why it's tested. You might think 'tighten that category's confidence threshold.' Instead, the right move is to temporarily DISABLE the problematic category entirely while you refine its criteria with explicit examples — then re-enable it once it's reliable. Pulling the noisy category restores developers' trust in the rest immediately, and you fix the bad category offline rather than letting it poison the whole system in production. Trust is the asset you're protecting; a single noisy category can bankrupt it.

⭐

4.1.3 — Key Concept

A high false-positive rate in ONE category erodes trust in ALL categories. The counterintuitive fix: temporarily DISABLE the problematic category while you refine its criteria with explicit examples — restoring trust in the rest — rather than just nudging a confidence threshold.

4.1.4 Why Confidence Filtering Isn't the Answer

A tempting shortcut for precision is to have the model rate its own confidence and filter out low-confidence findings. The exam wants you to know why this is a weak primary strategy: model self-confidence is POORLY CALIBRATED. The model is often confidently wrong and hesitantly right, so its confidence score isn't a reliable proxy for correctness. Filtering on it lets confident errors through and discards uncertain-but-correct findings.

So the ordering matters: get the EXPLICIT CRITERIA right first — that's what actually improves precision — and treat confidence-based routing as a secondary refinement, not the main lever. Whenever an exam answer offers 'add a confidence threshold' or 'tell it to be more confident' as the fix for inconsistent precision, be suspicious: the real fix is almost always sharper criteria (and, when a category is noisy, disabling and refining it).

⚠️

4.1.4 — Key Concept

Model self-confidence is poorly calibrated, so confidence-based filtering is a weak PRIMARY strategy — it lets confident errors through. Fix precision with explicit criteria first; use confidence routing only as a secondary refinement (and calibrate it against labeled data — see 5.5).

4.1.5 The Exam Traps

The 4.1 traps revolve around reaching for vague instructions or confidence thresholds where explicit criteria are the real fix. The signature scenario: a review category has a 40% false-positive rate; what do you do?

•Vague instruction as a fix. ✗ 'Tell it to be more conservative' to reduce false positives. ✓ Write explicit categorical criteria for what to flag vs skip.
•Confidence threshold as the cure. ✗ 'Raise the confidence threshold' for a noisy category. ✓ Disable the category, refine its criteria with examples, re-enable.
•Keeping a noisy category live. ✗ Leaving the 40%-FP category active while you tweak it. ✓ Disable it to restore trust in the others, fix it offline.
•Trusting self-confidence. ✗ Filtering findings purely on the model's self-rated confidence. ✓ Explicit criteria first; confidence only as a calibrated secondary signal.

⚠️

4.1.5 — Exam Trap

For a precision/false-positive problem: ✗ 'be conservative', ✗ raise a confidence threshold, ✗ rely on self-reported confidence. ✓ Write explicit categorical criteria; for a high-FP category, temporarily DISABLE it and refine with examples while keeping the rest trustworthy.

4.1.6 Put It Together: Make Judgment Consistent

You now know why vague instructions fail, how to write explicit criteria, the false-positive trust dynamic, and why confidence filtering is secondary. The exercise has you turn a vague review prompt into a precise one and feel the consistency improve.

✨

4.1.6 — Build Exercise (30 min)

(1) Write a vague review prompt ('flag anything important / be conservative') and run it several times; note inconsistent flags. (2) Rewrite it with explicit categorical criteria (which categories to report vs skip; precise flag conditions; concrete code patterns per severity) and re-run; measure the consistency gain. (3) Simulate a category with a high false-positive rate, disable it, refine its criteria with examples, then re-enable — and observe how disabling it restores trust in the other categories. (4) Try a confidence-threshold filter and note where confidently-wrong findings slip through.

Explicit criteria make judgment consistent through clear rules. The next lesson, 4.2, covers the most powerful technique when even explicit instructions aren't enough: few-shot examples — showing Claude what you want.

ℹ️

Where this shows up on the exam

4.1 questions describe a high false-positive rate or inconsistent precision. The answer is explicit criteria (and disabling-then-refining a noisy category) — not 'be conservative', not a confidence threshold, not sentiment or self-confidence.

Key Takeaways

✓Vague instructions ('be conservative', 'only high-confidence') give no decision boundary and produce inconsistent output; replace them with explicit categorical criteria.
✓Explicit criteria name the category and a precise, verifiable condition ('flag only when the comment's claim contradicts the code') and specify what to report vs skip.
✓For severity, show concrete code patterns per level rather than prose adjectives, so each level has a consistent meaning.
✓A high false-positive rate in ONE category erodes trust in ALL categories — false positives are a trust problem, not just a noise problem.
✓Counterintuitive fix for a noisy category: temporarily DISABLE it and refine its criteria with examples (restoring trust in the rest), rather than nudging a confidence threshold.
✓Model self-confidence is poorly calibrated, so confidence-based filtering is a weak primary strategy — fix precision with explicit criteria first, confidence routing only as a secondary calibrated signal.
✓Exam reflex: for precision/false-positive problems choose explicit criteria, not 'be conservative' or confidence thresholds.

Check Your Understanding

Test what you learned in this lesson.

Q1.Your code-review agent's 'documentation mismatch' category has a 40% false-positive rate, and developers are starting to ignore ALL the agent's findings. What's the most effective response?

Q2.Which instruction gives Claude a usable decision boundary for a code-comment review?

Q3.Why is filtering findings by the model's self-reported confidence a weak primary strategy for precision?

Q4.What's the best first step to improve precision when a review prompt produces inconsistent flags?

Practice This Lesson

3.6 Integrating Claude Code into CI/CD

4.2 Few-Shot Prompting