4.2 Few-Shot Prompting
4.2.1 When Telling Isn't Enough, Show
Lesson 4.1 made your instructions explicit. But sometimes even careful, explicit instructions still produce inconsistent results — the format drifts, the judgment on tricky cases wobbles, or extraction misses information that's clearly present. Task Statement 4.2 introduces the single most effective tool for these problems: few-shot prompting, which means giving Claude a few worked EXAMPLES of input and the exact output you want.
It's the difference between describing a dance and demonstrating it. You can write paragraphs about the footwork, but one demonstration conveys what words can't — the timing, the feel, the exact form. Few-shot examples are that demonstration for Claude: instead of telling it what good output looks like, you SHOW it two to four times, and it picks up the pattern far more reliably than from instructions alone. When detailed instructions aren't getting you consistent, actionable output, few-shot is the upgrade.
Crucially, few-shot is THE most effective technique for consistency — more effective than adding yet more instructions, than confidence thresholds, or than fiddling with temperature. The exam tests that you reach for examples (not those other levers) when consistency is the problem. Let's cover when to use it and how to construct examples that actually teach.
Instructions describe; examples demonstrate. Two-to-four worked examples teach the pattern more reliably than any amount of additional instruction — the most effective consistency technique.
The one idea to hold onto
Few-shot prompting — showing 2–4 worked input/output examples — is THE most effective technique for consistent, actionable output, more so than more instructions, confidence thresholds, or temperature changes.
4.2.2 The Three Triggers for Few-Shot
How do you know it's time to reach for few-shot rather than another technique? Three symptoms signal it, and recognizing them is what the exam tests.
- 1.Inconsistent FORMATTING. Your detailed instructions are clear, yet the output's structure varies run to run. Examples pin down the exact format (the location/issue/severity/fix shape you want).
- 2.Inconsistent JUDGMENT on ambiguous cases. The model decides borderline cases differently each time — which tool to use for an ambiguous request, whether a subtle coverage gap counts. Examples show how to judge the ambiguous ones.
- 3.Empty/null extraction of info that EXISTS. The model returns nothing for a field whose value is actually present, just expressed unusually (a measurement buried in narrative text, a date in prose). Examples teach it to recognize the information in varied forms.
| Symptom | Few-shot fixes it by… |
|---|---|
| Output format varies run to run | Showing the exact desired output shape |
| Judgment on borderline cases wobbles | Demonstrating how to decide ambiguous cases |
| Returns null for info that's actually present | Showing extraction from varied document structures |
Three triggers for few-shot: inconsistent formatting, inconsistent judgment, and empty extraction of present-but-unusually-expressed info.
4.2.2 — Key Concept
Reach for few-shot when (1) formatting is inconsistent despite detailed instructions, (2) judgment on ambiguous cases wobbles, or (3) extraction returns empty/null for information that actually exists in the source. Examples solve all three.
4.2.3 Constructing Examples That Teach
Not all examples are equally useful. The construction matters, and getting it right is the difference between examples that teach generalization and examples the model just copies superficially.
Three construction rules. First, use 2 to 4 TARGETED examples — enough to establish the pattern, focused on the scenarios that are actually failing (don't pad with easy cases). Second, and most important: each example should show the REASONING, not just the answer. An example that explains WHY one tool was chosen over a plausible alternative teaches the model to generalize the judgment to new cases; an example that shows only the final output invites surface pattern-matching. Third, cover the FAILING scenarios specifically — if extraction breaks on narrative-style documents, include a narrative-style example.
This is the deepest point in the lesson, and the exam tests it: few-shot examples are NOT about literal pattern-matching. Done well, they teach the model to GENERALIZE judgment to novel patterns it hasn't seen. That's why showing reasoning matters — you're teaching the decision process, not memorizing input-output pairs. The same mechanism reduces hallucination in extraction (show what real data looks like in varied forms) and reduces false positives (show what to flag AND what to deliberately ignore).
4.2.3 — Key Concept
Construct 2–4 targeted examples that each show the REASONING (why this choice over a plausible alternative) and cover the failing scenarios. Examples teach the model to GENERALIZE judgment to novel cases — they are not literal pattern-matching.
4.2.4 Few-Shot vs the Other Techniques
Domain 4 has several tools, and the exam loves to test whether you pick the RIGHT one for a symptom. Few-shot is powerful but not universal — here's how it slots in against the others you'll meet.
| Symptom | Best technique |
|---|---|
| Inconsistent formatting / judgment | Few-shot examples (this lesson) |
| Malformed JSON / syntax errors | tool_use with a schema (4.3) |
| Model fabricates values for absent info | Nullable/optional schema fields (4.3) |
| Wrong tool selected among similar tools | Better tool descriptions, then few-shot (2.1) |
| Line items don't sum to the total | Validation-retry loop (4.4) |
Few-shot is the answer for consistency of format and judgment; other symptoms (syntax errors, fabrication, math errors) have their own dedicated tools in 4.3 and 4.4.
The discriminator: few-shot fixes CONSISTENCY and JUDGMENT problems. It is NOT the fix for guaranteed JSON structure (that's tool_use, 4.3), for stopping fabrication of absent data (nullable fields, 4.3), or for catching arithmetic errors (validation-retry, 4.4). When the symptom is 'it's inconsistent' or 'it judges ambiguous cases differently each time,' few-shot; when it's a structural or semantic guarantee, look to the next lessons.
4.2.4 — Key Concept
Few-shot is the go-to for inconsistent formatting and judgment, but match technique to symptom: tool_use+schema for JSON syntax (4.3), nullable fields for fabrication (4.3), and validation-retry for math/semantic errors (4.4) — not few-shot.
4.2.5 The Exam Traps
The 4.2 traps test whether you reach for examples on consistency problems, and whether you understand examples teach generalization (not copying). The recurring distractor is 'add more instructions.'
- •More instructions for inconsistency. ✗ Adding yet another paragraph of detail when output is inconsistent. ✓ Provide 2–4 examples — they outperform more prose.
- •Confidence/temperature for consistency. ✗ Reaching for confidence thresholds or temperature to fix wobbling judgment. ✓ Few-shot examples showing how to decide.
- •Treating examples as literal patterns. ✗ Believing the model only matches the exact examples shown. ✓ Well-built examples (with reasoning) teach it to generalize to novel cases.
- •Few-shot for the wrong symptom. ✗ Using few-shot to guarantee JSON structure or stop fabrication. ✓ That's tool_use + nullable schema (4.3).
4.2.5 — Exam Trap
For inconsistent formatting/judgment, choose few-shot examples — reject 'add more instructions', confidence thresholds, and temperature. Remember examples teach GENERALIZATION (show reasoning), and don't misapply few-shot to syntax/fabrication problems (those are 4.3).
4.2.6 Put It Together: Teach by Example
You now know when to use few-shot, the three triggers, how to construct examples that teach generalization, and how few-shot fits among the other Domain 4 techniques. The exercise has you fix each trigger with examples and prove that reasoning-rich examples generalize.
4.2.6 — Build Exercise (30 min)
(1) Take a task where detailed instructions still give inconsistent FORMAT; add 2–4 examples showing the exact output shape and watch consistency improve. (2) Take an extraction that returns null for info present in narrative text; add a narrative-style example and confirm it now extracts. (3) Build two versions of a judgment example — one showing only the answer, one showing the REASONING — and test both on novel cases; observe that the reasoning version generalizes while the answer-only version pattern-matches. (4) Resist the urge to 'add more instructions' and confirm examples win.
Few-shot makes output consistent by demonstration. The next lesson, 4.3, tackles a different guarantee: STRUCTURE — using tool_use and JSON schemas to make output reliably machine-parseable.
Where this shows up on the exam
4.2 questions describe inconsistent output and ask the most effective fix — it's few-shot (2–4 reasoning-rich examples), not more instructions or confidence thresholds. Watch for the 'examples = literal matching' misconception (they teach generalization).
Key Takeaways
- ✓Few-shot prompting (2–4 worked input/output examples) is THE most effective technique for consistent, actionable output — more so than more instructions, confidence thresholds, or temperature.
- ✓Three triggers: inconsistent formatting despite detailed instructions, inconsistent judgment on ambiguous cases, and empty/null extraction of information that actually exists.
- ✓Construct 2–4 targeted examples that each show the REASONING (why this choice over a plausible alternative) and cover the specific failing scenarios.
- ✓Few-shot examples teach the model to GENERALIZE judgment to novel patterns — they are NOT literal pattern-matching, which is why showing reasoning matters.
- ✓Few-shot reduces hallucination in extraction (show varied real forms) and false positives (show what to flag AND what to ignore).
- ✓Match technique to symptom: few-shot for consistency/judgment; tool_use+schema for JSON syntax (4.3); nullable fields for fabrication (4.3); validation-retry for math/semantic errors (4.4).
- ✓The recurring distractor is 'add more instructions' — for inconsistency, examples beat more prose.
Check Your Understanding
Test what you learned in this lesson.
Q1.You've written detailed instructions but Claude's extraction output is still formatted inconsistently across runs. What's the most effective fix?
Q2.An extraction task returns null for a measurement that IS present in the document, just expressed in narrative prose. Which technique best addresses this?
Q3.Why should each few-shot example show the REASONING behind the chosen output, not just the output?
Q4.Which problem should NOT be solved with few-shot examples?
Practice This Lesson