Classification Consistency & False Positive Reduction

Core

Design prompts with explicit criteria to improve precision and reduce false positives · Difficulty 3/5

classificationconsistencyfalse-positivestrust

Prerequisites

When Claude classifies or categorizes items (like severity ratings), inconsistency is a common problem. High false positive rates in some categories erode trust across ALL categories.

Root Cause of Inconsistency

Ambiguous category definitions

No concrete examples for each category

Relative rather than absolute criteria

Solution: Explicit Criteria with Examples

Clear definition for each classification level

Concrete code/content examples for each level

Absolute criteria (not relative to other items in the batch)

False Positive Trust Erosion

When automated review produces high false positive rates in certain categories (e.g., style at 52%, docs at 48%), developers start dismissing even accurate findings. The fix:

Temporarily disable high false-positive categories

Keep high-precision categories running (security at 8%, correctness at 8%)

Improve prompts for disabled categories

Re-enable only when precision meets threshold

Anti-patterns

"Rate severity relative to other issues" (causes inconsistency across batches)

Confidence scores (developers who lost trust won't trust self-reported confidence)

Uniform strictness reduction (hurts high-precision categories unnecessarily)

Key Takeaways

✓Use absolute criteria with concrete examples for each classification level
✓Disable high false-positive categories immediately to stop trust erosion across all categories
✓Confidence scores do not fix the root cause -- explicit categorical criteria do

Related Concepts

Explicit Criteria over Vague Instructions

Replace vague goals with specific, categorical criteria the model can apply deterministically

Few-Shot Prompting Techniques

Few-shot examples are more reliable than instructions for consistent formatting

Test Yourself1 of 1

Your automated code review system shows inconsistent severity ratings — similar issues receive different severities in different PRs. What's the most effective way to improve severity consistency?