Self-Critique Limitations & Independent Review

Core

Design multi-instance and multi-pass review architectures · Difficulty 3/5

self-critiquereviewconfirmation-bias

Prerequisites

Explicit Criteria over Vague Instructions

The Evaluator-Optimizer Pattern uses a second pass to evaluate and improve the first pass's output. However, self-review in the same context has fundamental limitations.

Self-Review Limitations

When Claude generates code or analysis, it retains reasoning context from generation. This makes it less likely to question its own decisions in the same session -- a form of confirmation bias. The model may rationalize edge cases it considered and dismissed during generation.

Independent Review Instances

A second Claude instance evaluating WITHOUT the generator's reasoning context provides a fresh perspective:

Catches issues the first instance rationalized away

More effective than self-review instructions or extended thinking

Mirrors human peer review benefits

Pattern

Instance 1 (Generator): Produce code/analysis
Instance 2 (Reviewer): Review output WITHOUT seeing Instance 1's reasoning

Verification Passes with Confidence

Run verification passes where the model self-reports confidence alongside each finding. This enables calibrated review routing:

High-confidence findings go directly to the developer

Low-confidence findings get additional review passes

Confidence calibration improves over time with feedback data

Key Takeaways

✓Self-review in the same context suffers from confirmation bias -- the model retains generation reasoning
✓Independent review instances (without prior reasoning) catch issues self-review misses
✓This pattern is more effective than self-review instructions or extended thinking
✓Add confidence self-reporting to findings for calibrated review routing

Glossary Terms

Evaluator-Optimizer Pattern

A two-pass architecture where a generator produces output and a separate evaluator assesses it against explicit criteria. For true quality assurance, the evaluator must be a separate Claude instance with independent context — using the same instance creates confirmation bias.

Related Concepts

Multi-Pass Review Architecture

Split large reviews into per-file local passes plus cross-file integration passes

Feedback Loop Design & Dismissal Pattern Analysis

Add detected_pattern fields to enable systematic analysis of false positive patterns

Test Yourself1 of 1

Production metrics show that when your agent resolves complex cases involving billing disputes or multi-order returns, customer satisfaction scores are 15% lower than for simple cases — even when the resolution is technically correct. Root cause analysis reveals the agent provides accurate resolutions but inconsistently explains the reasoning: sometimes omitting relevant policy details, other times missing timeline information or next steps. The specific context gaps vary by case. You want to improve resolution quality without adding human review overhead. Which approach is most effective?