5.4 Context in Large Codebase Exploration

5.4.1 The Agent That Forgets What It Found

Task Statement 5.4 is about a reliability problem unique to long exploration sessions — and it's NOT about failures or errors. It's about a subtle DEGRADATION that creeps in as an agent works through a large codebase over many turns: it slowly stops referring to the specific things it discovered and starts giving vague, generic answers instead. Nothing crashes; the quality just quietly erodes.

Here's the telltale symptom. Early in a session, the agent gives precise answers: 'the PaymentProcessor class in billing/processor.py calls validate_card() before charging.' Hours and many turns later, ask a related question and you get: 'in a typical payment system, you'd usually validate the card first...' — generic textbook patterns instead of the specific classes and files it actually found earlier. The agent has drifted from concrete knowledge to vague generalities. It's like a researcher who, after a long day buried in documents, starts answering from general impressions rather than the specific notes they took that morning.

This is context degradation, and the crucial — and most tested — insight is its CAUSE: it is NOT a token-limit problem. The agent didn't run out of room; the quality of its attention to the specifics degraded as verbose output piled up. That distinction dictates the fix, because the obvious 'just use a bigger context window' does NOT solve it. Let's see what actually does.

Context degradation: an agent that started citing specific discovered classes/files drifts to generic 'typical patterns' as verbose output accumulates. The cause is attention quality, NOT a token limit.

ℹ️

The one idea to hold onto

In long exploration sessions, the agent drifts from specific discovered facts to generic 'typical patterns' — context degradation. The cause is attention quality as verbose output accumulates, NOT a token limit, so a bigger context window does NOT fix it.

5.4.2 The Primary Fix: Scratchpad Files

If the problem is that key findings get diluted as conversation context fills with verbose output, the fix is to move those findings OUT of the conversation entirely. That's the scratchpad-file pattern, and it's the primary cure for context degradation. The agent writes its key discoveries — important classes, file paths, dependency chains — to a FILE on disk, outside the conversation context, and refers back to that file when it needs the information later.

Why does this work? Because the file is immune to degradation. The conversation context can fill with noise and the model's attention to in-context details can fade, but a fact written to scratchpad.md is exactly as precise on turn 200 as it was on turn 5 — the agent just re-reads it. It's the codebase-exploration analogue of the case-facts block from Lesson 5.1: in both, you protect the information that must stay accurate by keeping it OUTSIDE the part of context that gets diluted or summarized. The researcher who writes careful notes and consults them, rather than trusting memory after a long day, doesn't drift.

⭐

5.4.2 — Key Concept

The primary fix for context degradation is scratchpad files: the agent writes key findings (classes, file paths, dependency chains) to files OUTSIDE conversation context, immune to degradation, and re-reads them later — the codebase analogue of Lesson 5.1's case-facts block.

5.4.3 Subagents, Summaries, and State Manifests

Three more techniques manage context across a long exploration, and each ties back to earlier domains. SUBAGENT DELEGATION: spawn subagents to investigate specific questions — 'find all the test files,' 'trace the refund-flow dependencies' — while the main agent keeps the high-level coordination. The key insight (from Lesson 1.3) is that the primary benefit here is context ISOLATION, not just parallelization: the subagent's verbose exploration happens in ITS context, and only a clean summary returns to the main agent, so the main context never fills with the noise that causes degradation.

SUMMARY INJECTION between phases: before moving from one exploration phase to the next, summarize the key findings and inject that summary into the next phase's starting context — carrying forward the essentials without the accumulated noise. And use /compact PROACTIVELY: rather than waiting until context is nearly full, compact during a long session to reduce the verbose discovery output before it causes drift. Compaction isn't only an emergency measure at the limit; it's a maintenance habit for long sessions.

Finally, STRUCTURED STATE MANIFESTS for crash recovery: have each agent export its state (what it explored, what it found, what phase it's in, next steps) to a known location, and on resume the coordinator loads that manifest and injects it. This makes a long, multi-phase exploration RESUMABLE after an interruption without re-exploring everything — the reliability complement to the scratchpad pattern.

Technique	Role	Ties to
Scratchpad files	Primary fix — findings outside context	5.1 case-facts block
Subagent delegation	Isolate verbose exploration; only summary returns	1.3 context isolation
Summary injection between phases	Carry essentials forward without noise	1.6 multi-pass
/compact proactively	Reduce verbose output before it drifts	3.6 / context mgmt
State manifests	Crash recovery — resume without re-exploring	1.7 session state

Five context-management techniques for long exploration. Scratchpad files are the primary fix; the others (subagents, summaries, /compact, manifests) keep the main context clean and the work resumable.

ℹ️

5.4.3 — Key Concept

Beyond scratchpad files: delegate to subagents (primary benefit = context ISOLATION, not just parallelization), inject summaries between phases, use /compact PROACTIVELY (not only at the limit), and export structured state manifests so a long exploration is resumable after a crash.

5.4.4 The Exam Traps

The 5.4 traps almost all hinge on the same insight: context degradation is an attention-quality problem, not a capacity one, so capacity-based 'fixes' don't work.

•Bigger window for degradation. ✗ 'Use a larger context window' to stop the drift to generic patterns. ✓ It's attention quality, not capacity — use scratchpad files.
•Delegation = parallelization only. ✗ Thinking subagents only speed things up. ✓ Their primary benefit here is context ISOLATION, keeping the main context clean.
•Restart without saving state. ✗ Crashing and re-exploring everything from scratch. ✓ Export structured state manifests so you can resume.
•/compact only at the limit. ✗ Waiting until context is full to compact. ✓ Compact proactively during long sessions.

⚠️

5.4.4 — Exam Trap

When an agent drifts to 'typical patterns' in a long session: ✗ a bigger context window (it's attention quality, not capacity); ✗ assuming delegation is just for speed; ✗ restarting without saved state; ✗ compacting only at the limit. ✓ scratchpad files (primary), subagent isolation, summaries, proactive /compact, and state manifests.

5.4.5 Put It Together: Keep a Long Exploration Sharp

You now know context degradation and its real cause, the scratchpad-file primary fix, and the supporting techniques (subagent isolation, summary injection, proactive /compact, state manifests). The exercise has you reproduce the drift and cure it.

✨

5.4.5 — Build Exercise (45 min)

(1) Run a long codebase exploration and watch the agent drift from specific class/file references to 'typical patterns'; confirm a bigger context window doesn't fix it. (2) Add scratchpad files — have the agent record key findings (classes, paths, dependency chains) to disk and re-read them — and confirm the drift stops. (3) Delegate a verbose sub-investigation ('find all test files') to a subagent and observe that only its summary returns, keeping the main context clean. (4) Add a state manifest each agent exports, kill the session, and resume by loading the manifest without re-exploring everything.

Scratchpad files and context isolation keep long explorations reliable. The next lesson, 5.5, is about reliability of a different kind — designing human review and confidence calibration so you know WHEN to trust automated output.

ℹ️

Where this shows up on the exam

5.4 questions describe an agent drifting to generic patterns in a long session. The answer is scratchpad files (and subagent isolation / manifests) — NOT a bigger context window, because the cause is attention quality, not a token limit.

Key Takeaways

✓Context degradation: in long exploration sessions an agent drifts from specific discovered facts (named classes/files) to generic 'typical patterns' as verbose output accumulates.
✓The cause is attention QUALITY, not a token limit — so a bigger context window does NOT fix it (the most-tested point).
✓Primary fix: scratchpad files — the agent writes key findings (classes, paths, dependency chains) to disk OUTSIDE conversation context, immune to degradation, and re-reads them (analogue of 5.1's case-facts block).
✓Subagent delegation's primary benefit here is context ISOLATION (not just parallelization): verbose exploration stays in the subagent's context and only a summary returns.
✓Inject summaries between phases to carry essentials forward without accumulated noise, and use /compact PROACTIVELY during long sessions, not only at the limit.
✓Export structured state manifests (explored / findings / phase / next steps) so a long multi-phase exploration is resumable after a crash without re-exploring everything.
✓All these techniques keep the main context clean so the agent keeps reasoning from specifics rather than generalities.

Check Your Understanding

Test what you learned in this lesson.

Q1.During a long codebase exploration, the agent starts giving inconsistent answers and references 'typical repository patterns' instead of the specific classes it discovered earlier. What's the best fix?

Q2.Why doesn't a larger context window fix context degradation in a long session?

Q3.In a long exploration, what is the PRIMARY benefit of delegating a verbose sub-investigation to a subagent?

Q4.How do you make a long, multi-phase codebase exploration resumable after a crash without re-exploring everything?

Practice This Lesson

5.3 Error Propagation in Multi-Agent Systems

5.5 Human Review & Confidence Calibration