Context Windows & Provision Strategies

Core

Manage conversation context to preserve critical information across long interactions · Difficulty 2/5

contextprovisiontokenscontext-window

The Context Window defines the maximum amount of text (measured in tokens) that Claude can process in a single request. Claude can only reason about information it can see, so providing the right context is essential for accurate responses.

Key Concepts

Input tokens: The system prompt, user messages, and any provided context

Output tokens: Claude's generated response (controlled by `max_tokens`)

Total context: Input + output tokens must fit within the model's Context Window

Context Provision Methods

Direct inclusion: Include relevant files/data directly in the prompt

Tool-based retrieval: Give Claude tools to fetch information on demand

RAG pipeline: Pre-retrieve relevant documents using embeddings/search

Prior findings: Include results from previous analysis passes

Key Principle: Include What Claude Needs to See

Claude has no implicit knowledge of your codebase, test suite, or prior reviews. If you want it to account for existing work, you must explicitly provide that context. If Claude suggests test cases that duplicate existing tests, the solution is simple: include the existing test file in the context.

Best Practices

**Monitor Token usage**: Track input/output tokens to avoid hitting limits

Prioritize context: Place the most important information at the beginning and end

Summarize when needed: For long conversations, progressively summarize older turns

Use prompt caching: Cache static system prompts to reduce costs on repeated calls

Key Takeaways

✓Context window = input tokens + output tokens combined
✓Claude can only reason about information explicitly provided in context
✓Include existing code/tests/reviews to prevent duplicate suggestions
✓Use progressive summarization for long conversations

Glossary Terms

Context Window

The maximum amount of text (measured in tokens) that Claude can process in a single request. Includes both input tokens (prompt, history, tool results) and output tokens. Exceeding the context window causes an error or requires context management strategies.

Token

The fundamental unit of text processing for Claude. Roughly 3-4 characters or about 0.75 words in English. Used for measuring input, output, and context window size. Costs are calculated per input and output token.

max_tokens

API parameter that sets the maximum number of tokens Claude will generate in its response. If the response would exceed this limit, it is truncated and stop_reason is set to 'max_tokens'. Always check stop_reason to detect truncation.

Related Concepts

Lost in the Middle & Position Effects

Models attend best to beginning and end of context, less to the middle

Context Token Management & Caching

Trim verbose tool outputs to only relevant fields before they accumulate in context

Test Yourself1 of 3

Your synthesis agent processes results from subagents and produces a report. When a second research query on a related topic runs, the synthesis agent suggests findings that duplicate the first report. What's the most effective fix?