Claude Code in Large Codebases: Context Without Chaos

Q: When does Claude Code auto-summarize older context, and does it count against usage?

According to [Anthropic's support documentation](https://support.claude.com), when a conversation on a paid plan approaches the context window limit, Claude automatically summarizes earlier messages to make room. This does not count against usage limits. The full chat history is preserved for reference even after portions are summarized. --- Teams getting consistent results from Claude Code on large codebases are not the ones with the biggest context windows. They're the ones who configured `CLAUDE.md` with genuine architectural boundaries, use subagents for any work involving significant file traversal, and encode domain knowledge as skills rather than front-loading it into every session. The model is the same for everyone. The harness is where the gap opens.

Claude Code now supports a 1M token context window on paid plans, and that number gets cited constantly as the reason it handles large codebases well. The number is real, but the explanation is wrong. Context window size is closer to a ceiling than a solution, and treating it as the primary mechanism leads to workflows that degrade precisely when you need precision most. What actually keeps Claude Code coherent across a 300,000-line monorepo is a set of engineering decisions made outside the model: how you configure retrieval, how you scope what Claude sees, and how you delegate work to isolated subagents. The harness matters more than the headline spec.

A 1M Token Window Does Not Mean You Should Fill It

Anthropic's platform documentation names the core problem directly: "more context isn't automatically better. As token count grows, accuracy and recall degrade, a phenomenon known as context rot." This is not a Claude-specific quirk -- it's a well-documented property of attention mechanisms in large language models. Precision drops when the relevant signal is buried in a large body of irrelevant text.

In practice, context rot shows up in specific ways. Claude starts making assumptions about import paths that don't exist in your project. It hallucinates dependencies that are present in some files it read but not in the module you're actually working on. It suggests refactors that break interfaces defined three files ago. None of this is a model failure in the abstract sense. It's a retrieval failure caused by overloaded context.

Official Claude Code best practices documentation is unusually direct about this: "A single debugging session or codebase exploration might generate and consume tens of thousands of tokens. LLM performance degrades as context fills." Anthropic's recommended response is not to buy a bigger context window. It's to manage what goes in through deliberate tooling choices, and to offload exploratory work to subagents so the main conversation context stays clean for implementation. This is the architectural shift that separates teams getting consistent results from teams complaining that Claude "loses track" of the codebase halfway through a session.

Counterintuitively, a well-configured Claude Code session on a large codebase should have a smaller active context than a naive one, not a larger one.

CLAUDE.md Is a Retrieval Configuration File, Not a README

Most teams who set up CLAUDE.md treat it like a project README -- high-level notes about what the project does, maybe some setup instructions. That's the wrong mental model. CLAUDE.md is closer to a retrieval policy document. It tells Claude which parts of the codebase are relevant to a given type of task, which directories are out of scope, and what architectural boundaries exist between modules.

When Claude Code runs on a monorepo without a configured CLAUDE.md, it tends to pull in broadly scoped files during exploration. Doing so fills a context window with code that is structurally adjacent to the task but semantically irrelevant to it. According to Anthropic's canonical guidance on how Claude Code works, the harness is built from five extension points: CLAUDE.md files, hooks, skills, plugins, and MCP servers. CLAUDE.md is the first layer, and it shapes everything downstream.

A useful CLAUDE.md for a monorepo specifies module ownership explicitly. Something like: "The payments/ directory is owned by the payments team and has no direct imports from catalog/. If a task involves payments, do not traverse catalog modules unless explicitly asked." That single constraint eliminates an entire class of hallucinated cross-module dependencies.

Beyond boundaries, CLAUDE.md should include the project's dependency conventions, any non-obvious path aliasing, and which services are external versus internal. Teams who pre-populate this file with architectural boundaries report fewer hallucinated import paths because they're managing what Claude retrieves before the context window even comes into play. Most articles about Claude Code skip this entirely because it looks like documentation hygiene rather than AI engineering.

For a practical analogy: this is similar in spirit to how negative constraints in prompts outperform purely affirmative instructions. You are not telling Claude what to do; you are narrowing the space it searches.

Subagents Are Context Isolation Units, Not Just Parallelism

Subagents in Claude Code are typically described as a way to parallelize work, and that's accurate but incomplete. A more precise description from Anthropic's own blog: "A subagent is an isolated Claude instance with its own context window that takes a task, does the work, and returns only the final result to the parent." Receiving only the return value is the key part. All of the subagent's exploration, file reads, command outputs, and intermediate reasoning stay in the subagent's context and never pollute the parent session.

In a large codebase, this matters in a specific, non-obvious way. Asking Claude to investigate why a particular API endpoint is slow might involve reading a dozen files, running queries, grepping through logs, and following dependency chains. All of that consumes tokens. If it happens in your main session, you've now spent tens of thousands of tokens on investigation, and the implementation work you do afterward starts in a degraded context. Delegating "use a subagent to investigate the latency in the payments API and return a summary of root causes" means the main context only receives the summary.

An academic analysis of Claude Code's architecture published on arxiv.org describes this as a "five-layer compaction pipeline for context management" combined with "subagent delegation and orchestration." Compaction handles automatic summarization of older context; the subagent mechanism handles isolation of exploratory work. These are separate mechanisms solving different parts of the same problem.

Treat subagents as context isolation units and delegate any task involving significant file traversal or exploratory reasoning before you get to implementation. The instruction syntax is straightforward -- "use subagents to investigate X, then return a structured summary" -- and the payoff is a main session context that has room to hold actual implementation work without degrading.

The Failure Mode Teams Discover Too Late: Skills Load Domain Context On Demand

Subagents and CLAUDE.md address two problems: exploration isolation and retrieval boundaries. Skills address a third problem that's less obvious until you've been running Claude Code on a complex codebase for several weeks.

As Anthropic describes it: "In a large codebase with dozens of task types, not all expertise needs to be present in every session. Skills solve this through progressive disclosure, offloading specialized workflows and domain knowledge that would otherwise compete for context space and loading them only when the task calls for it."

In practice, this means your testing conventions, deployment procedures, database migration patterns, and API versioning rules do not all need to be in the base CLAUDE.md. Each of those domains can be encoded as a skill that loads only when the task explicitly requires it. A session focused on a UI change does not need to load the database migration knowledge. A session focused on a hotfix does not need the full testing playbook.

Teams who don't configure skills end up compensating by putting everything into CLAUDE.md, which grows into a multi-thousand-word document that itself starts to cause context rot. At that point the document competes with the actual code for attention, and Claude's precision on specific tasks degrades as a result. Skills are the mechanism for preventing CLAUDE.md from becoming the problem it was supposed to solve.

This is also where the architectural thinking in Claude Code connects to broader patterns in how AI is changing software development: the discipline is less about prompting and more about system design, specifically about when to load what knowledge into which execution context.

FAQ

Does increasing Claude Code's context window size improve results on large codebases?

Not reliably, and sometimes the opposite. Anthropic's platform documentation explicitly notes that accuracy and recall degrade as token count grows, a phenomenon they call context rot. Controlling what enters the context through CLAUDE.md configuration and subagent delegation is the more precise lever, rather than expanding the window and filling it with broadly scoped files.

What should a CLAUDE.md file actually contain for a monorepo?

At minimum: module ownership boundaries, explicit out-of-scope directories for each task type, non-obvious path aliases, external versus internal service distinctions, and dependency conventions the project uses that differ from standard patterns. It should read less like a README and more like an architectural decision record focused on what Claude should and should not traverse during a given session.

How do subagents in Claude Code differ from just running multiple Claude sessions?

A subagent is an isolated instance managed by the parent Claude Code session. It receives a task, completes it in a separate context window, and returns only the result to the parent. The parent session never accumulates the subagent's file reads, grep outputs, or intermediate reasoning. Running separate sessions manually achieves isolation but not orchestration -- the parent session has no native way to receive structured results from manual sessions and continue implementation seamlessly.

When does Claude Code auto-summarize older context, and does it count against usage?

According to Anthropic's support documentation, when a conversation on a paid plan approaches the context window limit, Claude automatically summarizes earlier messages to make room. This does not count against usage limits. The full chat history is preserved for reference even after portions are summarized.

Teams getting consistent results from Claude Code on large codebases are not the ones with the biggest context windows. They're the ones who configured CLAUDE.md with genuine architectural boundaries, use subagents for any work involving significant file traversal, and encode domain knowledge as skills rather than front-loading it into every session. The model is the same for everyone. The harness is where the gap opens.

Claude Code in Large Codebases: Context Without Chaos

A 1M Token Window Does Not Mean You Should Fill It

CLAUDE.md Is a Retrieval Configuration File, Not a README

Subagents Are Context Isolation Units, Not Just Parallelism

The Failure Mode Teams Discover Too Late: Skills Load Domain Context On Demand

FAQ

…