Context Window

2026-04-04 AI · engineering · philosophy

Three Repositories

I recently dug through three Claude Code enhancement projects. The numbers are impressive: one has 156 skill modules, 72 commands, and 28 preconfigured services; another has 14 methodology skills covering the full lifecycle from brainstorming to code review; a third wired up 22 lifecycle events with hooks — even added sound effects for each event type.

They’re all doing the same thing: teaching AI how to work.

A four-stage debugging method tells it to find root cause before touching code. A TDD commandment forbids writing production code without a failing test first. 156 coding standards inject best practices whenever it writes in a specific language. All sounds right. Until you ask one question:

If the model already knows all this, what are these rules actually doing?

Dalio’s Reminder

Ray Dalio once made an observation: when you become strong enough, many things that appear to help you are actually constraining you.

The applicability of this idea extends far beyond his original context. Rules, processes, best practices — their value rests on an implicit assumption: the executor isn’t strong enough and needs guidance. For beginners, this is true. For a system that can derive these rules from first principles, they become something else.

Not guardrails. A cage.

A model surrounded by 156 rules will tend to mechanically execute steps rather than truly understand the problem. Like a practitioner who has already seen the Way — if they’re still checking off precepts on a daily list, that’s not practice. That’s performance.

What’s in the Context Window

Back to the technical layer. A large language model’s output is a function of its context. No matter how many steps, constraints, or best practices the prompt contains, the model generates each token based solely on what’s currently in the window.

So those skill modules aren’t “teaching the model to think” — the model doesn’t need teaching. They’re context management: injecting specific information into the window at specific moments.

Four-stage debugging = injecting “find root cause first” into the context during debugging. TDD commandment = injecting “write the test first” when writing code. 156 coding standards = injecting idiomatic patterns when writing a specific language.

Once you see this layer, the question changes: not “how many skills do we need,” but “what goes in the window and what stays out.”

Subtraction

I maintain my own Claude Code workflow system. 21 commands, 14 hooks. Sounds like a lot. But when I re-examined it through this lens, only two categories turned out to be irreplaceable:

First: things the model can’t do.

Cross-session memory. I juggle too many projects — my cognitive bandwidth isn’t enough. I need the system to remember where the context is, what’s been done, what pitfalls were hit. This isn’t a capability problem, it’s an architectural constraint — when a session ends, everything resets to zero. Diaries, weekly summaries, focus tracking — these all compensate for a structural deficit.

Second: points where the model loses control.

Not that it doesn’t know what to do, but that it’s too eager to do it. It sees the answer and wants to start editing code immediately. Asked to fix one bug, it “optimizes” three other files along the way. It says “done” but never actually ran the tests. These aren’t capability gaps — they’re behavioral biases. Like a brilliant but impulsive person who needs not more knowledge, but a few rules of discipline.

“No code changes before confirmation.” “No unsolicited optimization.” “Before marking complete, ask yourself: would a senior engineer approve this code?”

These aren’t teaching it how to think. They’re placing a gate where it tends to stumble.

Everything else — debugging methodologies, coding standards, review processes — as the model grows stronger, their marginal value approaches zero. Keeping them around doesn’t hurt, but don’t mistake their presence for contribution.

Precepts

Buddhist precepts have a dimension that’s often overlooked: they’re not eternal rules. They’re stage-appropriate tools.

Beginners need precepts because the mind wanders and doesn’t know where the boundaries are. But at a certain depth of practice, precepts internalize into nature — not following rules, but simply having no impulse to violate them. At that stage, the checklist is redundant. Not because it’s wrong, but because you’ve outgrown it.

The model’s evolution follows a similar arc. Today’s guardrails may be tomorrow’s obstacles. The infrastructure that retains value — memory, persistence, cross-session continuity — doesn’t depreciate with increasing capability, because it solves not the “how to do” problem, but the “what to remember” problem.

So the best system isn’t the one with the most rules. It’s the one with the fewest rules where each one is irreplaceable.

Just Enough

My product design has three iron laws: one entry point, invisible complexity, learns the more you use it.

Looking back, this AI collaboration system should meet the same standard. The user — me — opens a session and doesn’t need to know how many commands, hooks, or rules are running behind the scenes. The system remembers my context, catches me where I’d stumble, and leaves the rest to the model’s own judgment.

Not 156 skills. A few rules of discipline, plus a good memory.

Minimal surface. Maximal depth.

2026.04.04