Where Does Scaling Law End?

An Experiment

I spent a long time trying to mathematically formalize the I Ching.

384 line texts, 64 hexagrams, 6 positions — it looks like an exquisitely crafted encoding system. I threw everything at it: statistics, information theory, machine learning, permutation tests. The conclusion was clear: structure explains 14% of the variance. Position effect is the only signal. The remaining 86% cannot be derived from structure.

14%. Not more, not less. Six models, three measurement methods, all converging to the same number.

Then I tried quantifying the information value of structure using MDL (Minimum Description Length): the six-position structure saves roughly 23–44 bits across all 384 line texts. About 0.06 bits per line. Real, but tiny.

What does this mean? The I Ching’s structure is real — it’s not random — but structure does far less than intuition suggests.

A Divination

After finishing the experiments, I casually cast a hexagram.

The question was about a conflict with a friend. I got Hexagram 9 (Small Taming), with lines 3, 4, and 5 changing. Line 3: “The cart loses its axle pins; husband and wife turn away from each other.” Line 4: “With sincerity, blood departs and fear emerges — no blame.” Line 5: “Bound in sincerity, sharing wealth with neighbors.” The resulting hexagram was Kui (Opposition): “Small matters, auspicious.”

In plain language: the rift is real — but with sincerity, the hurt will pass and the fear underneath will surface — if trust holds, the relationship deepens — don’t try to fix everything at once, handle the small things.

Strikingly accurate. Not “vaguely relevant” accurate. “Husband and wife turn away” precisely describing the current state of affairs accurate.

I ran a permutation test: randomly drawing three lines from 384, the probability of matching this relevance score is 1%. The probability of forming a “conflict → sincerity → repair” narrative arc is 1 in 532.

What made it more interesting: the hexagram was cast first, the question came after. I asked the program to generate a random hexagram, and while it was still computing, I randomly thought of this question. Two completely independent causal chains — one a deterministic transformation of system clock microseconds through a pseudorandom number generator, the other my stream of consciousness. No causal channel connects them. Yet the result was statistically significant.

I can’t explain this. A single observation can’t distinguish between “it’s that 1%” and “an unknown mechanism exists.” But this wasn’t my first time encountering something like this.

The Boundary of Discrimination

This made me revisit a question: what is reasoning, really?

The Yogācāra school of Buddhism divides mind into eight layers. The first five are sensory consciousness. The sixth is mental consciousness — responsible for discrimination, categorization, judgment — what we call “reasoning.” The seventh is manas — stamping every experience as “mine.” The eighth is ālayavijñāna — the storehouse of seeds.

The crucial point is the direction. It’s not the world coming in to be processed — it’s seeds projecting outward from within. The ālayavijñāna is the film library, manas is the projectionist who won’t let go of the reel, consciousness is the image on the screen, and the five senses are the audience convinced they’re watching something real.

Buddhism calls what’s on the screen “xiāng” — appearances. All appearances are illusory — not that the world doesn’t exist, but that what you experience is a reconstruction, not the original signal.

And “reasoning” is just operations on appearances. Illusion built on illusion.

What LLMs Consume

Large language models consume text. What is text? The product of humans encoding their experience into words — appearances, encoded.

So what LLMs learn isn’t the world. It’s patterns between appearances. They model on top of illusion, and the model is staggeringly precise — but it never touches what lies beneath.

This isn’t a flaw of LLMs. It’s a precise engineering demonstration of the Buddhist point: operating purely at the level of appearances, never touching reality, can produce astonishingly convincing “intelligence.” The discriminating mind can be infinitely refined, but it forever operates within appearances.

What Scaling Law Is Scaling

Back to the scaling law. Loss decreases with more parameters, following a power law — each doubling yields a little less improvement. The curve descends but never reaches zero.

What is that unreachable floor?

Stack different frameworks together:

It’s the same structure. What scaling law is scaling is coverage over the totality of appearances.

From 10% to 50% to 90%, each step delivers enormous practical value. Code gets better, writing more fluent, reasoning more convincing. But 100% is unreachable. Not because of insufficient compute, but because some things aren’t in the domain of patterns.

But the Totality of Appearances Is Larger Than We Think

Anthropic’s CEO Dario Amodei recently said: the scaling law hasn’t hit a wall, and 2026 will see radical acceleration. He used the rice-on-a-chessboard analogy — “We’re standing on the 40th square. All the shocks from the first 39 squares combined are just a fraction of the last 24.”

Does this contradict the existence of a floor?

No. Because the “wall” people talked about was the wall of text-based appearances — existing human text is nearly exhausted, projected to run out between 2026 and 2028. But text is only a narrow slice of appearances.

The totality of appearances is far larger than text:

Every time a new door opens, an entire new space of appearances floods in. Scaling law isn’t a line hitting a wall — it’s a line switching tracks. Each new track, the curve starts fresh.

The Flywheel

But what truly excites Dario — and equally alarms him — may not be the new tracks themselves, but a flywheel:

In the past, data came from humans. Humans wrote articles, wrote code, took photos. Models learned. Humans were the bottleneck.

Now, agents can act on their own. Claude Code writes code, runs it, reads the output. Computer Use operates software. MCP protocol connects to services. Every action generates new data — new appearances.

These new appearances feed back into the model. The model improves. The agent can do more. More appearances are generated.

Agent acts → generates new appearances → trains better model → stronger agent → ...

This is a self-sustaining cycle that no longer depends on human data. Once it starts spinning, the totality of appearances isn’t a fixed mine — it’s being continuously generated. As long as agents are interacting with the world, appearances won’t run out, and scaling law has fuel.

This also explains why Dario is simultaneously accelerating and desperately researching safety — because once the flywheel starts spinning, it’s not easy to stop.

So

Where does scaling law end?

Perhaps it doesn’t. As long as new channels of perception keep opening, there are new spaces of appearances to cover. The emergence of agents means it’s no longer just humans opening doors — AI is opening them too.

But it will never reach beyond the boundary of appearances. Loss can asymptotically approach zero, but it won’t equal zero. Because some things — whether you call it reality, the Dao, or something else — aren’t in the domain of patterns.

384 line texts can’t cover it. A hundred billion parameters can’t cover it.

But perhaps that was never the point.