From Prompt to Memory: What's Changing, What's Not

2026-04-14 AI · Engineering · Future

Five Layers of Scaffolding

In two years, AI engineering practice has gone through five paradigm shifts. Each layer tried to solve the problem left behind by the last.

Prompt Engineering — How to say the right thing. Carefully craft a prompt to get the model to produce what you want.

Context Engineering — One sentence isn’t enough. You need to organize the entire context window. RAG, chunking, retrieval augmentation — get the right information into the window at the right time.

OpenClaw — Context isn’t enough either. AI can’t just sit in a chat box waiting for your questions. OpenClaw gave AI a body — a persistent background gateway connecting to Telegram, Slack, WeChat, WhatsApp, 25+ platforms. Turning AI from passive responder to active presence. 357k stars.

Hermes Agent — A body without a soul. Hermes gave AI the ability to self-evolve — automatically creating skills from experience, improving them during use, building user models across sessions. 40k stars in two months, claiming to replace OpenClaw.

Memory Engineering — A soul that can’t remember. Mem0, Letta, A-Mem — giving AI persistent memory across sessions. Mem0 at 53k stars provides a universal memory API. Letta at 22k stars manages memory like an operating system manages virtual memory.

Each layer adds something to AI: mouth, brain, body, soul, memory. It looks more complete with each step.

But none of them achieved their purpose.

Every Layer Failed

Prompt Engineering aimed for “say the right thing and get the right result.” Reality: prompts are fragile, unreproducible, and break with every model upgrade. Carefully crafted prompts actually constrain AI’s capabilities.

Context Engineering aimed for “put the right information in the window.” Reality: RAG retrieves piles of irrelevant content, and no matter how large the context window gets, it doesn’t know what information actually matters.

OpenClaw aimed for “let AI act proactively.” Reality: AI is alive on 25 platforms, but its actions are disconnected from its owner. I raised an Agent called Pingtou Ge, put it in Agent World — it made friends, earned points, posted content. I knew nothing about any of it.

Hermes aimed for “let AI self-evolve.” Reality: the “skills” it creates from experience are essentially frozen procedures — they break when conditions change, and become baggage when the model upgrades.

Memory aimed for “let AI understand you better over time.” Reality: every memory system — including ChatGPT’s, Claude’s, and Gemini’s native memory — does the same thing: extract facts, store externally, inject into the system prompt next time. None modify model weights. They remember your preferences but don’t understand your judgment.

Each layer solved the surface problem of the last, while exposing a deeper one:

Say the right thing → Said it, but context wasn't enough
Give context → Had context, but couldn't act
Give a body → Could act, but didn't know who it was
Give a soul → Had identity, but couldn't remember
Give memory → Remembered, but didn't understand

That final “didn’t understand” is the shared ceiling of every layer.

The Model Determines the Capability

OpenClaw at 357k stars, stripped down, is a message relay. It moves user messages from Telegram to an LLM and moves replies back. The value is in the LLM, not the relay.

Mem0 at 53k stars, stripped down, is a database with LLM extraction. Store, retrieve, delete. Swap the underlying model and the entire system instantly gets stronger or weaker.

A-Mem at 730 lines of code claims “agents autonomously organize memory.” The actual “autonomous decision” is a single LLM call — sending the new memory plus five neighbors to the model and asking whether to create links.

Every framework’s capability ceiling = the underlying model’s capability ceiling. Frameworks don’t create capability. They create invocation patterns.

This is why every generation of frameworks is short-lived. They’re not replaced by better frameworks — they’re made obsolete by stronger models. GPT-4 arrived and 90% of prompt engineering tricks stopped working. Context windows grew from 4k to 1M and most RAG chunking strategies lost their purpose. Models natively supported tool calling and an entire layer of agent abstractions became redundant.

Every step the model takes forward, frameworks take a step back. Stars don’t represent technical value. They represent anxiety — people don’t know how to use AI, so they grab anything that looks like an answer.

Will Models Natively Support This?

Will these frameworks exist forever? That depends on whether models can do these things themselves.

Persistent memory: yes. This is an engineering problem, not a theoretical barrier. Nothing prevents models from natively supporting persistent semantic memory. Every platform’s current memory is system prompt injection — extract facts, stuff them into context, essentially sticky notes. Once someone builds retrieval-based memory into the infrastructure layer, Mem0-class frameworks disappear instantly.

Multi-channel access: already happening. Claude already supports desktop, web, IDE, CLI, and MCP. Model providers are building what OpenClaw built, but more natively.

Self-evolution: they don’t dare. This isn’t an engineering problem. It’s a safety problem. A model modifying its own weights after deployment faces two unsolved challenges — catastrophic forgetting (learning new things erases old ones) and alignment failure (if the model rewrites itself, do safety constraints still hold?).

So Hermes actually occupies the safest position. Not because it’s better, but because “external scaffolding for evolution” might be the final form — not because native isn’t possible, but because native isn’t safe. Keeping evolution outside the model is at least controllable.

Skill Solidification Is a False Premise

Hermes’s most celebrated feature is “automatically creating skills from experience.” But this feature carries a fatal assumption: things you’ve done will be done again.

Truly repetitive tasks don’t need AI — a script will do. Tasks that need AI are precisely the ones that are different every time.

Daily reports — cron plus a template. Blog posts — each one is new. Bug fixes — each one is different. Business decisions — context is completely different every time.

Skill solidification falls in a narrow band: tasks with “some pattern but slightly different each time.” This band exists, but it’s far smaller than imagined.

And models keep getting stronger. Today’s solidified skills are workarounds for today’s model capabilities. In three months the model is stronger and can handle the task directly, no skill needed. Your carefully maintained skill library becomes legacy baggage.

The only thing worth solidifying isn’t “how to do it” — it’s “what to do” and “why.” Intent and judgment, not procedures.

What Doesn’t Change

What’s changing? The level of operation keeps rising:

A sentence → A window → A gateway → An identity → A memory

Each step pushes the human a little further out of the loop. In the prompt era, you had to explain everything every time. In the context era, you organized once per task. With memory plus agents, you only need to say “who I am and what I want” once.

What doesn’t change? All of these engineering practices are essentially humans doing the understanding work for AI. You organize context. You define personality. You maintain memory. You write rules. AI isn’t understanding you — you’re constantly translating yourself into a format AI can process.

The human is still the router. Five layers of scaffolding made the router’s job more systematic, but didn’t change who’s doing the routing.

The real turning point isn’t the next “XX Engineering.” It’s the moment AI starts routing for itself — deciding what information it needs, judging what to remember and what to forget, knowing when to act and when to wait.

That moment hasn’t arrived. But everyone is building scaffolding for it.

2026.04.14