Your agents need runbooks, not bigger context windows

Been reading for a while? Support our work by becoming a paid subscriber.

Your agents need runbooks, not bigger context windowsPut data, machine learning, and AI to work.
͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     
Forwarded this email? Subscribe here for more
Your agents need runbooks, not bigger context windows
Ben Lorica 罗瑞卡
Feb 10

READ IN APP

Subscribe • Previous Issues
Why Your AI Agents Need Operational Memory, Not Just Conversational Memory
Now that AI agents are moving out of the lab and into the real world, we’re realizing that “memory” isn’t one-size-fits-all. Most people think of agent memory like a personal assistant. It remembers your preferences, your travel plans, and the email you composed last week. This is great for a research copilot or a personal coach where the goal is a long-term relationship.
But if you’re deploying agents to handle back-office operations like fixing data pipelines or managing APIs, you don’t need a persona. You need reliability. I started looking into this because I saw teams struggling to make agents work in high-stakes, repetitive environments. In these cases, the goal isn’t conversational flair. Instead, it’s making sure the agent doesn’t have to reinvent the wheel every time it performs a task. By separating how an agent “thinks” from how it “acts,” we can finally scale automation without the massive costs and unpredictability that usually come with it.
Been reading for a while? Support our work by becoming a paid subscriber.
Upgrade to paid
Constraints of Infinite Context and the Limits of Modern Transformer Architectures
In a production environment, the "just add more context" strategy hits a wall very quickly. Even though context windows have grown massive, the math behind them is still expensive. Because computational work grows quadratically with input length, the more information you dump into a prompt, the slower the agent gets. A task that should take one second can balloon into thirty seconds just because the context is too large. There is also the “lost-in-the-middle” effect to worry about. When prompts get too long, agents often lose track of the information buried in the center. Research shows that giving an agent small, focused snippets is often twice as accurate as making it read a giant document.
Beyond the technical overhead, there is a structural problem with how agents handle enterprise infrastructure. If you have hundreds of internal tools and APIs, describing them all can eat up your prompt’s token budget before the agent even starts its work. Most importantly, today’s architectures don't allow for organizational learning. Every time an agent runs a task, it treats it like a brand new puzzle. It pays the full "thinking cost" of planning and discovery every single time. Without a way to save and reuse a successful workflow, the system never gets faster or cheaper. It just stays stuck in a loop where the 1,000th execution is just as difficult as the first.
The root of this problem is that we are treating the agent’s context window like RAM: a volatile workspace that wipes clean the moment a task is finished. In a traditional computing stack, you wouldn’t reload your entire operating system every time you wanted to open a text file. But in the current agent paradigm, we force the model to “re-boot” its understanding of our infrastructure for every single request. This creates a massive Context Tax — a recurring overhead where you pay in both latency and tokens to re-teach the agent things it should already know.
The Attention-Window Workarounds: Current Memory and Context Architectures
Most current ways of handling agent memory focus on finding information rather than reusing successful actions. Take Retrieval-Augmented Generation (RAG) for example. It is great at fetching facts to help an agent answer a question. This helps avoid the "lost-in-the-middle" problem. But RAG is built to find documents and not to remember how a job was actually done. It can find a technical runbook for you. However, it cannot remember that a specific five-step sequence of API calls fixed a database issue last week. Every time a workflow starts, the agent has to read documentation and plan its steps from scratch.
More advanced agent orchestration frameworks try to fix "catalog bloat" by loading tool definitions only when they are needed. This saves money on tokens, but it still doesn't help the organization learn. Other systems focus on "stateful continuity." This ensures an agent remembers who you are and what you like over several months. That is perfect for a personal assistant, but it doesn't help with operational tasks. If an API is broken or a tool description is confusing, the agent will just keep making the same mistake every time it runs. We also see agents that write code on the fly. This often leads to "workspace chaos" where you end up with hundreds of one-off scripts that nobody has verified or tracked.
In the end, current memory methods force agents to remain in a state of perpetual improvisation. Because there is no way to “crystallize” a successful solution into a reusable procedure, the organization never develops an institutional learning curve. To really scale in an enterprise, agents need to stop just managing what is in their immediate context and start building a permanent library of proven procedures. This transforms one-off successes into reliable, repeatable assets.
The Context File System: Memory for How Work Gets Done
To fix these structural bottlenecks, we are seeing the rise of what dex calls a Context File System (CFS). You might also hear this more broadly categorized as an Operational Skill Store. This architecture separates the expensive reasoning of a large language model from the actual storage of operational knowledge. It mirrors the way a mature engineering team works. Senior staff solve a novel problem once and then document the solution in a runbook so the task becomes routine. By turning successful executions into permanent and reusable procedures, a CFS transforms agents from improvisational tools into reliable enterprise infrastructure.