Context Memory and Search: The Secrets to Effective Agentic Work

Why search and memory systems matter more than smarter AI models

Feb 26, 2026

What Makes AI Coding Effective

Last weekend, working on performance improvements to my MCP vector search engine, I noticed something. The breakthrough in AI coding isn’t smarter models — it’s information architecture. The tools that actually work aren’t necessarily the ones with the biggest context windows. They’re the ones that find the right context and remember what matters.

Here’s what I mean. I’ve been using search and memory together long enough that I don’t think about them anymore. My prompts have gotten measurably shorter — an analysis of my sessions shows prompts averaging 12-15 words in mid-2025 dropping to 6-8 words now. “Check logs.” “What’s the command to quantize the index?” I just assume the agents will find the context they need. When I stepped back and thought about what changed, it came down to two things: Search and Memory.

You can see this pattern across successful AI coding tools. Claude MPM consistently outperforms Claude Code on its own — not because the underlying agentic AI differs, but because MPM brings the right context to the agents rather than flooding them with everything. Tools like Augment and Cursor have made similar investments in context retrieval. The winning tools aren’t the ones with the smartest models. They’re the ones that solved information architecture.

Search: Why Bigger Context Windows Aren’t the Answer

The promise of massive context windows is seductive: dump your entire codebase into the AI and let it figure out what’s relevant. The research tells a different story.

Liu et al.’s 2023 paper “Lost in the Middle: How Language Models Use Long Contexts” documented a U-shaped performance curve: models process information well at the beginning and end of long contexts, but performance drops 30% or more when relevant information is buried in the middle. This has been replicated across models since. Feeding a 500K-line codebase into Claude’s context window actually makes it worse at finding relevant patterns than targeted search.

Large context approach: AI gets overwhelmed. Focuses on random details buried in the middle of files. Expensive to run.

Search approach: AI gets exactly what it needs. Finds patterns quickly. Much cheaper to operate.

When you ask for “all authentication code that handles OAuth,” a semantic search returns exactly that — not every file that mentions the word “auth.” The AI gets relevant context, not noise.

The big AI vendors haven’t solved this yet. OpenAI and Anthropic are focused on the language models themselves. Neither has built search integration into their core products. The reasons are understandable — it’s genuinely hard to install and configure, and most users don’t work with enough data at once to need it. A simple find command covers most cases. But for serious engineering work on large codebases, the gap is real and growing.

Memory: Building on Previous Work Instead of Starting Over

Without memory that persists between sessions, every interaction starts from zero. The AI relearns your codebase, your patterns, your preferences each time. This isn’t just inconvenient — it’s a fundamental barrier to longer-running, multi-session agentic work.

Both OpenAI and Anthropic have shipped memory systems. They took different approaches.

OpenAI’s approach is user-centric — it remembers across all conversations, coding style, project preferences, common patterns. The interesting part: it includes personalized filtering that adjusts based on what it remembers about you. The downside is that it’s user-wide, not project-specific. Working across very different projects means the memory accumulates conflicting patterns.

Anthropic’s approach is project-based. Memory lives in CLAUDE.md files you can read and edit directly — you know exactly what the AI remembers about your project. The limitation is fading memory as files grow large; when a CLAUDE.md hits context window limits, older memories get pushed out.

Both reveal the same truth: memory isn’t just storage. It’s continuity across complex workflows.

There’s a subtler problem neither addresses well: your understanding evolves. Early assumptions might be wrong. Initial decisions might not hold up. A memory system that weights everything equally anchors the AI to outdated context. This is why I built Kuzu Memory as a graph storage system with temporal decay — more recent memories rank higher than older ones. I’m using it in this writing project, and it makes a real difference on long work streams where your thinking changes over time.

The market is fragmented right now: memory without search (OpenAI, Anthropic) or search without managed memory (most code tools). The tools that combine both — like MPM with Kuzu and MCP vector search — are ahead of where the mainstream market will be.

What You Can Do Today

If you want to try this yourself:

For search: MCP vector search now includes code review. It finds relevant patterns across your codebase without flooding the AI with irrelevant information. Works with any MCP-supporting framework — Claude Code, Codex, Gemini.

For memory: Kuzu Memory uses graph storage with temporal decay. Recent information ranks higher than older information — crucial for projects where your understanding evolves.

The specific tools matter less than the principle. Agentic workflows are longer-running and more complex than chat. They require building on previous work, not rebuilding context from scratch every session. The AI systems that enable this aren’t necessarily the smartest — they’re the ones that remember what matters and find what’s relevant.

The proof is in the prompts. If your queries to AI are getting shorter over time, your information architecture is working.

Bob Matsuoka is CTO of Duetto and writes about AI-powered engineering at HyperDev.

Related reading:

If Your Coding Agent Can’t Search — Why search capability is the missing piece in most AI coding setups
Why I Built My Own Multi-Agent Framework — The reasoning behind MPM and why delegation-first architecture matters
AI Power Ranking — Tool comparisons and benchmarks for AI practitioners
LinkedIn Newsletter — Strategic AI insights for CTOs and engineering leaders

Discussion about this post

Ready for more?