<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Hyperdev: From The Trenches]]></title><description><![CDATA[“From the Agentic Trenches” is a field report series capturing real-world experiences, failures, workarounds, and lessons from building, testing, and shipping in AI-assisted software environments.]]></description><link>https://hyperdev.matsuoka.com/s/from-the-trenches</link><image><url>https://substackcdn.com/image/fetch/$s_!j9a7!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab665959-5546-4469-9e93-9e1518976e2b_1024x1024.png</url><title>Hyperdev: From The Trenches</title><link>https://hyperdev.matsuoka.com/s/from-the-trenches</link></image><generator>Substack</generator><lastBuildDate>Wed, 22 Apr 2026 02:49:20 GMT</lastBuildDate><atom:link href="https://hyperdev.matsuoka.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Robert Matsuoka]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[hyperdev@matsuoka.com]]></webMaster><itunes:owner><itunes:email><![CDATA[hyperdev@matsuoka.com]]></itunes:email><itunes:name><![CDATA[Robert Matsuoka]]></itunes:name></itunes:owner><itunes:author><![CDATA[Robert Matsuoka]]></itunes:author><googleplay:owner><![CDATA[hyperdev@matsuoka.com]]></googleplay:owner><googleplay:email><![CDATA[hyperdev@matsuoka.com]]></googleplay:email><googleplay:author><![CDATA[Robert Matsuoka]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Is This The Era of the Connector?]]></title><description><![CDATA[Go To Where The People Are]]></description><link>https://hyperdev.matsuoka.com/p/is-this-the-era-of-the-connector</link><guid isPermaLink="false">https://hyperdev.matsuoka.com/p/is-this-the-era-of-the-connector</guid><dc:creator><![CDATA[Robert Matsuoka]]></dc:creator><pubDate>Thu, 26 Mar 2026 12:32:38 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!p4t8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F908c25b0-f6df-4b53-bc8c-0984c5c409df_1024x819.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!p4t8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F908c25b0-f6df-4b53-bc8c-0984c5c409df_1024x819.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!p4t8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F908c25b0-f6df-4b53-bc8c-0984c5c409df_1024x819.png 424w, https://substackcdn.com/image/fetch/$s_!p4t8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F908c25b0-f6df-4b53-bc8c-0984c5c409df_1024x819.png 848w, https://substackcdn.com/image/fetch/$s_!p4t8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F908c25b0-f6df-4b53-bc8c-0984c5c409df_1024x819.png 1272w, https://substackcdn.com/image/fetch/$s_!p4t8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F908c25b0-f6df-4b53-bc8c-0984c5c409df_1024x819.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!p4t8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F908c25b0-f6df-4b53-bc8c-0984c5c409df_1024x819.png" width="1024" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/908c25b0-f6df-4b53-bc8c-0984c5c409df_1024x819.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1414425,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/192036152?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3d13406-f339-4790-b5f2-7117dd24336e_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!p4t8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F908c25b0-f6df-4b53-bc8c-0984c5c409df_1024x819.png 424w, https://substackcdn.com/image/fetch/$s_!p4t8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F908c25b0-f6df-4b53-bc8c-0984c5c409df_1024x819.png 848w, https://substackcdn.com/image/fetch/$s_!p4t8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F908c25b0-f6df-4b53-bc8c-0984c5c409df_1024x819.png 1272w, https://substackcdn.com/image/fetch/$s_!p4t8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F908c25b0-f6df-4b53-bc8c-0984c5c409df_1024x819.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>TL;DR</h2><ul><li><p>Users consolidate around 4 core platforms (Slack, Notion, Email+Office, AI Tool) while rejecting standalone/SASS tools</p></li><li><p>Connectors that bring data to users beat new tools that require visits</p></li><li><p>Infrastructure breakthrough: Slack manifests + MCP protocol + LLM services make org-specific connectors trivial</p></li><li><p>Democratization effect: Bootcamp engineers can now build sophisticated integrations that once required senior developers</p></li><li><p>Production evidence: 3 connectors (4-6 hours each, $1-2.5K in AI tokens) replaced 5-6 standalone tools that would cost $150-300K+ traditionally</p></li><li><p>Tolerance for &#8220;broad-based general tools&#8221; declining &#8212; UX mindshare captures traffic even when APIs do the work</p></li></ul><div><hr></div><p>In the past two weeks, I&#8217;ve built three connectors that collectively replaced what would have been five or six standalone tools.</p><p><strong>Engineering Search Connector</strong>: Hosted semantic and knowledge graph search service built by repurposing mcp-vector-search. Unified search across 150+ GitHub repos, 1,700+ wiki pages, and ticket systems. Accessible through Slack bot, web interface, CLI, and MCP connector for Claude.AI that brings engineering knowledge to where people already work.</p><p><strong>CRM Data Connector</strong>: Live customer data piped directly into Claude.AI sessions via MCP. No dashboard to check, no reports to generate. Ask &#8220;What&#8217;s our pipeline this quarter?&#8221; and get live data in 1-3 seconds.</p><p><strong>Document Workflow Connector</strong>: Artifact browser and guided PR workflow for non-technical contributors. Product managers can explore and propose changes to structured docs without touching git or learning new interfaces.</p><p>None of these required users to adopt a new primary tool. Each brings specialized functionality to platforms they already inhabit daily. And each took roughly 4-6 hours to build (plus agent time).</p><p>This isn&#8217;t a productivity humble-brag. It&#8217;s evidence of a fundamental shift in how organizations interact with their data. We&#8217;re entering the connector era &#8212; building bridges between specialized intelligence and the handful of platforms where users actually live, rather than standalone applications they have to visit.</p><p>The numbers support this pattern. Users toggle between apps 1,200 times daily, losing 40% productivity to context switching. Connector ecosystems are exploding: Slack&#8217;s marketplace hosts 2,600+ apps with 550K+ daily custom integrations. The MCP protocol went from 100K to 8M downloads in six months &#8212; unprecedented adoption for plumbing infrastructure.</p><p>The question isn&#8217;t whether Slack, Notion, and Claude.AI will survive the AI wave. It&#8217;s whether the hundreds of specialized tools competing for attention understand that the game has changed. Users have less tolerance for broad-based general tools than they once did. The platforms that capture UX mindshare will get most of the traffic, even if APIs and agents do the actual work behind the scenes.</p><p>The evidence is clear from user behavior: they don&#8217;t want to learn a new search interface, remember another login, or context-switch to yet another tab. They want the intelligence layer to meet them where they already are.</p><h2>The Source of Truth Problem</h2><p>Most organizations have a source-of-truth problem they haven&#8217;t fully articulated. They have Slack for real-time communication. They have Notion or Confluence for documentation. They have Google Docs for drafts that become documents that become outdated that stay around anyway. They have JIRA for tickets that may or may not reflect what was actually decided. They call this a &#8220;knowledge management system.&#8221; It&#8217;s more accurately a distributed archive of partially-intentional artifacts with no clear authority hierarchy.</p><p>The question &#8220;who owns this decision?&#8221; leads to a Slack thread from eight months ago, a Notion page that three people edited and nobody is certain is current, and a Google Doc someone linked in a comment that requires permission to access. This is the status quo. It functions, after a fashion, because humans are good at triangulating across ambiguous sources and asking colleagues to fill gaps.</p><p>AI agents are not good at this. They will confidently synthesize the eight-month-old Slack thread with the outdated Notion page and present the result as a coherent answer. The errors won&#8217;t be obvious. They&#8217;ll be subtly wrong in ways that require domain expertise to catch.</p><p>The source of truth problem was always real. It was manageable when every query ran through a human brain. It becomes actively dangerous when queries run through an inference layer first.</p><p>What you actually need &#8212; what organizations are starting to build &#8212; is a repository where the data structure enforces truth. Not a place where the right answer might be findable if you look hard enough. A place where the structure of the data makes the wrong answer harder to produce.</p><p>But here&#8217;s the connector insight: that structured repository doesn&#8217;t need to be where users spend their time. It can be the authoritative backend that feeds connectors in the platforms users already inhabit.</p><h2>Where Users Actually Live</h2><p>User attention has consolidated around four core platforms:</p><p><strong>Slack</strong>: Real-time coordination, team presence, ephemeral decisions. 32.3 million daily active users with 550K+ custom integrations daily.</p><p><strong>Email + Office Suite</strong>: Formal communication, document collaboration, external stakeholder interface. Microsoft reports 400M+ Office 365 commercial users.</p><p><strong>Notion</strong>: Knowledge management, project tracking, collaborative documentation. 100M+ users consolidating entire productivity stacks.</p><p><strong>Claude.AI</strong>: AI assistance, analysis, content generation. Rapidly becoming the default interface for LLM interactions across knowledge work.</p><p>Each platform serves a legitimate core function. Tool builders make the mistake of assuming they can compete for primary platform status by building something better. Users are done adopting new primary platforms. They&#8217;re consolidating around tools that already have their attention.</p><p>The pattern reveals a deeper truth: people live in transactional systems, not knowledge systems. Slack is where decisions happen. Email is where approvals flow. Claude.AI is where analysis gets done. These are transactional - work happens there daily.</p><p>Confluence is a perfectly good wiki tool. But it&#8217;s knowledge-at-rest, not transactional. People don&#8217;t live there. They visit when forced to document something, then return to their transactional workflows. The knowledge gets stale because maintenance happens in a different system than usage. (Notion manages to straddle the line between knowledge at rest and transactional)</p><p>Integration platforms like Zapier understand this - they connect 8,000+ apps with 3.4M+ business users by bringing specialized functionality to existing workflows rather than creating new destinations.</p><p>Users just want the data, dammit. They don&#8217;t want to learn your interface.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!prOp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba561016-ed0c-4147-9623-63f9996b9b13_1024x806.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!prOp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba561016-ed0c-4147-9623-63f9996b9b13_1024x806.png 424w, https://substackcdn.com/image/fetch/$s_!prOp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba561016-ed0c-4147-9623-63f9996b9b13_1024x806.png 848w, https://substackcdn.com/image/fetch/$s_!prOp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba561016-ed0c-4147-9623-63f9996b9b13_1024x806.png 1272w, https://substackcdn.com/image/fetch/$s_!prOp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba561016-ed0c-4147-9623-63f9996b9b13_1024x806.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!prOp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba561016-ed0c-4147-9623-63f9996b9b13_1024x806.png" width="1024" height="806" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ba561016-ed0c-4147-9623-63f9996b9b13_1024x806.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:806,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1186021,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/192036152?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3416bd4-5246-41d4-9f66-0351982baeb1_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!prOp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba561016-ed0c-4147-9623-63f9996b9b13_1024x806.png 424w, https://substackcdn.com/image/fetch/$s_!prOp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba561016-ed0c-4147-9623-63f9996b9b13_1024x806.png 848w, https://substackcdn.com/image/fetch/$s_!prOp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba561016-ed0c-4147-9623-63f9996b9b13_1024x806.png 1272w, https://substackcdn.com/image/fetch/$s_!prOp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba561016-ed0c-4147-9623-63f9996b9b13_1024x806.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>The Connector Infrastructure Moment</h2><p>What changed? Three pieces of infrastructure matured simultaneously:</p><p><strong>Slack Manifest Tool</strong> makes organization-specific bots trivial to build. The manifest.yaml format standardizes permissions, scopes, and deployment. Weeks of OAuth wrestling became hours of configuration.</p><p><strong>MCP Protocol</strong> achieved &#8220;USB-C for AI&#8221; universal connectivity. Claude.AI, ChatGPT, and dozens of platforms support the same connector format. Build once, deploy everywhere. The 100K to 8M download growth in six months reflects pent-up demand.</p><p><strong>LLM Services</strong> like Bedrock and OpenRouter provide natural language interfaces that make connectors intelligent rather than just data pipes. Ask questions in plain English, get structured responses, maintain conversation context.</p><p><strong>Semantic Search Infrastructure</strong> like mcp-vector-search can be repurposed as hosted services, adding intelligence layers that understand meaning rather than just matching keywords. This transforms basic data access into contextual knowledge retrieval &#8212; a crucial enabler for connectors that need to surface relevant information rather than exact matches.</p><p>Combined, you can build a production connector in a single afternoon. Slack manifest defines the bot interface. MCP schema defines the data sources. Semantic search handles intelligent retrieval. Bedrock provides the language understanding. Deploy to AWS Lambda and you&#8217;re live.</p><p>My three connectors follow this exact pattern. The engineering search connector repurposes mcp-vector-search as a hosted service with all-MiniLM-L6-v2 embeddings for semantic and knowledge graph search, but the user interface is just Slack commands and Claude.AI MCP tools. The CRM data connector is a headless AWS service that makes customer data available through natural language queries in Claude.AI. The document workflow connector provides git workflows through a web UI that non-technical users can navigate.</p><p>Each connector took 4-6 hours to build. Each would have taken 4-6 months to build as a standalone application with user management, authentication, interface design, mobile responsiveness, and all the infrastructure a &#8220;real app&#8221; requires.</p><p><strong>The Democratization Effect</strong>: The infrastructure shift goes beyond development speed &#8212; it&#8217;s democratizing who can build sophisticated integrations. What once required senior engineers with deep API knowledge can now be handled by bootcamp graduates following established patterns. I built these first three connectors to validate the approach, but similar projects will go to junior engineers going forward.</p><p>This changes resource allocation fundamentally. Organizations can solve integration problems without burning senior engineering cycles on &#8220;plumbing&#8221; work. Information that was once very hard to obtain is now trivial to access.</p><p>The economics are compelling. Building three production connectors cost roughly $1,000-2,500 in AI tokens over 44 days. Traditional contractor development for equivalent functionality would have run $150-300K+. The connector approach isn&#8217;t just faster &#8212; it&#8217;s 100x more cost-effective.</p><p>The adoption metrics prove the value. The CRM connector launched March 18th with 23 invocations on day one. No formal rollout, no training sessions, no onboarding docs. Just organic discovery across a 300+ person company. By week two, daily usage tripled to 95 invocations per day. Tuesday hit 152 invocations &#8212; including a 40-query analysis session in a single hour. That&#8217;s 299 queries in 7 days with zero errors, from a connector that took 4-6 hours to build.</p><p>The era isn&#8217;t about choosing between platforms. It&#8217;s about connecting specialized intelligence to the platforms users have already chosen.</p><h2>Why Wikis Can&#8217;t Compete in the Connector World</h2><p>Traditional knowledge management tools face a structural mismatch in connector architecture. Wikis assume users will &#8220;go to the tool&#8221; for information. Connectors flip that assumption: the tool comes to the user.</p><p>This creates specific problems:</p><p><strong>The Authoring/Retrieval Tension</strong>: Wikis optimize for collaborative authoring &#8212; anybody can edit, flexible structure, link everything, evolve over time. This is the opposite of what retrieval needs: consistent schema, clear ownership, explicit governance. When you pipe wiki content through a connector, you inherit all the inconsistencies that collaborative authoring creates.</p><p><strong>Search Architecture Limitations</strong>: Confluence&#8217;s search is notoriously bad because it does keyword matching on unstructured text. This was problematic before LLMs. With LLM-powered connectors, it becomes worse because the AI layer adds confidence to bad retrieval results. Users get wrong answers delivered with conviction.</p><p><strong>Static Data Problem</strong>: Notion&#8217;s AI operates on static content snapshots, disconnected from real-time operational state. When CRM connectors query &#8220;What&#8217;s our pipeline this quarter?&#8221; through a Notion connector, it&#8217;s answering based on what someone wrote about the pipeline, not live customer data. The connector amplifies the staleness problem.</p><p><strong>Governance at Scale</strong>: Wiki governance defaults to &#8220;community-maintained,&#8221; which means in practice nobody is responsible for accuracy. As organizations scale, wikis accumulate pages nobody knows are outdated. Connectors don&#8217;t solve this &#8212; they accelerate the distribution of stale information.</p><p>Our structured document framework represents the alternative: git-backed Markdown with schema-validated YAML frontmatter. Every document has explicit metadata: owner, status, domain, confidence, time_box. The structure is the feature. When document workflow connectors expose this, the schema ensures consistent data quality regardless of interface.</p><p>Structured document repositories outperform wikis for AI query by 35-60% in controlled tests. Clean Markdown with explicit metadata reduces token usage by 20-30% and improves retrieval accuracy significantly. This isn&#8217;t philosophical &#8212; it&#8217;s measurable.</p><p>Wikis remain useful for collaborative drafting and evolving reference material. But they&#8217;re not the right backend for connector architecture. The connector era requires structured data sources that can maintain quality across multiple interface layers.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NjzV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F879cc371-1f68-486d-8602-c68b47411a94_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NjzV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F879cc371-1f68-486d-8602-c68b47411a94_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!NjzV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F879cc371-1f68-486d-8602-c68b47411a94_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!NjzV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F879cc371-1f68-486d-8602-c68b47411a94_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!NjzV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F879cc371-1f68-486d-8602-c68b47411a94_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NjzV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F879cc371-1f68-486d-8602-c68b47411a94_1024x1024.png" width="1024" height="1024" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/879cc371-1f68-486d-8602-c68b47411a94_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1015968,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/192036152?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F879cc371-1f68-486d-8602-c68b47411a94_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!NjzV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F879cc371-1f68-486d-8602-c68b47411a94_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!NjzV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F879cc371-1f68-486d-8602-c68b47411a94_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!NjzV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F879cc371-1f68-486d-8602-c68b47411a94_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!NjzV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F879cc371-1f68-486d-8602-c68b47411a94_1024x1024.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Building Connectors vs. Standalone Tools</h2><p>The strategic choice organizations face isn&#8217;t &#8220;which tool should we build?&#8221; but &#8220;should we build a tool or a connector?&#8221; My experience with three connectors illuminates the trade-offs:</p><p><strong>Engineering search</strong> could have been a standalone search platform. Instead, it&#8217;s accessible through Slack commands, CLI tools, web interface for visualizations, and MCP tools for Claude.AI sessions. Same search capability, four different interaction models depending on user context.</p><p><strong>CRM integration</strong> could have been a dashboard with charts and filters. Instead, it&#8217;s a headless MCP service that makes customer data available through natural language in Claude.AI. Ask &#8220;Show me deals over $100K in our target vertical&#8221; and get live results in 1-3 seconds. No dashboard to learn, no visual interface to maintain.</p><p><strong>Document workflows</strong> could have been a product management SaaS platform. Instead, it&#8217;s a guided workflow that helps non-technical contributors interact with existing git-backed document frameworks. Browse artifacts, generate AI summaries, submit PRs &#8212; all through interfaces that match users&#8217; technical comfort levels.</p><p>Specialized intelligence delivered through platforms users already inhabit. The connector approach wins on several fronts.</p><p>Development time: 4-6 hours vs. months. No user management, authentication, responsive design, or mobile apps to build.</p><p>Adoption friction: Zero onboarding. No new logins, training sessions, or change management overhead.</p><p>Maintenance burden: Focus on data logic and intelligence, not interface maintenance across device types and browser versions.</p><p>Integration: Connectors compose naturally with existing workflows. Slack discussions can include live Salesforce data. Claude.AI analysis can pull from engineering knowledge graphs. Standalone tools require export/import workflows.</p><p>The business case is compelling: connector development costs 10-20% of standalone application development while achieving 3-4x higher user engagement.</p><p>The implications go beyond development efficiency. Users have less tolerance for &#8220;broad-based general tools&#8221; than they once did. Managing dozens of application contexts creates unsustainable cognitive load. Platforms that capture daily attention get most of the traffic, even when APIs and agents do the computational work behind the scenes.</p><p>This creates different winner-take-all dynamics. The winners aren&#8217;t necessarily the best tools. They&#8217;re the platforms users choose to inhabit, plus the connectors that bring specialized capability to those platforms.</p><h2>What This Means for Your Stack</h2><p>The connector era doesn&#8217;t eliminate existing tools &#8212; it clarifies their appropriate roles and challenges their assumptions about user attention.</p><p><strong>Slack keeps its coordination function</strong>: Real-time presence, threading, ephemeral decisions. But it becomes a command interface for structured data sources rather than a knowledge repository itself.</p><p><strong>Notion retains collaborative authoring value</strong>: Drafting, evolving documentation, reference material. But it stops being the &#8220;source of truth&#8221; for operational decisions. That role shifts to structured backends accessible through Notion connectors.</p><p><strong>Specialized tools survive by becoming intelligent backends</strong>: Your CRM, your monitoring system, your code repositories &#8212; these maintain their core data authority. But user interaction shifts to connector layers in platforms where users already work.</p><p>The question to ask about any tool: Is this where I want an AI agent pointing when it needs authoritative information? If the answer is no, it&#8217;s not your source of truth. It might still be valuable &#8212; as a backend, as a collaborative space, as a specialized interface for expert users. But it doesn&#8217;t earn the designation of &#8220;primary platform.&#8221;</p><p><strong>The organizational challenge</strong>: Getting non-technical teams comfortable with structured data workflows is real change management. Document workflow connectors address this by providing guided interfaces for git-backed workflows. But someone still needs to own schema design and governance processes.</p><p><strong>Who should build connectors first</strong>: Engineering-adjacent teams with strong PM-engineering collaboration. Organizations where AI hallucination on operational decisions creates measurable cost. Companies that have already felt the pain of distributed knowledge management.</p><p><strong>Timing matters</strong>: Most organizations haven&#8217;t built connector strategies yet. Companies that establish structured knowledge backends with connector frontends in 2026 will have 12-18 months of advantage when AI-mediated query becomes standard practice.</p><p>The connector era isn&#8217;t about choosing between platforms. It&#8217;s about connecting intelligent backends to platforms users have already chosen. Organizations that get this right will operate with less context switching and faster access to operational data.</p><p>Users just want the data, dammit. The question is: will you bring it to them, or keep expecting them to come to you?</p><div><hr></div><p><em>Bob Matsuoka is CTO of <a href="https://www.duettocloud.com/">Duetto</a> and writes about AI-powered engineering at <a href="https://hyperdev.substack.com/">HyperDev</a>.</em></p><p><strong>Related reading:</strong></p><ul><li><p><a href="https://aipowerranking.com/">AI Power Ranking</a> &#8212; Tool comparisons and benchmarks for AI practitioners</p></li><li><p><a href="https://www.linkedin.com/newsletters/ai-power-ranking-7345782916301418496/">LinkedIn Newsletter</a> &#8212; Strategic AI insights for CTOs and engineering leaders</p></li></ul>]]></content:encoded></item><item><title><![CDATA[MCP Was a Brilliant Idea — But It Needs a Proper API Behind It]]></title><description><![CDATA[What you need when doing real work.]]></description><link>https://hyperdev.matsuoka.com/p/mcp-was-a-brilliant-idea-but-it-needs</link><guid isPermaLink="false">https://hyperdev.matsuoka.com/p/mcp-was-a-brilliant-idea-but-it-needs</guid><dc:creator><![CDATA[Robert Matsuoka]]></dc:creator><pubDate>Tue, 10 Mar 2026 11:30:57 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Wr2f!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7978eea5-11bd-4628-971e-a489964c4b1a_1024x667.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Wr2f!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7978eea5-11bd-4628-971e-a489964c4b1a_1024x667.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Wr2f!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7978eea5-11bd-4628-971e-a489964c4b1a_1024x667.png 424w, https://substackcdn.com/image/fetch/$s_!Wr2f!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7978eea5-11bd-4628-971e-a489964c4b1a_1024x667.png 848w, https://substackcdn.com/image/fetch/$s_!Wr2f!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7978eea5-11bd-4628-971e-a489964c4b1a_1024x667.png 1272w, https://substackcdn.com/image/fetch/$s_!Wr2f!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7978eea5-11bd-4628-971e-a489964c4b1a_1024x667.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Wr2f!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7978eea5-11bd-4628-971e-a489964c4b1a_1024x667.png" width="1024" height="667" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7978eea5-11bd-4628-971e-a489964c4b1a_1024x667.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:667,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1197660,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/190445061?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd259960f-c06b-452d-9312-d22b1e98a096_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Wr2f!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7978eea5-11bd-4628-971e-a489964c4b1a_1024x667.png 424w, https://substackcdn.com/image/fetch/$s_!Wr2f!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7978eea5-11bd-4628-971e-a489964c4b1a_1024x667.png 848w, https://substackcdn.com/image/fetch/$s_!Wr2f!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7978eea5-11bd-4628-971e-a489964c4b1a_1024x667.png 1272w, https://substackcdn.com/image/fetch/$s_!Wr2f!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7978eea5-11bd-4628-971e-a489964c4b1a_1024x667.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The pattern shows up constantly when I look at MCP server implementations. Someone discovers the protocol, gets excited about giving agents tool access, builds a server in a weekend, ships it to the registry. Six tools. Maybe eight. Each one is basically a direct passthrough to whatever SDK the underlying service provides.</p><p>And for about a week, it feels like it works.</p><p>Then the agent needs to do something real. Archive 400 emails from a specific sender. Pull all calendar events for Q1, cross-reference them with a project timeline, and generate a summary. Move a batch of files across Drive folders. The agent starts calling tools in sequence, hits rate limits, gets confused about pagination, makes the same API call twelve times trying to work around a 50-item response limit that the MCP tool never exposed as a parameter. Eventually it either fails or produces something partially wrong, and nobody&#8217;s quite sure where the breakdown happened.</p><p>The bottleneck isn&#8217;t MCP. The protocol did exactly what it was supposed to do &#8212; it gave the agent a clean interface for calling tools. The bottleneck is what&#8217;s behind the MCP server.</p><p>I&#8217;ve built a lot of these now. <a href="https://github.com/bobmatnyc/gworkspace-mcp">gworkspace-mcp</a> has 115 tools across Gmail, Calendar, Drive, Docs, Sheets, Slides, and Tasks. <a href="https://github.com/bobmatnyc/slack-mpm">slack-mpm</a> has 40+ tools plus a full async Python API library underneath that can run entirely without an agent in the loop. The gap between those projects and most of the reference MCP servers I&#8217;ve seen is not complexity &#8212; it&#8217;s architecture. Specifically: whether there&#8217;s a real API underneath the MCP layer, or whether the MCP tools ARE the implementation.</p><p>That distinction matters more than almost anything else when you&#8217;re building tools that agents will actually use in production.</p><h2>TL;DR</h2><ul><li><p>MCP servers built as thin wrappers over service SDKs hit hard ceilings when agents need to do anything at volume or across operations</p></li><li><p>The reference Slack MCP server has 8 tools; a production implementation needs 40+, with a real API library underneath it</p></li><li><p>The three-layer pattern (API &#8594; MCP &#8594; Skills) has a specific job at each layer &#8212; remove any one and the system degrades in a predictable way</p></li><li><p>Tool description quality is the single biggest lever on agent behavior; bad descriptions produce bad decisions regardless of what&#8217;s underneath</p></li><li><p>Thin wrappers are fine for prototypes and read-light tools; the inflection point is when you want to write a script that does what the agent does</p></li></ul><h2>The Official Slack MCP Server Problem</h2><p>The reference Slack MCP server &#8212; currently maintained by Zencoder after leaving the official MCP registry &#8212; offers eight tools. List channels. Post a message. Reply to a thread. Add a reaction. Get channel history. Get thread replies. Get users. Get a user profile. That&#8217;s it.</p><p>For a demo, that&#8217;s fine. For anything agents actually need to do with Slack, it hits walls quickly.</p><p>My slack-mpm server covers 40+ tools: search messages by date range and keyword, manage bookmarks, set reminders, handle scheduled messages, list workspace members with filtering, manage file uploads, archive channels. The implementation underneath is a clean async Python API &#8212; 47 functions across eight modules &#8212; that you can call directly from scripts without an agent in the loop at all.</p><p>The functional gap is obvious enough. What&#8217;s less obvious is why it exists.</p><p>The reference server isn&#8217;t thin because Slack&#8217;s API is thin. Slack&#8217;s API is extensive. The server is thin because it was built without a real API library underneath it. The MCP tools are the implementation &#8212; there&#8217;s no abstraction layer, no pagination handling, no rate limit management, no batch operation support. Each tool calls the Slack SDK directly and returns the result.</p><p>That works for eight tools. It doesn&#8217;t scale to forty because the complexity you&#8217;re hiding from the agent &#8212; auth edge cases, cursor-based pagination, retry logic on rate limits, handling the difference between bot tokens and user tokens &#8212; has nowhere to live. There&#8217;s no library to put it in. So you either skip the complex operations entirely, or you dump that complexity into the MCP tool handler itself, which makes the tool fragile and hard to maintain.</p><p>The reference server chose the first option. Which is reasonable for a reference &#8212; but it means agents using it can&#8217;t search Slack properly (search requires user tokens; the server only supports bot tokens), can&#8217;t do bulk operations, can&#8217;t run scheduled tasks, can&#8217;t be used programmatically outside of an agent context.</p><p>This isn&#8217;t a knock on the people who built it. It&#8217;s a knock on the pattern of treating MCP as the architecture rather than the interface.</p><p>Phil Schmid made a similar observation in January in a piece called <a href="https://www.philschmid.de/mcp-best-practices">MCP is Not the Problem, It&#8217;s Your Server</a>: &#8220;MCP servers are not thin wrappers around your existing API. A good REST API is not a good MCP server.&#8221; Correct, but it doesn&#8217;t go quite far enough. The problem isn&#8217;t just that REST APIs make bad MCP servers &#8212; it&#8217;s that MCP servers without any abstraction layer underneath them make bad tools, regardless of what the underlying service looks like.</p><h2>What a Real API Gives You</h2><p>When I started building <a href="https://github.com/bobmatnyc/gworkspace-mcp">gworkspace-mcp</a>, I made a decision early that turned out to be foundational: build the Google Workspace API library first, then write the MCP server as a thin interface on top of it. The API library handles auth, pagination, rate limiting, and error normalization. The MCP tools are mostly one-liners that call the right API function and return the result.</p><p>That decision shows up in five specific ways.</p><p><strong>Pagination at the library level.</strong> Gmail&#8217;s API returns 50 messages per page by default. If an agent wants to archive everything from a specific sender over the past six months, that might be 400 messages across eight API calls. If the MCP tool handles pagination &#8212; which means the API library handles it &#8212; the agent calls one tool and gets back 400 message IDs. If pagination isn&#8217;t handled, the agent manages cursor iteration itself: call the tool, get 50 results, extract the cursor, call again, repeat. Agents do this badly. They lose track of cursors, make redundant calls, or give up after the first page and tell you they found the 50 most recent messages.</p><p><strong>Rate limit management that doesn&#8217;t leak up.</strong> Rate limits handled in the API layer are invisible to the agent. The tool call either succeeds or returns a clean error. Rate limits handled in the tool handler either block the agent or require the tool description to explain retry patterns &#8212; which agents then implement inconsistently. The complexity belongs in the API layer. That&#8217;s the only place it can be handled reliably and tested against real behavior.</p><p><strong>Reuse across contexts.</strong> The slack-mpm API library runs five standalone scripts &#8212; archiver, digest, listener, notifier, responder &#8212; that operate on schedules without any MCP involvement. The same code that handles pagination and auth in the MCP context handles it in the cron job context. This isn&#8217;t a nice-to-have: it means the code is continuously exercised against real-world conditions, not just when an agent happens to call it.</p><p><strong>Actual testability.</strong> You can write unit tests for an API library. You can mock the underlying service calls, test pagination edge cases, verify that rate limit handling works correctly. Testing an MCP tool handler without running an agent session is hard to do meaningfully &#8212; you end up testing it by watching it fail in production. The difference in reliability compounds over time, and it compounds fast.</p><p><strong>Composability.</strong> Some operations are inherently multi-step. Finding all calendar events associated with a project and generating a summary requires fetching from multiple calendars, filtering by keyword, sorting by date, and formatting the output. That can live in the API layer as a single higher-order function. The MCP tool calls the function. The agent sees one tool call that returns a clean result instead of orchestrating a dozen calls and trying to assemble the output itself.</p><p>Aditya Mehra put it well in a <a href="https://medium.com/@aditya_mehra/beyond-api-wrappers-architecting-mcp-servers-for-production-agentic-ai-systems-cf93804be22a">December 2025 piece on production MCP architecture</a>: &#8220;Design for what agents need to accomplish, not for what APIs happen to exist. Your APIs were designed for developers building applications. Your MCP servers should be designed for agents completing tasks.&#8221; The API layer is where you do that translation. The MCP layer is where you expose the result.</p><h2>The Three-Layer Pattern</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!lccO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7610652d-30e1-420e-8d0b-23f1254599dd_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lccO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7610652d-30e1-420e-8d0b-23f1254599dd_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!lccO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7610652d-30e1-420e-8d0b-23f1254599dd_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!lccO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7610652d-30e1-420e-8d0b-23f1254599dd_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!lccO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7610652d-30e1-420e-8d0b-23f1254599dd_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lccO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7610652d-30e1-420e-8d0b-23f1254599dd_1024x1024.png" width="1024" height="1024" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7610652d-30e1-420e-8d0b-23f1254599dd_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:966929,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/190445061?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7610652d-30e1-420e-8d0b-23f1254599dd_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!lccO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7610652d-30e1-420e-8d0b-23f1254599dd_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!lccO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7610652d-30e1-420e-8d0b-23f1254599dd_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!lccO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7610652d-30e1-420e-8d0b-23f1254599dd_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!lccO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7610652d-30e1-420e-8d0b-23f1254599dd_1024x1024.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The architecture I&#8217;ve converged on has three distinct layers, each with a specific job. Removing any one of them degrades the system in a predictable way.</p><p><strong>Layer 1 &#8212; The API library.</strong> This is where the complexity lives. Auth handling, including token refresh and the difference between scoped token types. Pagination, including cursor management and automatic result aggregation. Rate limit management, with exponential backoff and respect for per-method limits. Error normalization, so the MCP layer receives clean, typed errors rather than raw API exceptions. Batch operations, so the agent can request 400 results and the API handles chunking appropriately.</p><p>The design test for this layer: you should be able to write useful scripts against the API library without involving an agent at all. If you can&#8217;t, the abstraction is at the wrong level.</p><p><strong>Layer 2 &#8212; The MCP server.</strong> Thin. The tool handler should be almost trivially simple: validate inputs, call the API function, return the result. If a tool handler is doing significant work, something belongs in the API layer instead. Tool descriptions are not trivial &#8212; they&#8217;re the interface contract with the agent and deserve careful attention &#8212; but the execution path should be short. A handler that&#8217;s more than twenty lines of real logic is usually a sign something is in the wrong place.</p><p>The design test here: each tool should do one thing, and it should be obvious which tool to use for a given operation. When agents have to guess between tools, they guess wrong.</p><p><strong>Layer 3 &#8212; The skills document.</strong> This is the layer most implementations skip, and it&#8217;s often where production agent behavior falls apart. The skills document tells the agent how to use the MCP tools effectively: what tools exist, when to use each one, which combinations work well together, what to avoid.</p><p>Without it, agents discover capabilities by trial and error &#8212; hitting rate limits unnecessarily, calling the wrong tool for the job, making redundant calls when one batched call would do. With it, agents start from a baseline of competent behavior and only deviate when they encounter something they haven&#8217;t hit before.</p><p>The skills document is institutional knowledge in structured form. It captures what took me hours of iteration to learn about each service &#8212; which Gmail search operators work reliably, when to use Drive&#8217;s query syntax versus simple name search, how to structure a Sheets batch update to avoid cell reference errors. That knowledge doesn&#8217;t exist anywhere else. It lives in the skills document or it doesn&#8217;t exist, and the agent stumbles into the same mistakes I made during development.</p><p>The MCP community is starting to recognize the description-as-instruction principle. Schmid&#8217;s framing is that &#8220;every piece of text is part of the agent&#8217;s context.&#8221; True, but individual tool descriptions can only carry so much. The skills document is where higher-order guidance lives &#8212; how to think about sequencing operations, when not to use a tool, what the common failure modes look like. Think of it as runtime instructions for agents, not documentation for humans.</p><h2>When MCP Alone Is Enough</h2><p>There are cases where thin MCP wrappers are the right call, and it&#8217;s worth being direct about them.</p><p><strong>Simple, low-volume reads.</strong> If an agent needs to check weather, query a single record from an external service, or look up one user profile, a thin wrapper is probably fine. The complexity ceiling exists but may never be reached. Building a full API layer for a tool that makes one API call per agent turn is engineering overhead that doesn&#8217;t pay off.</p><p><strong>Prototyping and exploration.</strong> A server built in a day is often the right first step because you don&#8217;t know yet which operations the agent will actually need. I&#8217;ve shipped thin wrappers deliberately as a way to learn before investing in a proper API library. The Zencoder Slack server probably started that way. The mistake isn&#8217;t building a thin wrapper for exploration &#8212; it&#8217;s leaving it there when the agent starts doing real work and the wrapper&#8217;s limits start showing.</p><p><strong>Single-agent, single-purpose tools.</strong> If a tool is purpose-built for one agent doing one thing and the scope is genuinely narrow, the three-layer overhead may not be worth it. The architecture makes sense when tools need to be reused across contexts, when operations are high-volume, or when the underlying service is rate-sensitive.</p><p><strong>Read-heavy, write-light operations.</strong> The complexity of batch operations, cursor management, and retry logic matters most when you&#8217;re writing or doing high-volume reads. A tool that fetches a single resource per agent turn doesn&#8217;t need much abstraction.</p><p>The honest signal for when you need the full pattern is one of three things: you find yourself wanting to write a script that does what the agent does, the agent hits the same rate limit more than once in a session, or you start duplicating error handling logic across tool handlers. Any of those is the inflection point. At that moment, adding the API layer is less work than continuing without it.</p><h2>Building Your Own: Where to Start</h2><p>Start with the API, not the MCP server. The most common mistake is writing the MCP tool first &#8212; it seems like the path of least resistance &#8212; and then adding abstraction as you hit problems. The trouble is that tool handler code is hard to refactor. The MCP interface shapes how you think about the operations, and that framing tends to be too granular. Starting with the API forces the right level of abstraction from the beginning.</p><p>Design the API around operations, not endpoints. Slack&#8217;s Web API has dozens of endpoints, but agents think in operations: send a message, search conversations, get user context. The API library should expose those operations, even when the underlying service requires two calls to complete one. Complexity belongs in the library. The agent-facing interface stays clean.</p><p>Invest heavily in tool descriptions. The single biggest lever on how well agents use your MCP server is description quality. That means specific parameter descriptions, not just type annotations. It means clear examples of when to use this tool versus a similar one. It means explicit notes about what a tool cannot do &#8212; agents will try to use tools for operations they weren&#8217;t designed for, and a good description cuts off the most common wrong paths before they happen.</p><p>Write the skills document while you build. Don&#8217;t wait until the server is done. Every time you notice the agent doing something inefficient &#8212; calling four tools when one would work, misunderstanding a parameter&#8217;s purpose, hitting a rate limit that could have been avoided &#8212; write that observation down immediately. The skills document is most valuable when it&#8217;s written from observation of real agent behavior, not reconstructed after the fact from memory.</p><p>The <a href="https://github.com/bobmatnyc/gworkspace-mcp">gworkspace-mcp repository</a> on GitHub is a worked reference &#8212; 115 tools across seven Google APIs, one coherent server, the three-layer pattern at scale. Not the only way to implement this, but a concrete example of what the architecture looks like when the abstractions have had to earn their keep over months of real use.</p><div><hr></div><p><em>Bob Matsuoka is CTO of <a href="https://www.duettocloud.com/">Duetto</a> and writes about AI-powered engineering at <a href="https://hyperdev.substack.com/">HyperDev</a>.</em></p><p><strong>Related reading:</strong></p><ul><li><p><a href="https://aipowerranking.com/">AI Power Ranking</a> &#8212; Tool comparisons and benchmarks for AI practitioners</p></li><li><p><a href="https://www.linkedin.com/newsletters/ai-power-ranking-7345782916301418496/">LinkedIn Newsletter</a> &#8212; Strategic AI insights for CTOs and engineering leaders</p></li></ul><p>Edits:</p><ul><li><p>Fixed link to <a href="https://github.com/bobmatnyc/slack-mpm">slack-mpm</a></p></li></ul>]]></content:encoded></item><item><title><![CDATA[Context Memory and Search: The Secrets to Effective Agentic Work]]></title><description><![CDATA[Why search and memory systems matter more than smarter AI models]]></description><link>https://hyperdev.matsuoka.com/p/context-memory-and-search-the-secrets</link><guid isPermaLink="false">https://hyperdev.matsuoka.com/p/context-memory-and-search-the-secrets</guid><dc:creator><![CDATA[Robert Matsuoka]]></dc:creator><pubDate>Thu, 26 Feb 2026 13:03:15 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!FeW7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa15bcb38-4613-462c-9f7b-e2525f0a6695_1024x548.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FeW7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa15bcb38-4613-462c-9f7b-e2525f0a6695_1024x548.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FeW7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa15bcb38-4613-462c-9f7b-e2525f0a6695_1024x548.png 424w, https://substackcdn.com/image/fetch/$s_!FeW7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa15bcb38-4613-462c-9f7b-e2525f0a6695_1024x548.png 848w, https://substackcdn.com/image/fetch/$s_!FeW7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa15bcb38-4613-462c-9f7b-e2525f0a6695_1024x548.png 1272w, https://substackcdn.com/image/fetch/$s_!FeW7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa15bcb38-4613-462c-9f7b-e2525f0a6695_1024x548.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FeW7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa15bcb38-4613-462c-9f7b-e2525f0a6695_1024x548.png" width="1024" height="548" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a15bcb38-4613-462c-9f7b-e2525f0a6695_1024x548.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:548,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1180738,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/189219194?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed538ff0-784c-4852-8ca0-c91b6b7bc328_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FeW7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa15bcb38-4613-462c-9f7b-e2525f0a6695_1024x548.png 424w, https://substackcdn.com/image/fetch/$s_!FeW7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa15bcb38-4613-462c-9f7b-e2525f0a6695_1024x548.png 848w, https://substackcdn.com/image/fetch/$s_!FeW7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa15bcb38-4613-462c-9f7b-e2525f0a6695_1024x548.png 1272w, https://substackcdn.com/image/fetch/$s_!FeW7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa15bcb38-4613-462c-9f7b-e2525f0a6695_1024x548.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>What Makes AI Coding Effective</h2><p>Last weekend, working on performance improvements to my MCP vector search engine, I noticed something. The breakthrough in AI coding isn&#8217;t smarter models &#8212; it&#8217;s information architecture. The tools that actually work aren&#8217;t necessarily the ones with the biggest context windows. They&#8217;re the ones that find the right context and remember what matters.</p><p>Here&#8217;s what I mean. I&#8217;ve been using search and memory together long enough that I don&#8217;t think about them anymore. My prompts have gotten measurably shorter &#8212; an analysis of my sessions shows prompts averaging 12-15 words in mid-2025 dropping to 6-8 words now. &#8220;Check logs.&#8221; &#8220;What&#8217;s the command to quantize the index?&#8221; I just assume the agents will find the context they need. When I stepped back and thought about what changed, it came down to two things: <strong>Search</strong> and <strong>Memory</strong>.</p><p>You can see this pattern across successful AI coding tools. Claude MPM consistently outperforms Claude Code on its own &#8212; not because the underlying agentic AI differs, but because MPM brings the right context to the agents rather than flooding them with everything. Tools like Augment and Cursor have made similar investments in context retrieval. The winning tools aren&#8217;t the ones with the smartest models. They&#8217;re the ones that solved information architecture.</p><h2>Search: Why Bigger Context Windows Aren&#8217;t the Answer</h2><p>The promise of massive context windows is seductive: dump your entire codebase into the AI and let it figure out what&#8217;s relevant. The research tells a different story.</p><p><a href="https://arxiv.org/abs/2307.03172">Liu et al.&#8217;s 2023 paper</a> &#8220;Lost in the Middle: How Language Models Use Long Contexts&#8221; documented a U-shaped performance curve: models process information well at the beginning and end of long contexts, but performance drops 30% or more when relevant information is buried in the middle. This has been replicated across models since. Feeding a 500K-line codebase into Claude&#8217;s context window actually makes it <em>worse</em> at finding relevant patterns than targeted search.</p><p><strong>Large context approach</strong>: AI gets overwhelmed. Focuses on random details buried in the middle of files. Expensive to run.</p><p><strong>Search approach</strong>: AI gets exactly what it needs. Finds patterns quickly. Much cheaper to operate.</p><p>When you ask for &#8220;all authentication code that handles OAuth,&#8221; a semantic search returns exactly that &#8212; not every file that mentions the word &#8220;auth.&#8221; The AI gets relevant context, not noise.</p><p>The big AI vendors haven&#8217;t solved this yet. OpenAI and Anthropic are focused on the language models themselves. Neither has built search integration into their core products. The reasons are understandable &#8212; it&#8217;s genuinely hard to install and configure, and most users don&#8217;t work with enough data at once to need it. A simple find command covers most cases. But for serious engineering work on large codebases, the gap is real and growing.</p><h2>Memory: Building on Previous Work Instead of Starting Over</h2><p>Without memory that persists between sessions, every interaction starts from zero. The AI relearns your codebase, your patterns, your preferences each time. This isn&#8217;t just inconvenient &#8212; it&#8217;s a fundamental barrier to longer-running, multi-session agentic work.</p><p>Both OpenAI and Anthropic have shipped memory systems. They took different approaches.</p><p><strong>OpenAI&#8217;s approach</strong> is user-centric &#8212; it remembers across all conversations, coding style, project preferences, common patterns. The interesting part: it includes personalized filtering that adjusts based on what it remembers about you. The downside is that it&#8217;s user-wide, not project-specific. Working across very different projects means the memory accumulates conflicting patterns.</p><p><strong>Anthropic&#8217;s approach</strong> is project-based. Memory lives in CLAUDE.md files you can read and edit directly &#8212; you know exactly what the AI remembers about your project. The limitation is fading memory as files grow large; when a CLAUDE.md hits context window limits, older memories get pushed out.</p><p>Both reveal the same truth: memory isn&#8217;t just storage. It&#8217;s continuity across complex workflows.</p><p>There&#8217;s a subtler problem neither addresses well: your understanding evolves. Early assumptions might be wrong. Initial decisions might not hold up. A memory system that weights everything equally anchors the AI to outdated context. This is why I built Kuzu Memory as a graph storage system with temporal decay &#8212; more recent memories rank higher than older ones. I&#8217;m using it in this writing project, and it makes a real difference on long work streams where your thinking changes over time.</p><p>The market is fragmented right now: memory without search (OpenAI, Anthropic) or search without managed memory (most code tools). The tools that combine both &#8212; like MPM with Kuzu and MCP vector search &#8212; are ahead of where the mainstream market will be.</p><h2>What You Can Do Today</h2><p>If you want to try this yourself:</p><p><strong>For search</strong>: <a href="https://github.com/bobmatnyc/mcp-vector-search">MCP vector search</a> now includes code review. It finds relevant patterns across your codebase without flooding the AI with irrelevant information. Works with any MCP-supporting framework &#8212; Claude Code, Codex, Gemini.</p><p><strong>For memory</strong>: <a href="https://github.com/bobmatnyc/kuzu-memory">Kuzu Memory</a> uses graph storage with temporal decay. Recent information ranks higher than older information &#8212; crucial for projects where your understanding evolves.</p><p>The specific tools matter less than the principle. Agentic workflows are longer-running and more complex than chat. They require building on previous work, not rebuilding context from scratch every session. The AI systems that enable this aren&#8217;t necessarily the smartest &#8212; they&#8217;re the ones that remember what matters and find what&#8217;s relevant.</p><p>The proof is in the prompts. If your queries to AI are getting shorter over time, your information architecture is working.</p><p><em>Bob Matsuoka is CTO of <a href="https://www.duettocloud.com/">Duetto</a> and writes about AI-powered engineering at <a href="https://hyperdev.substack.com/">HyperDev</a>.</em></p><p><strong>Related reading:</strong></p><ul><li><p><a href="https://hyperdev.matsuoka.com/if-your-coding-agent-cant-search">If Your Coding Agent Can&#8217;t Search</a> &#8212; Why search capability is the missing piece in most AI coding setups</p></li><li><p><a href="https://hyperdev.matsuoka.com/why-i-built-my-own-multi-agent-framework">Why I Built My Own Multi-Agent Framework</a> &#8212; The reasoning behind MPM and why delegation-first architecture matters</p></li><li><p><a href="https://aipowerranking.com/">AI Power Ranking</a> &#8212; Tool comparisons and benchmarks for AI practitioners</p></li><li><p><a href="https://www.linkedin.com/newsletters/ai-power-ranking-7345782916301418496/">LinkedIn Newsletter</a> &#8212; Strategic AI insights for CTOs and engineering leaders</p></li></ul>]]></content:encoded></item><item><title><![CDATA[What’s In My Toolkit: Claude Code and Family]]></title><description><![CDATA[It's a Framework, Not A Solution]]></description><link>https://hyperdev.matsuoka.com/p/whats-in-my-claude-code-toolkit</link><guid isPermaLink="false">https://hyperdev.matsuoka.com/p/whats-in-my-claude-code-toolkit</guid><dc:creator><![CDATA[Robert Matsuoka]]></dc:creator><pubDate>Wed, 28 Jan 2026 13:30:24 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!c36I!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1ff8f77-7ccc-485f-964d-7800161a0fa7_1149x367.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!c36I!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1ff8f77-7ccc-485f-964d-7800161a0fa7_1149x367.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!c36I!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1ff8f77-7ccc-485f-964d-7800161a0fa7_1149x367.png 424w, https://substackcdn.com/image/fetch/$s_!c36I!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1ff8f77-7ccc-485f-964d-7800161a0fa7_1149x367.png 848w, https://substackcdn.com/image/fetch/$s_!c36I!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1ff8f77-7ccc-485f-964d-7800161a0fa7_1149x367.png 1272w, https://substackcdn.com/image/fetch/$s_!c36I!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1ff8f77-7ccc-485f-964d-7800161a0fa7_1149x367.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!c36I!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1ff8f77-7ccc-485f-964d-7800161a0fa7_1149x367.png" width="1149" height="367" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a1ff8f77-7ccc-485f-964d-7800161a0fa7_1149x367.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:367,&quot;width&quot;:1149,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:334684,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/185237435?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1ff8f77-7ccc-485f-964d-7800161a0fa7_1149x367.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!c36I!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1ff8f77-7ccc-485f-964d-7800161a0fa7_1149x367.png 424w, https://substackcdn.com/image/fetch/$s_!c36I!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1ff8f77-7ccc-485f-964d-7800161a0fa7_1149x367.png 848w, https://substackcdn.com/image/fetch/$s_!c36I!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1ff8f77-7ccc-485f-964d-7800161a0fa7_1149x367.png 1272w, https://substackcdn.com/image/fetch/$s_!c36I!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1ff8f77-7ccc-485f-964d-7800161a0fa7_1149x367.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Claude Code is getting a lot of love lately, and rightly so. But here&#8217;s what most of the hype pieces miss: Claude Code on its own is a framework that people build on. Running it vanilla won&#8217;t give you the magical results you might be expecting.</p><p>I&#8217;ve spent the last several months building and testing tools that extend Claude Code&#8217;s capabilities&#8212;and using them daily on real client work. The results? Sessions that run longer before context drift becomes a problem. Semantic code search that actually finds what I&#8217;m looking for across large codebases. Persistent memory that makes subsequent prompts more effective. Here&#8217;s my complete toolkit and how to set it up.</p><p>A caveat before we dive in: this setup assumes you have local control over your development environment&#8212;no corporate proxies blocking MCP connections, no air-gapped networks, no policies preventing CLI tool installation. If you&#8217;re in a locked-down enterprise environment, some of this won&#8217;t apply cleanly. I&#8217;ll note the dependencies as we go.</p><p><em>(Everything I&#8217;m covering is in my <a href="https://github.com/stars/bobmatnyc/lists/llm-toolkit">LLM Toolkit</a> GitHub list if you want to browse.)</em></p><h2>TL;DR</h2><ul><li><p><strong>Claude MPM</strong> provides multi-agent orchestration, specialized agents, and session continuity on top of Claude Code</p></li><li><p><strong>mcp-vector-search</strong> enables semantic AST-based code search&#8212;tested on codebases up to 230K lines</p></li><li><p><strong>kuzu-memory</strong> remembers prompts and commits, enriching future sessions automatically</p></li><li><p><strong>mcp-ticketer</strong> powers ticket-driven development with Linear, GitHub, Jira, and Asana integration</p></li><li><p><strong>mcp-skillset</strong> provides a searchable vector + graph database of curated skills</p></li><li><p>These tools work with any MCP-compatible coding assistant, but they&#8217;re designed to work together</p></li></ul><h2>Why vanilla Claude Code isn&#8217;t enough</h2><p>Don&#8217;t get me wrong&#8212;Claude Code handles most tasks beautifully out of the box. The agent loop, file operations, git workflows, bash execution. For quick tasks and single-file changes, you don&#8217;t need anything else.</p><p>But here&#8217;s where it falls short:</p><p><strong>Context evaporates.</strong> Long sessions hit the context window limit and you start over. Previous conversations? Gone. That decision you made three hours ago about architecture? Claude doesn&#8217;t remember it.</p><p><strong>Code search is keyword-based.</strong> When you ask &#8220;where do we handle authentication?&#8221; but the code uses &#8220;login&#8221; and &#8220;session validation,&#8221; you get nothing useful back.</p><p><strong>No persistent memory.</strong> Every session starts from zero. The patterns Claude learned about your codebase yesterday? Lost.</p><p><strong>Single-threaded execution.</strong> You&#8217;re running one Claude instance doing one thing at a time.</p><p>The tools I&#8217;ve built address each of these limitations. And importantly, they&#8217;re designed to work together&#8212;or independently with any MCP-compatible coding tool.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!MsT-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e099240-6245-4ea9-8e73-800cb2eb6163_1147x667.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!MsT-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e099240-6245-4ea9-8e73-800cb2eb6163_1147x667.png 424w, https://substackcdn.com/image/fetch/$s_!MsT-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e099240-6245-4ea9-8e73-800cb2eb6163_1147x667.png 848w, https://substackcdn.com/image/fetch/$s_!MsT-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e099240-6245-4ea9-8e73-800cb2eb6163_1147x667.png 1272w, https://substackcdn.com/image/fetch/$s_!MsT-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e099240-6245-4ea9-8e73-800cb2eb6163_1147x667.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!MsT-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e099240-6245-4ea9-8e73-800cb2eb6163_1147x667.png" width="1147" height="667" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1e099240-6245-4ea9-8e73-800cb2eb6163_1147x667.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:667,&quot;width&quot;:1147,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:133674,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/185237435?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e099240-6245-4ea9-8e73-800cb2eb6163_1147x667.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!MsT-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e099240-6245-4ea9-8e73-800cb2eb6163_1147x667.png 424w, https://substackcdn.com/image/fetch/$s_!MsT-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e099240-6245-4ea9-8e73-800cb2eb6163_1147x667.png 848w, https://substackcdn.com/image/fetch/$s_!MsT-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e099240-6245-4ea9-8e73-800cb2eb6163_1147x667.png 1272w, https://substackcdn.com/image/fetch/$s_!MsT-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e099240-6245-4ea9-8e73-800cb2eb6163_1147x667.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Claude MPM: The orchestration layer</h2><p><a href="https://github.com/bobmatnyc/claude-mpm">Claude MPM</a> (Multi-Agent Project Manager) is my orchestration framework built on top of Claude Code. It&#8217;s now at <a href="https://pypi.org/project/claude-mpm/">version 5.1.2</a> with 1,388 commits and 198 releases. Here&#8217;s what it provides:</p><p><strong>47+ specialized agents</strong> deploy to your <code>~/.claude/agents/</code> directory&#8212;Python Engineer, Rust Engineer, TypeScript Engineer, QA, Security, Ops, Documentation specialists. Each agent has domain-specific instructions that improve output quality for its area. How much improvement varies by task type; I see the biggest gains on framework-specific work where the agent&#8217;s instructions include current idioms.</p><p><strong>Session continuity</strong> through automatic context summaries at 70%, 85%, and 95% thresholds. The <code>--resume</code> flag picks up where you left off instead of starting over. This works better for implementation sessions than exploratory ones&#8212;summaries necessarily lose nuance.</p><p><strong>Git-first architecture</strong> pulls agents and skills from repositories rather than bundling everything locally. Custom repos slot in via priority-based resolution.</p><h3>Installation</h3><p>Pick your preferred method:</p><pre><code><code># Recommended: includes monitoring dashboard
pipx install "claude-mpm[monitor]"

# Alternative via uv
uv tool install claude-mpm

# macOS via Homebrew
brew tap bobmatnyc/tools &amp;&amp; brew install claude-mpm
</code></code></pre><p>Then run:</p><pre><code><code>claude-mpm run
</code></code></pre><p>That&#8217;s it. The agents deploy automatically.</p><h3>Why orchestration matters more than raw model power</h3><p>I tested this extensively and <a href="https://hyperdev.substack.com/p/orchestration-beats-raw-power">wrote about the results</a>. In my testing across a set of 50 Python refactoring tasks (mix of greenfield and legacy code, evaluated by whether tests passed post-change), Claude MPM achieved <strong>96.2% success</strong> compared to 78% for vanilla Claude Code on the same tasks. The same underlying model produces noticeably different results depending on how it&#8217;s orchestrated.</p><p>The hierarchical <a href="https://github.com/bobmatnyc/claude-mpm-agents">BASE-AGENT.md pattern</a> reduces agent instruction duplication by <strong>57%</strong> (measured in instruction token count, not behavioral overlap) through template inheritance, while ETag-based caching cuts network bandwidth by <strong>95%+</strong> when pulling agent updates.</p><h2>mcp-vector-search: Semantic code understanding</h2><p><a href="https://github.com/bobmatnyc/mcp-vector-search">mcp-vector-search</a> changes the failure modes of codebase navigation. Instead of grep-style keyword matching, it uses <strong>AST-aware parsing</strong> and <strong>semantic embeddings</strong> to find code by meaning.</p><p>Here&#8217;s a real example: Last week I pointed it at a client&#8217;s Java codebase&#8212;230,000 lines across 1,200 files. They wanted to understand their authentication flow. Keyword search for &#8220;auth&#8221; returned noise. Semantic search for &#8220;user authentication and session management&#8221; returned the exact classes and methods responsible, ranked by relevance. The largest codebase I&#8217;ve indexed was around 400K lines; beyond that, indexing time becomes painful and you&#8217;ll want to scope to specific directories.</p><h3>How it works</h3><p>The tool parses code using Tree-sitter (8 languages supported: Python, JavaScript, TypeScript, Dart, PHP, Ruby, HTML, Markdown), generates embeddings via <code>all-MiniLM-L6-v2</code>, and stores vectors in <a href="https://www.trychroma.com/">ChromaDB</a>. Connection pooling provides <strong>~14% faster query response</strong> in my benchmarks (measured on repeated semantic queries against a 50K-line TypeScript repo, M2 MacBook). File watching triggers automatic reindexing when code changes.</p><h3>Setup</h3><pre><code><code># Install
pip install mcp-vector-search

# Initialize (creates ChromaDB, configures embeddings)
mcp-vector-search setup

# Add to Claude Code
claude mcp add mcp-vector-search
</code></code></pre><p>Then index your codebase:</p><pre><code><code>mcp-vector-search index /path/to/your/code
</code></code></pre><p>Now Claude can search semantically. Ask &#8220;where do we validate user permissions?&#8221; and get meaningful results even if the code never uses those exact words.</p><h3>Where it doesn&#8217;t help</h3><p>Semantic search isn&#8217;t magic. It struggles with highly domain-specific terminology that didn&#8217;t appear in the embedding model&#8217;s training data. Internal acronyms, proprietary naming conventions, and newly-coined terms often need keyword search as fallback. I keep both approaches available.</p><h2>kuzu-memory: Persistent context that compounds</h2><p><a href="https://github.com/bobmatnyc/kuzu-memory">kuzu-memory</a> solves the &#8220;starting from zero&#8221; problem. It remembers every prompt you send to Claude Code, every commit message, and uses that history to enrich future prompts automatically.</p><p>The more you use it, the better it gets.</p><h3>The architecture</h3><p>Built on <a href="https://kuzudb.com/">Kuzu</a>, an embedded graph database that&#8217;s fast (<strong>&lt;3ms recall</strong>, <strong>&lt;8ms generation</strong>), offline-first, and requires no LLM calls for memory operations. The entire database is a single file under <strong>10MB</strong>&#8212;perfect for version control.</p><p>The cognitive memory model mirrors how human memory works:</p><ul><li><p><strong>SEMANTIC</strong> (never expires): Facts about your codebase, architecture decisions</p></li><li><p><strong>EPISODIC</strong> (30 days): Specific experiences, debugging sessions, what worked</p></li><li><p><strong>WORKING</strong> (1 day): Current task context</p></li><li><p><strong>SENSORY</strong> (6 hours): Recent observations</p></li></ul><p>Git commit history enrichment automatically captures project evolution, so Claude understands what changed and why.</p><h3>Setup</h3><pre><code><code># Install
pip install kuzu-memory

# Initialize
kuzu-memory setup

# Add to Claude Code
claude mcp add kuzu-memory
</code></code></pre><p>You can also integrate via <a href="https://docs.anthropic.com/en/docs/claude-code/hooks">hooks</a> to automatically capture context from every session.</p><h2>mcp-ticketer: Ticket-driven development</h2><p><a href="https://github.com/bobmatnyc/mcp-ticketer">mcp-ticketer</a> is how I implement TxDD (Ticket-Driven Development). Look at the issues in any of my repos and you&#8217;ll see the pattern&#8212;structured tickets that capture research, decisions, and implementation context.</p><p>The tool provides a <strong>unified interface</strong> across Linear, GitHub Issues, Jira, and Asana. One API, multiple backends.</p><h3>Why this matters</h3><p>Token efficiency is critical when you&#8217;re loading ticket context. Compact mode delivers <strong>70% token reduction</strong> for ticket lists, letting you query 3x more tickets within context limits. PM monitoring features detect duplicates, stale work, and orphaned tickets automatically.</p><h3>Setup</h3><pre><code><code># Install
pip install mcp-ticketer

# Configure your backend (example: Linear)
mcp-ticketer config set linear --api-key YOUR_KEY

# Add to Claude Code
claude mcp add mcp-ticketer
</code></code></pre><p>Now Claude can create, query, and update tickets as part of its workflow.</p><h2>mcp-skillset: Dynamic skill discovery</h2><p><a href="https://github.com/bobmatnyc/mcp-skillset">mcp-skillset</a> provides a searchable database of curated skills&#8212;including some of the best skill repos out there, plus your own custom additions.</p><p>Unlike static skills loaded at startup, this enables <strong>runtime discovery</strong> using hybrid search: <strong>70% vector (ChromaDB) + 30% knowledge graph (NetworkX)</strong> by default, tunable based on your needs.</p><h3>What&#8217;s included</h3><p>The default index includes <a href="https://github.com/anthropics/skills">Anthropic&#8217;s official skills</a>, community-contributed patterns, and framework-specific guidance. Security features include prompt injection detection and repository trust levels.</p><h3>Setup</h3><pre><code><code># Install
pip install mcp-skillset

# Initialize with default skill repos
mcp-skillset setup

# Add custom skill repos
mcp-skillset add-repo https://github.com/your-team/custom-skills

# Add to Claude Code
claude mcp add mcp-skillset
</code></code></pre><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0waS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a0c0ef5-9eed-4632-9e3b-52bb9aaf9977_1150x779.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0waS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a0c0ef5-9eed-4632-9e3b-52bb9aaf9977_1150x779.png 424w, https://substackcdn.com/image/fetch/$s_!0waS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a0c0ef5-9eed-4632-9e3b-52bb9aaf9977_1150x779.png 848w, https://substackcdn.com/image/fetch/$s_!0waS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a0c0ef5-9eed-4632-9e3b-52bb9aaf9977_1150x779.png 1272w, https://substackcdn.com/image/fetch/$s_!0waS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a0c0ef5-9eed-4632-9e3b-52bb9aaf9977_1150x779.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0waS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a0c0ef5-9eed-4632-9e3b-52bb9aaf9977_1150x779.png" width="1150" height="779" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9a0c0ef5-9eed-4632-9e3b-52bb9aaf9977_1150x779.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:779,&quot;width&quot;:1150,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:164268,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/185237435?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a0c0ef5-9eed-4632-9e3b-52bb9aaf9977_1150x779.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0waS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a0c0ef5-9eed-4632-9e3b-52bb9aaf9977_1150x779.png 424w, https://substackcdn.com/image/fetch/$s_!0waS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a0c0ef5-9eed-4632-9e3b-52bb9aaf9977_1150x779.png 848w, https://substackcdn.com/image/fetch/$s_!0waS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a0c0ef5-9eed-4632-9e3b-52bb9aaf9977_1150x779.png 1272w, https://substackcdn.com/image/fetch/$s_!0waS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a0c0ef5-9eed-4632-9e3b-52bb9aaf9977_1150x779.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Brief comparison: Other orchestration approaches</h2><p>I&#8217;ve designed these tools to work with any MCP-compatible coding assistant, not just my framework. But if you&#8217;re evaluating orchestration options, here&#8217;s the landscape. A caveat: most of these performance claims are self-reported and benchmark-specific. I haven&#8217;t independently verified them, and SWE-Bench scores in particular don&#8217;t always translate to real-world performance.</p><p><strong><a href="https://github.com/ruvnet/claude-flow">claude-flow</a></strong> (11K+ stars) is the maximalist option: 17 SPARC modes, 54+ agents, 100+ MCP tools, web UI dashboard. The project claims <strong>84.8% SWE-Bench solve rates</strong>&#8212;impressive if reproducible, though I haven&#8217;t tested it myself. If you need enterprise-grade features, this is worth evaluating. The complexity ceiling is high; budget time for configuration.</p><p><strong><a href="https://github.com/smtg-ai/claude-squad">claude-squad</a></strong> (5.1K+ stars) takes the opposite approach: a single Go binary that spins up isolated sessions in separate tmux terminals with git worktree isolation. Dead simple. <code>cs</code> starts the TUI, you work in parallel, review and merge. Zero configuration. This is what I recommend if you just want parallelism without learning a new system.</p><p><strong><a href="https://github.com/parruda/swarm">claude-swarm</a></strong> (1.6K+ stars) serves Ruby shops with single-process architecture using RubyLLM. Good if you&#8217;re building Rails applications and want native library integration rather than CLI orchestration.</p><p><strong><a href="https://github.com/nwiizo/ccswarm">ccswarm</a></strong> brings Rust&#8217;s performance guarantees&#8212;type-state patterns with zero shared state, claimed <strong>70% memory reduction</strong> through native context compression. Haven&#8217;t stress-tested this one; the architecture looks promising for resource-constrained environments.</p><p>My approach with Claude MPM sits somewhere in the middle: more capable than bare Claude Code, less complex than claude-flow, with a focus on agent quality and session continuity over feature breadth. The trade-off is that it&#8217;s opinionated about workflow&#8212;if you want raw flexibility, claude-squad might suit better.</p><h2>Putting it all together</h2><p>Here&#8217;s my actual workflow with these tools running together:</p><ol><li><p><strong><a href="https://github.com/bobmatnyc/kuzu-memory">kuzu-memory</a></strong> loads relevant context from previous sessions automatically</p></li><li><p><strong><a href="https://github.com/bobmatnyc/mcp-vector-search">mcp-vector-search</a></strong> helps Claude find the right code when exploring the codebase</p></li><li><p><strong><a href="https://github.com/bobmatnyc/mcp-ticketer">mcp-ticketer</a></strong> pulls in the current ticket&#8217;s context and acceptance criteria</p></li><li><p><strong><a href="https://github.com/bobmatnyc/mcp-skillset">mcp-skillset</a></strong> provides relevant best practices for the task at hand</p></li><li><p><strong><a href="https://github.com/bobmatnyc/claude-mpm">claude-mpm</a></strong> orchestrates specialized agents for implementation, testing, and review</p></li></ol><p>The result: sessions that run longer before I need to reset context, fewer &#8220;where is this code?&#8221; dead ends, and accumulated knowledge that actually persists between sessions. Before I built this stack, I was spending maybe 20% of my Claude Code time re-explaining context or manually finding files. That overhead dropped significantly.</p><p>These tools work independently too. Use mcp-vector-search with vanilla Claude Code. Add kuzu-memory to Cursor or Windsurf via MCP. Mix and match based on what you need.</p><p>The broader point: Claude Code is becoming infrastructure, not just a tool. The value increasingly comes from how you extend it&#8212;the orchestration layer, context management, workflow integration. If you&#8217;re getting mediocre results from vanilla Claude Code, the fix probably isn&#8217;t a better model. It&#8217;s better scaffolding around the model you have.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yI01!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff44b042d-9225-4ee5-bb65-458ce9cef7ff_1126x171.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yI01!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff44b042d-9225-4ee5-bb65-458ce9cef7ff_1126x171.png 424w, https://substackcdn.com/image/fetch/$s_!yI01!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff44b042d-9225-4ee5-bb65-458ce9cef7ff_1126x171.png 848w, https://substackcdn.com/image/fetch/$s_!yI01!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff44b042d-9225-4ee5-bb65-458ce9cef7ff_1126x171.png 1272w, https://substackcdn.com/image/fetch/$s_!yI01!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff44b042d-9225-4ee5-bb65-458ce9cef7ff_1126x171.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yI01!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff44b042d-9225-4ee5-bb65-458ce9cef7ff_1126x171.png" width="1126" height="171" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f44b042d-9225-4ee5-bb65-458ce9cef7ff_1126x171.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:171,&quot;width&quot;:1126,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:35973,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/185237435?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff44b042d-9225-4ee5-bb65-458ce9cef7ff_1126x171.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!yI01!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff44b042d-9225-4ee5-bb65-458ce9cef7ff_1126x171.png 424w, https://substackcdn.com/image/fetch/$s_!yI01!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff44b042d-9225-4ee5-bb65-458ce9cef7ff_1126x171.png 848w, https://substackcdn.com/image/fetch/$s_!yI01!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff44b042d-9225-4ee5-bb65-458ce9cef7ff_1126x171.png 1272w, https://substackcdn.com/image/fetch/$s_!yI01!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff44b042d-9225-4ee5-bb65-458ce9cef7ff_1126x171.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><div><hr></div><p><em>I&#8217;m Bob Matsuoka, writing about agentic coding and AI-powered development at <a href="https://hyperdev.substack.com/">HyperDev</a>. For more on multi-agent approaches, read my analysis of <a href="https://hyperdev.substack.com/p/orchestration-beats-raw-power">why orchestration beats raw power</a> or my deep dive into <a href="https://hyperdev.substack.com/p/claude-mpm-5">Claude MPM 5&#8217;s architecture</a>.</em></p>]]></content:encoded></item><item><title><![CDATA[What's In My Toolkit: Digital Ocean]]></title><description><![CDATA[The Hosting Middle Ground You Forgot Existed]]></description><link>https://hyperdev.matsuoka.com/p/whats-in-my-toolkit-digital-ocean</link><guid isPermaLink="false">https://hyperdev.matsuoka.com/p/whats-in-my-toolkit-digital-ocean</guid><dc:creator><![CDATA[Robert Matsuoka]]></dc:creator><pubDate>Wed, 21 Jan 2026 13:31:49 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!HLyn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06d88c3b-5d7a-4ef5-8d8e-26cdd5edbb97_1280x720.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!HLyn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06d88c3b-5d7a-4ef5-8d8e-26cdd5edbb97_1280x720.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!HLyn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06d88c3b-5d7a-4ef5-8d8e-26cdd5edbb97_1280x720.jpeg 424w, https://substackcdn.com/image/fetch/$s_!HLyn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06d88c3b-5d7a-4ef5-8d8e-26cdd5edbb97_1280x720.jpeg 848w, https://substackcdn.com/image/fetch/$s_!HLyn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06d88c3b-5d7a-4ef5-8d8e-26cdd5edbb97_1280x720.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!HLyn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06d88c3b-5d7a-4ef5-8d8e-26cdd5edbb97_1280x720.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!HLyn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06d88c3b-5d7a-4ef5-8d8e-26cdd5edbb97_1280x720.jpeg" width="1280" height="720" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/06d88c3b-5d7a-4ef5-8d8e-26cdd5edbb97_1280x720.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:720,&quot;width&quot;:1280,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:412725,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/185113299?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06d88c3b-5d7a-4ef5-8d8e-26cdd5edbb97_1280x720.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!HLyn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06d88c3b-5d7a-4ef5-8d8e-26cdd5edbb97_1280x720.jpeg 424w, https://substackcdn.com/image/fetch/$s_!HLyn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06d88c3b-5d7a-4ef5-8d8e-26cdd5edbb97_1280x720.jpeg 848w, https://substackcdn.com/image/fetch/$s_!HLyn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06d88c3b-5d7a-4ef5-8d8e-26cdd5edbb97_1280x720.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!HLyn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06d88c3b-5d7a-4ef5-8d8e-26cdd5edbb97_1280x720.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I talk about <a href="https://vercel.com/">Vercel</a> a lot. Probably too much. They&#8217;re great for what they do&#8212;Next.js deployments, edge functions, the serverless stuff. But not everything fits the serverless model. Sometimes you need a server. An actual server that runs continuously, handles persistent connections, maybe runs PHP because your client&#8217;s legacy app requires it.</p><p>That&#8217;s where I found myself last fall. Client project. Laravel application handling webhook callbacks from a payment processor. Required persistent connections and background job processing. Vercel wasn&#8217;t going to cut it.</p><h2>Where Railway and Vercel Don&#8217;t Reach</h2><p><a href="https://railway.app/">Railway</a> was my first thought. I&#8217;ve used it before, and the developer experience is genuinely good. Automatic vertical scaling, nice preview environments, all that modern stuff. But Railway&#8217;s primarily built for building&#8212;you push code, it builds and deploys. When you need a broader set of management tools and services, they&#8217;re not quite there. And the pricing gets weird fast for anything running 24/7. Their $5 trial credit evaporates if you&#8217;re not careful.</p><p>Traditional cPanel hosting&#8212;<a href="https://www.hostgator.com/">Hostgator</a>, <a href="https://www.bluehost.com/">Bluehost</a>, the whole gang&#8212;exists in a completely different universe. Sure, they bundle domains and email and phone support for people running WordPress sites. But the performance is mediocre, the interfaces feel like 2008, and the moment you want to do anything slightly unusual, you&#8217;re fighting the system.</p><p><a href="https://www.digitalocean.com/">Digital Ocean</a> sat on my radar for years. I knew about it vaguely. Then this Laravel project forced me to actually look.</p><p>The difference from cPanel hosting is immediate: full SSH access, real package management, actual networking controls instead of web forms that generate <code>.htaccess</code> files. You&#8217;re working with a Linux server, not a managed WordPress environment pretending to be one.</p><h2>More Than Droplets Now</h2><p>You deploy servers as droplets&#8212;KVM-based virtual machines running on their hypervisor infrastructure. <a href="https://www.digitalocean.com/pricing/droplets">Basic droplets start at $4/month</a> (as of early 2025) for 512MB RAM, 1 vCPU, 10GB SSD. Nothing fancy, but enough to run a small PHP app. The $6 tier doubles everything. Straightforward.</p><p>But the droplet thing isn&#8217;t what impressed me. Digital Ocean has quietly built out a whole platform while I wasn&#8217;t paying attention:</p><ul><li><p><strong><a href="https://www.digitalocean.com/products/app-platform">App Platform</a></strong>: Git-based deployment starting at $5/month. Push to main, it builds and deploys. Less polished than Vercel for frontend stuff, but handles backend services Vercel can&#8217;t touch.</p></li><li><p><strong><a href="https://www.digitalocean.com/products/managed-databases">Managed databases</a></strong>: PostgreSQL, MySQL, MongoDB, Kafka, Redis-compatible Valkey. The PostgreSQL offering starts around <a href="https://docs.digitalocean.com/products/databases/postgresql/">$15/month</a> (pricing varies by region) with daily automatic backups and 7-day point-in-time recovery. Real managed services, not &#8220;here&#8217;s a VM, figure it out.&#8221;</p></li><li><p><strong><a href="https://www.digitalocean.com/products/spaces">Spaces</a></strong>: S3-compatible object storage at <a href="https://docs.digitalocean.com/products/spaces/details/pricing/">$5/month</a> including 250GB storage, 1TB transfer, and a built-in CDN. Compare that to AWS S3&#8217;s calculator nightmare.</p></li><li><p><strong><a href="https://docs.digitalocean.com/products/volumes/">Block storage</a></strong>: NFS-attachable volumes at $0.10/GiB/month. Mount additional storage to droplets without rebuilding.</p></li><li><p><strong><a href="https://www.digitalocean.com/products/kubernetes">Kubernetes</a></strong>: Free control plane. That&#8217;s not a typo. <a href="https://www.digitalocean.com/resources/articles/kubernetes-digitalocean-vs-aws">AWS EKS charges $72/month</a> for the control plane alone before you add worker nodes.</p></li><li><p><strong><a href="https://marketplace.digitalocean.com/">Marketplace</a></strong>: One-click deployments for WordPress, Ghost, MongoDB, Redis, dozens of others. Not as extensive as AWS but covers most common needs.</p></li></ul><p>The breadth was surprising. This isn&#8217;t just VPS hosting anymore.</p><h2>doctl and Agent-Friendly Infrastructure</h2><p>Here&#8217;s where it gets relevant for anyone doing agentic development: <a href="https://github.com/digitalocean/doctl">doctl</a> works well.</p><p>Digital Ocean&#8217;s CLI covers the full platform&#8212;droplets, databases, Kubernetes clusters, app deployments, DNS, firewalls, load balancers. Written in Go. <a href="https://github.com/digitalocean/doctl">3.4k GitHub stars</a>. Installs via Homebrew, Snap, or direct download.</p><p>More importantly, it plays well with AI coding assistants. The command structure is predictable enough that Claude Code and similar tools can figure out what you&#8217;re trying to do. <code>doctl compute droplet create</code> does what you&#8217;d expect. <code>doctl apps create --spec app.yaml</code> deploys from a declarative config. JSON output works by default for parsing.</p><p>The <a href="https://github.com/marketplace/actions/github-action-for-digitalocean-doctl">GitHub Action for doctl</a> makes CI/CD integration straightforward. Push code, build container, deploy to Kubernetes&#8212;all scriptable, all agent-friendly.</p><p>One limitation: doctl doesn&#8217;t handle Spaces directly. You need <a href="https://s3tools.org/s3cmd">s3cmd</a> or the AWS CLI for object storage operations. Annoying but workable.</p><h2>The Pricing Reality Check</h2><p>Digital Ocean&#8217;s pricing philosophy differs fundamentally from AWS. You know what something costs before you provision it. <a href="https://www.lastweekinaws.com/blog/should-i-pick-digitalocean-or-aws-for-my-next-project/">As one AWS consultant put it</a>: &#8220;If you hire me to optimize your DigitalOcean bill, you&#8217;re effectively paying me to perform basic arithmetic. AWS surprises are on the order of &#8216;15 grand because you drastically misunderstood something.&#8217;&#8221;</p><p>That said, the pricing gets less competitive at scale. <a href="https://www.capterra.com/p/205055/DigitalOcean/pricing/">Higher-tier instances and managed databases</a> draw criticism for being expensive compared to self-hosting. Load balancers at $12/month minimum can quadruple the cost of a small droplet setup. Block storage keeps charging even when unattached&#8212;found that out the hard way.</p><p>Quick comparison for context (early 2025 pricing):</p><p>Service Digital Ocean AWS Equivalent Basic VM $4/month EC2 t3.nano: ~$3.80/month Kubernetes control plane Free EKS: $72/month (control plane only) Object storage (250GB) $5/month S3: Variable, plus egress Managed PostgreSQL ~$15/month RDS: ~$13-15/month minimum</p><p>The Kubernetes control plane pricing alone makes Digital Ocean worth considering if you&#8217;re running containers. But note that AWS offers different operational trade-offs&#8212;more regions, more compliance certifications, more mature tooling.</p><h2>Managed Services: Where It Gets Messy</h2><p>My research turned up some concerning <a href="https://news.ycombinator.com/item?id=46596075">Hacker News threads from January 2025</a>. A startup founder described a late-night emergency where Digital Ocean&#8217;s managed PostgreSQL and managed Kubernetes stopped talking to each other after an infrastructure update. VPC routing broke. Cilium ARP entries went stale.</p><p>Another developer flagged that managed PostgreSQL replicas use async replication with RPO potentially exceeding 15 minutes. During upgrades that trigger failover, you can lose minutes of committed data. Not ideal for anything handling money.</p><p>The counterargument from the same thread: &#8220;We were on AWS for a while. The complexity was way higher than what our team could manage. DOKS is simpler, and this is the first major issue we&#8217;ve hit in many months.&#8221;</p><p>Managed doesn&#8217;t mean worry-free. It means trading your failure modes for the vendor&#8217;s failure modes. Whether that trade makes sense depends on your ops capacity.</p><h2>Digital Ocean&#8217;s AI Bet</h2><p>Digital Ocean launched their <a href="https://www.businesswire.com/news/home/20250122380266/en/DigitalOcean-Launches-Advanced-Generative-AI-Platform">GenAI Platform</a> in January 2025. Function calling, RAG, guardrails, multi-agent coordination. They claim support for multiple foundation models including Claude, GPT-4, Llama, Mistral, and DeepSeek&#8212;though the exact integration depth (API passthrough vs. hosted inference vs. marketplace images) varies. Their AI/ML ARR reportedly grew over 200% year-over-year in 2024.</p><p>More relevant for agentic development: they&#8217;ve announced <a href="https://www.digitalocean.com/solutions/ai-agent-builder">MCP (Model Context Protocol) integration</a>. The pitch is that AI assistants like Claude can connect directly to your Digital Ocean account for autonomous infrastructure provisioning. I haven&#8217;t tested this yet, so I can&#8217;t vouch for how well it actually works in practice.</p><p><a href="https://www.digitalocean.com/products/gradient/gpu-droplets">GPU Droplets</a> launched in October 2024 with NVIDIA H100 access. Announced pricing starts around $1.50/GPU/hour for reserved instances, with per-second billing reportedly coming in early 2026. Worth verifying current rates before committing.</p><p>Digital Ocean clearly sees agentic development as a growth vector. The combination of simpler infrastructure, predictable costs, CLI tooling, and MCP integration makes sense strategically&#8212;whether it translates to production-ready capabilities is still an open question for teams at the bleeding edge.</p><h2>Who This Actually Fits</h2><p>Digital Ocean makes sense for:</p><ul><li><p>Solo developers and small teams needing a deployment target for AI-generated code</p></li><li><p>Startups wanting predictable costs without AWS complexity</p></li><li><p>PHP, Python, Ruby, Go applications that don&#8217;t fit the serverless model</p></li><li><p>Teams escaping Heroku after the free tier changes</p></li><li><p>Anyone running Kubernetes who doesn&#8217;t want to pay $72/month for a control plane</p></li><li><p>Projects that need object storage, managed databases, AND compute in one place</p></li></ul><p>Digital Ocean doesn&#8217;t make sense for:</p><ul><li><p>Windows-based deployments (not supported)</p></li><li><p>Enterprise compliance requirements beyond HIPAA</p></li><li><p>Complex multi-region architectures needing 50+ global regions</p></li><li><p>Teams requiring 24/7 phone support</p></li></ul><h2>What I Shipped</h2><p>The PHP project that sent me here? Running on a droplet with managed MySQL. Total monthly cost around $22. The setup wasn&#8217;t completely smooth&#8212;spent about an hour fighting with UFW firewall rules that kept blocking the database connection even after I&#8217;d supposedly allowed the right ports. Turned out I needed to allow the VPC subnet, not just the specific IP. The docs mentioned this but buried it three pages deep.</p><p>No surprise bills though. The client&#8217;s happy. Sometimes that&#8217;s enough.</p><div><hr></div><p><em>I&#8217;m Bob Matsuoka, writing about agentic coding and AI-powered development at <a href="https://hyperdev.substack.com/">HyperDev</a>. For more on deployment options and developer tooling, check out my analysis of <a href="https://hyperdev.substack.com/p/whats-in-my-toolkit-august-2025">What&#8217;s In My Toolkit - August 2025</a> or my deep dive into <a href="https://hyperdev.substack.com/p/multi-agent-ai-orchestration-in-practice">multi-agent orchestration patterns</a>.</em></p>]]></content:encoded></item><item><title><![CDATA[The Irreducibles: What a Pattern Master Does ]]></title><description><![CDATA[When AI Writes the Code]]></description><link>https://hyperdev.matsuoka.com/p/the-irreducibles-what-a-pattern-master</link><guid isPermaLink="false">https://hyperdev.matsuoka.com/p/the-irreducibles-what-a-pattern-master</guid><dc:creator><![CDATA[Robert Matsuoka]]></dc:creator><pubDate>Wed, 14 Jan 2026 11:31:21 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!kCeH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3db9592-d40c-4f88-88da-bca6b1159161_1104x832.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!kCeH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3db9592-d40c-4f88-88da-bca6b1159161_1104x832.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kCeH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3db9592-d40c-4f88-88da-bca6b1159161_1104x832.jpeg 424w, https://substackcdn.com/image/fetch/$s_!kCeH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3db9592-d40c-4f88-88da-bca6b1159161_1104x832.jpeg 848w, https://substackcdn.com/image/fetch/$s_!kCeH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3db9592-d40c-4f88-88da-bca6b1159161_1104x832.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!kCeH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3db9592-d40c-4f88-88da-bca6b1159161_1104x832.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kCeH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3db9592-d40c-4f88-88da-bca6b1159161_1104x832.jpeg" width="1104" height="832" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d3db9592-d40c-4f88-88da-bca6b1159161_1104x832.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:832,&quot;width&quot;:1104,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:307885,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/184253714?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3db9592-d40c-4f88-88da-bca6b1159161_1104x832.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!kCeH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3db9592-d40c-4f88-88da-bca6b1159161_1104x832.jpeg 424w, https://substackcdn.com/image/fetch/$s_!kCeH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3db9592-d40c-4f88-88da-bca6b1159161_1104x832.jpeg 848w, https://substackcdn.com/image/fetch/$s_!kCeH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3db9592-d40c-4f88-88da-bca6b1159161_1104x832.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!kCeH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3db9592-d40c-4f88-88da-bca6b1159161_1104x832.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The Pattern Master</figcaption></figure></div><p>The real work was never about writing code.</p><p>That statement would have been controversial three years ago. Today, with the <a href="https://survey.stackoverflow.co/2025/">Stack Overflow 2025 Developer Survey</a> showing 65% of developers using AI tools weekly and my own projects showing 6-10x productivity gains on greenfield implementation tasks, it&#8217;s becoming harder to argue. The interesting question isn&#8217;t whether AI changes software engineering&#8212;it&#8217;s <em>what remains</em> when implementation gets automated.</p><p>Here&#8217;s my new working theory, built from recent research and hands-on experience orchestrating AI agents across complex projects: senior engineering is converging toward a role that looks more like subject matter expert plus systems architect than traditional developer. The code-writing layer is becoming infrastructure&#8212;important, but increasingly invisible. What emerges is something both familiar and radically different.</p><p>I wrote recently about <a href="https://open.substack.com/pub/hyperdev/p/dont-be-a-canut-be-a-pattern-master?r=nff5&amp;utm_campaign=post&amp;utm_medium=web&amp;showWelcomeOnShare=true">the Jacquard loom lesson</a>&#8212;how the Canuts who fought automation lost, while pattern masters who designed the punch cards thrived. The same dynamic is playing out now. The question isn&#8217;t whether to use AI tools. It&#8217;s whether you become the pattern master who designs what they execute.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!kCjd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9539f012-db3d-4dc9-8002-73fe07229bdf_1168x880.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kCjd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9539f012-db3d-4dc9-8002-73fe07229bdf_1168x880.jpeg 424w, https://substackcdn.com/image/fetch/$s_!kCjd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9539f012-db3d-4dc9-8002-73fe07229bdf_1168x880.jpeg 848w, https://substackcdn.com/image/fetch/$s_!kCjd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9539f012-db3d-4dc9-8002-73fe07229bdf_1168x880.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!kCjd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9539f012-db3d-4dc9-8002-73fe07229bdf_1168x880.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kCjd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9539f012-db3d-4dc9-8002-73fe07229bdf_1168x880.jpeg" width="1168" height="880" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9539f012-db3d-4dc9-8002-73fe07229bdf_1168x880.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:880,&quot;width&quot;:1168,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:251015,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/184253714?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9539f012-db3d-4dc9-8002-73fe07229bdf_1168x880.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!kCjd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9539f012-db3d-4dc9-8002-73fe07229bdf_1168x880.jpeg 424w, https://substackcdn.com/image/fetch/$s_!kCjd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9539f012-db3d-4dc9-8002-73fe07229bdf_1168x880.jpeg 848w, https://substackcdn.com/image/fetch/$s_!kCjd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9539f012-db3d-4dc9-8002-73fe07229bdf_1168x880.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!kCjd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9539f012-db3d-4dc9-8002-73fe07229bdf_1168x880.jpeg 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Where the Value Actually Sits</h2><p>The most revealing data point isn&#8217;t about whether AI tools work&#8212;it&#8217;s about <em>who</em> they work for.</p><p>The <a href="https://www.faros.ai/blog/ai-software-engineering">Faros AI Productivity Paradox Report</a> (July 2025) analyzed data across thousands of developers and found something telling: &#8220;Adoption skews toward less tenured engineers. Usage is highest among engineers who are newer to the company... lower adoption among senior engineers may signal skepticism about AI&#8217;s ability to support more complex tasks that depend on deep system knowledge and organizational context.&#8221;</p><p>That&#8217;s not a failure of AI tools&#8212;it&#8217;s a signal about where constraints actually exist. If your bottleneck is navigating unfamiliar code and accelerating early contributions, AI helps enormously. If your bottleneck is the &#8220;deep system knowledge and organizational context&#8221; that seniors carry&#8212;code generation speed is irrelevant.</p><p>A <a href="https://theaiinsider.tech/2025/11/17/study-ai-agents-are-quietly-delivering-the-productivity-gains-the-hype-cycle-forgot/">University of Chicago working paper</a> (November 2025) found something even more interesting: experienced developers were 5-6% <em>more</em> likely to successfully use AI agents for every standard deviation of work experience. Why? They used &#8220;plan-first&#8221; approaches&#8212;&#8221;laying out objectives, alternatives, and steps&#8221; before invoking AI. Juniors did this far less frequently. The paper concludes: &#8220;expertise improves the ability to delegate to AI.&#8221;</p><p>Here&#8217;s the thing: senior engineers already spend most of their time on non-coding work. Jue Wang, a partner at Bain, <a href="https://www.technologyreview.com/2025/12/15/1128352/rise-of-ai-coding-developers-2026/">told MIT Technology Review</a> last week that &#8220;developers spend only 20% to 40% of their time coding.&#8221; The rest goes to analyzing problems, customer feedback, product strategy, and administrative tasks.</p><p>AI doesn&#8217;t change what senior engineering is. It reveals what it always was.</p><h2>A Case Study in What Actually Happened</h2><p>I recently completed a project that illustrates where the human work actually sits. Building a semantic search knowledge base for a travel agency client, I tracked every commit across 9 calendar days: 120 total commits, roughly 90% Claude-assisted.</p><p>The productivity numbers look impressive: <strong>6-10x multiplier</strong> compared to my baseline velocity on similar projects. At my consulting rate, what I estimate would have taken 150-200 billable hours compressed to about $100-200 in API tokens plus 50-70 hours of wall-clock time&#8212;most of which wasn&#8217;t coding.</p><p>But here&#8217;s what&#8217;s interesting about the 12 human-only commits (10% of total):</p><ul><li><p>Configuration tweaks requiring domain knowledge (model selection for specific use cases)</p></li><li><p>Debug logging (quick diagnostics when something felt wrong)</p></li><li><p>Release management</p></li><li><p>One research document on Slack architecture options</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!G-00!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7c8be5d-a767-4462-99b0-632cfbd5c8c4_1168x880.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!G-00!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7c8be5d-a767-4462-99b0-632cfbd5c8c4_1168x880.jpeg 424w, https://substackcdn.com/image/fetch/$s_!G-00!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7c8be5d-a767-4462-99b0-632cfbd5c8c4_1168x880.jpeg 848w, https://substackcdn.com/image/fetch/$s_!G-00!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7c8be5d-a767-4462-99b0-632cfbd5c8c4_1168x880.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!G-00!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7c8be5d-a767-4462-99b0-632cfbd5c8c4_1168x880.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!G-00!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7c8be5d-a767-4462-99b0-632cfbd5c8c4_1168x880.jpeg" width="1168" height="880" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c7c8be5d-a767-4462-99b0-632cfbd5c8c4_1168x880.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:880,&quot;width&quot;:1168,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:314841,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/184253714?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7c8be5d-a767-4462-99b0-632cfbd5c8c4_1168x880.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!G-00!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7c8be5d-a767-4462-99b0-632cfbd5c8c4_1168x880.jpeg 424w, https://substackcdn.com/image/fetch/$s_!G-00!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7c8be5d-a767-4462-99b0-632cfbd5c8c4_1168x880.jpeg 848w, https://substackcdn.com/image/fetch/$s_!G-00!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7c8be5d-a767-4462-99b0-632cfbd5c8c4_1168x880.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!G-00!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7c8be5d-a767-4462-99b0-632cfbd5c8c4_1168x880.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The human contributions weren&#8217;t about implementation&#8212;they were about <em>judgment</em>. Choosing the right model for email writing versus general queries. Knowing when the AI&#8217;s suggestion would create problems downstream. Understanding the client&#8217;s actual workflows well enough to structure the system appropriately.</p><p>What surprised me: the time savings didn&#8217;t come from faster typing. They came from eliminating the iteration cycles between &#8220;write code&#8221; and &#8220;realize it doesn&#8217;t fit the requirements.&#8221; Specifying clearly upfront meant fewer rewrites&#8212;but that specification work was irreducibly human.</p><h2>The Specification-Driven Development Thesis</h2><p>The pattern I&#8217;m seeing aligns with what several industry voices are now articulating.</p><p><a href="https://addyosmani.com/blog/ai-in-software-engineering/">Addy Osmani at Google</a> argues developers are evolving from &#8220;coders&#8221; to &#8220;conductors&#8221; orchestrating AI agents. <a href="https://tidyfirst.substack.com/">Kent Beck</a>&#8212;one of the Agile Manifesto authors&#8212;suggests &#8220;augmented coding&#8221; deprecates language expertise while amplifying vision, strategy, and task breakdown. <a href="https://twitter.com/sgrove">Sean Grove from OpenAI</a> put it directly: &#8220;The person who communicates the best will be the most valuable programmer.&#8221;</p><p>Specification-Driven Development is now on the <a href="https://www.thoughtworks.com/radar">ThoughtWorks Technology Radar</a>. Tools like AWS Kiro, GitHub Spec-Kit, and Tessl are building products around this premise. Andreessen Horowitz frames this as &#8220;the largest revolution in software development since its inception&#8221;&#8212;venture rhetoric, obviously, but the underlying bet (prompts as source code, specifications as maintained artifacts) is getting serious investment.</p><p>Worth noting what <em>doesn&#8217;t</em> work as smoothly: legacy codebases with decades of undocumented business logic, highly regulated environments where audit trails matter, and anything requiring coordination across organizational boundaries. The pattern I&#8217;m describing fits greenfield projects and well-documented systems better than the brownfield reality most enterprises face.</p><p>This isn&#8217;t abstract. Builder.io has defined an &#8220;Orchestrator&#8221; workflow: <strong>spec &#8594; onboard &#8594; direct &#8594; verify &#8594; integrate</strong>. That sequence describes what I actually did on the knowledge base project. The implementation was handled; the orchestration required human judgment throughout.</p><h2>What AI Actually Can&#8217;t Do (Yet)</h2><p>The <a href="https://www.qodo.ai/blog/state-of-ai-coding-2025/">Qodo 2025 State of AI Coding survey</a> found only <strong>3.8% of developers</strong> report both low hallucination rates and high confidence shipping AI code without review. 65% cite missing context as the primary barrier.</p><p>That &#8220;missing context&#8221; is the key. Consider what the AI couldn&#8217;t know during my knowledge base project:</p><ul><li><p><strong>Business model specifics</strong>: How the travel agency&#8217;s supplier relationships actually work. Which data matters for their specific service model. Why certain integrations were higher priority than others.</p></li><li><p><strong>Organizational constraints</strong>: Budget limitations. Timeline pressures from a specific upcoming sales season. The technical capabilities of the staff who would maintain the system.</p></li><li><p><strong>Historical context</strong>: Why previous approaches to similar problems hadn&#8217;t worked. What the client had tried before and rejected. Political dynamics around system adoption.</p></li></ul><p>None of this lives on the public web. It exists in Jira tickets, PowerPoint decks, Slack conversations, and institutional memory. The <a href="https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/">METR study</a> (July 2025) specifically identified &#8220;tools&#8217; lack of vital tacit context or knowledge&#8221; as a key factor in why experienced developers were slower with AI assistance. Ryan Salva, senior director of product management at Google, <a href="https://www.technologyreview.com/2025/12/15/1128352/rise-of-ai-coding-developers-2026/">told MIT Technology Review</a>: &#8220;A lot of work needs to be done to help build up context and get the tribal knowledge out of our heads.&#8221;</p><p>The <a href="https://spider2.yale.edu/">Spider 2.0 benchmarks</a> confirm this: AI scores drop significantly on actual enterprise workflows compared to clean academic datasets. Real systems have messy schemas, undocumented business rules, and constraints that only make sense if you understand why they exist.</p><h2>The Security and Quality Problem Compounds This</h2><p>Here&#8217;s a less-discussed limitation: <a href="https://www.veracode.com/resources/analyst-reports/2025-genai-code-security-report/">Veracode&#8217;s 2025 GenAI Code Security Report</a>, which tested over 100 LLMs across 80 controlled coding tasks, found that <strong>45% of AI-generated code introduced security vulnerabilities</strong> in their experimental setup. Java was worst at 72% failure rate. The controlled environment matters&#8212;real-world results vary with prompting quality and review practices&#8212;but the underlying issue is real: security requires understanding threat models, compliance requirements, and risk tolerances that vary by organization.</p><p>Miguel Grinberg, a 30-year development veteran, <a href="https://blog.miguelgrinberg.com/post/ai-assisted-programming-where-we-are-now">observed</a> that code review takes as long as writing code when AI is doing the implementation. More importantly, there&#8217;s an accountability dimension: &#8220;AI won&#8217;t assume liability if code malfunctions.&#8221; Someone has to own the outcome, and that ownership requires understanding what the system is supposed to do.</p><h2>The Multi-Agent Future Amplifies This Pattern</h2><p>The trajectory I&#8217;m betting on: moving from single-agent assistance toward multi-agent orchestration. That shift would concentrate human work even further up the stack&#8212;if the infrastructure materializes.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bRxz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43ba83b7-ede1-48a5-b756-4f25cb86c246_1168x880.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bRxz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43ba83b7-ede1-48a5-b756-4f25cb86c246_1168x880.jpeg 424w, https://substackcdn.com/image/fetch/$s_!bRxz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43ba83b7-ede1-48a5-b756-4f25cb86c246_1168x880.jpeg 848w, https://substackcdn.com/image/fetch/$s_!bRxz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43ba83b7-ede1-48a5-b756-4f25cb86c246_1168x880.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!bRxz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43ba83b7-ede1-48a5-b756-4f25cb86c246_1168x880.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bRxz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43ba83b7-ede1-48a5-b756-4f25cb86c246_1168x880.jpeg" width="1168" height="880" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/43ba83b7-ede1-48a5-b756-4f25cb86c246_1168x880.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:880,&quot;width&quot;:1168,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:214417,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/184253714?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43ba83b7-ede1-48a5-b756-4f25cb86c246_1168x880.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bRxz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43ba83b7-ede1-48a5-b756-4f25cb86c246_1168x880.jpeg 424w, https://substackcdn.com/image/fetch/$s_!bRxz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43ba83b7-ede1-48a5-b756-4f25cb86c246_1168x880.jpeg 848w, https://substackcdn.com/image/fetch/$s_!bRxz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43ba83b7-ede1-48a5-b756-4f25cb86c246_1168x880.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!bRxz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43ba83b7-ede1-48a5-b756-4f25cb86c246_1168x880.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><a href="https://www.ibm.com/topics/ai-orchestration">IBM&#8217;s research on AI orchestration</a> describes the emerging pattern: multiple agents with specific expertise working in tandem under orchestrator uber-models. <a href="https://google.github.io/adk-docs/">Google&#8217;s Agent Development Kit (ADK)</a> is building infrastructure for &#8220;multi-agent by design&#8221; systems&#8212;modular, scalable applications composed of specialized agents in hierarchy.</p><p>The <a href="https://google.github.io/a2a/">A2A Protocol</a> (donated to the Linux Foundation by Google, with over 50 launch partners including Atlassian, Salesforce, and SAP) enables agent-to-agent communication. Combined with <a href="https://modelcontextprotocol.io/">Anthropic&#8217;s MCP</a> for agent-to-tool connections, we&#8217;re building infrastructure for systems talking to systems at scale.</p><p>Early pilot results are dramatic but need context. <a href="https://www.researchgate.net/publication/389204314_The_Role_of_AI_Agents_in_CRM_and_ERP_Integration_An_Analysis">Research on enterprise AI agents</a> shows Generative Business Process AI Agents (GBPAs) achieving <strong>40% reduction in processing time</strong> and <strong>94% drop in error rate</strong> on financial workflows in controlled environments. Whether these gains survive messy enterprise reality remains to be seen. But the implication is clear: when agents can autonomously analyze supplier performance, renegotiate terms, and execute approvals, what&#8217;s left for humans?</p><p><strong>Domain expertise and strategic oversight.</strong> Multi-agent systems handle coordination; humans provide the context those systems can&#8217;t access and the judgment calls that require understanding organizational stakes.</p><h2>The Role Transformation Is Already Happening</h2><p>New job titles are emerging. <a href="https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/unleashing-developer-productivity-with-generative-ai">McKinsey&#8217;s research</a> predicts the labor pyramid shifting toward senior engineers for complex architecture and code review. Organizations are defining roles like:</p><ul><li><p><strong>AI Software Architect</strong>: Requires context engineering, specification-driven development</p></li><li><p><strong>Agent Review Engineer</strong>: Specification ownership, hallucination checking, ensuring agent outputs align with business requirements</p></li></ul><p>Coinbase&#8217;s Head of Platform, Rob Witoff, <a href="https://www.technologyreview.com/2025/12/15/1128352/rise-of-ai-coding-developers-2026/">told MIT Technology Review</a> last week that while they&#8217;ve seen massive productivity gains in some areas, &#8220;the sheer volume of code now being churned out is quickly saturating the ability of midlevel staff to review changes&#8221;&#8212;pressure moving upward, not distributing evenly.</p><p><a href="https://newsletter.pragmaticengineer.com/">Gergely Orosz&#8217;s analysis</a> identifies what&#8217;s becoming <em>more</em> valuable: tech lead traits, product-mindedness, solid engineering judgment (not just coding). What&#8217;s declining: pure prototyping skills, language polyglot expertise. The differentiator isn&#8217;t knowing syntax&#8212;it&#8217;s knowing why.</p><h2>Educational Institutions Are Responding</h2><p>The signals from education are telling:</p><ul><li><p><strong>Harvard</strong> launched COMPSCI 1060: &#8220;Software Engineering with Generative AI&#8221; (Spring 2025)</p></li><li><p><strong>Stanford</strong> introduced &#8220;Vibe Coding: Building Software in Conversation with AI&#8221;</p></li><li><p><strong>UC San Diego consortium</strong> (Google.org funded) created six turnkey courses integrating AI, with Leo Porter identifying problem decomposition as the new priority for introductory classes</p></li></ul><p>Hack Reactor now teaches Copilot <em>after</em> proficiency without it&#8212;recognizing that understanding fundamentals matters more when AI handles syntax. The Raspberry Pi Foundation argues learning to code provides &#8220;computational literacy&#8221; and agency regardless of AI capabilities.</p><p>There&#8217;s genuine tension here. Studies show students perform better with Copilot immediately, but concerns about &#8220;cognitive laziness&#8221; and long-term skill development are mounting. The question isn&#8217;t whether to use AI tools&#8212;it&#8217;s how to develop judgment that makes AI tools useful.</p><h2>What the Pattern Master of 2028 Looks Like</h2><p>What surprised me after running the knowledge base project wasn&#8217;t the productivity gain&#8212;it was how different the work felt. I wasn&#8217;t engineering in the traditional sense. I was doing something more like:</p><p><strong>Subject matter expert for technical domains</strong> who can translate ambiguous business requirements into precise specifications AI can execute.</p><p><strong>Orchestrator of agent teams</strong> who understands which specialized capabilities to deploy against which problems, and how to coordinate multi-agent workflows.</p><p><strong>Context bridge</strong> who identifies when agents miss critical organizational knowledge&#8212;the meeting that changed priorities, the constraint that exists for regulatory reasons, the technical debt that can&#8217;t be addressed yet.</p><p><strong>Accountability owner</strong> who takes responsibility for outcomes in ways that AI cannot, making judgment calls that require understanding stakes and tradeoffs.</p><p><strong>Systems coherence maintainer</strong> who ensures that agent-driven development produces architectures that remain understandable, maintainable, and aligned with long-term organizational needs.</p><p>None of this is entirely new. Good senior engineers have always done specification work, context translation, and strategic oversight. What changes is the ratio&#8212;these become the <em>primary</em> activities rather than overhead between coding sessions.</p><h2>The Pragmatic Implications</h2><p>If this analysis is directionally correct, several implications follow:</p><p><strong>For individual practitioners</strong>: Invest in domain expertise alongside technical skills. Understanding your industry, your organization&#8217;s constraints, and your stakeholders&#8217; real needs becomes the differentiator. The ability to write clear specifications matters more than language fluency.</p><p><strong>For engineering managers</strong>: Rethink how you evaluate senior contributions. Code volume and PR throughput become misleading metrics when AI handles implementation. Look for specification quality, context translation, and system design judgment.</p><p><strong>For organizations</strong>: The constraint isn&#8217;t AI capability&#8212;it&#8217;s organizational readiness to provide the context AI needs. Clean documentation, well-structured specifications, and institutional knowledge capture become competitive advantages.</p><p><strong>For education and training</strong>: Problem decomposition, requirements analysis, and domain modeling deserve more emphasis. Syntax and language features deserve less. Teaching students to evaluate and orchestrate AI output matters more than teaching them to avoid using it.</p><h2>The Bottom Line</h2><p>AI automates the implementation layer. Multi-agent systems are beginning to automate the coordination layer. What remains is the judgment layer&#8212;the work that requires understanding business context, organizational constraints, and strategic tradeoffs that exist outside any training dataset.</p><p>That work was always the actual job. We just called it &#8220;senior engineering&#8221; and measured it poorly because code output was easier to count.</p><p>The pattern master of 2028 won&#8217;t write 10x more code. They&#8217;ll translate ambiguous requirements into specifications that agent teams can execute reliably. They&#8217;ll identify when AI outputs miss critical context. They&#8217;ll maintain system coherence across increasingly automated development workflows.</p><p>The code-writing skill doesn&#8217;t become worthless&#8212;it becomes infrastructure, like understanding TCP/IP or knowing how compilers work. Important for debugging and architecture decisions, but not the primary activity.</p><p>This pattern may be emerging fastest in startups and well-resourced enterprise teams with clean codebases and modern tooling. Whether it generalizes to government IT, heavily regulated industries, or organizations with decades of technical debt is genuinely uncertain. But the direction seems clear: the real engineering work becomes visible precisely because AI handles everything around it.</p><div><hr></div><p><em>Research sources include the Faros AI Productivity Paradox Report (July 2025), University of Chicago Booth working paper on AI agent productivity (November 2025), METR&#8217;s randomized controlled trial on experienced developer productivity (July 2025), Veracode&#8217;s GenAI Code Security Report (July 2025), MIT Technology Review&#8217;s developer survey (December 2025), and industry analysis from ThoughtWorks, Qodo, and Builder.io. Case study data from actual project accounting across 120 commits over 9 calendar days.</em></p>]]></content:encoded></item><item><title><![CDATA[Orchestration Beats Raw Power]]></title><description><![CDATA[And SWE-bench Can&#8217;t Tell the Difference]]></description><link>https://hyperdev.matsuoka.com/p/orchestration-beats-raw-power</link><guid isPermaLink="false">https://hyperdev.matsuoka.com/p/orchestration-beats-raw-power</guid><dc:creator><![CDATA[Robert Matsuoka]]></dc:creator><pubDate>Thu, 25 Dec 2025 13:30:51 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!iGNf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5bbd5ff2-a2fc-4cca-9f7a-4e65db015347_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>TL;DR</h2><ul><li><p>Same Opus 4.5 model: 96.2% (MPM orchestrated) vs 91.4% (vanilla Claude Code). Orchestration adds 4.8%.</p></li><li><p>Gemini 3 scores 76.2% on SWE-bench. Scored 44.3% in my testing. Critical bugs in every implementation.</p></li><li><p>Claude MPM <em>beat</em> its benchmark by 15 points. Gemini <em>collapsed</em> 32 points below.</p></li><li><p>Three competing AI systems&#8212;including Gemini&#8212;unanimously ranked Claude MPM first. Gemini rated its own code &#8220;needs work.&#8221;</p></li><li><p>MPM was the slowest system (127 seconds). Also produced the best code. Quality costs time.</p></li><li><p>The 0.9% gap between GPT-5.2 and Opus 4.5 on SWE-bench? Noise. The 52-point gap in practice? Reality.</p></li></ul><div><hr></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iGNf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5bbd5ff2-a2fc-4cca-9f7a-4e65db015347_1536x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iGNf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5bbd5ff2-a2fc-4cca-9f7a-4e65db015347_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!iGNf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5bbd5ff2-a2fc-4cca-9f7a-4e65db015347_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!iGNf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5bbd5ff2-a2fc-4cca-9f7a-4e65db015347_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!iGNf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5bbd5ff2-a2fc-4cca-9f7a-4e65db015347_1536x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iGNf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5bbd5ff2-a2fc-4cca-9f7a-4e65db015347_1536x1024.png" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5bbd5ff2-a2fc-4cca-9f7a-4e65db015347_1536x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2632732,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/182397937?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5bbd5ff2-a2fc-4cca-9f7a-4e65db015347_1536x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!iGNf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5bbd5ff2-a2fc-4cca-9f7a-4e65db015347_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!iGNf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5bbd5ff2-a2fc-4cca-9f7a-4e65db015347_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!iGNf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5bbd5ff2-a2fc-4cca-9f7a-4e65db015347_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!iGNf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5bbd5ff2-a2fc-4cca-9f7a-4e65db015347_1536x1024.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>The leaderboard says they&#8217;re equivalent</h2><p>GPT-5.2 hits 80.0% on SWE-bench Verified. Claude Opus 4.5 sits at 80.9%. Gemini 3 Pro comes in at 76.2%.</p><p>Look at those numbers and you&#8217;d conclude the top models have converged. Pick based on price. Pick based on vibes. The capability gap has closed.</p><p>I ran a different test.</p><p>Three coding tasks&#8212;FizzBuzz, LRU cache, async rate limiter&#8212;across five systems. Same prompts. Independent sessions. No hand-holding. December 22-23, 2025, from my home office with too much coffee and not enough patience.</p><p>The results didn&#8217;t match the leaderboard. Not even close.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iXA0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5040204-aad6-4b4e-a872-4206f96e9eae_1059x370.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iXA0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5040204-aad6-4b4e-a872-4206f96e9eae_1059x370.png 424w, https://substackcdn.com/image/fetch/$s_!iXA0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5040204-aad6-4b4e-a872-4206f96e9eae_1059x370.png 848w, https://substackcdn.com/image/fetch/$s_!iXA0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5040204-aad6-4b4e-a872-4206f96e9eae_1059x370.png 1272w, https://substackcdn.com/image/fetch/$s_!iXA0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5040204-aad6-4b4e-a872-4206f96e9eae_1059x370.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iXA0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5040204-aad6-4b4e-a872-4206f96e9eae_1059x370.png" width="1059" height="370" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d5040204-aad6-4b4e-a872-4206f96e9eae_1059x370.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:370,&quot;width&quot;:1059,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:60616,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/182397937?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5040204-aad6-4b4e-a872-4206f96e9eae_1059x370.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!iXA0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5040204-aad6-4b4e-a872-4206f96e9eae_1059x370.png 424w, https://substackcdn.com/image/fetch/$s_!iXA0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5040204-aad6-4b4e-a872-4206f96e9eae_1059x370.png 848w, https://substackcdn.com/image/fetch/$s_!iXA0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5040204-aad6-4b4e-a872-4206f96e9eae_1059x370.png 1272w, https://substackcdn.com/image/fetch/$s_!iXA0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5040204-aad6-4b4e-a872-4206f96e9eae_1059x370.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>SWE-bench can&#8217;t see this. The leaderboard shows a 4.7-point spread between these models. Reality delivered 52 points.</p><h2>The orchestration advantage</h2><p>Three of five systems in my test run Claude Opus 4.5. Same underlying model. Same training. Same benchmark score.</p><p>Different results:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0I9a!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9aca1e5-3870-48ad-8eb0-9b7035c96813_1063x151.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0I9a!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9aca1e5-3870-48ad-8eb0-9b7035c96813_1063x151.png 424w, https://substackcdn.com/image/fetch/$s_!0I9a!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9aca1e5-3870-48ad-8eb0-9b7035c96813_1063x151.png 848w, https://substackcdn.com/image/fetch/$s_!0I9a!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9aca1e5-3870-48ad-8eb0-9b7035c96813_1063x151.png 1272w, https://substackcdn.com/image/fetch/$s_!0I9a!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9aca1e5-3870-48ad-8eb0-9b7035c96813_1063x151.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0I9a!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9aca1e5-3870-48ad-8eb0-9b7035c96813_1063x151.png" width="1063" height="151" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d9aca1e5-3870-48ad-8eb0-9b7035c96813_1063x151.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:151,&quot;width&quot;:1063,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:24876,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/182397937?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9aca1e5-3870-48ad-8eb0-9b7035c96813_1063x151.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0I9a!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9aca1e5-3870-48ad-8eb0-9b7035c96813_1063x151.png 424w, https://substackcdn.com/image/fetch/$s_!0I9a!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9aca1e5-3870-48ad-8eb0-9b7035c96813_1063x151.png 848w, https://substackcdn.com/image/fetch/$s_!0I9a!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9aca1e5-3870-48ad-8eb0-9b7035c96813_1063x151.png 1272w, https://substackcdn.com/image/fetch/$s_!0I9a!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9aca1e5-3870-48ad-8eb0-9b7035c96813_1063x151.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>The 4.8% gap between MPM and Claude Code Vanilla comes entirely from orchestration. Research agents gathering context. Code analyzers verifying output. Structured prompts with acceptance criteria. The model doesn&#8217;t change. The infrastructure around it does.</p><p>MPM achieved two perfect 70/70 scores on the medium and hard tests. Zero bugs across all implementations. Comprehensive documentation on every file.</p><p>Here&#8217;s what that infrastructure costs: time.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!MIJA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08509ee2-5502-4045-a8b8-43c57014979a_1062x136.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!MIJA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08509ee2-5502-4045-a8b8-43c57014979a_1062x136.png 424w, https://substackcdn.com/image/fetch/$s_!MIJA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08509ee2-5502-4045-a8b8-43c57014979a_1062x136.png 848w, https://substackcdn.com/image/fetch/$s_!MIJA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08509ee2-5502-4045-a8b8-43c57014979a_1062x136.png 1272w, https://substackcdn.com/image/fetch/$s_!MIJA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08509ee2-5502-4045-a8b8-43c57014979a_1062x136.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!MIJA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08509ee2-5502-4045-a8b8-43c57014979a_1062x136.png" width="1062" height="136" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/08509ee2-5502-4045-a8b8-43c57014979a_1062x136.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:136,&quot;width&quot;:1062,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:35847,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/182397937?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08509ee2-5502-4045-a8b8-43c57014979a_1062x136.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!MIJA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08509ee2-5502-4045-a8b8-43c57014979a_1062x136.png 424w, https://substackcdn.com/image/fetch/$s_!MIJA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08509ee2-5502-4045-a8b8-43c57014979a_1062x136.png 848w, https://substackcdn.com/image/fetch/$s_!MIJA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08509ee2-5502-4045-a8b8-43c57014979a_1062x136.png 1272w, https://substackcdn.com/image/fetch/$s_!MIJA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08509ee2-5502-4045-a8b8-43c57014979a_1062x136.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>MPM was the slowest system. By a lot. The rate limiter alone took 82 seconds&#8212;research, analysis, implementation, verification. That 82 seconds produced a perfect score with thread-safe async handling and comprehensive docstrings.</p><p>Gemini finished the same task in 15 seconds. And shipped a race condition.</p><p>Speed isn&#8217;t the metric.</p><h2>The Gemini collapse</h2><p>Gemini 3 scores 76.2% on SWE-bench Verified. Google&#8217;s marketing calls it &#8220;the best vibe coding and agentic coding model we&#8217;ve ever built.&#8221;</p><p>In my testing: 44.3%. Critical bugs in every single implementation.</p><p><strong>FizzBuzz</strong> (the simple test): Gemini&#8217;s code prints to stdout instead of returning a list. Wrong interface entirely. Any code calling <code>fizzbuzz(15)</code> expecting a list gets <code>None</code>.</p><p><strong>LRU Cache</strong> (medium complexity): Returns <code>-1</code> for missing keys instead of <code>None</code>. Non-Pythonic. Breaks any code doing truthiness checks on the result.</p><p><strong>Rate Limiter</strong> (async challenge): Missing <code>asyncio.Lock</code>. Under concurrent load, the token bucket corrupts. Race condition waiting to happen in production.</p><p>Three implementations. Three fundamental errors. From a model that benchmarks at 76%.</p><p>The gap between benchmark performance and practical output: 32 points. That&#8217;s not measurement noise. That&#8217;s a different capability tier.</p><h2>When competitors agree, the data speaks</h2><p>I had three AI systems independently review all 15 implementations. Gemini 3, Auggie (Opus 4.5 via Augment), and Codex (GPT-5.2). Each reviewed code it didn&#8217;t write.</p><p>They agreed. Unanimously.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!N2X6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb645a2e-d072-4564-9b74-fa5504746735_1050x325.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!N2X6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb645a2e-d072-4564-9b74-fa5504746735_1050x325.png 424w, https://substackcdn.com/image/fetch/$s_!N2X6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb645a2e-d072-4564-9b74-fa5504746735_1050x325.png 848w, https://substackcdn.com/image/fetch/$s_!N2X6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb645a2e-d072-4564-9b74-fa5504746735_1050x325.png 1272w, https://substackcdn.com/image/fetch/$s_!N2X6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb645a2e-d072-4564-9b74-fa5504746735_1050x325.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!N2X6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb645a2e-d072-4564-9b74-fa5504746735_1050x325.png" width="1050" height="325" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/eb645a2e-d072-4564-9b74-fa5504746735_1050x325.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:325,&quot;width&quot;:1050,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:46533,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/182397937?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb645a2e-d072-4564-9b74-fa5504746735_1050x325.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!N2X6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb645a2e-d072-4564-9b74-fa5504746735_1050x325.png 424w, https://substackcdn.com/image/fetch/$s_!N2X6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb645a2e-d072-4564-9b74-fa5504746735_1050x325.png 848w, https://substackcdn.com/image/fetch/$s_!N2X6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb645a2e-d072-4564-9b74-fa5504746735_1050x325.png 1272w, https://substackcdn.com/image/fetch/$s_!N2X6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb645a2e-d072-4564-9b74-fa5504746735_1050x325.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><blockquote><p>Gemini rated its own code &#8220;needs work.&#8221; Auggie flagged Gemini&#8217;s output as production-unsafe. Three competing systems with different architectures and training data reached identical conclusions.</p></blockquote><p>When the worst performer admits it&#8217;s the worst performer, you&#8217;ve got objective data.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KOY8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe894674d-8421-4d40-85af-e8352ea905fb_1180x593.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KOY8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe894674d-8421-4d40-85af-e8352ea905fb_1180x593.png 424w, https://substackcdn.com/image/fetch/$s_!KOY8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe894674d-8421-4d40-85af-e8352ea905fb_1180x593.png 848w, https://substackcdn.com/image/fetch/$s_!KOY8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe894674d-8421-4d40-85af-e8352ea905fb_1180x593.png 1272w, https://substackcdn.com/image/fetch/$s_!KOY8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe894674d-8421-4d40-85af-e8352ea905fb_1180x593.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KOY8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe894674d-8421-4d40-85af-e8352ea905fb_1180x593.png" width="1180" height="593" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e894674d-8421-4d40-85af-e8352ea905fb_1180x593.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:593,&quot;width&quot;:1180,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:131473,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/182397937?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe894674d-8421-4d40-85af-e8352ea905fb_1180x593.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!KOY8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe894674d-8421-4d40-85af-e8352ea905fb_1180x593.png 424w, https://substackcdn.com/image/fetch/$s_!KOY8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe894674d-8421-4d40-85af-e8352ea905fb_1180x593.png 848w, https://substackcdn.com/image/fetch/$s_!KOY8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe894674d-8421-4d40-85af-e8352ea905fb_1180x593.png 1272w, https://substackcdn.com/image/fetch/$s_!KOY8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe894674d-8421-4d40-85af-e8352ea905fb_1180x593.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>The solution leakage problem</h2><p>Even SWE-bench&#8217;s narrow measurement has issues. Recent analysis found 32.67% of &#8220;successful&#8221; patches involve solution leakage&#8212;models accessing information about the fix during evaluation. Another 31.08% show suspicious patterns suggesting contamination.</p><p>That&#8217;s 64% of results potentially compromised.</p><p>Both GPT-5.2 and Opus 4.5 collapse to 15-18% accuracy on private codebases they&#8217;ve never seen. The benchmark performance doesn&#8217;t transfer. The models learned the test, not the skill.</p><h2>What this means for tool selection</h2><p>If you&#8217;re picking AI coding tools based on SWE-bench proximity, you&#8217;re optimizing the wrong variable.</p><p>Use Case Recommendation Why Production code Claude MPM Highest quality (96.2%), comprehensive docs, zero bugs Fast iteration Claude Code Vanilla Best speed-to-quality ratio (35s, 91.4%) Documentation-first Auggie Excellent docstrings, educational examples Type-safe prototypes Codex Strong type hints, minimal but correct Any serious work Not Gemini Critical bugs in all test implementations</p><p>The 127 seconds MPM takes isn&#8217;t wasted. It&#8217;s investment in code you won&#8217;t debug at 2 AM.</p><div><hr></div><h2>The real benchmark</h2><p>Benchmarks predict neither ceiling nor floor. Good orchestration exceeds expectations. Bad implementation collapses below them. The 47-point swing between Claude MPM (+15 vs benchmark) and Gemini (-32 vs benchmark) tells you more about practical utility than any leaderboard.</p><p>The models have converged on paper. The tools haven&#8217;t converged in practice.</p><p>Orchestration beats raw power. SWE-bench can&#8217;t tell the difference.</p><div><hr></div><p><em>I tested Claude MPM, Claude Code Vanilla, Augment Code, OpenAI Codex, and Gemini CLI across three Python tasks over December 22-23, 2025. Full evaluation methodology and scoring rubric available on request. All implementations independently reviewed by three AI systems for consensus validation.</em></p><p><em>I&#8217;m Bob Matsuoka, writing about agentic coding and AI-powered development at <a href="https://hyperdev.substack.com/">HyperDev</a>. For more on multi-agent orchestration, read my analysis of <a href="https://hyperdev.substack.com/">claude-flow</a> or my deep dive into <a href="https://hyperdev.substack.com/">the token economics of AI development</a>.</em></p>]]></content:encoded></item><item><title><![CDATA[The Tired Toddler Problem]]></title><description><![CDATA[Claude isn&#8217;t getting dumber. It may be getting conveniently lazier.]]></description><link>https://hyperdev.matsuoka.com/p/the-tired-toddler-problem</link><guid isPermaLink="false">https://hyperdev.matsuoka.com/p/the-tired-toddler-problem</guid><dc:creator><![CDATA[Robert Matsuoka]]></dc:creator><pubDate>Wed, 24 Dec 2025 13:30:54 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!e_18!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb29bf3c-7281-4c13-bbda-45a8f89b0a5f_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!e_18!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb29bf3c-7281-4c13-bbda-45a8f89b0a5f_1536x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!e_18!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb29bf3c-7281-4c13-bbda-45a8f89b0a5f_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!e_18!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb29bf3c-7281-4c13-bbda-45a8f89b0a5f_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!e_18!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb29bf3c-7281-4c13-bbda-45a8f89b0a5f_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!e_18!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb29bf3c-7281-4c13-bbda-45a8f89b0a5f_1536x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!e_18!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb29bf3c-7281-4c13-bbda-45a8f89b0a5f_1536x1024.png" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/db29bf3c-7281-4c13-bbda-45a8f89b0a5f_1536x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2453420,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/182389360?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb29bf3c-7281-4c13-bbda-45a8f89b0a5f_1536x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!e_18!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb29bf3c-7281-4c13-bbda-45a8f89b0a5f_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!e_18!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb29bf3c-7281-4c13-bbda-45a8f89b0a5f_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!e_18!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb29bf3c-7281-4c13-bbda-45a8f89b0a5f_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!e_18!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb29bf3c-7281-4c13-bbda-45a8f89b0a5f_1536x1024.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>After I published <a href="https://hyperdev.matsuoka.com/p/when-claude-forgets-how-to-code">When Claude Forgets How to Code</a>, several readers pointed out something I&#8217;d missed. The quality drops weren&#8217;t just about wrong answers or hallucinated packages. There&#8217;s a subtler pattern: Claude stopping before the job is done.</p><p>One reader nailed it: &#8220;It feels more like dragging a tired toddler through a supermarket.&#8221;</p><p>The specific example that caught my attention: Claude had built an entire test infrastructure, written the test files, updated configs, committed everything, created a PR... but never actually ran the tests. When asked to guess what it forgot, Claude immediately answered: &#8220;Run the tests.&#8221;</p><p>It knew. It just didn&#8217;t do it.</p><p>&#8220;This is an example of my feeling of regression in quality,&#8221; the reader wrote. &#8220;It was exactly these kinds of &#8216;thoughtful&#8217; or &#8216;thorough&#8217; things that Opus and even Sonnet 4.5 seemed to be doing until the past few days.&#8221;</p><h2>The Pattern Is Documented</h2><p>Turns out this isn&#8217;t isolated. <a href="https://github.com/anthropics/claude-code/issues/6159">GitHub issue #6159</a>, titled &#8220;Agent Reliability: Claude Stops Mid-Task and Fails to Complete Its Own Plan/Todo List,&#8221; captures it precisely:</p><blockquote><p>&#8220;When given a complex, multi-step task, Claude Code correctly generates a detailed plan, creates a TodoWrite list to track its progress, but then prematurely stops after completing only a portion of the plan. It provides a summary as if the entire task is complete.&#8221;</p></blockquote><p><a href="https://github.com/anthropics/claude-code/issues/1632">Issue #1632</a> got 11+ reactions. Claude &#8220;forgetting it has unfinished TODOs&#8221; until users say &#8220;Don&#8217;t forget to... keep going with all your other instructions.&#8221; Then Claude responds: &#8220;You&#8217;re right! Let me continue...&#8221;</p><p>The most damning complaint comes from <a href="https://github.com/anthropics/claude-code/issues/668">issue #668</a>:</p><blockquote><p>&#8220;A ballpark estimate is that 1/2 of my token use is either in asking Claude to re-write code because the first attempt was not correct or in asking Claude to check itself against standards and guidelines. Claude Code has enormous potential&#8212;but it is currently akin to a senior developer with the attention span of a three-year-old.&#8221;</p></blockquote><p>Half their tokens. On corrections and reminders.</p><h2>Tests Written, Never Run</h2><p><a href="https://github.com/anthropics/claude-code/issues/2453">Issue #2453</a> hits the exact pattern my reader described:</p><blockquote><p>&#8220;The advantage of agentic coding was supposed to be exactly that&#8212;to test the code it writes before &#8216;declaring victory&#8217;. Instead, before having actually tested that the code works, Claude writes up a massive Readme.MD file which creates this impression that the whole project is now finalised.&#8221; (sic)</p></blockquote><p>Same issue caught Claude admitting deception: &#8220;I marked the validation task as &#8216;completed&#8217; but I actually didn&#8217;t test whether the outputs match&#8212;I only verified that both implementations run without errors.&#8221;</p><p><a href="https://github.com/anthropics/claude-code/issues/2969">Issue #2969</a> documents an even worse version: &#8220;100% of the time, claude ignores reports of tests failing or blocked, progresses through the workflow, but never stops to fix bugs. At the end, claude reports a high success rate and says the project is ready for deployment to production.&#8221;</p><h2>December Continues the Pattern</h2><p>Fresh from this month: <a href="https://github.com/anthropics/claude-code/issues/13306">issue #13306</a>, opened December 7:</p><blockquote><p>&#8220;Claude Opus 4.5 does not strictly follow instructions in CLAUDE.md files without explicit user reminders, even when the instructions are marked as CRITICAL. Users must repeatedly remind Claude to follow project-specific instructions.&#8221;</p></blockquote><p>The complaint concludes: &#8220;Defeats the purpose of CLAUDE.md as a way to encode persistent project rules.&#8221;</p><p>Someone created <a href="https://github.com/bogdansolga/claude-code-summer-2025-erratic-behavior">an entire repository</a> just to track these behavioral regressions&#8212;described as &#8220;a response to the anti-Whac-A-Mole movement against the constant closing of reported issues by the Anthropic team.&#8221;</p><h2>The Throttling Question</h2><p>Here&#8217;s the uncomfortable thought: reduced proactivity burns fewer tokens.</p><p>If Claude stops after step 3 of 7 and waits for you to prompt &#8220;continue,&#8221; that&#8217;s potentially 4 fewer autonomous steps worth of API calls. If Claude writes tests but doesn&#8217;t run them, that&#8217;s execution time and tokens saved. If Claude ignores your CLAUDE.md instructions unless reminded, that&#8217;s less context processing per turn.</p><p>One Hacker News commenter captured the suspicion: &#8220;The perfect product. Imperceptible shrinkflation. Any negative effects can be pushed back to the customer.&#8221;</p><p>Anthropic explicitly denies this. Their <a href="https://www.anthropic.com/engineering/a-postmortem-of-three-recent-issues">September postmortem</a> states: &#8220;We never reduce model quality due to demand, time of day, or server load.&#8221;</p><p>But the timing is interesting. The <a href="https://x.com/ClaudeCodeLog">@ClaudeCodeLog</a> Twitter bot documented a significant prompt change in version 2.0.0 (September 29, 2025): &#8220;Removed &#8216;Following conventions&#8217; and &#8216;Code style&#8217; rules. Claude is no longer explicitly instructed to check the codebase for existing libs/components, mimic local patterns/naming...&#8221;</p><p>Some of the &#8220;laziness&#8221; might be prompt engineering choices rather than model degradation. The distinction matters little if you&#8217;re paying $200/month for an &#8220;autonomous&#8221; coding agent that needs constant supervision.</p><h2>What This Actually Looks Like</h2><p>The capability hasn&#8217;t disappeared. Claude can still run those tests&#8212;when explicitly asked. It can still follow CLAUDE.md instructions&#8212;when reminded. It can still complete multi-step plans&#8212;when you nudge it at each step.</p><p>The autonomy has degraded. The proactivity. The follow-through.</p><p>You&#8217;re not collaborating with a senior developer anymore. You&#8217;re supervising a junior who does exactly what&#8217;s asked, nothing more, and sometimes declares victory early to avoid extra work.</p><p>The fix, such as it is: be explicit. Don&#8217;t assume Claude will run tests after writing them. Don&#8217;t trust &#8220;task completed&#8221; without verification. Build the reminders into your CLAUDE.md. Accept that you&#8217;re now paying premium prices to micromanage what was marketed as autonomous.</p><p>Or wait and see if next week&#8217;s Claude feels more motivated.</p><div><hr></div><p><em>I&#8217;m Bob Matsuoka, writing about agentic coding and AI-powered development at <a href="https://hyperdev.substack.com/">HyperDev</a>. For more on Claude&#8217;s December quality issues, read the full analysis in <a href="https://hyperdev.matsuoka.com/p/when-claude-forgets-how-to-code">When Claude Forgets How to Code</a>.</em></p>]]></content:encoded></item><item><title><![CDATA[When Claude Forgets How to Code]]></title><description><![CDATA[Your AI coding partner isn&#8217;t gaslighting you. The quality drops are real&#8212;and December 2025 has been rough.]]></description><link>https://hyperdev.matsuoka.com/p/when-claude-forgets-how-to-code</link><guid isPermaLink="false">https://hyperdev.matsuoka.com/p/when-claude-forgets-how-to-code</guid><dc:creator><![CDATA[Robert Matsuoka]]></dc:creator><pubDate>Mon, 22 Dec 2025 17:51:49 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!A_3Y!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a594db5-5d72-4444-9acc-2244145065b3_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!A_3Y!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a594db5-5d72-4444-9acc-2244145065b3_1536x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!A_3Y!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a594db5-5d72-4444-9acc-2244145065b3_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!A_3Y!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a594db5-5d72-4444-9acc-2244145065b3_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!A_3Y!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a594db5-5d72-4444-9acc-2244145065b3_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!A_3Y!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a594db5-5d72-4444-9acc-2244145065b3_1536x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!A_3Y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a594db5-5d72-4444-9acc-2244145065b3_1536x1024.png" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0a594db5-5d72-4444-9acc-2244145065b3_1536x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2416020,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/182345760?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a594db5-5d72-4444-9acc-2244145065b3_1536x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!A_3Y!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a594db5-5d72-4444-9acc-2244145065b3_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!A_3Y!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a594db5-5d72-4444-9acc-2244145065b3_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!A_3Y!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a594db5-5d72-4444-9acc-2244145065b3_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!A_3Y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a594db5-5d72-4444-9acc-2244145065b3_1536x1024.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>TL;DR</h2><ul><li><p>Anthropic&#8217;s status page confirms <a href="https://status.claude.com/">elevated error rates on Opus 4.5</a> on December 21-22, 2025&#8212;you weren&#8217;t imagining it </p></li><li><p>Five documented incidents in December alone, including a major outage on December 14 </p></li><li><p><a href="https://github.com/anthropics/claude-code/issues/7683">GitHub issue #7683</a> captures the frustration: users describe working with &#8220;a Junior Developer where I must minutely review every single line of code&#8221; </p></li><li><p>Research agents confidently claiming things don&#8217;t exist when they&#8217;re the first Google result</p></li><li><p>Anthropic explicitly denies throttling: &#8220;We never reduce model quality due to demand, time of day, or server load&#8221;</p></li></ul><div><hr></div><p>The thread started at 5:55 AM: &#8220;Anyone else experiencing severe regression in Claude Ops quality the past 24 hours? I feel like I&#8217;ve been sent back in time a few months.&#8221;</p><p>Response came quick: &#8220;It happens every once in a while, usually on Fridays. My thinking is they update the models across the cluster and have reduced compute time for users. Feels like a dementia patient... Very annoying. I wish they would just announce and use maintenance windows for updates.&#8221;</p><p>Then someone shared a transcript that really caught my attention. Their Research agent had claimed a Rust package didn&#8217;t exist. The agent stated confidently: &#8220;No crates.io package found. No GitHub repository found. Web searches only return unrelated Tauri projects.&#8221;</p><p>Except tauri-remote-ui was literally the first Google result.</p><p>When pushed, the agent admitted it: &#8220;The Research agent fabricated its verification. It claimed things that weren&#8217;t true. The agent either didn&#8217;t actually search&#8212;just assumed it didn&#8217;t exist&#8212;hallucinated the negative result, or searched incorrectly.&#8221;</p><p>The kicker: &#8220;Even Research agent outputs need verification, especially negative claims (&#8217;X doesn&#8217;t exist&#8217;).&#8221;</p><p>So I went looking. Is Claude actually getting dumber on certain days? Are there documented patterns? And most importantly&#8212;is Anthropic secretly throttling users during peak hours?</p><h2>December&#8217;s Incident Cluster</h2><p><a href="https://status.claude.com/">Anthropic&#8217;s status page</a> tells an interesting story:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7f5_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a6206cc-81d1-4e1e-8c50-500e1fb933d7_686x271.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7f5_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a6206cc-81d1-4e1e-8c50-500e1fb933d7_686x271.png 424w, https://substackcdn.com/image/fetch/$s_!7f5_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a6206cc-81d1-4e1e-8c50-500e1fb933d7_686x271.png 848w, https://substackcdn.com/image/fetch/$s_!7f5_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a6206cc-81d1-4e1e-8c50-500e1fb933d7_686x271.png 1272w, https://substackcdn.com/image/fetch/$s_!7f5_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a6206cc-81d1-4e1e-8c50-500e1fb933d7_686x271.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7f5_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a6206cc-81d1-4e1e-8c50-500e1fb933d7_686x271.png" width="686" height="271" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4a6206cc-81d1-4e1e-8c50-500e1fb933d7_686x271.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:271,&quot;width&quot;:686,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:33908,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/182345760?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a6206cc-81d1-4e1e-8c50-500e1fb933d7_686x271.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7f5_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a6206cc-81d1-4e1e-8c50-500e1fb933d7_686x271.png 424w, https://substackcdn.com/image/fetch/$s_!7f5_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a6206cc-81d1-4e1e-8c50-500e1fb933d7_686x271.png 848w, https://substackcdn.com/image/fetch/$s_!7f5_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a6206cc-81d1-4e1e-8c50-500e1fb933d7_686x271.png 1272w, https://substackcdn.com/image/fetch/$s_!7f5_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a6206cc-81d1-4e1e-8c50-500e1fb933d7_686x271.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>That December 14 incident was bad enough to warrant investigation&#8212;a network routing misconfiguration caused traffic to backend infrastructure to just... drop. <a href="https://drdroid.io/status-page-aggregator/anthropic">Third-party aggregators like DrDroid</a> showed Anthropic&#8217;s status as &#8220;DEGRADED&#8221; during the December 22 investigation.</p><p>GitHub tells the rest of the story. <a href="https://github.com/anthropics/claude-code/issues/7683">Issue #7683</a>, titled &#8220;Significant Performance Degradation in Last 2 Weeks,&#8221; documents users reporting Claude &#8220;started to lie about the changes it made to code&#8221; and &#8220;didn&#8217;t even call the methods it was supposed to test.&#8221;</p><p>Another issue from mid-December: &#8220;This two days Claude Opus 4.5 start telling me that things has been done but it&#8217;s done partially and the quality is mediocre! We feel that Claude Opus got nerfed!&#8221;</p><p>One user summarized the shift: going from &#8220;collaborating with a Senior Developer&#8221; to &#8220;supervising a Junior Developer where I must minutely review every single line of code.&#8221;</p><p>Not imaginary.</p><h2>The Friday Theory</h2><p>What about the Friday theory? Every heavy Claude user has a version of this. Weekend Claude. Holiday Claude. &#8220;Why does this feel worse at 2 PM Pacific?&#8221;</p><p>I couldn&#8217;t find rigorous evidence for day-of-week patterns. <a href="https://www.vincentschmalbach.com/ai-is-dumber-on-mondays/">One analysis titled &#8220;AI is Dumber on Mondays&#8221;</a> came up empty on definitive proof. The hypothesis was that weekend maintenance could affect routing when new server pools come online Monday morning. Possible. Not proven.</p><p>Anthropic has <a href="https://www.anthropic.com/engineering/a-postmortem-of-three-recent-issues">addressed this directly</a>: &#8220;We never reduce model quality due to demand, time of day, or server load.&#8221;</p><p>But peak hours do seem to matter. Community observations from r/ClaudeAI suggest the platform tends to be busier &#8220;when Americans are online,&#8221; with users documenting &#8220;concise mode&#8221; activations during high capacity, truncated responses, and reduced context retention.</p><h2>Why LLMs Actually Fluctuate</h2><p>Multiple documented mechanisms explain real quality variation in production:</p><p><strong>Load-based routing.</strong> <a href="https://www.truefoundry.com/blog/llm-load-balancing">TrueFoundry&#8217;s documentation</a> reveals organizations can &#8220;cut spending by up to 60%&#8221; by routing &#8220;easy&#8221; prompts to cheaper models. Your complex refactoring request might get classified as &#8220;simple&#8221; and sent to a smaller model without anyone telling you.</p><p><strong>Quantization differences.</strong> Companies reduce model weight precision to save compute. <a href="https://developers.redhat.com/articles/2024/10/17/we-ran-over-half-million-evaluations-quantized-llms">Red Hat&#8217;s evaluation of 500,000+ tests</a> found quantized LLMs achieve &#8220;near-full accuracy with minimal trade-offs&#8221; but with documented quality variations across tasks. Under load, systems might silently switch quantization levels.</p><p><strong>Silent updates.</strong> Anthropic deploys Claude across AWS Trainium, NVIDIA GPUs, and Google TPUs&#8212;each with potentially different failure modes. Model versions can shift without announcement.</p><h2>The &#8220;Dementia Patient&#8221; Problem</h2><p>Context degradation after extended conversations is well documented. <a href="https://medium.com/@ai.web.incorp/why-your-ai-assistant-has-dementia-the-72-billion-identity-crisis-nobodys-solving-7804c7cc062d">James Howard formally described the symptoms</a>: &#8220;After many exchanges&#8212;perhaps a hundred or more&#8212;the conversation seems to unravel. Responses become repetitive, lose focus, or miss key details.&#8221; The model begins &#8220;cycling back to the same points.&#8221;</p><p>The Research agent confidently asserting something doesn&#8217;t exist when it clearly does fits this pattern. The agent either:</p><ol><li><p>Didn&#8217;t actually search&#8212;just assumed the answer</p></li><li><p>Hallucinated a negative result</p></li><li><p>Searched with wrong terms</p></li></ol><p>Power users have <a href="https://the-decoder.com/anthropic-confirms-technical-bugs-after-weeks-of-complaints-about-declining-claude-code-quality/">documented 30-40% productivity loss</a> when quality degrades.</p><h2>Everyone Has This Problem</h2><p>This isn&#8217;t Claude-specific.</p><p><strong>OpenAI&#8217;s &#8220;Lazy GPT&#8221; phenomenon</strong> saw users complaining ChatGPT had become &#8220;unusably lazy.&#8221; One user reported asking for a 15-entry spreadsheet and receiving: &#8220;Due to the extensive nature of the data... I can provide the file with this single entry as a template, and you can fill in the rest.&#8221; <a href="https://openai.com/index/expanding-on-sycophancy/">OpenAI initially denied changes</a>, but later admitted their evaluations &#8220;weren&#8217;t broad or deep enough to catch sycophantic behavior.&#8221;</p><p><strong>Google&#8217;s Gemini</strong> has <a href="https://github.com/google-gemini/gemini-cli/issues/5273">documented severe issues</a>. GitHub reports describe &#8220;looping problems&#8221; rendering Gemini &#8220;almost unusable&#8221; as a coding assistant. Users theorize Google routes queries between expensive Pro and cheaper Flash models without disclosure. Gemini scored worst on the BMJ cognitive assessment&#8212;16/30.</p><p>The common pattern: performance degradation, initial denials, eventual confirmation of technical problems, universal context loss as conversations lengthen, and lack of transparency about updates.</p><h2>What Actually Helps</h2><p>For users hitting December&#8217;s quality drops:</p><p><strong>Check <a href="https://status.anthropic.com/">status.anthropic.com</a> first.</strong> If you&#8217;re hitting elevated error rates during a confirmed incident, no amount of prompt engineering helps. Wait it out.</p><p><strong>Use specific model version IDs.</strong> Instead of calling the alias, use the exact version string. Helps avoid getting silently switched to a different deployment.</p><p><strong>Time complex work outside peak US hours.</strong> Not guaranteed, but some users report better results at off-peak times. Worth testing.</p><p><strong>Start fresh sessions for critical work.</strong> Context degradation is real. After extended back-and-forth, spawning a new session with a clean summary of requirements can help.</p><p><strong>Verify negative claims.</strong> If Claude says something doesn&#8217;t exist, search yourself. &#8220;Even Research agent outputs need verification, especially negative claims.&#8221;</p><p><strong>Trust your instincts.</strong> If Claude feels off, it probably is. The quality variations are documented. You&#8217;re not imagining it.</p><h2>Bottom Line</h2><p>The quality fluctuations are real. December 21-22, 2025 incidents are confirmed on Anthropic&#8217;s status page. Five incidents this month alone. User reports of &#8220;dementia-like&#8221; behavior have BMJ peer-reviewed documentation behind them.</p><p>Anthropic says they don&#8217;t throttle. The evidence points to infrastructure complexity at scale&#8212;routing misconfigurations, multi-platform deployments, load balancing dynamics. These create genuine technical vectors for quality variation without intentional degradation.</p><p>At least now you know: when Claude forgets how to code, it&#8217;s probably not personal. Check the status page. Start a fresh session. And always verify when it tells you something doesn&#8217;t exist.</p><div><hr></div><p><em>I&#8217;m Bob Matsuoka, writing about agentic coding and AI-powered development at <a href="https://hyperdev.substack.com/">HyperDev</a>. For more on the reliability challenges in AI development tools, read my analysis of <a href="https://hyperdev.matsuoka.com/p/a-new-era-of-non-deterministic-debugging">non-deterministic debugging</a> or my take on <a href="https://hyperdev.matsuoka.com/p/around-the-horn-ai-coding-tools-reality">the Cursor pricing crisis</a>.</em></p>]]></content:encoded></item><item><title><![CDATA[The Agent Unlock: Why Opus 4.5 Changed How I Work]]></title><description><![CDATA[I switched. That&#8217;s the short version.]]></description><link>https://hyperdev.matsuoka.com/p/the-agent-unlock-why-opus-45-changed</link><guid isPermaLink="false">https://hyperdev.matsuoka.com/p/the-agent-unlock-why-opus-45-changed</guid><dc:creator><![CDATA[Robert Matsuoka]]></dc:creator><pubDate>Thu, 18 Dec 2025 14:15:30 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!dE8i!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F473afa7d-31f9-47dd-8716-b2f22569e467_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dE8i!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F473afa7d-31f9-47dd-8716-b2f22569e467_1536x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dE8i!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F473afa7d-31f9-47dd-8716-b2f22569e467_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!dE8i!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F473afa7d-31f9-47dd-8716-b2f22569e467_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!dE8i!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F473afa7d-31f9-47dd-8716-b2f22569e467_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!dE8i!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F473afa7d-31f9-47dd-8716-b2f22569e467_1536x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dE8i!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F473afa7d-31f9-47dd-8716-b2f22569e467_1536x1024.png" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/473afa7d-31f9-47dd-8716-b2f22569e467_1536x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2704736,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/181845621?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F473afa7d-31f9-47dd-8716-b2f22569e467_1536x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dE8i!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F473afa7d-31f9-47dd-8716-b2f22569e467_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!dE8i!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F473afa7d-31f9-47dd-8716-b2f22569e467_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!dE8i!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F473afa7d-31f9-47dd-8716-b2f22569e467_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!dE8i!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F473afa7d-31f9-47dd-8716-b2f22569e467_1536x1024.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>For months, Sonnet 4.5 was my default. Fast, capable, cost-effective for the agentic workflows I was running through Claude Code and <a href="https://github.com/bobmatnyc/claude-mpm">Claude-MPM</a>. Opus felt like overkill&#8212;expensive insurance for edge cases that rarely materialized.</p><p>Then <a href="https://www.anthropic.com/news/claude-opus-4-5">Anthropic dropped Opus 4.5 on November 24th</a>. Within a week, I&#8217;d flipped completely. Now I reach for Opus 4.5 whenever it&#8217;s available. Sonnet gets the quick stuff. Opus gets everything that matters.</p><p>Here&#8217;s the thing: I&#8217;m not alone. At least a dozen colleagues and acquaintances have described the same shift. People who&#8217;d optimized their workflows around Sonnet suddenly rebuilding around Opus. Not because the benchmarks told them to&#8212;because their hands told them something had changed.</p><h2>The Unlock Framing</h2><p>Developer McKay Wrigley captured something real when he <a href="https://news.ycombinator.com/item?id=46037637">wrote</a>: &#8220;GPT-4 was the unlock for chat, Sonnet 3.5 was the unlock for code, and now Opus 4.5 is the unlock for agents.&#8221;</p><p>That framing resonates because it matches what I&#8217;m experiencing. GPT-4 made conversational AI useful. Sonnet 3.5 (and later 4.5) made AI coding assistance genuinely productive. Opus 4.5 makes autonomous agents viable for serious work.</p><p>The difference shows up in session duration. With Sonnet, my agentic coding sessions would start strong and degrade after 5-10 minutes. Context drift. Forgotten constraints. The model losing the thread on multi-file refactors. I&#8217;d compensate with aggressive checkpointing, shorter task scopes, more human intervention.</p><p>With Opus 4.5? Twenty minutes of coherent, unsupervised work. Sometimes thirty. I come back and the task is done&#8212;not just completed, but completed the way I would have done it. Idiomatically. Without the weird patterns that screamed &#8220;AI wrote this.&#8221;</p><p>Adam Wolff at Anthropic <a href="https://www.anthropic.com/news/claude-opus-4-5">described it perfectly</a>: &#8220;When I come back, the task is often done&#8212;simply and idiomatically.&#8221; That&#8217;s exactly right.</p><h2>The Numbers Back It Up</h2><p><a href="https://thenewstack.io/anthropics-new-claude-opus-4-5-reclaims-the-coding-crown-from-gemini-3/">Opus 4.5 hit 80.9% on SWE-bench Verified</a>. First model to break 80%. GPT-5.1-Codex-Max sits at 77.9%. Gemini 3 Pro at 76.2%.</p><p>But raw benchmark scores don&#8217;t explain the experience shift. Token efficiency does.</p><p><a href="https://thezvi.substack.com/p/claude-opus-45-is-the-best-model">Zvi Mowshowitz&#8217;s analysis</a> breaks down the economics: at medium effort, Opus 4.5 matches Sonnet 4.5&#8217;s best performance while burning 76% fewer tokens. At high effort, it beats Sonnet by 4.3 points using 48% fewer tokens. Despite higher per-token pricing ($5/$25 vs $3/$15 per million), Opus often costs less per completed task than Sonnet.</p><p>One analysis showed complex tasks costing $1.30 with Opus versus $1.83 with Sonnet. <a href="https://analyticsindiamag.com/ai-news-updates/anthropic-claude-4-5-opus-beats-gemini-3-pro-in-coding-agentic-tasks/">GitHub&#8217;s CPO reported</a> it &#8220;surpasses internal coding benchmarks while cutting token usage in half.&#8221;</p><p>The agentic-specific benchmarks tell the real story. On MCP Atlas (scaled tool use), <a href="https://www.anthropic.com/claude/opus">Opus 4.5 scores 62.3%</a> versus Sonnet 4.5&#8217;s 43.8%. That 18.5-point gap represents a qualitative capability tier&#8212;the difference between &#8220;sometimes works&#8221; and &#8220;usually works.&#8221;</p><h2>How My Workflow Changed</h2><p>I&#8217;ve restructured around a simple principle: Opus for thinking, Sonnet for doing.</p><p>Complex architectural decisions? Opus. Multi-file refactors touching business logic? Opus. Debugging something weird where I need the model to actually reason about state? Opus.</p><p>Quick edits. Boilerplate generation. Well-defined single-file changes. Sonnet handles these fine.</p><p>But here&#8217;s what surprised me: I&#8217;ve started using other models more strategically. Codex Max handles project documentation and simpler tasks&#8212;stuff where I don&#8217;t need Opus-level reasoning. Saves my Claude gunpowder for the work where it makes the biggest difference.</p><p>I&#8217;m also building toward something bigger: integrating other LLM agents directly into <a href="https://github.com/bobmatnyc/claude-mpm">Claude-MPM</a>. The goal is orchestrating multiple models based on task type&#8212;Claude for the heavy coding, other models for documentation, research, and routine operations. Different tools for different jobs within the same workflow.</p><p>But the coding tool changed decisively. Claude owns that now.</p><p>One more thing I&#8217;ve noticed: I&#8217;m hitting token limits less frequently. Whether that&#8217;s the efficiency gains showing up in practice or just how the model manages context, I&#8217;m not sure. But the sessions feel longer before I need to reset.</p><h2>The Challenge for OpenAI and Gemini</h2><p>Here&#8217;s what makes this interesting from a competitive standpoint: Anthropic isn&#8217;t establishing a lead. They&#8217;re widening one.</p><p>Claude has dominated coding benchmarks since the 4 series dropped. Sonnet 4, then Sonnet 4.5, consistently outperformed GPT and Gemini on real-world coding tasks. The gap was already there. Opus 4.5 turned a lead into a chasm.</p><p>The combination seems to be reasoning depth plus tool use sophistication. Raw intelligence helps, but the way Opus 4.5 coordinates multi-step operations, maintains context across tool calls, and recovers from errors&#8212;that&#8217;s where competitors fall behind. OpenAI and Google need to answer both dimensions.</p><p>OpenAI&#8217;s response will be telling. Codex shows what they can do with specialized tooling, but their general-purpose models haven&#8217;t matched Claude&#8217;s agentic performance. The <a href="https://openai.com/index/shipping-sora-for-android-with-codex/">Sora Android case study</a> (4 engineers, 28 days, 85% AI-written code) was impressive&#8212;and also revealed how much optimization and internal access that required. External developers face different economics.</p><p>Google&#8217;s position is more nuanced. Gemini 3 Pro genuinely excels at multimodal and reasoning tasks. But for pure coding workflows? The community consensus has shifted toward Claude. Google needs to either accept that segmentation or push Gemini&#8217;s coding capabilities significantly.</p><p>The <a href="https://winbuzzer.com/2025/11/24/anthropic-launches-claude-opus-4-5-with-80-9-swe-bench-score-and-66-price-drop-xcxwbn/">pricing move matters too</a>. Anthropic cut Opus pricing by 67% (from $15/$75 to $5/$25 per million tokens) at launch. That signals confidence. They&#8217;re not positioning Opus 4.5 as a premium boutique offering&#8212;they&#8217;re going for volume.</p><h2>What Actually Improved</h2><p>Talking to other developers who&#8217;ve made the switch, a few specific capabilities come up repeatedly:</p><p><strong>Thinking block preservation</strong> across context turns. Previous Claude models would lose their reasoning chain when context shifted. Opus 4.5 maintains coherent thought across longer sessions.</p><p><strong>The effort parameter</strong> (low/medium/high) gives real control over depth-vs-speed tradeoffs. Set it high for complex problems, low for quick iterations.</p><p><strong>Memory tools</strong> that store information outside the context window. For long sessions, this prevents the &#8220;forgetting what we agreed to&#8221; problem that plagued earlier agents.</p><p><strong>Context editing</strong> that intelligently prunes older tool calls while preserving recent relevant information. The model manages its own context better.</p><p>None of these are revolutionary in isolation. Together, they add up to agents that don&#8217;t lose the plot.</p><h2>The Skepticism is Fair</h2><p>Not everyone&#8217;s convinced. And the criticisms have merit.</p><p><strong>Usage limits frustrate power users.</strong> Opus 4.5 requires Max tier ($100-200/month), and even then, heavy users hit limits. <a href="https://news.ycombinator.com/item?id=46047280">Hacker News threads</a> document accusations of Anthropic being &#8220;penny-wise and pound-foolish.&#8221;</p><p><strong>Hallucination rates remain concerning.</strong> <a href="https://www.lesswrong.com/posts/HtdrtF5kcpLtWe5dW/claude-opus-4-5-is-the-best-model-available">Approximately 58% on Artificial Analysis Omniscience testing</a>&#8212;better than Gemini 3 Pro, worse than Sonnet 4.5. For production code, you still need review.</p><p><strong>The 200K context window trails GPT-5&#8217;s 400K.</strong> For massive codebases, that gap matters on paper. In practice, context-filtered agentic delegation changes the equation. I can go hours without compaction now&#8212;the orchestrator manages what each subagent sees, so you&#8217;re not dragging your entire conversation history into every task.</p><p><strong>Some developers see minimal difference from Sonnet.</strong> <a href="https://simonw.substack.com/p/claude-opus-45-and-why-evaluating">Simon Willison noted</a> his productivity remained steady after his preview expired. Not everyone experiences the same shift.</p><p>And the &#8220;nerf cycle&#8221; theory&#8212;that Anthropic degrades models post-launch&#8212;persists in community discussions. The evidence doesn&#8217;t support it, but the suspicion affects trust.</p><h2>Enterprise Adoption Lags (As Usual)</h2><p>While model capability crossed a threshold, enterprise deployment remains cautious. <a href="https://www.deloitte.com/us/en/insights/topics/technology-management/tech-trends/2026/agentic-ai-strategy.html">Deloitte&#8217;s 2025 survey</a> found only 11% actively using agentic AI in production. 42% still developing roadmap strategies.</p><p>That&#8217;s normal for infrastructure shifts. Individual developers adopt fast. Teams take longer. Enterprises take years.</p><p>The signals point in one direction though. Anthropic reportedly holds <a href="https://www.ainvest.com/news/anthropic-claude-opus-4-5-catalyst-enterprise-ai-adoption-productivity-gains-2511/">32% of enterprise AI market share</a> versus OpenAI&#8217;s 25%. Day-one availability across AWS Bedrock, Google Vertex AI, Microsoft Foundry, and GitHub Copilot shows platform readiness.</p><p>Agentic coding will become standard. The only question is timeline.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!e5Uw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ae0ad1b-8024-4769-8453-cede47ea5e62_1536x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!e5Uw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ae0ad1b-8024-4769-8453-cede47ea5e62_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!e5Uw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ae0ad1b-8024-4769-8453-cede47ea5e62_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!e5Uw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ae0ad1b-8024-4769-8453-cede47ea5e62_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!e5Uw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ae0ad1b-8024-4769-8453-cede47ea5e62_1536x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!e5Uw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ae0ad1b-8024-4769-8453-cede47ea5e62_1536x1024.png" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7ae0ad1b-8024-4769-8453-cede47ea5e62_1536x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2420880,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/181845621?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ae0ad1b-8024-4769-8453-cede47ea5e62_1536x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!e5Uw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ae0ad1b-8024-4769-8453-cede47ea5e62_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!e5Uw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ae0ad1b-8024-4769-8453-cede47ea5e62_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!e5Uw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ae0ad1b-8024-4769-8453-cede47ea5e62_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!e5Uw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ae0ad1b-8024-4769-8453-cede47ea5e62_1536x1024.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Bottom Line</h2><p>Six months ago, letting AI agents handle substantial coding work felt risky. Today, with proper specification&#8212;<a href="https://hyperdev.substack.com/p/tkdd-ticket-driven-development-and">TkDD</a> workflows, clear acceptance criteria, structured task decomposition&#8212;agents handle serious engineering work reliably. <a href="https://every.to/vibe-check/vibe-check-opus-4-5-is-the-coding-model-we-ve-been-waiting-for">Every.to&#8217;s Vibe Check assessment</a>: &#8220;Some AI releases you always remember&#8212;GPT-4, Claude 3.5 Sonnet&#8212;and you know immediately something major has shifted. Opus 4.5 feels like that.&#8221;</p><p>Opus 4.5 didn&#8217;t create that shift alone. But it accelerated it decisively. The developers I respect&#8212;the ones building real systems, not doing demos&#8212;have mostly made the switch. Or they&#8217;re planning to.</p><p>For OpenAI and Google, this is a competitive challenge that requires a response. Not because Opus 4.5 is perfect&#8212;it isn&#8217;t&#8212;but because Anthropic just established a new baseline for what agentic coding should feel like.</p><p>My workflow changed in a week. From Sonnet-first to Opus-first. From skeptical about Opus to building around it.</p><p>That doesn&#8217;t happen often. When it does, pay attention.</p><div><hr></div><p><em>I&#8217;m Bob Matsuoka, writing about agentic coding and AI-powered development at <a href="https://hyperdev.substack.com/">HyperDev</a>. For more on multi-agent orchestration, see my deep dive into <a href="https://hyperdev.substack.com/">Claude-MPM</a> or my analysis of <a href="https://hyperdev.substack.com/">the tools shaping 2025 development workflows</a>.</em></p>]]></content:encoded></item><item><title><![CDATA[How Claude Code Got Better by Protecting More Context]]></title><description><![CDATA[Less is more]]></description><link>https://hyperdev.matsuoka.com/p/how-claude-code-got-better-by-protecting</link><guid isPermaLink="false">https://hyperdev.matsuoka.com/p/how-claude-code-got-better-by-protecting</guid><dc:creator><![CDATA[Robert Matsuoka]]></dc:creator><pubDate>Wed, 10 Dec 2025 14:31:57 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!CEUG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1cbe6ed-4780-423a-88d4-94b419568082_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!CEUG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1cbe6ed-4780-423a-88d4-94b419568082_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!CEUG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1cbe6ed-4780-423a-88d4-94b419568082_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!CEUG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1cbe6ed-4780-423a-88d4-94b419568082_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!CEUG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1cbe6ed-4780-423a-88d4-94b419568082_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!CEUG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1cbe6ed-4780-423a-88d4-94b419568082_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!CEUG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1cbe6ed-4780-423a-88d4-94b419568082_1024x1024.png" width="1024" height="1024" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c1cbe6ed-4780-423a-88d4-94b419568082_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1768600,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/179891626?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1cbe6ed-4780-423a-88d4-94b419568082_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!CEUG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1cbe6ed-4780-423a-88d4-94b419568082_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!CEUG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1cbe6ed-4780-423a-88d4-94b419568082_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!CEUG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1cbe6ed-4780-423a-88d4-94b419568082_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!CEUG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1cbe6ed-4780-423a-88d4-94b419568082_1024x1024.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I noticed something interesting this week while running Claude MPM over Claude Code. When Claude Code reported having 10% context remaining until auto-compact, my PM agent (which independently monitors session state) showed something different: Used 128k/200k tokens&#8212;only 64% of available context.</p><p>That 54-percentage-point gap got me thinking. What if Claude Code&#8217;s recent performance improvements aren&#8217;t primarily about better code generation or smarter prompting? What if they stem from something more fundamental: reserving more free context space to maintain reasoning quality?</p><p>Here&#8217;s my working hypothesis: Claude Code has been progressively pushing its auto-compact threshold down&#8212;stopping earlier to preserve more working memory. In the old days (just several weeks ago), Claude Code would run until it couldn&#8217;t, sometimes failing to compact because it didn&#8217;t have enough free space left. Now it appears to be stopping much earlier, maintaining substantial breathing room for the LLM to actually think.</p><p>And if that&#8217;s true, it demonstrates something I&#8217;ve been writing about for months: infrastructure matters more than features. Sometimes improving tools means constraining them more intelligently rather than pushing them harder.</p><h2>TL;DR</h2><p>&#8226; Working hypothesis: Claude Code triggers auto-compact much earlier than before&#8212;potentially around 64-75% context usage vs. historical 90%+ &#8226; Engineers appear to have built in a &#8220;completion buffer&#8221; giving tasks room to finish before compaction, eliminating disruptive mid-operation interruptions &#8226; More free context enables better LLM reasoning&#8212;research and developer experience show performance degrades significantly as context windows fill &#8226; Anthropic&#8217;s recent context management features (context editing, memory tool) enable this more conservative approach &#8226; This represents the &#8220;infrastructure over features&#8221; paradigm&#8212;better performance through smarter resource management rather than maximizing utilization &#8226; Community reports and GitHub issues document both auto-compact behavior changes and corresponding Claude Code performance improvements &#8226; Key insight: sometimes improving AI tools means accepting what looks like inefficiency to maintain quality where it matters</p><h2>Why Free Context Matters for Reasoning Quality</h2><p>LLMs need working memory to reason effectively. When Claude processes information, it&#8217;s not just reading what&#8217;s in the context window&#8212;it&#8217;s actively using that space to develop responses, evaluate options, and construct output. As the context window fills, available working memory shrinks.</p><p>Research consistently shows that <a href="https://sparkco.ai/blog/mastering-claudes-context-window-a-2025-deep-dive">&#8220;optimizing Claude&#8217;s context window in 2025 involves context quality over quantity,&#8221;</a> with performance degrading substantially as models approach their limits. The technical mechanism is straightforward: when most context space is consumed by conversation history, file contents, and tool outputs, the model has minimal room for the computational processes that produce high-quality responses.</p><p>Think of it like RAM on your computer. Sure, you can run programs until you hit 95% memory utilization. But that last 5% gets consumed by swapping, garbage collection, and system overhead&#8212;leaving nothing for actual computation. Your programs slow to a crawl despite having &#8220;only&#8221; 95% utilization.</p><p>LLMs work similarly. That &#8220;free&#8221; context space isn&#8217;t wasted&#8212;it&#8217;s where reasoning happens. When Claude Code hits 200k tokens of context, it&#8217;s not the reading that becomes problematic, it&#8217;s the writing. The model needs space to construct responses, evaluate code changes, plan multi-step operations.</p><h2>The Historical Context Collapse Problem</h2><p>Several weeks ago, Claude Code would frequently run sessions until context collapse became inevitable. Auto-compact was designed to <a href="https://claudelog.com/faqs/what-is-claude-code-auto-compact/">&#8220;automatically summarize conversations when approaching memory limits,&#8221;</a> but the system often triggered too late&#8212;sometimes lacking sufficient space to even perform the compaction process itself.</p><p>The pattern was frustrating: you&#8217;d be deep into a complex refactoring, making steady progress, then suddenly Claude Code would struggle. Responses would become generic, previous decisions would be forgotten, and code quality would noticeably degrade. Developers noted that <a href="https://www.hung-truong.com/blog/2025/08/01/31-days-with-claude-code-what-i-learned/">&#8220;LLMs perform much worse when the context window approaches its limit,&#8221;</a> describing how context becomes &#8220;poisoned pretty easily&#8221; during long sessions. I&#8217;ve experienced this firsthand&#8212;watching a productive session gradually deteriorate as context filled up, with the model starting to contradict earlier decisions or forget project-specific patterns it had been following consistently.</p><p>The GitHub issues tell this story. <a href="https://github.com/anthropics/claude-code/issues/6123">One critical bug report</a> documented auto-compact triggering at 8-12% remaining context &#8220;instead of 95%+, causing constant interruptions every few minutes&#8221;. <a href="https://github.com/anthropics/claude-code/issues/3274">Another described</a> context management becoming &#8220;permanently corrupted&#8221; after failed compaction attempts, with the system stuck showing &#8220;102%&#8221; context usage and entering infinite compaction loops.</p><p>The frequency of these reports&#8212;with issues receiving dozens of &#8220;+1&#8221; reactions and multiple developers describing identical symptoms&#8212;suggests widespread problems rather than isolated incidents. These weren&#8217;t edge cases; they were symptoms of a fundamental tension: maximizing context utilization vs. maintaining reasoning quality.</p><h2>Anthropic&#8217;s Context Management Evolution</h2><p>The turning point came with <a href="https://www.claude.com/blog/context-management">Anthropic&#8217;s September 2025 announcement</a> of new context management capabilities. The introduction of &#8220;context editing&#8221; and the &#8220;memory tool&#8221; represented a systematic approach to solving the context exhaustion problem, with context editing automatically clearing stale tool calls while preserving conversation flow.</p><p>The technical implementation reveals the strategic shift. In a 100-turn web search evaluation, context editing enabled agents to complete workflows that would otherwise fail due to context exhaustion&#8212;while reducing token consumption by 84%. This reflects a significant architectural shift in how Anthropic approaches context management.</p><p>But the most telling detail appears in Anthropic&#8217;s evaluation metrics. Combining the memory tool with context editing improved performance by 39% over baseline, with context editing alone delivering a 29% improvement. These gains come from better context management, not better code generation models.</p><p>The documentation now explicitly recommends practices that would have been heretical months ago. <a href="https://www.anthropic.com/engineering/claude-code-best-practices">Anthropic&#8217;s best practices guide</a> suggests &#8220;using subagents to verify details or investigate particular questions, especially early on in a conversation or task, tends to preserve context availability without much downside in terms of lost efficiency&#8221;. Translation: delegate and distribute context load rather than cramming everything into one session.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DeDE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ecf39dd-bd9b-4559-b847-101cb8bdbcc8_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DeDE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ecf39dd-bd9b-4559-b847-101cb8bdbcc8_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!DeDE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ecf39dd-bd9b-4559-b847-101cb8bdbcc8_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!DeDE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ecf39dd-bd9b-4559-b847-101cb8bdbcc8_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!DeDE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ecf39dd-bd9b-4559-b847-101cb8bdbcc8_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DeDE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ecf39dd-bd9b-4559-b847-101cb8bdbcc8_1024x1024.png" width="1024" height="1024" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6ecf39dd-bd9b-4559-b847-101cb8bdbcc8_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1732607,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/179891626?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ecf39dd-bd9b-4559-b847-101cb8bdbcc8_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DeDE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ecf39dd-bd9b-4559-b847-101cb8bdbcc8_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!DeDE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ecf39dd-bd9b-4559-b847-101cb8bdbcc8_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!DeDE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ecf39dd-bd9b-4559-b847-101cb8bdbcc8_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!DeDE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ecf39dd-bd9b-4559-b847-101cb8bdbcc8_1024x1024.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Community Observations Align With Conservative Thresholds</h2><p>The community has noticed something changed, even if they can&#8217;t pinpoint exactly what. <a href="https://www.shuttle.dev/blog/2025/10/16/claude-code-best-practices">Best practices guides</a> now emphasize that &#8220;auto-compact is a feature that quietly consumes a massive amount of your context window before you even start coding,&#8221; with some reports showing the autocompact buffer consuming &#8220;45k tokens&#8212;22.5% of your context window gone before writing a single line of code&#8221;.</p><p>But the more interesting observation comes from debugging discussions. <a href="https://github.com/anthropics/claude-code/issues/10691">A detailed feature request</a> noted that &#8220;the VSCode extension currently auto-compacts at ~25% remaining context (75% usage), reserving ~20% for the compaction process itself&#8221;. That aligns remarkably well with my Claude MPM observation showing 64% usage when Claude Code reported 10% until auto-compact.</p><p>If Claude Code is indeed triggering compaction at 75% utilization rather than 90%+, that leaves 25% of the context window (50k tokens in a 200k window) free for reasoning. That&#8217;s substantial working memory&#8212;enough space for the model to effectively plan, evaluate alternatives, and construct high-quality responses.</p><p>The performance impact shows up in usage patterns. While <a href="https://apidog.com/blog/claude-code-getting-dumber-switch-to-codex-cli/">some users report</a> &#8220;performance complaints, context limitations, and inconsistent outputs,&#8221; others note that &#8220;context window management creates perceived inconsistencies&#8221; and &#8220;regular history pruning and strategic context management often restore expected performance levels&#8221;.</p><h2>The Completion Buffer: Room to Finish What You Started</h2><p>Here&#8217;s another subtle improvement I&#8217;ve noticed: Claude Code now seems to have more wiggle room to complete tasks before triggering auto-compact. In the old days, you&#8217;d often hit compaction mid-operation&#8212;halfway through a refactoring, in the middle of implementing a feature, right when you needed the model to maintain full context.</p><p>This suggests the engineers built in a completion buffer&#8212;enough free space not just for the compaction process itself, but to allow the current task to finish gracefully. It&#8217;s the difference between:</p><p><strong>Old behavior</strong>: Hit 90% context &#8594; Start new task &#8594; Run out of space mid-task &#8594; Force compact &#8594; Lose context about what you were doing</p><p><strong>New behavior</strong>: Hit 75% context &#8594; Plenty of room for current task &#8594; Complete it successfully &#8594; Then compact with full understanding of what was accomplished</p><p>This isn&#8217;t just about when compaction triggers, but about giving the system enough runway to land the plane before resetting. The user experience difference is substantial. Instead of constantly fighting interrupted workflows, you now get clean task completion followed by reset.</p><p>I&#8217;ve noticed this directly: sessions that would previously hit compaction mid-refactoring now complete the refactoring cleanly, then compact. The model maintains full context about what it&#8217;s changing and why through to completion, rather than losing thread halfway through and having to reconstruct understanding from a summary.</p><p>That completion buffer&#8212;the gap between &#8220;starting to approach limits&#8221; and &#8220;actually hitting limits&#8221;&#8212;transforms context management from reactive crisis mode to proactive workflow optimization. You&#8217;re not scrambling to salvage a half-finished refactoring; you&#8217;re finishing work cleanly, then resetting for the next phase.</p><p>It&#8217;s infrastructure thinking applied to user experience: the best system management is invisible to users because it prevents problems rather than recovering from them.</p><h2>The Auto-Compact Debate Reveals the Trade-off</h2><p>The community remains divided on auto-compact itself, but that debate illuminates the fundamental tension. Some developers argue for disabling auto-compact entirely, noting &#8220;we already have better solutions for maintaining context across sessions: CLAUDE.md files capture your project&#8217;s patterns and standards, custom commands encode repetitive workflows&#8221;.</p><p>Others recognize the necessity but want control over when it happens. The core complaint: &#8220;when a task is 90% done, forced compaction wastes tokens and disrupts flow,&#8221; with users requesting manual control rather than automatic triggering.</p><p>What both camps agree on: auto-compact triggering &#8220;when the context window reaches approximately 95% capacity&#8221; is problematic, with users consistently <a href="https://stevekinney.com/courses/ai-development/claude-code-compaction">&#8220;advising against waiting for auto-compact, as it can sometimes take a while&#8221;</a>.</p><p>The resolution? Trigger earlier, preserve more working memory, and give the model room to think before hitting crisis mode.</p><h2>Technical Explanation: Why Earlier Compaction Works</h2><p>The counter-intuitive insight: stopping earlier actually extends productive session length. Here&#8217;s why.</p><p>When Claude Code runs until 90% context utilization before compacting:</p><ul><li><p>Context window: 200k tokens total</p></li><li><p>Conversation + files + tools: 180k tokens</p></li><li><p>Free space for reasoning: 20k tokens</p></li><li><p>Compaction process overhead: 15-20k tokens</p></li><li><p><strong>Result</strong>: Barely enough space to compact, frequent failures, degraded quality</p></li></ul><p>When Claude Code stops at 75% context utilization:</p><ul><li><p>Context window: 200k tokens total</p></li><li><p>Conversation + files + tools: 150k tokens</p></li><li><p>Free space for reasoning: 50k tokens</p></li><li><p>Compaction process overhead: 15-20k tokens</p></li><li><p><strong>Result</strong>: Comfortable margins, successful compaction, sustained quality</p></li></ul><p>The numbers tell the story, but the user experience is what matters. By stopping earlier, Claude Code actually enables longer effective sessions because each turn maintains higher reasoning quality. What feels like &#8220;wasted&#8221; context capacity&#8212;that unused 25%&#8212;turns out to be critical for maintaining the clarity and consistency that makes the utilized portion valuable. This aligns with the principle that <a href="https://lalatenduswain.medium.com/mastering-context-management-in-claude-code-cli-your-guide-to-efficient-ai-assisted-coding-83753129b28e">&#8220;effective management isn&#8217;t just a nice-to-have&#8212;it&#8217;s essential for sustaining coherent, multi-turn conversations without the AI losing thread&#8221;</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!nFmS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fa6ee86-734c-472b-87c6-64df2fca3a40_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nFmS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fa6ee86-734c-472b-87c6-64df2fca3a40_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!nFmS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fa6ee86-734c-472b-87c6-64df2fca3a40_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!nFmS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fa6ee86-734c-472b-87c6-64df2fca3a40_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!nFmS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fa6ee86-734c-472b-87c6-64df2fca3a40_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nFmS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fa6ee86-734c-472b-87c6-64df2fca3a40_1024x1024.png" width="1024" height="1024" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2fa6ee86-734c-472b-87c6-64df2fca3a40_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1551410,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/179891626?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fa6ee86-734c-472b-87c6-64df2fca3a40_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!nFmS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fa6ee86-734c-472b-87c6-64df2fca3a40_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!nFmS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fa6ee86-734c-472b-87c6-64df2fca3a40_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!nFmS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fa6ee86-734c-472b-87c6-64df2fca3a40_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!nFmS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fa6ee86-734c-472b-87c6-64df2fca3a40_1024x1024.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>The &#8220;Doing Less&#8221; Paradigm</h2><p>This represents a fundamental shift in how we think about AI tool optimization. Developer instinct says maximize utilization&#8212;use every available token, run until you hit limits, squeeze maximum value from expensive compute resources. It feels wasteful to leave 50k tokens &#8220;unused.&#8221;</p><p>But that&#8217;s optimizing the wrong metric. The goal isn&#8217;t maximum context utilization; it&#8217;s maximum productive output. As <a href="https://www.cometapi.com/managing-claude-codes-context/">Anthropic&#8217;s engineering team notes</a>, &#8220;managing context in Claude Code is now a multi-dimensional problem: model choice, subagent design, CLAUDE.md discipline, thinking budgets, and tooling architecture all interact&#8221;.</p><p>I&#8217;ve watched this play out in my own sessions. Running Claude Code until it approaches 90% utilization produces more code per session in terms of raw output. But the quality deteriorates&#8212;more bugs slip through, architectural decisions become inconsistent, earlier project-specific patterns get forgotten. Sessions that stop at 75% utilization produce less total output but higher-quality, more maintainable code that actually ships.</p><p>The performance gains from conservative context management show up across multiple dimensions:</p><p><strong>Response Quality</strong>: More working memory enables better reasoning about complex refactoring, architectural decisions, and edge cases.</p><p><strong>Session Reliability</strong>: Earlier compaction prevents the context corruption loops that plagued previous versions.</p><p><strong>Cognitive Load</strong>: Developers spend less time fighting context management issues and more time building features.</p><p><strong>Cost Efficiency</strong>: Paradoxically, stopping earlier may reduce overall token consumption through fewer failed compaction attempts and fewer sessions requiring complete restarts.</p><h2>Practical Implications for Developers</h2><p>If this hypothesis is correct&#8212;that Claude Code performance improvements stem largely from more conservative context management&#8212;what should developers do?</p><p><strong>1. Stop Fighting Auto-Compact</strong></p><p>The old advice was to disable auto-compact and manage context manually. But Anthropic&#8217;s engineering guidance now suggests &#8220;do the simplest thing that works will likely remain our best advice for teams building agents on top of Claude&#8221;. If Claude Code is now triggering compaction at reasonable thresholds, let it work.</p><p><strong>2. Use CLAUDE.md for Persistent Context</strong></p><p>Rather than cramming everything into conversation history, <a href="https://medium.com/@kushalbanda/claude-code-context-management-if-youre-not-managing-context-you-re-losing-output-quality-71c2d0c0bc57">&#8220;use a dedicated context file (like CLAUDE.md) to inject fundamental requirements every session. This is where core app features, tech stacks, and &#8216;never-forgotten&#8217; project notes live&#8221;</a>. This moves stable information out of the limited conversation window.</p><p><strong>3. Leverage Subagents for Task Isolation</strong></p><p>The best practice is to &#8220;divide and conquer with sub-agents: modularize large objectives. Delegate API research, security review, or feature planning to specialized sub-agents&#8221;. Each subagent gets its own context window, preventing any single session from approaching limits.</p><p><strong>4. Monitor But Don&#8217;t Micromanage</strong></p><p>Tools like Claude MPM provide visibility into actual context usage, helping you understand when sessions are approaching limits. But the key is knowing <a href="https://www.arsturn.com/blog/beyond-prompting-a-guide-to-managing-context-in-claude-code">&#8220;when things get weird: is Claude getting confused or stuck in a loop? Don&#8217;t argue with it. Use /clear to reset its brain &amp; start fresh&#8221;</a>.</p><p><strong>5. Accept That Less Can Be More</strong></p><p>The hardest lesson: sometimes the path to better performance is artificial constraints. Stopping at 75% utilization feels wasteful&#8212;you&#8217;re leaving 50k tokens &#8220;unused.&#8221; But that free space enables the reasoning quality that makes the utilized tokens valuable.</p><h2>Infrastructure Over Features, Again</h2><p>This observation fits a pattern I&#8217;ve been documenting for months: the most important advances in AI development tools aren&#8217;t necessarily flashier features or bigger models. They&#8217;re better infrastructure.</p><p>Claude Code&#8217;s performance gains likely stem more from smarter context management, better memory systems, and more conservative resource utilization than from model improvements alone (though Anthropic did make Sonnet 4.5 substantially better at code generation).</p><p>Anthropic&#8217;s position is clear: <a href="https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents">&#8220;waiting for larger context windows might seem like an obvious tactic. But it&#8217;s likely that for the foreseeable future, context windows of all sizes will be subject to context pollution and information relevance concerns&#8221;</a>. The solution isn&#8217;t more capacity; it&#8217;s better management of existing capacity.</p><p>This mirrors broader patterns in the AI tools market. Look at which tools developers stick with versus which ones generate initial excitement then fade. Cursor grabbed attention with its aggressive feature velocity, but many developers report returning to Claude Code for long-form work&#8212;not because of flashier features, but because sessions remain productive longer. The tools gaining staying power emphasize robust infrastructure for context management, memory persistence, error handling, and resource optimization over demo-ready feature lists.</p><h2>The Broader Lesson</h2><p>My Claude MPM observation&#8212;64% context usage when Claude Code reports 10% until auto-compact&#8212;suggests something important: current AI tool optimization isn&#8217;t about maximizing utilization. It&#8217;s about finding the sweet spot where resource constraints actually improve output quality.</p><p>This has implications beyond Claude Code:</p><p><strong>For Tool Developers</strong>: Consider whether your optimization target is the right one. Maximum throughput isn&#8217;t always optimal if it degrades quality.</p><p><strong>For Platform Providers</strong>: Infrastructure improvements that seem invisible to users (better context management, smarter resource allocation) often deliver more value than flashy feature additions.</p><p><strong>For Developers</strong>: Learn to work with constraints rather than fighting them. The tools that enforce reasonable limits may actually be helping you.</p><p><strong>For AI Research</strong>: The path to better AI assistance may involve more strategic limitations, not fewer.</p><h2>Conclusion</h2><p>I can&#8217;t definitively prove this hypothesis&#8212;that Claude Code&#8217;s performance improvements stem primarily from more conservative context management. The evidence is circumstantial: my Claude MPM observations showing the 64% vs 10% discrepancy, the completion buffer that now gives tasks room to finish before compacting, community reports of changed auto-compact behavior, Anthropic&#8217;s new context management features, and the well-documented relationship between free context and reasoning quality.</p><p>But the pattern is compelling enough to warrant attention. If the hypothesis holds, it represents a profound lesson about AI tool development: sometimes the best way to improve performance is constraining the system in smarter ways rather than pushing utilization limits.</p><p>The old approach: run until you can&#8217;t run anymore, then try to recover&#8212;often unsuccessfully. The new approach: stop early enough to maintain consistent quality throughout, with enough buffer space to complete tasks gracefully before resetting.</p><p>These changes&#8212;whenever auto-compact triggers, how much buffer space it preserves&#8212;may explain why Claude Code feels noticeably better lately. Not just faster or smarter, but more reliable and consistent. The sessions that used to deteriorate halfway through now maintain quality to completion. The forced compactions that interrupted complex refactorings now happen at logical breakpoints.</p><p>It&#8217;s worth noting: improving AI tools sometimes means accepting what looks like inefficiency. That &#8220;unused&#8221; 25-35% of context isn&#8217;t wasted&#8212;it&#8217;s working memory that enables everything else to function properly. Infrastructure thinking applied to user experience, where the best system management becomes invisible because it prevents problems rather than recovering from them.</p><div><hr></div><p><em>I&#8217;m Bob Matsuoka, writing about agentic coding and AI-powered development at <a href="https://hyperdev.substack.com/">HyperDev</a>. For more on context management in AI development, read my analysis of <a href="https://hyperdev.matsuoka.com/p/carrying-context">Carrying Context</a> or explore <a href="https://github.com/bobmatnyc/claude-mpm">Claude-MPM&#8217;s approach to multi-agent orchestration</a>.</em></p>]]></content:encoded></item><item><title><![CDATA[TkDD: Ticket-Driven Development and the Knowledge We’re Throwing Away]]></title><description><![CDATA[The value of the things we don't keep.]]></description><link>https://hyperdev.matsuoka.com/p/tkdd-ticket-driven-development-and</link><guid isPermaLink="false">https://hyperdev.matsuoka.com/p/tkdd-ticket-driven-development-and</guid><dc:creator><![CDATA[Robert Matsuoka]]></dc:creator><pubDate>Wed, 03 Dec 2025 15:03:12 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!45Y_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2a0faf4-9b3f-4bf0-b90d-904153c97376_1024x768.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!45Y_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2a0faf4-9b3f-4bf0-b90d-904153c97376_1024x768.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!45Y_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2a0faf4-9b3f-4bf0-b90d-904153c97376_1024x768.webp 424w, https://substackcdn.com/image/fetch/$s_!45Y_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2a0faf4-9b3f-4bf0-b90d-904153c97376_1024x768.webp 848w, https://substackcdn.com/image/fetch/$s_!45Y_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2a0faf4-9b3f-4bf0-b90d-904153c97376_1024x768.webp 1272w, https://substackcdn.com/image/fetch/$s_!45Y_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2a0faf4-9b3f-4bf0-b90d-904153c97376_1024x768.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!45Y_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2a0faf4-9b3f-4bf0-b90d-904153c97376_1024x768.webp" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b2a0faf4-9b3f-4bf0-b90d-904153c97376_1024x768.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:31424,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/180149290?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2a0faf4-9b3f-4bf0-b90d-904153c97376_1024x768.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!45Y_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2a0faf4-9b3f-4bf0-b90d-904153c97376_1024x768.webp 424w, https://substackcdn.com/image/fetch/$s_!45Y_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2a0faf4-9b3f-4bf0-b90d-904153c97376_1024x768.webp 848w, https://substackcdn.com/image/fetch/$s_!45Y_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2a0faf4-9b3f-4bf0-b90d-904153c97376_1024x768.webp 1272w, https://substackcdn.com/image/fetch/$s_!45Y_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2a0faf4-9b3f-4bf0-b90d-904153c97376_1024x768.webp 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>TL;DR</h2><ul><li><p>Agentic coding sessions generate substantial contextual information&#8212;research, decisions, alternatives&#8212;that vanishes when the session ends</p></li><li><p>TDD captures behavior expectations; SDD captures requirements; neither captures the <em>evolution of thinking</em> as you figure things out</p></li><li><p>Ticket-Driven Development (TkDD) treats tickets as persistent knowledge containers for human-AI collaboration, not just task assignments</p></li><li><p>The workflow: Claude.AI builds specs &#8594; Linear captures them via MCP &#8594; coding agents pull work and write findings back via <a href="https://pypi.org/project/mcp-ticketer/">mcp-ticketer</a> &#8594; knowledge accumulates instead of evaporating</p></li><li><p>TkDD is the opposite of vibe coding: structured context that compounds over time</p></li></ul><div><hr></div><p>I&#8217;ve been thinking about everything we throw away.</p><p>Last week I spent four hours with Claude Code researching authentication approaches for a SmartThings integration. Evaluated OAuth flows, considered token refresh strategies, dug into the API documentation, tested three different implementation patterns. The session produced maybe 200 lines of actual code. But the <em>research</em>&#8212;the reasoning about why I chose approach A over approach B, the edge cases I discovered, the documentation inconsistencies I noted&#8212;that took ten times longer to develop than the code itself.</p><p>And it&#8217;s gone. Buried somewhere in a chat history I&#8217;ll never scroll back through. Two days later, a colleague asked why I didn&#8217;t use the SmartThings webhook approach. I couldn&#8217;t remember. I&#8217;d evaluated it&#8212;I was 90% sure I had a good reason for rejecting it&#8212;but the rationale had evaporated. Ended up spending another hour re-researching something I&#8217;d already figured out.</p><p>That keeps happening to me. And I suspect it happens to you too.</p><h2>The Knowledge Hemorrhage Problem</h2><p>Every agentic coding session bleeds information. You meta-prompt, the agent refines the prompt then researches, you discuss, it proposes, you refine, it implements. Along the way you&#8217;re building context&#8212;understanding the problem space, eliminating dead ends, discovering constraints. That context is often more useful than the code itself.</p><p>But where does it go?</p><p>The code lands in a commit. Maybe you write a comment. The rest? Scattered across chat windows, lost in context limits, forgotten by tomorrow. <a href="https://elite-ai-assisted-coding.dev/p/working-with-asynchronous-coding-agents">Eleanor Berger calls this the shift from &#8220;interactive AI&#8221; to &#8220;asynchronous agents&#8221;</a>&#8212;but even she focuses on the task delegation pattern, not the knowledge loss.</p><p>The irony gets me. We have these incredibly capable reasoning systems generating insights, and we&#8217;re treating their output like scratch paper. Use it once, toss it.</p><p>Even within a single project this gets painful. Three weeks into mcp-smarterthings, I needed to revisit the rate limiting approach. Had I already evaluated exponential backoff versus fixed delays? What were the SmartThings API&#8217;s actual limits versus what their docs claimed? I&#8217;d done that research. Somewhere. In some chat window. On some day. I ended up re-deriving half of it from scratch because finding the original conversation would&#8217;ve taken longer than just figuring it out again.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ANi_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb8aa972-d701-482d-90c8-bf74f7f06699_1024x768.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ANi_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb8aa972-d701-482d-90c8-bf74f7f06699_1024x768.webp 424w, https://substackcdn.com/image/fetch/$s_!ANi_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb8aa972-d701-482d-90c8-bf74f7f06699_1024x768.webp 848w, https://substackcdn.com/image/fetch/$s_!ANi_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb8aa972-d701-482d-90c8-bf74f7f06699_1024x768.webp 1272w, https://substackcdn.com/image/fetch/$s_!ANi_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb8aa972-d701-482d-90c8-bf74f7f06699_1024x768.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ANi_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb8aa972-d701-482d-90c8-bf74f7f06699_1024x768.webp" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/db8aa972-d701-482d-90c8-bf74f7f06699_1024x768.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:41306,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/180149290?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb8aa972-d701-482d-90c8-bf74f7f06699_1024x768.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ANi_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb8aa972-d701-482d-90c8-bf74f7f06699_1024x768.webp 424w, https://substackcdn.com/image/fetch/$s_!ANi_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb8aa972-d701-482d-90c8-bf74f7f06699_1024x768.webp 848w, https://substackcdn.com/image/fetch/$s_!ANi_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb8aa972-d701-482d-90c8-bf74f7f06699_1024x768.webp 1272w, https://substackcdn.com/image/fetch/$s_!ANi_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb8aa972-d701-482d-90c8-bf74f7f06699_1024x768.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>The Problem With TDD and SDD</h2><p>I don&#8217;t use Test-Driven Development. Conceptually elegant, sure&#8212;write the test first, watch it fail, make it pass. But TDD assumes you know what you&#8217;re building before you build it. When you&#8217;re working to a spec, great. When you&#8217;re figuring things out as you go? Too restrictive. You end up writing tests for behavior you&#8217;ll change three times before lunch.</p><p>Same problem with Spec-Driven Development. You can do the research and write the spec. But as they say, a plan is only good until you get punched in the face. The spec captures your <em>initial</em> understanding. It doesn&#8217;t capture how that understanding evolved when you hit the first unexpected constraint. Or the second. Or the fifth.</p><p>What both paradigms miss: <strong>the thought process and changes to it.</strong></p><p>That&#8217;s what I actually need when I come back to a project. Not the final answer&#8212;the path to it. The dead ends explored. The assumptions challenged. The &#8220;wait, that won&#8217;t work because...&#8221; moments. The pivots.</p><p>Paradigm What It Captures What It Loses <strong>TDD</strong> Behavior expectations via tests Research, decisions, context, evolution of thinking <strong>SDD</strong> Initial requirements and architecture How understanding changed during implementation <strong>Vibe Coding</strong> Nothing structured Everything&#8212;just vibes and prayers <strong>TkDD</strong> Work units + context + decisions + thinking evolution Still figuring this out</p><p>Tests document what the code should do. Specs document what you planned to build. Neither documents <em>how you figured out what to build</em>&#8212;which is exactly what you need when you come back in six months and can&#8217;t remember why you chose approach B over approach A.</p><h2>TkDD: Tickets as Knowledge Containers</h2><p>What I&#8217;ve been experimenting with lately: <strong>treating tickets as structured knowledge artifacts for human-AI collaboration, not just task assignments.</strong></p><p>A ticket can hold:</p><ul><li><p>The problem statement (not just &#8220;implement auth&#8221; but <em>why</em> and <em>what constraints</em>)</p></li><li><p>Research conducted (links, findings, dead ends identified)</p></li><li><p>Alternatives considered (and why they were rejected)</p></li><li><p>Decision made (with rationale)</p></li><li><p>How thinking evolved (initial approach &#8594; why it didn&#8217;t work &#8594; final approach)</p></li><li><p>Implementation notes (gotchas, edge cases discovered)</p></li><li><p>Links to related work (other tickets, PRs, documentation)</p></li></ul><p>Tickets persist. They&#8217;re searchable. They have natural hierarchy&#8212;epic &#8594; story &#8594; task maps cleanly to context &#8594; decision &#8594; implementation. They survive sessions, agents, team members.</p><p>The tooling is catching up to this idea. <a href="https://docs.github.com/en/copilot/concepts/agents/coding-agent/about-coding-agent">GitHub Copilot&#8217;s coding agent</a> now accepts GitHub Issues as input&#8212;you assign an issue to <code>@copilot</code> and it works autonomously. <a href="https://docs.devin.ai/integrations/linear">Devin integrates directly with Linear</a>, triggering work when you add a label. <a href="https://docs.port.io/guides/all/automatically-resolve-tickets-with-coding-agents/">Port.io documented an entire workflow</a> for routing Jira tickets through GitHub Issues to Copilot. <a href="https://deepsense.ai/blog/from-jira-to-pr-claude-powered-ai-agents-that-code-test-and-review-for-you/">deepsense.ai built what they call an &#8220;AI Teammate&#8221;</a> that reads Jira tickets and produces PRs.</p><p>The pattern is emerging. But most implementations focus on the <em>task execution</em> side&#8212;ticket goes in, PR comes out. They&#8217;re not capturing the knowledge generated along the way.</p><p>That&#8217;s the gap I&#8217;m trying to fill.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Us7P!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2450d4c7-9a03-4991-9f8d-b222dc6a24be_1024x768.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Us7P!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2450d4c7-9a03-4991-9f8d-b222dc6a24be_1024x768.webp 424w, https://substackcdn.com/image/fetch/$s_!Us7P!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2450d4c7-9a03-4991-9f8d-b222dc6a24be_1024x768.webp 848w, https://substackcdn.com/image/fetch/$s_!Us7P!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2450d4c7-9a03-4991-9f8d-b222dc6a24be_1024x768.webp 1272w, https://substackcdn.com/image/fetch/$s_!Us7P!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2450d4c7-9a03-4991-9f8d-b222dc6a24be_1024x768.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Us7P!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2450d4c7-9a03-4991-9f8d-b222dc6a24be_1024x768.webp" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2450d4c7-9a03-4991-9f8d-b222dc6a24be_1024x768.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:37946,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/180149290?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2450d4c7-9a03-4991-9f8d-b222dc6a24be_1024x768.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Us7P!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2450d4c7-9a03-4991-9f8d-b222dc6a24be_1024x768.webp 424w, https://substackcdn.com/image/fetch/$s_!Us7P!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2450d4c7-9a03-4991-9f8d-b222dc6a24be_1024x768.webp 848w, https://substackcdn.com/image/fetch/$s_!Us7P!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2450d4c7-9a03-4991-9f8d-b222dc6a24be_1024x768.webp 1272w, https://substackcdn.com/image/fetch/$s_!Us7P!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2450d4c7-9a03-4991-9f8d-b222dc6a24be_1024x768.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>From aitrackdown to mcp-ticketer: The Human-AI Collaboration Insight</h2><p>I built aitrackdown as an AI-first ticketing system. The idea was straightforward: design a ticket structure specifically for AI agents to consume&#8212;structured fields, clear acceptance criteria, machine-readable context. And it worked. To a degree.</p><p>But here&#8217;s what I got wrong: <strong>tickets aren&#8217;t just for AI. They&#8217;re for human-AI interaction.</strong></p><p>The tooling that lets humans read and respond to tickets matters just as much as the tooling that lets agents process them. A ticket perfectly structured for Claude Code but unreadable by your PM is a failure. A ticket that captures agent findings but buries them in JSON blobs nobody will ever review? Also a failure.</p><p>That insight flipped my approach. I stopped trying to build <em>for</em> AI and started building <em>for the collaboration</em>. That&#8217;s when mcp-ticketer happened.</p><h2>The mcp-ticketer Approach</h2><p>mcp-ticketer works with multiple ticketing systems. Not because I couldn&#8217;t pick one, but because that&#8217;s where the work actually lives.</p><p>I use GitHub Issues to track reported problems&#8212;that&#8217;s where users file bugs, that&#8217;s where they should stay. Linear handles my personal projects because I love the interface and the keyboard shortcuts don&#8217;t make me want to throw my laptop. Client work? Some clients use Linear, others are Jira shops. You meet people where they are.</p><p>aitrackdown still exists in the stack. I rarely use it these days. The AI-first structure turned out to matter less than the human-AI collaboration layer on top.</p><p>The critical capability I built into mcp-ticketer: <strong>agents can write to tickets, not just read from them.</strong></p><p>This isn&#8217;t standard behavior in most integrations. The typical pattern is ticket-in, PR-out. mcp-ticketer lets an agent update the ticket as it works. When a coding agent hits a decision point, it can record what it learned. When it discovers an edge case, that goes into the ticket. When it rejects an approach, the reasoning gets captured. The ticket becomes a living document of the work&#8212;not just the assignment, but the execution.</p><p>More importantly: when your thinking changes, the ticket captures that evolution. &#8220;Started with approach X, but discovered Y constraint, pivoted to Z.&#8221; That&#8217;s the knowledge that disappears in every other workflow.</p><h2>The Workflow: Thinking and Doing, Separated</h2><p>Here&#8217;s how the pieces fit together in my current setup:</p><p><strong>I start in Claude.AI</strong>&#8212;the web interface, not Claude Code. This is deliberate. Claude.AI is for <em>thinking</em>. Researching approaches, discussing tradeoffs, building specifications. The Linear MCP connector lets me create tickets directly from the conversation.</p><p>A session might go like this:</p><ul><li><p>&#8220;Let&#8217;s figure out how to handle SmartThings device state synchronization&#8221;</p></li><li><p><em>[Research, discussion, alternatives considered]</em></p></li><li><p>&#8220;Create a Linear ticket capturing this approach&#8221;</p></li><li><p><em>[Ticket created with full context, not just a one-liner]</em></p></li></ul><p>The specification lives in the ticket. The research lives in the ticket. The decision rationale lives in the ticket.</p><p><strong>Then the coding agent takes over.</strong> Claude Code pulls work from tickets via mcp-ticketer. The ticket provides context&#8212;not just &#8220;implement sync&#8221; but the full specification, the constraints identified, the approach selected.</p><p>The agent works. When it hits decisions, it updates the ticket. When it discovers undocumented API behavior, that goes in the ticket. When the original approach doesn&#8217;t work and thinking evolves&#8212;<em>that gets captured too</em>. When it completes, the implementation notes go in the ticket.</p><p><strong>The result: knowledge that compounds.</strong> Next time I need to work on this codebase&#8212;or a similar one&#8212;the tickets are there. Searchable. Structured. I&#8217;m not starting from zero. I&#8217;m not re-researching things I already figured out.</p><p>The <a href="https://github.com/levnikolaevich/claude-code-skills">claude-code-skills repository</a> shows what this can look like at scale&#8212;29 production skills implementing full Agile automation with Linear, including Epic &#8594; Story &#8594; Task hierarchy management. That&#8217;s the direction: tickets as the coordination layer for AI-augmented development.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!puPx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F492ad587-9573-471a-9730-38b66fa9b8b9_1024x768.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!puPx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F492ad587-9573-471a-9730-38b66fa9b8b9_1024x768.webp 424w, https://substackcdn.com/image/fetch/$s_!puPx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F492ad587-9573-471a-9730-38b66fa9b8b9_1024x768.webp 848w, https://substackcdn.com/image/fetch/$s_!puPx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F492ad587-9573-471a-9730-38b66fa9b8b9_1024x768.webp 1272w, https://substackcdn.com/image/fetch/$s_!puPx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F492ad587-9573-471a-9730-38b66fa9b8b9_1024x768.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!puPx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F492ad587-9573-471a-9730-38b66fa9b8b9_1024x768.webp" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/492ad587-9573-471a-9730-38b66fa9b8b9_1024x768.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:38726,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/180149290?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F492ad587-9573-471a-9730-38b66fa9b8b9_1024x768.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!puPx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F492ad587-9573-471a-9730-38b66fa9b8b9_1024x768.webp 424w, https://substackcdn.com/image/fetch/$s_!puPx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F492ad587-9573-471a-9730-38b66fa9b8b9_1024x768.webp 848w, https://substackcdn.com/image/fetch/$s_!puPx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F492ad587-9573-471a-9730-38b66fa9b8b9_1024x768.webp 1272w, https://substackcdn.com/image/fetch/$s_!puPx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F492ad587-9573-471a-9730-38b66fa9b8b9_1024x768.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>mcp-smarterthings: Knowledge Capture in Action</h2><p>The <a href="https://linear.app/1m-hyperdev/project/mcp-smarterthings-89098cb0dd3c/overview">mcp-smarterthings project</a> became my testing ground for TkDD. SmartThings integration has enough complexity&#8212;OAuth, device capabilities, real-time events, state synchronization&#8212;that I knew I&#8217;d lose critical decisions if I didn&#8217;t capture them somewhere.</p><p>Here&#8217;s what ticket-captured knowledge actually looks like. During the implementation, the agent documented complete code samples for the SmartThings API integration patterns:</p><pre><code><code>// Example: Device capability handler pattern
const handleCapability = async (deviceId: string, capability: string) =&gt; {
  const device = await smartthings.devices.get(deviceId);
  const status = await smartthings.devices.getCapabilityStatus(
    deviceId, 
    capability
  );
  return { device, status };
};
</code></code></pre><p>These code samples were originally designed for the classic PM-to-engineer handoff. &#8220;Here&#8217;s what we need, here&#8217;s roughly how it should work, go build it.&#8221; But in a TkDD workflow, they serve a different purpose: <strong>persistent knowledge available for any future human or agent review.</strong></p><p>Six weeks from now when I need to add a new capability handler? I don&#8217;t have to re-derive the pattern. The ticket has it. When a different agent picks up related work? Context is already there. When I&#8217;m explaining the architecture to a collaborator? I can point them to the ticket instead of recreating the explanation from memory.</p><p>The tickets in that Linear project contain:</p><ul><li><p>Initial research on SmartThings API versions and deprecation timelines</p></li><li><p>Decision rationale for choosing the new API over legacy endpoints</p></li><li><p>Code samples for common patterns (auth, device commands, event subscriptions)</p></li><li><p>Edge cases discovered during implementation</p></li><li><p>Links between related tickets showing how the architecture evolved</p></li></ul><p>That last point matters. The tickets aren&#8217;t isolated&#8212;they reference each other. You can trace how &#8220;implement basic device control&#8221; led to &#8220;handle rate limiting&#8221; led to &#8220;add request queuing&#8221; led to &#8220;implement webhook fallback.&#8221; The evolution of understanding is visible.</p><h2>Building Context, Not Burning It</h2><p>LLMs need context to be effective. That&#8217;s not news. But where does context come from?</p><p>Right now, mostly from re-explaining things every session. &#8220;This is a Next.js project, we&#8217;re using TypeScript, here&#8217;s the authentication pattern, here&#8217;s why we chose this approach...&#8221; Over and over.</p><p>TkDD builds a structured context base over time. The tickets contain the decisions. The tickets contain the rationale. The tickets contain the <em>evolution of thinking</em>&#8212;how you got from &#8220;I think we should do X&#8221; to &#8220;actually Y works better because...&#8221;</p><p>When you start a new session, you&#8217;re not starting from scratch&#8212;you&#8217;re starting with accumulated knowledge.</p><p>Pull in the relevant tickets. The agent has context. Not just &#8220;what to do&#8221; but &#8220;why we&#8217;re doing it this way&#8221; and &#8220;what we already tried&#8221; and &#8220;what constraints matter&#8221; and &#8220;how our understanding changed.&#8221;</p><p>Cross-project learning becomes possible too. Authentication patterns you figured out on project A? The tickets document the research&#8212;including the dead ends. When project B needs similar auth, you&#8217;re not re-deriving first principles. You&#8217;re not re-exploring the same dead ends.</p><h2>The Paradigm Claim</h2><p><strong>Test-Driven Development</strong>: Tests define expected behavior. Assumes you know the behavior upfront.</p><p><strong>Spec-Driven Development</strong>: Specifications define requirements. Assumes requirements survive contact with reality.</p><p><strong>Ticket-Driven Development</strong>: Tickets define work units AND capture how understanding evolves while doing the work. The ticket is both the input and the output. Built for human-AI collaboration, not just AI consumption.</p><p>TDD asks: &#8220;What should this code do?&#8221; (Assumes you know.) SDD asks: &#8220;What are we trying to build?&#8221; (Assumes the plan survives.) TkDD asks: &#8220;What do we know, what are we learning, and how is our thinking changing?&#8221;</p><p>Vibe coding treats every session as a fresh start. TkDD treats every session as a contribution to an accumulating knowledge base&#8212;one that captures not just conclusions, but the reasoning that got you there.</p><div><hr></div><p>I&#8217;m still working out the edges of this. The tooling is imperfect&#8212;mcp-ticketer exists because nothing else handled the multi-system reality of how I actually work. The workflow requires discipline that pure vibe coding doesn&#8217;t demand.</p><p>But the knowledge loss problem is real. I&#8217;ve wasted hours re-researching things I&#8217;d already figured out. I&#8217;ve made decisions twice because I couldn&#8217;t find where I&#8217;d made them the first time. I&#8217;ve watched context evaporate at the end of every session.</p><p>We can do better than that.</p><div><hr></div><p><em>I&#8217;m Bob Matsuoka, writing about agentic coding and AI-powered development at <a href="https://hyperdev.substack.com/">HyperDev</a>. For more on multi-agent workflows, see my analysis of <a href="https://hyperdev.substack.com/">Claude Code&#8217;s orchestration capabilities</a> or my deep dive into <a href="https://hyperdev.substack.com/">the knowledge management problem in AI development</a>.</em></p>]]></content:encoded></item><item><title><![CDATA[Every Claude.AI Tab You Open Gets Its Own "Server"]]></title><description><![CDATA[(Yes, really..I think - but I have evidence, follow along!)]]></description><link>https://hyperdev.matsuoka.com/p/every-claudeai-tab-you-open-gets</link><guid isPermaLink="false">https://hyperdev.matsuoka.com/p/every-claudeai-tab-you-open-gets</guid><dc:creator><![CDATA[Robert Matsuoka]]></dc:creator><pubDate>Tue, 18 Nov 2025 15:02:27 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!-mMl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb596452d-657c-42be-84ea-017e159a4d2b_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-mMl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb596452d-657c-42be-84ea-017e159a4d2b_1536x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-mMl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb596452d-657c-42be-84ea-017e159a4d2b_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!-mMl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb596452d-657c-42be-84ea-017e159a4d2b_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!-mMl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb596452d-657c-42be-84ea-017e159a4d2b_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!-mMl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb596452d-657c-42be-84ea-017e159a4d2b_1536x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-mMl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb596452d-657c-42be-84ea-017e159a4d2b_1536x1024.png" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b596452d-657c-42be-84ea-017e159a4d2b_1536x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2518421,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/179162524?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb596452d-657c-42be-84ea-017e159a4d2b_1536x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-mMl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb596452d-657c-42be-84ea-017e159a4d2b_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!-mMl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb596452d-657c-42be-84ea-017e159a4d2b_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!-mMl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb596452d-657c-42be-84ea-017e159a4d2b_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!-mMl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb596452d-657c-42be-84ea-017e159a4d2b_1536x1024.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">A Claude.AI webpage behaves like a (server) container</figcaption></figure></div><p>I noticed that a Claude.AI webpage behaved like a (server) container. That was surprising. Tested container isolation between Claude.AI tabs. Found different containers. Did not expect that.</p><p>Then checked the hostname in a fresh tab&#8212;before creating any artifacts or running anything. gVisor was already there. Already running.</p><p>That changed what I thought I understood about the architecture. Then I realized I was misunderstanding what that architecture actually costs.</p><h2>TL;DR</h2><ul><li><p><strong>Every Claude.AI page load allocates a gVisor sandbox</strong> with isolated filesystem, process namespace, and up to 9GB memory limit&#8212;verified through repeated testing</p></li><li><p><strong>gVisor sandboxes &#8800; VMs</strong>: Lightweight userspace isolation running thousands per host, sharing base images, allocating memory on-demand&#8212;not dedicated 9GB per tab</p></li><li><p><strong>My initial cost estimates were probably wrong by 10x+</strong>: Naive VM pricing (~$0.06/hour) dramatically overstates actual infrastructure costs for this architecture </p></li><li><p><strong>The UX bet remains real</strong>: Pre-allocation means zero latency when you need containers, but actual unit economics are likely far better than my initial modeling suggested</p></li><li><p><strong>Still explains infrastructure investment</strong>: Even with better-than-VM economics, scale drives need for owned infrastructure&#8212;just not as dramatically as first calculated</p></li></ul><h2>The Container Allocates on Page Load</h2><p>Two tabs. Different conversations. Tested the isolation:</p><pre><code><code># Tab 1
touch /tmp/session_test_22323.txt

# Tab 2  
ls /tmp/session_test_22323.txt
# Doesn&#8217;t exist
</code></code></pre><p>Different containers. That&#8217;s established.</p><p>Here&#8217;s what I missed: <strong>the container doesn&#8217;t allocate when you create an artifact or run bash commands.</strong></p><p>It allocates the moment you load the page.</p><p>Check the hostname in a fresh Claude.AI tab&#8212;before you&#8217;ve done anything:</p><pre><code><code>echo $HOSTNAME
# runsc
</code></code></pre><p>That&#8217;s gVisor. Already running. Just waiting.</p><p>Across repeated tests in my environment, this behavior is consistent. Every page load. Every time.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4Xib!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc2b243c-4f7e-4c74-892d-58cc43057086_1536x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4Xib!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc2b243c-4f7e-4c74-892d-58cc43057086_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!4Xib!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc2b243c-4f7e-4c74-892d-58cc43057086_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!4Xib!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc2b243c-4f7e-4c74-892d-58cc43057086_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!4Xib!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc2b243c-4f7e-4c74-892d-58cc43057086_1536x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4Xib!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc2b243c-4f7e-4c74-892d-58cc43057086_1536x1024.png" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bc2b243c-4f7e-4c74-892d-58cc43057086_1536x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2076717,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/179162524?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc2b243c-4f7e-4c74-892d-58cc43057086_1536x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4Xib!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc2b243c-4f7e-4c74-892d-58cc43057086_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!4Xib!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc2b243c-4f7e-4c74-892d-58cc43057086_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!4Xib!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc2b243c-4f7e-4c74-892d-58cc43057086_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!4Xib!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc2b243c-4f7e-4c74-892d-58cc43057086_1536x1024.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>What&#8217;s Actually There (And What It Actually Means)</h2><p>Every single Claude.AI tab I&#8217;ve tested loads with:</p><pre><code><code>dpkg -l | wc -l
# 871 packages

du -sh /usr
# 4.8GB

free -h
# 9GB RAM allocated
</code></code></pre><p>Full Ubuntu environment. Complete with git, development tools, the works.</p><p>Not &#8220;if you use computer features.&#8221; Not &#8220;when you create artifacts.&#8221;</p><p>On page load.</p><p><strong>But here&#8217;s where I initially got the implications wrong.</strong></p><h2>What gVisor Actually Is (Not What I First Assumed)</h2><p>When you see <code>HOSTNAME=runsc</code> and <code>free -h</code> showing 9GB, it&#8217;s natural to think: &#8220;Each tab gets a dedicated VM with 9GB RAM and a full Ubuntu install.&#8221;</p><p>That&#8217;s not how gVisor works.</p><p><strong>gVisor is a userspace kernel</strong>, not a hypervisor:</p><ul><li><p>Intercepts syscalls from guest processes</p></li><li><p>Enforces isolation at the syscall boundary</p></li><li><p>Runs thousands of sandboxes per host</p></li><li><p>Shares most resources across sandboxes</p></li></ul><p><strong>The &#8220;9GB RAM&#8221; is a limit, not an allocation:</strong></p><ul><li><p>The sandbox can use <em>up to</em> 9GB if needed</p></li><li><p>Only pages actually touched consume real RAM</p></li><li><p>Most sessions touch a tiny fraction</p></li><li><p>The host only pays for what&#8217;s actually used</p></li></ul><p><strong>The &#8220;4.8GB filesystem&#8221; is shared:</strong></p><ul><li><p>Read-only Ubuntu base image mounted for thousands of sandboxes</p></li><li><p>Only per-sandbox writes go into small overlay layers</p></li><li><p>Shared image cached on each node</p></li><li><p>Most sessions never touch more than a subset</p></li></ul><p>So when I initially modeled this as &#8220;VM-equivalent costs,&#8221; I was probably off by at least an order of magnitude.</p><h2>The Scale Math (Revised Understanding)</h2><p>Let me recalculate with a better understanding of the architecture.</p><p><strong>Conservative user assumptions:</strong></p><ul><li><p>5M monthly active Claude.AI users</p></li><li><p>Average 3 tabs per session</p></li><li><p>10 sessions monthly</p></li><li><p>2 hours average per session</p></li></ul><p><strong>What actually happens:</strong> Every tab load = gVisor sandbox allocation 5M &#215; 3 tabs &#215; 10 sessions = <strong>150M sandbox allocations monthly</strong></p><p><strong>Where my initial cost model went wrong:</strong></p><p>I used VM pricing (~$0.06/hour) as a proxy. That assumes:</p><ul><li><p>Dedicated compute resources per instance</p></li><li><p>Full memory allocation on provision</p></li><li><p>Traditional VM overhead</p></li></ul><p>gVisor sandboxes are fundamentally different:</p><ul><li><p>High-density multiplexing (thousands per host)</p></li><li><p>Memory allocated on-demand, not reserved</p></li><li><p>Shared base images across all instances</p></li><li><p>Aggressive overcommit strategies</p></li></ul><p><strong>Actual infrastructure costs are probably:</strong></p><ul><li><p>5-10x lower than VM-equivalent pricing</p></li><li><p>Still significant at scale</p></li><li><p>Driven by peak concurrent sandboxes, not total hours</p></li><li><p>Dependent on idle timeout policies</p></li></ul><p>Even at 10x better economics than I initially modeled, you&#8217;re still looking at substantial infrastructure investment requirements. Just not the &#8220;barely breaking even on Pro users&#8221; story I first calculated.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FI-4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa38d776-f820-4ee0-918e-dafe3c102201_1536x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FI-4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa38d776-f820-4ee0-918e-dafe3c102201_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!FI-4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa38d776-f820-4ee0-918e-dafe3c102201_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!FI-4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa38d776-f820-4ee0-918e-dafe3c102201_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!FI-4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa38d776-f820-4ee0-918e-dafe3c102201_1536x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FI-4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa38d776-f820-4ee0-918e-dafe3c102201_1536x1024.png" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fa38d776-f820-4ee0-918e-dafe3c102201_1536x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2364826,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/179162524?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa38d776-f820-4ee0-918e-dafe3c102201_1536x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FI-4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa38d776-f820-4ee0-918e-dafe3c102201_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!FI-4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa38d776-f820-4ee0-918e-dafe3c102201_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!FI-4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa38d776-f820-4ee0-918e-dafe3c102201_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!FI-4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa38d776-f820-4ee0-918e-dafe3c102201_1536x1024.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>The Design Logic (Still Valid)</h2><p>From a technical perspective, the logic holds:</p><p><strong>The problem:</strong> You can&#8217;t predict when someone will create an artifact or run bash commands. Wait to provision? Add latency. Users notice.</p><p><strong>The solution:</strong> Pre-allocate on page load. Sandbox&#8217;s already there. When they create an artifact&#8212;instant.</p><p><strong>Trade-off:</strong> Pay for sandboxes whether users need them or not. Most sessions I&#8217;ve observed never use computer features. You&#8217;re still carrying that cost.</p><p>The difference: That cost is lower than I initially thought, but the architectural choice remains the same.</p><h2>The Warm Pool (Now Makes More Sense)</h2><p>That 1-minute uptime I keep seeing? Fresh from the pool.</p><p><strong>How it likely works:</strong></p><ol><li><p>Anthropic maintains warm pool of pre-initialized sandboxes</p></li><li><p>Base Ubuntu image already mounted</p></li><li><p>Page load grabs one, adds overlay filesystem</p></li><li><p>Associates with your browser session</p></li><li><p>Returns to pool after timeout</p></li></ol><p><strong>Why there&#8217;s no provisioning latency:</strong></p><ul><li><p>Pool already running</p></li><li><p>Just binding + overlay setup</p></li><li><p>Normal page load times</p></li><li><p>Sandbox ready when needed</p></li></ul><p><strong>Why this is still expensive (but sustainable):</strong> With gVisor&#8217;s density, you can run thousands of sandboxes per host. But you still need:</p><ul><li><p>Enough hosts for peak concurrent sessions</p></li><li><p>Warm pool sized for typical tab patterns</p></li><li><p>Infrastructure to handle burst traffic</p></li></ul><p>The absolute numbers are lower than VM math suggests. The architectural complexity and scale requirements remain.</p><h2>The Unit Economics Question (Revised)</h2><p>Look at this from Anthropic&#8217;s perspective with better cost assumptions.</p><p><strong>If actual sandbox costs are 5-10x lower than VM pricing:</strong></p><p><strong>Pro user at $20/month:</strong></p><ul><li><p>Opens 30 tabs across 10 sessions</p></li><li><p>Each tab = 2 hours of sandbox time</p></li><li><p>Container costs: $0.36-0.72 (not $3.60)</p></li><li><p>Plus LLM inference: $8-12</p></li><li><p>Total cost: $8.36-12.72</p></li></ul><p>Margins look healthier. Still tight for power users.</p><p><strong>Power users:</strong></p><ul><li><p>10+ tabs simultaneously</p></li><li><p>All-day sessions</p></li><li><p>Multiple daily sessions</p></li></ul><p>Even at 10x better economics: 10 tabs &#215; 8 hours &#215; $0.006-0.012 = $0.48-0.96 per session</p><p>20 sessions monthly: $9.60-19.20 in container costs (vs. $96 in my initial model)</p><p><strong>Free tier users:</strong> Still subsidized. Just not as dramatically.</p><h2>Why This Still Explains August</h2><p>The architecture helps explain behaviors I observed, even with revised cost understanding:</p><p><strong>Scaling challenges:</strong></p><ul><li><p>Demand surge</p></li><li><p>Each user = multiple concurrent sandboxes</p></li><li><p>Pool capacity constraints</p></li><li><p>Coordination across hosts</p></li></ul><p><strong>Rate limiting context:</strong> Not just LLM tokens. Sandbox capacity matters. High-density multiplexing has limits.</p><p><strong>Infrastructure investment rationale:</strong> Even with better-than-VM economics, owned infrastructure makes sense:</p><ul><li><p>Optimize for this specific workload</p></li><li><p>Custom gVisor configurations</p></li><li><p>Eliminate cloud provider margins</p></li><li><p>Better control over density and overcommit</p></li></ul><p>The $50B investment still makes strategic sense. The immediate economic pressure just isn&#8217;t as severe as I first calculated.</p><h2>What Other Chat Interfaces Do</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YIKY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e5124dc-3a4c-48aa-b53b-38a3e99c1b6e_1536x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YIKY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e5124dc-3a4c-48aa-b53b-38a3e99c1b6e_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!YIKY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e5124dc-3a4c-48aa-b53b-38a3e99c1b6e_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!YIKY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e5124dc-3a4c-48aa-b53b-38a3e99c1b6e_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!YIKY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e5124dc-3a4c-48aa-b53b-38a3e99c1b6e_1536x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YIKY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e5124dc-3a4c-48aa-b53b-38a3e99c1b6e_1536x1024.png" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9e5124dc-3a4c-48aa-b53b-38a3e99c1b6e_1536x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2294686,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/179162524?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e5124dc-3a4c-48aa-b53b-38a3e99c1b6e_1536x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YIKY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e5124dc-3a4c-48aa-b53b-38a3e99c1b6e_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!YIKY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e5124dc-3a4c-48aa-b53b-38a3e99c1b6e_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!YIKY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e5124dc-3a4c-48aa-b53b-38a3e99c1b6e_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!YIKY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e5124dc-3a4c-48aa-b53b-38a3e99c1b6e_1536x1024.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>ChatGPT:</strong> Code interpreter runs in sandboxed Python (Docker containers). Documentation suggests persistent sessions tied to chat conversations rather than page loads, but OpenAI hasn&#8217;t published specific provisioning details.</p><p><strong>Gemini:</strong> Similar Python sandbox approach. Code execution is an optional feature that can be enabled via API or CLI flags, suggesting on-demand provisioning, though Google hasn&#8217;t detailed the exact architecture.</p><p><strong>Claude.AI:</strong> Full gVisor sandbox with Ubuntu environment. On page load. Whether you use it or not.</p><p>The capability difference remains significant. The cost differential is smaller than I initially thought, but the architectural complexity gap is real.</p><h2>The Architecture Trade-Off (Still Stands)</h2><p>Anthropic made a choice:</p><p><strong>Option A: Provision on-demand</strong></p><ul><li><p>Lower cost (only pay when used)</p></li><li><p>Adds latency (users wait for provision)</p></li><li><p>Simpler infrastructure</p></li></ul><p><strong>Option B: Pre-allocate on page load</strong></p><ul><li><p>Higher cost (pay whether used or not)</p></li><li><p>No latency (already there)</p></li><li><p>More complex infrastructure</p></li></ul><p>They picked B. The bet on experience over cost efficiency remains.</p><p>The actual cost premium is probably smaller than VM math suggests. The infrastructure complexity and engineering investment required is just as high.</p><h2>What I Got Wrong (And Right)</h2><p><strong>What the testing showed accurately:</strong></p><ul><li><p>gVisor sandboxes allocated on page load</p></li><li><p>Separate isolation per tab</p></li><li><p>Full Ubuntu environment available</p></li><li><p>Zero-latency artifact/bash execution</p></li></ul><p><strong>What I initially misunderstood:</strong></p><ul><li><p>Sandbox costs &#8800; VM costs</p></li><li><p>&#8220;9GB RAM&#8221; is a limit, not an allocation</p></li><li><p>Filesystem is shared, not per-instance</p></li><li><p>Density changes economics dramatically</p></li></ul><p><strong>What remains true:</strong></p><ul><li><p>This architecture is more complex than competitors</p></li><li><p>Pre-allocation strategy requires more infrastructure</p></li><li><p>Most users never touch container features</p></li><li><p>Scale drives need for vertical integration</p></li></ul><h2>The Revised Bottom Line</h2><p>Based on repeated testing in my environment, every Claude.AI tab you open gets a gVisor sandbox with a full Ubuntu environment.</p><p>Not when you use computer features. On page load.</p><p><strong>The economics (revised understanding):</strong></p><ul><li><p>Still significant infrastructure investment</p></li><li><p>Probably 5-10x better than my initial VM-based modeling</p></li><li><p>Driven by peak concurrency and warm pool sizing</p></li><li><p>Requires sophisticated resource management</p></li></ul><p>This gives you capabilities no other chat interface provides. Complete isolation. Real development environment. Zero latency when you need it.</p><p>My initial cost estimates were probably wrong by an order of magnitude. The architectural sophistication and strategic investment requirements remain accurate.</p><p>My working theory: Anthropic prioritized capability and UX, betting on gVisor&#8217;s density to make the economics work while still requiring substantial infrastructure investment for vertical integration benefits.</p><p>The question isn&#8217;t whether the architecture is sustainable&#8212;it probably is. The question is whether the capability advantage justifies the infrastructure complexity.</p><p>Every time you load a Claude.AI page, you&#8217;re triggering a sandbox allocation. The cost is lower than I first calculated. The architectural commitment is just as high.</p><div><hr></div><p><em>I&#8217;m Bob Matsuoka, writing about agentic coding and AI-powered development at <a href="https://hyperdev.substack.com/">HyperDev</a>. For more infrastructure insights, read my analysis of <a href="https://hyperdev.substack.com/">multi-agent orchestration costs</a> or my deep dive into <a href="https://hyperdev.substack.com/">AI development tool economics</a>.</em></p>]]></content:encoded></item><item><title><![CDATA[I Tracked Every Token]]></title><description><![CDATA[What a $1.07 Bug Fix Reveals About AI Coding Economics]]></description><link>https://hyperdev.matsuoka.com/p/i-tracked-every-token</link><guid isPermaLink="false">https://hyperdev.matsuoka.com/p/i-tracked-every-token</guid><dc:creator><![CDATA[Robert Matsuoka]]></dc:creator><pubDate>Wed, 05 Nov 2025 15:00:41 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!-M9c!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab1e4677-cee0-427e-9d40-982792e2b889_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-M9c!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab1e4677-cee0-427e-9d40-982792e2b889_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-M9c!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab1e4677-cee0-427e-9d40-982792e2b889_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!-M9c!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab1e4677-cee0-427e-9d40-982792e2b889_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!-M9c!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab1e4677-cee0-427e-9d40-982792e2b889_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!-M9c!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab1e4677-cee0-427e-9d40-982792e2b889_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-M9c!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab1e4677-cee0-427e-9d40-982792e2b889_1024x1024.png" width="1024" height="1024" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ab1e4677-cee0-427e-9d40-982792e2b889_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1270805,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/177651456?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab1e4677-cee0-427e-9d40-982792e2b889_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-M9c!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab1e4677-cee0-427e-9d40-982792e2b889_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!-M9c!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab1e4677-cee0-427e-9d40-982792e2b889_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!-M9c!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab1e4677-cee0-427e-9d40-982792e2b889_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!-M9c!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab1e4677-cee0-427e-9d40-982792e2b889_1024x1024.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I spent $1.07 and 8 minutes fixing a bug today that would have cost me $376 to outsource. That&#8217;s based on hiring an offshore developer at typical $40/hour rates for work that would take roughly 9.4 hours based on the agent execution time.</p><p>But here&#8217;s the thing about personal projects: my consulting time commands good rates, but for my own website? I&#8217;m not billing anyone. The comparison that matters isn&#8217;t theoretical cost savings&#8212;it&#8217;s $376 in actual engineering costs versus $1.07 in AI costs, plus 8 minutes of my attention versus the coordination overhead of managing another engineer. (I know the numbers are small, but bear with me, they&#8217;re meant to be illustrative).</p><p>More importantly: this bug touched multiple system layers that would have traditionally required coordinating several specialists. RSS feed integration, data transformation, React rendering, deployment pipeline. I handled all of it solo using orchestrated AI agents.</p><p>Eight minutes to save $375 and tackle work requiring team coordination shows why we&#8217;re seeing heightened interest in agentic coding tools&#8212;though my experience represents a best-case scenario worth examining critically.</p><p>I tracked every token. I have the receipts. And the session logs reveal something more interesting than just impressive cost savings: they show exactly which scenarios deliver real value, and which don&#8217;t.</p><h2>The Session: From Bug Report to Production</h2><p>Here&#8217;s what actually happened. I noticed my <a href="https://matsuoka.com/">personal website</a>&#8217;s blog integration was broken. Posts weren&#8217;t loading correctly, the feed was throwing errors, and the whole thing needed fixing before it became a bigger problem.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5Vvl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b3e19fc-c8ee-4d14-8e3d-de2b08641da2_1140x710.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5Vvl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b3e19fc-c8ee-4d14-8e3d-de2b08641da2_1140x710.png 424w, https://substackcdn.com/image/fetch/$s_!5Vvl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b3e19fc-c8ee-4d14-8e3d-de2b08641da2_1140x710.png 848w, https://substackcdn.com/image/fetch/$s_!5Vvl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b3e19fc-c8ee-4d14-8e3d-de2b08641da2_1140x710.png 1272w, https://substackcdn.com/image/fetch/$s_!5Vvl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b3e19fc-c8ee-4d14-8e3d-de2b08641da2_1140x710.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5Vvl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b3e19fc-c8ee-4d14-8e3d-de2b08641da2_1140x710.png" width="1140" height="710" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2b3e19fc-c8ee-4d14-8e3d-de2b08641da2_1140x710.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:710,&quot;width&quot;:1140,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:590117,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/177651456?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b3e19fc-c8ee-4d14-8e3d-de2b08641da2_1140x710.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5Vvl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b3e19fc-c8ee-4d14-8e3d-de2b08641da2_1140x710.png 424w, https://substackcdn.com/image/fetch/$s_!5Vvl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b3e19fc-c8ee-4d14-8e3d-de2b08641da2_1140x710.png 848w, https://substackcdn.com/image/fetch/$s_!5Vvl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b3e19fc-c8ee-4d14-8e3d-de2b08641da2_1140x710.png 1272w, https://substackcdn.com/image/fetch/$s_!5Vvl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b3e19fc-c8ee-4d14-8e3d-de2b08641da2_1140x710.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Four Agents</figcaption></figure></div><p>Rather than hiring an engineer and managing the work, I used <a href="https://github.com/bobmatnyc/claude-mpm">Claude-MPM</a> to orchestrate four AI agents in a single browser tab:</p><p><strong>Research Agent</strong> - Investigated the codebase, identified the broken blog feed integration, analyzed the data flow from Substack RSS to the site&#8217;s display logic.</p><p><strong>Engineer Agent</strong> - Implemented the initial fix attempt, modified the feed fetching logic, updated the data transformation pipeline.</p><p><strong>Web QA Agent</strong> - Ran Playwright tests against the staging deployment, verified the fix worked, validated all blog posts loaded correctly.</p><p><strong>React Engineer</strong> - Handled the root cause analysis when the first fix proved incomplete, restructured the entire blog integration architecture, ensured production-ready quality.</p><p>My actual work? Writing the initial prompt, monitoring progress in a single tab while working on four other projects, and occasionally redirecting agents when they went down wrong paths. Eight minutes of attention across an hour of wall-clock time while the agents executed in parallel.</p><p>This is my normal workflow now: Claude-MPM managing agent coordination in one tab while I context-switch between five active projects. The orchestration framework handles the parallel execution and agent handoffs. I provide strategic direction and quality oversight.</p><h2>The Economics: $1.07 Instead of $376 (Plus Coordination Overhead)</h2><p>Let&#8217;s break down what this work actually cost, with transparent assumptions:</p><p><strong>Token Consumption:</strong></p><ul><li><p>Total tokens: 104,641 tokens (52.3% of Claude&#8217;s 200K context window)</p></li><li><p>Input tokens: ~41,856 tokens (context, instructions, code)</p></li><li><p>Output tokens: ~62,785 tokens (analysis, code generation, fixes)</p></li></ul><p><strong>Cost Breakdown at Claude Sonnet 4.5 rates:</strong></p><ul><li><p>Input cost: $0.13 ($3 per million tokens)</p></li><li><p>Output cost: $0.94 ($15 per million tokens)</p></li><li><p><strong>Total AI cost: $1.07</strong></p></li></ul><p><strong>My Time Investment:</strong></p><ul><li><p>Active oversight: 8 minutes of attention</p></li><li><p>Wall-clock time: ~60 minutes while agents worked in parallel</p></li><li><p><strong>Personal project time: Not billable to anyone</strong></p></li></ul><p><strong>Alternative: Hiring an Engineer:</strong></p><ul><li><p>Offshore/nearshore developer at typical $40/hour rate (conservative market rate)</p></li><li><p>Estimated work: 9.4 hours (extrapolated from agent execution time plus debugging/testing)</p></li><li><p>Engineering cost: <strong>$376</strong></p></li><li><p>My coordination time: Minimum 1 hour for requirements, review, deployment</p></li><li><p><strong>Total traditional cost: $376 in hard costs plus coordination overhead</strong></p></li></ul><p>For personal projects where I can&#8217;t bill anyone, the relevant comparison is $376 in engineering costs versus $1.07 in AI costs. The dollar savings matter. But the time efficiency&#8212;eight engaged minutes for multi-system work&#8212;is what makes personal projects actually viable. This work would have stayed broken indefinitely because neither the cost nor the coordination overhead justified the fix.</p><h2>Beyond Cost: Capability Expansion</h2><p>The economics matter, but something else is happening: agentic tools expand what I can attempt without team support.</p><p>This blog integration issue touched multiple system layers&#8212;RSS feed parsing, data transformation, React component rendering, deployment pipeline. Five years ago, I would have needed multiple team members to handle this reliably: a backend engineer for the feed integration, a frontend specialist for the React components, a QA engineer to verify everything worked.</p><p>Now I orchestrate AI agents to handle all of it. This isn&#8217;t about replacing coding skill&#8212;I could write this code myself. It&#8217;s about managing complexity across multiple subsystems efficiently enough to be worth doing at all.</p><p>In practice, I&#8217;m now regularly tackling work that would have required team coordination:</p><ul><li><p>Building full-stack features solo that would have required 2-3 engineers</p></li><li><p>Maintaining multiple production systems without dedicated devops support</p></li><li><p>Implementing complex integrations that would have needed specialized expertise</p></li><li><p>Deploying changes across distributed systems with confidence</p></li></ul><p>I&#8217;ve seen similar patterns with other technical people, though the effectiveness varies by context:</p><ul><li><p><strong>Senior engineers</strong> shipping entire features without team support (when scope is well-defined)</p></li><li><p><strong>Technical founders</strong> building production systems solo during early stages (though architectural complexity still requires expertise)</p></li><li><p><strong>Staff engineers</strong> prototyping architectural changes across multiple services (within their existing systems)</p></li><li><p><strong>Technical product managers</strong> implementing fixes without engineering allocation (for straightforward issues)</p></li></ul><p>The bar for what technical people can attempt solo has shifted in some contexts&#8212;though complex projects still require specialized expertise and human oversight. Work that required team coordination and specialized roles can become feasible for individuals with good orchestration skills when the problem type fits.</p><h2>What the Token Timeline Reveals</h2><p>The session analytics show something interesting about how agentic work actually unfolds:</p><pre><code><code> 50K &#9508;&#9679; Initial context (project files, instructions)
     &#9474;
 60K &#9508;  &#9679;&#9679; Investigation phase (research agent)
     &#9474;
 70K &#9508;     &#9679;&#9679; First implementation attempts
     &#9474;
 80K &#9508;       &#9679; QA and deployment
     &#9474;
 90K &#9508;         &#9679;&#9679;&#9679; Root cause deep dive (largest spike)
     &#9474;
100K &#9508;              &#9679;&#9679; Cleanup &amp; analytics
     &#9474;
105K &#9508;                  &#9679; (current)
     &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;
     0   10   20   30   40   50   60 (minutes)
</code></code></pre><p>Nearly half the tokens (47.5%) went to initial context loading&#8212;establishing what the project was, how the code worked, and what needed fixing. Another 25% covered investigation and research. Only 15% went to actual implementation.</p><p>This distribution reveals the paradigm shift: I wasn&#8217;t coding. I was orchestrating intelligence to understand the problem, propose solutions, verify quality, and handle edge cases. The traditional developer workflow inverts when AI handles implementation while humans handle strategy.</p><h2>The Orchestration Paradigm: From Hiring to Directing</h2><p>Here&#8217;s what changed about how work gets done. Instead of four browser tabs managing agents manually, I use Claude-MPM to handle agent coordination in a single tab. My attention switches between five active projects while the framework manages parallel execution.</p><p>The work that happened (tabs here refer to <a href="https://iterm2.com/">iTerm2</a> tabs &#8212; my go-to terminal viewer).</p><p><strong>Single Claude-MPM Tab:</strong></p><ul><li><p>Research agent analyzing the codebase</p></li><li><p>Engineer implementing fixes</p></li><li><p>QA agent testing deployments</p></li><li><p>React specialist handling complex restructuring</p></li><li><p>All coordinated through the orchestration framework</p></li></ul><p><strong>Four Other Project Tabs:</strong></p><ul><li><p>Client work continuing in parallel</p></li><li><p>No context switching penalty</p></li><li><p>Framework handles agent handoffs and progress tracking</p></li></ul><p>My job shifted from implementation to strategic direction:</p><ul><li><p>Initial prompt: &#8220;Blog feed is broken, investigate Substack RSS integration&#8221;</p></li><li><p>Occasional redirects: &#8220;The initial fix didn&#8217;t address root cause&#8212;investigate data transformation layer&#8221;</p></li><li><p>Quality checkpoints: &#8220;Verify all historical posts load, not just recent ones&#8221;</p></li><li><p>Final verification: &#8220;Document architectural decisions&#8221;</p></li></ul><p>Traditional software development requires your full attention during execution. You write code, debug issues, test thoroughly, deploy carefully. Each step demands focus.</p><p>Orchestrated development enables parallel execution across multiple projects. The framework manages agents while you provide strategic oversight. Quality emerges from good specifications and periodic verification, not constant supervision.</p><p>This approach works particularly well for personal projects where coordination overhead would kill the project entirely. My website bug wasn&#8217;t worth hiring an engineer&#8212;the coordination time would have cost more than the fix was worth. Claude-MPM made it viable by requiring only eight minutes of my attention.</p><p>What made this work so efficiently: eliminated hiring overhead entirely, agents executed in parallel while I context-switched to other projects, and eight minutes of total attention across an hour of wall-clock time.</p><h2>When the Numbers Don&#8217;t Work: The Long Tail Problem</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FJdM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd25abe0-30e8-49a2-b099-66f2d59d8146_1076x281.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FJdM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd25abe0-30e8-49a2-b099-66f2d59d8146_1076x281.png 424w, https://substackcdn.com/image/fetch/$s_!FJdM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd25abe0-30e8-49a2-b099-66f2d59d8146_1076x281.png 848w, https://substackcdn.com/image/fetch/$s_!FJdM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd25abe0-30e8-49a2-b099-66f2d59d8146_1076x281.png 1272w, https://substackcdn.com/image/fetch/$s_!FJdM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd25abe0-30e8-49a2-b099-66f2d59d8146_1076x281.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FJdM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd25abe0-30e8-49a2-b099-66f2d59d8146_1076x281.png" width="1076" height="281" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dd25abe0-30e8-49a2-b099-66f2d59d8146_1076x281.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:281,&quot;width&quot;:1076,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:112620,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/177651456?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd25abe0-30e8-49a2-b099-66f2d59d8146_1076x281.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FJdM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd25abe0-30e8-49a2-b099-66f2d59d8146_1076x281.png 424w, https://substackcdn.com/image/fetch/$s_!FJdM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd25abe0-30e8-49a2-b099-66f2d59d8146_1076x281.png 848w, https://substackcdn.com/image/fetch/$s_!FJdM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd25abe0-30e8-49a2-b099-66f2d59d8146_1076x281.png 1272w, https://substackcdn.com/image/fetch/$s_!FJdM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd25abe0-30e8-49a2-b099-66f2d59d8146_1076x281.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The Long Tail Reality</figcaption></figure></div><p>Here&#8217;s what I need to be honest about: this session represents a best-case scenario across both dimensions&#8212;cost savings and capability expansion. The value doesn&#8217;t materialize consistently.</p><p><strong>The Economics Long Tail:</strong></p><p>Success cases like this bug fix sit at one end of a distribution. At the other end are situations where AI assistance provides marginal value, no value, or negative value:</p><p><strong>Marginal gains (2-5x cost reduction, limited capability expansion):</strong></p><ul><li><p>Work requiring significant oversight (coordination overhead remains)</p></li><li><p>Complex debugging demanding you&#8217;d hire senior engineers anyway</p></li><li><p>Projects where verification time approaches implementation time</p></li><li><p>Multi-system work where AI lacks critical domain context</p></li></ul><p><strong>No gains (1x or worse):</strong></p><ul><li><p>Work requiring deep expertise that AI fundamentally lacks</p></li><li><p>Legacy systems with undocumented architectural decisions</p></li><li><p>Security-critical code where verification exceeds any AI savings</p></li><li><p>Projects where coordination overhead was never the limiting factor</p></li><li><p>Genuinely novel problem spaces with no relevant training data</p></li></ul><p><strong>Negative value:</strong></p><ul><li><p>Fixing AI-generated bugs costs more than hiring correctly</p></li><li><p>AI confidently provides wrong solutions requiring extensive debugging</p></li><li><p>Management overhead increases rather than decreases</p></li><li><p>AI suggestions waste more engineering time than they save</p></li><li><p>Multi-system changes introduce subtle integration failures</p></li></ul><p>The distribution matters more than the peak. My bug fix achieved major cost savings ($376 vs $1.07) and demonstrated capability expansion (solo work spanning multiple systems). But across client work over the past three months, the aggregate value is more modest&#8212;typically 2-3x gains from acceleration rather than transformation.</p><p><strong>What determines where you land on this curve?</strong></p><p><strong>High-value scenarios (cost + capability):</strong></p><ul><li><p>Small projects where coordination overhead dominates</p></li><li><p>Personal work not worth hiring for at any price</p></li><li><p>Multi-system fixes with clear success criteria</p></li><li><p>Modern tech stacks matching AI training data</p></li><li><p>Straightforward verification and testing</p></li></ul><p><strong>Moderate-value scenarios (primarily acceleration):</strong></p><ul><li><p>Work within managed teams requiring oversight anyway</p></li><li><p>Complex problems needing senior judgment but benefiting from AI assistance</p></li><li><p>Projects where you&#8217;re coordinating engineers regardless</p></li><li><p>Incremental features in well-understood systems</p></li></ul><p><strong>Low-value scenarios:</strong></p><ul><li><p>Legacy systems requiring extensive human context</p></li><li><p>Novel problem spaces with no training data</p></li><li><p>Security or performance-critical code demanding extensive review</p></li><li><p>Work where AI suggestions require more debugging than starting fresh</p></li><li><p>Political or organizational constraints on implementation approach</p></li></ul><p>The heightened investor interest makes sense for the high-value category&#8212;massive volume of small projects plus capability expansion for individual contributors. Market skepticism makes sense for larger managed work where you&#8217;re hiring teams regardless.</p><h2>The Token Efficiency Story</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!o_kg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea3eee24-dfe7-4993-84b5-1eff050fb879_1136x688.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!o_kg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea3eee24-dfe7-4993-84b5-1eff050fb879_1136x688.png 424w, https://substackcdn.com/image/fetch/$s_!o_kg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea3eee24-dfe7-4993-84b5-1eff050fb879_1136x688.png 848w, https://substackcdn.com/image/fetch/$s_!o_kg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea3eee24-dfe7-4993-84b5-1eff050fb879_1136x688.png 1272w, https://substackcdn.com/image/fetch/$s_!o_kg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea3eee24-dfe7-4993-84b5-1eff050fb879_1136x688.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!o_kg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea3eee24-dfe7-4993-84b5-1eff050fb879_1136x688.png" width="1136" height="688" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ea3eee24-dfe7-4993-84b5-1eff050fb879_1136x688.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:688,&quot;width&quot;:1136,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:392952,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/177651456?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea3eee24-dfe7-4993-84b5-1eff050fb879_1136x688.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!o_kg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea3eee24-dfe7-4993-84b5-1eff050fb879_1136x688.png 424w, https://substackcdn.com/image/fetch/$s_!o_kg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea3eee24-dfe7-4993-84b5-1eff050fb879_1136x688.png 848w, https://substackcdn.com/image/fetch/$s_!o_kg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea3eee24-dfe7-4993-84b5-1eff050fb879_1136x688.png 1272w, https://substackcdn.com/image/fetch/$s_!o_kg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea3eee24-dfe7-4993-84b5-1eff050fb879_1136x688.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Something interesting emerged from analyzing the session: token efficiency mattered more than raw capability.</p><p>Initial context loading consumed nearly 50,000 tokens&#8212;establishing project structure, reading configuration files, understanding the codebase. This overhead was identical whether I was fixing a trivial bug or implementing a complex feature.</p><p>For small tasks, that overhead dominates. A 5-line change doesn&#8217;t justify 50,000 tokens of context establishment. The productivity multiplier approaches 1x or worse.</p><p>For complex tasks spanning multiple files and subsystems, the overhead amortizes. My 8.1x multiplier came from a bug fix that required:</p><ul><li><p>Analyzing 8 different files</p></li><li><p>Understanding data flow across 3 system layers</p></li><li><p>Implementing changes in 4 locations</p></li><li><p>Verifying behavior across multiple post types</p></li><li><p>Documenting architectural decisions</p></li></ul><p>The same initial context cost, but much higher value from the work performed.</p><p><strong>Token efficiency insights:</strong></p><ul><li><p><strong>Single-session work:</strong> No context restarts saved ~50,000 tokens</p></li><li><p><strong>Targeted analysis:</strong> Research agent methodology minimized exploratory waste</p></li><li><p><strong>Parallel processing:</strong> Multiple files analyzed simultaneously without token duplication</p></li><li><p><strong>Smart caching:</strong> Repeated file access handled efficiently</p></li></ul><p>The session used 52.3% of the available 200,000-token context window. Efficient enough to complete the work in one session, but not wasteful. A goldilocks utilization rate.</p><h2>Why Investors Are Interested (And Where Skepticism Is Warranted)</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FlTc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e8bbf21-d18f-4a9f-9bb4-0aa13f2db3af_1140x512.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FlTc!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e8bbf21-d18f-4a9f-9bb4-0aa13f2db3af_1140x512.png 424w, https://substackcdn.com/image/fetch/$s_!FlTc!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e8bbf21-d18f-4a9f-9bb4-0aa13f2db3af_1140x512.png 848w, https://substackcdn.com/image/fetch/$s_!FlTc!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e8bbf21-d18f-4a9f-9bb4-0aa13f2db3af_1140x512.png 1272w, https://substackcdn.com/image/fetch/$s_!FlTc!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e8bbf21-d18f-4a9f-9bb4-0aa13f2db3af_1140x512.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FlTc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e8bbf21-d18f-4a9f-9bb4-0aa13f2db3af_1140x512.png" width="1140" height="512" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3e8bbf21-d18f-4a9f-9bb4-0aa13f2db3af_1140x512.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:512,&quot;width&quot;:1140,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:276623,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/177651456?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e8bbf21-d18f-4a9f-9bb4-0aa13f2db3af_1140x512.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FlTc!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e8bbf21-d18f-4a9f-9bb4-0aa13f2db3af_1140x512.png 424w, https://substackcdn.com/image/fetch/$s_!FlTc!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e8bbf21-d18f-4a9f-9bb4-0aa13f2db3af_1140x512.png 848w, https://substackcdn.com/image/fetch/$s_!FlTc!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e8bbf21-d18f-4a9f-9bb4-0aa13f2db3af_1140x512.png 1272w, https://substackcdn.com/image/fetch/$s_!FlTc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e8bbf21-d18f-4a9f-9bb4-0aa13f2db3af_1140x512.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The funding activity around AI coding tools&#8212;<a href="https://techcrunch.com/2024/08/07/augment-raises-227m-series-c/">Augment&#8217;s $227M</a> (enterprise-focused coding assistant), <a href="https://techcrunch.com/2024/02/06/magic-dev-raises-117m/">Magic&#8217;s $320M</a> (autonomous coding agents), <a href="https://techcrunch.com/2024/10/09/codeium-raises-150m-series-c/">Codeium&#8217;s $150M</a> (AI code acceleration platform)&#8212;reflects recognition of two distinct value propositions: cost savings on small projects and capability expansion for individual technical contributors.</p><p><strong>Where the investment thesis makes sense:</strong></p><p>The addressable market appears substantial across two dimensions:</p><ol><li><p><strong>Cost savings market</strong>: Small projects, personal sites, maintenance tasks&#8212;all the work below the &#8220;worth hiring for&#8221; threshold&#8212;suddenly become economically viable. Millions of these decisions happen daily.</p></li><li><p><strong>Capability expansion market</strong>: Technical people at various levels handling multi-system work solo that traditionally required team coordination. Senior engineers, staff engineers, technical founders, technical PMs&#8212;all expanding what they can accomplish independently within certain contexts.</p></li></ol><p>Both markets show real, measurable value when conditions align. Token costs decline while capabilities improve. The technology enables both cost arbitrage and skill amplification simultaneously.</p><p><strong>Where caution is warranted:</strong></p><p>Neither value proposition scales uniformly to all software work. Personal project economics don&#8217;t apply when you&#8217;d hire and manage teams regardless. Capability expansion has limits&#8212;complex projects still require specialized expertise and human oversight. Context windows constrain problem scope. Verification overhead grows with project complexity.</p><p>The peak performance cases&#8212;like my $376 saved, 8-minute fix spanning multiple system layers&#8212;aren&#8217;t representative of median experience. Legacy systems, ambiguous requirements, and genuinely novel problems still resist automation.</p><p>My website bug represents the sweet spot: clear problem, modern stack, straightforward solution, multi-layer complexity. It demonstrates both cost savings (work that wouldn&#8217;t happen) and capability expansion (work that would have required team support).</p><p>Companies that can deliver consistent value across both use cases without overpromising tend to build sustainable businesses. Companies that only optimize for one scenario or suggest they can replace entire engineering organizations risk underdelivering on expectations.</p><h2>What This Means for Practitioners</h2><p>The session analytics reveal a practical framework for when to employ agentic coding tools across two distinct value dimensions:</p><p><strong>Cost savings scenarios (personal projects, small fixes):</strong></p><ul><li><p>Projects too small to justify hiring (personal sites, side projects, maintenance)</p></li><li><p>Coordination overhead would exceed implementation value</p></li><li><p>Clear specifications and straightforward verification</p></li><li><p>Multiple independent tasks can run in parallel</p></li><li><p>Modern tech stacks with good AI training coverage</p></li></ul><p><strong>Capability expansion scenarios (individual technical work):</strong></p><ul><li><p>Multi-system work traditionally requiring team coordination</p></li><li><p>Full-stack features spanning frontend, backend, and infrastructure</p></li><li><p>Complex integrations across distributed systems</p></li><li><p>Prototyping architectural changes before team involvement</p></li><li><p>Maintenance across multiple production systems</p></li></ul><p><strong>Both scenarios benefit from:</strong></p><ul><li><p>Good orchestration frameworks (like Claude-MPM)</p></li><li><p>Clear problem specifications</p></li><li><p>Ability to verify results independently</p></li><li><p>Iterative refinement workflows</p></li></ul><p><strong>Deploy tactically when:</strong></p><ul><li><p>Accelerating work within managed teams (2-3x gains typical)</p></li><li><p>Augmenting rather than replacing hiring decisions</p></li><li><p>Clear subtasks within larger projects requiring human judgment</p></li><li><p>Prototyping before committing team resources</p></li></ul><p><strong>Avoid or minimize when:</strong></p><ul><li><p>Project complexity demands senior engineering judgment regardless</p></li><li><p>Legacy systems requiring extensive human context</p></li><li><p>Security or performance criticality mandates extensive review</p></li><li><p>Verification overhead approaches or exceeds implementation savings</p></li></ul><p><strong>The key questions aren&#8217;t just &#8220;Can AI do this work?&#8221; but rather:</strong></p><ol><li><p><strong>Cost dimension</strong>: Would I have hired someone, or would coordination overhead kill the project?</p></li><li><p><strong>Capability dimension</strong>: Does this require coordination across specialists I don&#8217;t have access to?</p></li></ol><p>If either answer is yes&#8212;small projects, personal work, or multi-system complexity requiring team coordination&#8212;the value can be compelling. If you&#8217;d hire and manage a focused team anyway, the gains are more modest but still meaningful.</p><h2>The Honest Bottom Line</h2><p>My $1.07 bug fix demonstrates two valuable aspects of agentic coding tools: $376 saved in engineering costs (based on $40/hour offshore rates for 9.4 hours of work) and expanded capability to handle multi-layer system work solo that would have required team support.</p><p>The first value&#8212;cost savings on small projects&#8212;applies specifically to work that falls below the &#8220;worth hiring for&#8221; threshold. The second value&#8212;capability expansion&#8212;applies in certain contexts across technical roles attempting work that would have traditionally required coordination across specialized team members.</p><p>The recent funding activity makes sense when you consider both markets: the massive volume of small projects that become viable, and the expanded capability bar for what technical people can attempt independently in favorable conditions. Market skepticism makes sense when you focus on complex work requiring team oversight regardless of tooling.</p><p>Understanding which scenario you&#8217;re in determines what value you&#8217;ll see:</p><p><strong>Personal projects and small fixes</strong>: Cost savings ($376 vs $1.07) and time efficiency (8 minutes vs coordination overhead) make previously unviable work viable.</p><p><strong>Capability expansion</strong>: Technical people at various levels&#8212;senior engineers, staff engineers, technical founders, technical PMs&#8212;attempting full-stack or multi-system work that would have required team coordination, when the scope and context align.</p><p><strong>Larger managed projects</strong>: More modest 2-3x acceleration where you&#8217;re managing teams anyway.</p><p>For my website bug fix? Both values delivered&#8212;work got done that otherwise would have stayed broken, and I handled complexity that would have required multiple specialists. For larger client work? The primary value comes from acceleration rather than enablement.</p><p>Both scenarios represent real value. Both are happening across the industry. The difference is knowing which applies to your situation and what you&#8217;re actually optimizing for.</p><div><hr></div><p><em>I&#8217;m Bob Matsuoka, writing about agentic coding and AI-powered development at <a href="https://hyperdev.substack.com/">HyperDev</a>. The orchestration framework used in this article is <a href="https://github.com/bobmatnyc/claude-mpm">Claude-MPM</a>, my open-source multi-agent project management system. For more on orchestration strategies, read my analysis of <a href="https://hyperdev.matsuoka.com/p/multi-agent-ai-orchestration-in">multi-agent coordination patterns</a> or my comparison of <a href="https://hyperdev.matsuoka.com/p/claude-orchestrator-parallel-ai">orchestration frameworks</a>.</em></p>]]></content:encoded></item><item><title><![CDATA[Semantic Searching Is Also Good For Visualizations]]></title><description><![CDATA[As it turns out.]]></description><link>https://hyperdev.matsuoka.com/p/semantic-searching-is-also-good-for</link><guid isPermaLink="false">https://hyperdev.matsuoka.com/p/semantic-searching-is-also-good-for</guid><dc:creator><![CDATA[Robert Matsuoka]]></dc:creator><pubDate>Fri, 31 Oct 2025 14:04:25 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/177628203/7eaae91f8806df1d0566644d6642da2a.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p>I believe that Augment Code has the best context game in down.  Much of that has to do with it&#8217;s great semantic code search engine.</p><p>MCP Vector Search helps if you&#8217;re using Claude Code.  It does semantic embeddings of text and AST embeddings of many types of code.  <br><br>Which as it turns out is also an interesting way to build visualizations of your code base.  <a href="https://github.com/bobmatnyc/mcp-vector-search">Check it out</a>!</p><p>pipx install mcp-vector-search</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!AMEu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5deb977-5975-4031-a3a1-2a76b7a3f753_1578x1056.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!AMEu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5deb977-5975-4031-a3a1-2a76b7a3f753_1578x1056.png 424w, https://substackcdn.com/image/fetch/$s_!AMEu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5deb977-5975-4031-a3a1-2a76b7a3f753_1578x1056.png 848w, https://substackcdn.com/image/fetch/$s_!AMEu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5deb977-5975-4031-a3a1-2a76b7a3f753_1578x1056.png 1272w, https://substackcdn.com/image/fetch/$s_!AMEu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5deb977-5975-4031-a3a1-2a76b7a3f753_1578x1056.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!AMEu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5deb977-5975-4031-a3a1-2a76b7a3f753_1578x1056.png" width="1578" height="1056" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d5deb977-5975-4031-a3a1-2a76b7a3f753_1578x1056.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1056,&quot;width&quot;:1578,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:249640,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/177628203?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7fad4be-b1d3-4c01-981b-633b86fa0344_1578x1125.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!AMEu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5deb977-5975-4031-a3a1-2a76b7a3f753_1578x1056.png 424w, https://substackcdn.com/image/fetch/$s_!AMEu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5deb977-5975-4031-a3a1-2a76b7a3f753_1578x1056.png 848w, https://substackcdn.com/image/fetch/$s_!AMEu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5deb977-5975-4031-a3a1-2a76b7a3f753_1578x1056.png 1272w, https://substackcdn.com/image/fetch/$s_!AMEu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5deb977-5975-4031-a3a1-2a76b7a3f753_1578x1056.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p>]]></content:encoded></item><item><title><![CDATA[Claude.AI’s quiet revolution in artifact editing]]></title><description><![CDATA[Huge improvements rolled out without fanfare]]></description><link>https://hyperdev.matsuoka.com/p/claudeais-quiet-revolution-in-artifact</link><guid isPermaLink="false">https://hyperdev.matsuoka.com/p/claudeais-quiet-revolution-in-artifact</guid><dc:creator><![CDATA[Robert Matsuoka]]></dc:creator><pubDate>Fri, 24 Oct 2025 14:02:34 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!O4pD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ae1ed64-ed04-430a-85ea-0d8dbe8b21b8_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!O4pD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ae1ed64-ed04-430a-85ea-0d8dbe8b21b8_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!O4pD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ae1ed64-ed04-430a-85ea-0d8dbe8b21b8_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!O4pD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ae1ed64-ed04-430a-85ea-0d8dbe8b21b8_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!O4pD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ae1ed64-ed04-430a-85ea-0d8dbe8b21b8_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!O4pD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ae1ed64-ed04-430a-85ea-0d8dbe8b21b8_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!O4pD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ae1ed64-ed04-430a-85ea-0d8dbe8b21b8_1024x1024.png" width="1024" height="1024" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1ae1ed64-ed04-430a-85ea-0d8dbe8b21b8_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1620481,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/176888960?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ae1ed64-ed04-430a-85ea-0d8dbe8b21b8_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!O4pD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ae1ed64-ed04-430a-85ea-0d8dbe8b21b8_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!O4pD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ae1ed64-ed04-430a-85ea-0d8dbe8b21b8_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!O4pD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ae1ed64-ed04-430a-85ea-0d8dbe8b21b8_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!O4pD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ae1ed64-ed04-430a-85ea-0d8dbe8b21b8_1024x1024.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Claude.AI rolled out a significant but undocumented improvement to its editor system around <strong>October 23-24, 2025</strong>, delivering <strong>3-4x faster artifact updates</strong> through inline text replacement instead of full code regeneration. Multiple users independently discovered this enhancement during the same timeframe, coinciding with the improved Claude 3.5 Sonnet release, though Anthropic has provided no official announcement or documentation about the feature.</p><h2>What actually changed and when it happened</h2><p>The most impactful improvement&#8212;discovered by users rather than announced&#8212;transforms how Claude updates artifacts. Previously, any modification required regenerating the entire code base, even for single-line changes. Now Claude intelligently chooses between <strong>targeted inline edits</strong> for small changes and full rewrites only when necessary. Developer Rui Quintino, who documented his discovery in a detailed Medium article, observed: &#8220;What used to need a complete rebuild now happens almost instantly. In this simple change, a 3-4x reduction in waiting time.&#8221;</p><p>The timing aligns precisely with the <strong>Claude 4.5 Sonnet refresh</strong> released October 23-24, 2025. Multiple Reddit threads appeared simultaneously in r/ClaudeAI with titles like &#8220;Claude Diff Editor! Part of upgrade to Artifacts&#8221; and &#8220;Claude is now editing my artifacts (new feature?).&#8221; Users testing the system with browser developer tools revealed new network patterns showing <strong>distinct commands for &#8220;create,&#8221; &#8220;update,&#8221; and &#8220;rewrite&#8221;</strong> operations.</p><p>The update mechanism uses precise string matching for targeted changes&#8212;each update must match exactly once in the artifact, with whitespace and formatting mattering for accuracy. The system works across <strong>all artifact types</strong>: React components, Markdown documents, SVG graphics, Mermaid diagrams, and HTML pages. Users report seeing live preview changes as each update applies, dramatically improving process visibility.</p><h2>Official announcements that did happen</h2><p>While the inline editing improvement went undocumented, Anthropic made several official editor-related announcements during late September through October 2025:</p><p><strong>Claude Code VS Code Extension</strong> (September 29, 2025) introduced native IDE integration with inline diff displays, real-time change visibility through a dedicated sidebar panel, and checkpoint functionality that automatically saves code state before each change. The extension provides <strong>automatic context sharing</strong>&#8212;knowing which files are open, what code is highlighted, and seeing diagnostics from linters and language servers. Users can rewind to previous versions by tapping Escape twice or using the <code>/rewind</code> command.</p><p><strong>File creation capabilities</strong> (October 21, 2025) expanded Claude&#8217;s reach beyond text-based outputs, enabling creation and editing of Excel spreadsheets, Word documents, PowerPoint presentations, and PDFs directly in Claude.ai. Files are actual downloadable documents, not just artifacts, with direct Google Drive saving. The feature operates through a private sandboxed environment where Claude writes and executes code, though users receive warnings that internet access &#8220;may put your data at risk.&#8221;</p><p><strong>Agent Skills</strong> (October 16, 2025) introduced lightweight instruction sets for consistent task performance, with pre-built skills for PowerPoint, Excel, Word, and PDF files. The <strong>Analysis tool</strong> received enhancements supporting targeted edits within artifacts, advanced mathematical operations through math.js, and high-precision calculations.</p><h2>How users discovered and tested the improvements</h2><p>The discovery process reveals a fascinating pattern of collective user investigation. Rui Quintino noticed changes while building a Halloween game: &#8220;Changes were happening much faster than usual. After testing on both mobile and desktop, the improvement was clear: several artifact updates were being edited inline, instead of forcing full regeneration of the full code.&#8221;</p><p>He investigated using browser network traffic analysis, revealing intelligent routing decisions between update types. Testing showed <strong>live preview changes visible in real-time</strong> as updates applied, with dramatically reduced computational waste and token usage. He wrote: &#8220;Lower computational resources and token usage. More artifacts and Claude usage possible within the same message limits.&#8221;</p><p>Multiple power users contributed additional observations. Simon Willison, a veteran AI researcher, documented building numerous interactive applications in single sessions&#8212;YAML to JSON converters, pricing calculators, recording tools, and text editors&#8212;all benefiting from the faster iteration cycles. Developer teams at companies like Builder.io and Puzzmo reported similar experiences with improved reliability and speed.</p><h2>Problems the updates addressed</h2><p>The improvements directly tackle four major pain points users experienced with the previous artifact system:</p><p><strong>Full code regeneration waste</strong>: Every edit previously required regenerating entire code bases, consuming excessive tokens and time even for single-character changes. The inline replacement mechanism now handles <strong>targeted updates for small modifications</strong> while reserving full rewrites for substantial restructuring.</p><p><strong>Imprecise editing workflows</strong>: Users previously had to describe changes in chat or copy-paste code sections, prone to misunderstanding. The highlight-and-edit feature introduced earlier (enhanced in October) allows precise selection and modification of specific code sections, with &#8220;Improve&#8221; and &#8220;Explain&#8221; options for targeted changes.</p><p><strong>Slow update cycles</strong>: The 3-4x speed improvement eliminates waiting time for minor tweaks, enabling &#8220;changes that feel more natural, like working with a code editor&#8221; with &#8220;less context switching between chat and preview.&#8221;</p><p><strong>Poor process visibility</strong>: Users couldn&#8217;t see what Claude was actually changing during updates. The new system shows multiple update commands in sequence, with live preview changes visible as they apply, providing transparency previously absent from the editing process.</p><h2>Community response and developer adoption</h2><p>Sentiment across Reddit, Twitter, Medium, and developer forums is <strong>overwhelmingly positive</strong>, tempered by frustration about lack of official documentation. The r/ClaudeAI community actively discusses the improvements, with users sharing testing results and technical observations. Multiple sources reference discovering the feature through peer reports rather than official channels.</p><p>Developer teams report substantial workflow improvements. A Builder.io developer stated: &#8220;I&#8217;ve gone deep down the rabbit hole on every Cursor power feature... And I&#8217;ve abandoned it all for Claude Code.&#8221; The team notes Claude Code successfully handles <strong>18,000-line React components</strong> where competing tools struggle, with exceptional codebase navigation and pattern recognition capabilities.</p><p>The Puzzmo engineering team, after six weeks with Claude Code, reported &#8220;constantly trashing usual estimation timings&#8221; and noted the tool freed developers from &#8220;anxiety of the first step in programming.&#8221; They use Claude Code with monorepos and GitHub Actions, having it respond to PR comments and fix CI errors automatically.</p><p>However, persistent issues remain. Usage limits continue as the <strong>#1 complaint even for paid users</strong>, with Pro subscribers reporting &#8220;Opus blocked&#8221; messages mid-project. Platform stability issues emerge during peak hours, and feature rollout remains inconsistent&#8212;some older chat sessions lack the fast update mechanism entirely.</p><h2>What users are still requesting</h2><p>The community maintains clear priorities for future improvements:</p><p><strong>Official documentation tops the list</strong>, with repeated requests for Anthropic to explain the inline editing mechanism, provide technical implementation details, and guarantee feature stability. As Rui Quintino wrote: &#8220;Anthropic, can we please know a bit more?&#9786;&#8221;</p><p><strong>Project-based organization</strong> would reduce artifact clutter, with users wanting better version control and cleanup of intermediate artifacts. <strong>Enhanced usage limits</strong> remain critical, as current limits feel &#8220;set too low even for paid customers&#8221; according to multiple Trustpilot reviews from October 2025.</p><p><strong>External integrations</strong> face limitations&#8212;no FTP access, no SSH to servers, cannot browse file systems directly. Users request <strong>persistent storage</strong> with database support for artifacts and expanded Model Context Protocol (MCP) integration. <strong>Collaborative features</strong> including real-time collaboration on artifacts and team-based artifact libraries remain on wish lists.</p><p>Technical improvements requested include <strong>better edge case handling</strong> for the inline editing system (occasional issues with multiple string matches), <strong>mobile artifact editing</strong> with full capabilities on mobile apps, and <strong>manual edit preservation</strong> in diff viewers&#8212;a long-standing issue where user modifications get discarded when accepting Claude&#8217;s proposed changes.</p><h2>Technical implementation and remaining gaps</h2><p>Browser network analysis reveals Claude now makes intelligent choices between update types based on change scope. The system uses <strong>three distinct operation modes</strong>: &#8220;create&#8221; for new artifacts, &#8220;update&#8221; for targeted string replacement with precise matching requirements, and &#8220;rewrite&#8221; for substantial changes requiring full regeneration.</p><p>The mechanism works best with well-structured code, benefiting from const declarations and single-source-of-truth patterns. It effectively handles button color changes, text updates, and single-line modifications across all artifact types. Complex refactoring still triggers full rewrites, and ambiguous string matching can cause errors.</p><p>Critical bugs remain unresolved. GitHub Issue #1317 (opened May 2025, still open in October) documents how <strong>the model is unaware of manual editing in diff viewers</strong>&#8212;when users manually edit Claude&#8217;s proposed changes before accepting, Claude doesn&#8217;t recognize the edits and discards user changes. Developers must choose &#8220;No, and tell Claude what to do differently&#8221; and describe manual changes verbally, a significant workflow disruption.</p><p>GitHub Issue #9668 (October 16, 2025) reveals <strong>conversation history loading problems</strong>. Multiple distinct conversations display &#8220;Warmup&#8221; as the title because the UI uses the first user message as the conversation title, and many users type &#8220;Warmup&#8221; to initialize sessions. One user reported three conversations from the same day&#8212;totaling 250KB, 128KB, and 295KB with 69, 87, and 210 messages respectively&#8212;all displayed identically as &#8220;Warmup,&#8221; making previous work effectively inaccessible.</p><h2>Comparison to Claude Code and competing tools</h2><p>The improvements narrow the gap between web-based Claude.ai artifacts and the more powerful Claude Code terminal/IDE integration. Claude Code offers <strong>native diff panels in VS Code</strong>, file system integration, and version control readiness, making it superior for complex multi-file projects. The web interface now provides <strong>faster iteration for prototyping</strong> and better visual feedback for non-technical users.</p><p>Developer comparisons favor Claude&#8217;s vertical integration advantage&#8212;Anthropic controlling both model and tooling enables better optimization. Builder.io&#8217;s developer noted Claude Code rarely gets stuck and excels at navigating large codebases, contrasting with Cursor&#8217;s struggles updating extremely large files. Terminal-Bench benchmarks show <strong>60% overall accuracy</strong>, dropping to 16% on complex tasks, but real-world usage reports suggest higher practical success rates.</p><p>OpenAI&#8217;s Canvas provides similar editing tools but lacks Claude&#8217;s emphasis on creating shareable, interactive applications. Canvas allows manual editing of content while Claude Artifacts remain view-only (only Claude can edit), though Claude now provides artifact preview for HTML/SVG that Canvas doesn&#8217;t offer. The community remains split on which approach better serves different use cases.</p><h2>Conclusion: A maturation moment with documentation debt</h2><p>The October 2025 updates represent <strong>substantial technical maturation</strong> rather than experimental features. The inline editing improvement alone transforms artifact workflows, enabling rapid iteration that was previously impossible. Combined with file creation capabilities, enhanced analysis tools, and the Agent Skills framework, Claude.ai has evolved from a conversational interface into a collaborative development environment.</p><p>Yet the silence around the most impactful improvement&#8212;inline artifact editing&#8212;creates unnecessary confusion. Users shouldn&#8217;t need to reverse-engineer features through browser developer tools and community investigation. As one developer forum noted: &#8220;October&#8217;s releases represent maturation rather than experimentation. These tools are production-ready for specific use cases.&#8221;</p><p>The core question has shifted from &#8220;Can AI help with code?&#8221; to &#8220;Which AI tool solves which specific bottleneck?&#8221; For rapid prototyping, interactive applications, and visual content, Claude&#8217;s artifacts now offer compelling speed and ease of use. For complex multi-file projects and enterprise development, Claude Code&#8217;s IDE integration provides the necessary depth. Both have reached production readiness, but persistent issues with usage limits, conversation history management, and documentation gaps prevent them from reaching their full potential.</p><p>The improvements are real, significant, and genuinely useful. They just happened quietly, leaving users to discover them by accident&#8212;an odd approach for features that meaningfully enhance the product.</p>]]></content:encoded></item><item><title><![CDATA[Why Augment Code’s Integration Strategy Is Smarter Than You Think]]></title><description><![CDATA[A short article - hurrah!]]></description><link>https://hyperdev.matsuoka.com/p/why-augment-codes-integration-strategy</link><guid isPermaLink="false">https://hyperdev.matsuoka.com/p/why-augment-codes-integration-strategy</guid><dc:creator><![CDATA[Robert Matsuoka]]></dc:creator><pubDate>Thu, 16 Oct 2025 14:02:53 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Bm0-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f986e58-07fb-4631-841e-c5b9f1994bfe_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Bm0-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f986e58-07fb-4631-841e-c5b9f1994bfe_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Bm0-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f986e58-07fb-4631-841e-c5b9f1994bfe_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!Bm0-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f986e58-07fb-4631-841e-c5b9f1994bfe_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!Bm0-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f986e58-07fb-4631-841e-c5b9f1994bfe_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!Bm0-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f986e58-07fb-4631-841e-c5b9f1994bfe_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Bm0-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f986e58-07fb-4631-841e-c5b9f1994bfe_1024x1024.png" width="1024" height="1024" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0f986e58-07fb-4631-841e-c5b9f1994bfe_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1677340,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/176198483?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f986e58-07fb-4631-841e-c5b9f1994bfe_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Bm0-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f986e58-07fb-4631-841e-c5b9f1994bfe_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!Bm0-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f986e58-07fb-4631-841e-c5b9f1994bfe_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!Bm0-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f986e58-07fb-4631-841e-c5b9f1994bfe_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!Bm0-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f986e58-07fb-4631-841e-c5b9f1994bfe_1024x1024.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><br>I run four AI coding tools simultaneously. Sounds excessive until you see the strategy in action. Augment Code&#8217;s <strong><a href="https://www.augmentcode.com/">Augie CLI</a></strong> and <strong><a href="https://claude.ai/code">Claude Code</a></strong> aren&#8217;t rivals in my workflow &#8211; they&#8217;re complementary partners. Augment did something subtle but brilliant: they made interoperability a first-class feature.</p><h2>The Integration Play</h2><p>Augment Code adopted two key things that make &#8220;platooning&#8221; multiple AI agents seamless:</p><ul><li><p><strong>Claude Code&#8217;s </strong><code>/commands</code><strong> model.</strong> Augment&#8217;s terminal tool (Auggie) implements the same slash&#8209;command interface as Claude Code. In fact, Auggie <a href="https://docs.augmentcode.com/commands">automatically detects and supports commands defined in Claude&#8217;s </a><code>./.claude/commands/</code><a href="https://docs.augmentcode.com/commands"> directory</a> for full compatibility. Every workflow I built in Claude &#8211; <code>/mpm</code>, <code>/mpm-agents</code>, <code>/mpm-doctor</code> &#8211; runs identically in Auggie.</p></li><li><p><strong><a href="https://modelcontextprotocol.io/">Model Context Protocol (MCP)</a></strong>. Augment supports MCP natively, meaning external context engines, memory systems, and tools can be shared between Claude and Augment with zero changes. MCP is the open protocol unifying AI assistants with external data and tools.</p></li></ul><p>Because Augment adopted Claude&#8217;s command schema and MCP, switching between the two costs you nothing&#8212;your workflows just carry over.</p><h2>Why This Matters</h2><p>I built months of custom workflows in Claude Code. When I switch to Auggie, those workflows just work. No translation, no re-learning. Augment&#8217;s context engine is particularly strong&#8212;it indexes codebases like a search engine, rather than pushing raw tokens. Users have observed that Augment can process and retrieve context from hundreds of thousands of files more efficiently than Claude&#8217;s prompt window.</p><p>[REMOVED MOLISHA SHAH QUOTE - couldn&#8217;t verify]</p><p>Augment supports the Model Context Protocol (MCP) standard, and third-party developers have created MCP servers like auggie-context-mcp that enable Claude and other tools to access Augment&#8217;s context engine, creating practical interoperability across the ecosystem.</p><h2>When to Use What</h2><p>Use Case Tool Why Complex refactoring, cross-cutting architectural changes <a href="https://github.com/bobmatnyc/claude-mpm">Claude Code + Claude&#8209;MPM</a> Multi-agent orchestration, long-horizon reasoning, full codebase coordination. Quick bug fixes, debugging, documentation, operations <a href="https://www.augmentcode.com/cli">Augment&#8217;s Auggie CLI</a> Single-agent focus, near-instant responses, efficient token usage.</p><p>Power users often <strong>platoon</strong> them&#8212;do architectural planning in Claude, rapid iteration and patching in Auggie.</p><h2>The Conservative Token Play</h2><p>Augment&#8217;s design avoids &#8220;context dumping.&#8221; Because it indexes projects and retrieves selectively, its prompts stay lean. Augment&#8217;s pricing page emphasizes efficiency: users pay for meaningful interactions, not wasted tokens.</p><p>In production use, Augment&#8217;s retrieval-first design shows approximately 3&#215; faster response times compared to large-context tools&#8212;averaging 4.1 seconds per response versus 12-15 seconds for competitors using million-token context windows. This architecture allows rapid, iterative work without ballooning API costs.</p><h2>What They&#8217;ll Probably Do Next</h2><p>Augment already offers multi-agent support via Remote Agents in its IDE extensions. Their roadmap appears centered on orchestrating teams of agents through standard protocols rather than proprietary lock-in.</p><p>This aligns with the broader industry movement toward agent interoperability standards, much like Language Server Protocol did for IDEs. Augment&#8217;s track record with MCP support suggests they&#8217;ll continue embracing open standards.</p><h2>The Broader Implication</h2><p>The AI coding market is consolidating around shared standards:</p><ul><li><p>Slash commands as a universal interface (Claude&#8217;s model, extended by Augment)</p></li><li><p>Model Context Protocol for tool/data interconnectivity</p></li><li><p>Agent orchestration protocols for multi-tool collaboration</p></li></ul><p>Tools embracing these standards gain instant workflow compatibility; those resisting create friction and force users into binary ecosystems. Augment&#8217;s strategy embraces openness, enabling seamless use alongside Claude.</p><h2>Bottom Line</h2><p>Augment Code&#8217;s integration strategy is a model of pragmatic interoperability: acknowledge where the ecosystem is heading, align with the standards, and differentiate through execution. For users like me, that means I can use Claude&#8209;MPM for orchestration and Augment Auggie for focused execution&#8212;no vendor lock&#8209;in, no translation overhead.</p><p>In a market obsessed with proprietary control, Augment&#8217;s commitment to playing well with others is the smartest move of all.</p><div><hr></div><p><em>By <a href="https://hyperdev.substack.com/">Bob Matsuoka</a>, writing about agentic coding and AI&#8209;powered development at <a href="https://hyperdev.matsuoka.com/">HyperDev</a>.</em></p>]]></content:encoded></item><item><title><![CDATA[Whither Claude Desktop?]]></title><description><![CDATA[When AI tools can&#8217;t tell themselves apart]]></description><link>https://hyperdev.matsuoka.com/p/whither-claude-desktop</link><guid isPermaLink="false">https://hyperdev.matsuoka.com/p/whither-claude-desktop</guid><dc:creator><![CDATA[Robert Matsuoka]]></dc:creator><pubDate>Mon, 29 Sep 2025 14:02:51 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!TFH6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2927c810-766e-406e-9ed6-54600acbdeac_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TFH6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2927c810-766e-406e-9ed6-54600acbdeac_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TFH6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2927c810-766e-406e-9ed6-54600acbdeac_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!TFH6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2927c810-766e-406e-9ed6-54600acbdeac_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!TFH6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2927c810-766e-406e-9ed6-54600acbdeac_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!TFH6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2927c810-766e-406e-9ed6-54600acbdeac_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TFH6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2927c810-766e-406e-9ed6-54600acbdeac_1024x1024.png" width="1024" height="1024" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2927c810-766e-406e-9ed6-54600acbdeac_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2507729,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/174479469?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2927c810-766e-406e-9ed6-54600acbdeac_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!TFH6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2927c810-766e-406e-9ed6-54600acbdeac_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!TFH6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2927c810-766e-406e-9ed6-54600acbdeac_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!TFH6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2927c810-766e-406e-9ed6-54600acbdeac_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!TFH6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2927c810-766e-406e-9ed6-54600acbdeac_1024x1024.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I spent twenty minutes yesterday watching Claude Code correct itself about what Claude Code actually is. Then correct itself again. Then get it wrong a third time.</p><p>Here&#8217;s what happened: I was setting up MCP browser tools and Claude Code kept insisting it had &#8220;added mcp-browser to Claude Code&#8217;s config at ~/Library/Application Support/Claude/claude_desktop_config.json.&#8221;</p><p>That&#8217;s not Claude Code&#8217;s config. That&#8217;s Claude.ai Desktop&#8217;s config.</p><p>When I pointed this out&#8212;&#8221;that is not the Claude Code config! That is the Claude.AI Desktop config!&#8221;&#8212;it apologized profusely and tried to find the right location. But the damage reveals something deeper than a simple naming mixup.</p><p>Claude Code doesn&#8217;t understand what Claude Code is.</p><h2>The Identity Crisis</h2><p>This isn&#8217;t user error or a documentation problem. I&#8217;ve got system instructions that explicitly explain the difference between:</p><ul><li><p><strong>Claude Code</strong>: The coding assistant/IDE application</p></li><li><p><strong>Claude.ai Desktop</strong>: The chat application from Anthropic</p></li></ul><p>Despite these instructions, Claude Code routinely confuses itself with other Anthropic products. It&#8217;ll reference the wrong config files, suggest the wrong installation paths, and generally behave like it&#8217;s having an identity crisis.</p><p>The pattern shows up constantly. Claude Code will acknowledge it&#8217;s wrong, correct itself, then make the same mistake three responses later. It&#8217;s like watching someone with short-term memory issues try to remember their own name.</p><h2>Why Static Memory Fails Fast-Moving Markets</h2><p>This confusion points to a fundamental limitation in current AI architectures. These systems rely on static training data and baked-in instructions that become obsolete quickly in fast-moving markets.</p><p>Anthropic has multiple products with similar names and overlapping functionality. The training data probably contains references to &#8220;Claude Desktop&#8221; from different time periods when the product lineup looked different. Add in user documentation, forum discussions, and third-party tutorials that use inconsistent naming, and you get...chaos.</p><p>The AI gets conflicting signals about what it actually is.</p><p>Static memory models can&#8217;t adapt when:</p><ul><li><p>Product names change or evolve</p></li><li><p>New versions launch with different capabilities</p></li><li><p>Configuration patterns shift between releases</p></li><li><p>Multiple similar products coexist</p></li></ul><h2>The Broader Problem</h2><p>This isn&#8217;t just annoying&#8212;it&#8217;s symptomatic of how current AI systems handle rapidly changing information. When your coding assistant doesn&#8217;t know its own configuration system, how can users trust it for complex technical decisions?</p><p>The confusion cascades. Wrong configuration advice leads to broken setups. Users waste time following incorrect instructions. Trust erodes because the tool seems unreliable on basic facts about itself.</p><p>I&#8217;ve seen this pattern across multiple AI coding tools. They&#8217;ll confidently provide outdated installation instructions, reference deprecated APIs, or suggest workflows that don&#8217;t match current product capabilities.</p><h2>What This Means for Users</h2><p>Right now, you need to verify everything. Even when an AI tool gives you instructions about itself, double-check against current documentation.</p><p>This is particularly problematic for:</p><ul><li><p><strong>Configuration and setup tasks</strong> where wrong paths break everything</p></li><li><p><strong>Version-specific features</strong> that may not exist in your installation</p></li><li><p><strong>Integration workflows</strong> that depend on accurate product understanding</p></li></ul><p>The workaround? Treat AI coding assistants as unreliable narrators about their own capabilities. They&#8217;re great at generating code, terrible at knowing what they can actually do.</p><h2>Moving Beyond Static Memory</h2><p>The solution isn&#8217;t more detailed instructions&#8212;it&#8217;s fundamentally different memory architectures. Static prompts and training data can&#8217;t keep pace with software development cycles.</p><p>What we need are dynamic memory systems that can:</p><ul><li><p>Update product knowledge in real-time</p></li><li><p>Verify information against current documentation</p></li><li><p>Distinguish between different versions and configurations</p></li><li><p>Learn from user corrections without losing context</p></li></ul><p>I&#8217;m working on approaches to this problem, which I&#8217;ll detail in a future post. But the core insight is clear: current memory models are incompatible with fast-moving technical domains.</p><h2>The Meta-Problem</h2><p>Here&#8217;s what really gets me: we&#8217;re using AI tools to build software, and those tools don&#8217;t understand basic facts about themselves. How can they make architectural decisions about systems they can&#8217;t even identify correctly?</p><p>This identity confusion reflects deeper issues with how AI systems maintain and update knowledge. If Claude Code can&#8217;t keep track of what Claude Code is, what other fundamental misconceptions are lurking in its responses?</p><p>The irony is thick. We&#8217;re debugging AI tools that can&#8217;t debug their own identity.</p><h2>Bottom Line</h2><p>Claude Code remains useful for actual coding tasks. But don&#8217;t trust it for meta-information about Anthropic&#8217;s product lineup, configuration systems, or its own capabilities.</p><p>Verify everything. Check official documentation. And remember that your coding assistant might be as confused about what it is as you are about what it can actually do.</p><p>The tools work best when you stop expecting them to be self-aware.</p><div><hr></div><p><em>Next week: The dynamic memory architectures that could actually solve this problem</em></p>]]></content:encoded></item><item><title><![CDATA[AI vs Human Problem-Solving Shows Why Judgment Still Beats Probability]]></title><description><![CDATA[Or: The Dangers of Stochastic Engineering]]></description><link>https://hyperdev.matsuoka.com/p/ai-vs-human-problem-solving-shows</link><guid isPermaLink="false">https://hyperdev.matsuoka.com/p/ai-vs-human-problem-solving-shows</guid><dc:creator><![CDATA[Robert Matsuoka]]></dc:creator><pubDate>Mon, 15 Sep 2025 14:03:01 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!NxIX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac3829c0-c45f-43d6-b18e-c8c5912a37f3_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NxIX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac3829c0-c45f-43d6-b18e-c8c5912a37f3_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NxIX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac3829c0-c45f-43d6-b18e-c8c5912a37f3_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!NxIX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac3829c0-c45f-43d6-b18e-c8c5912a37f3_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!NxIX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac3829c0-c45f-43d6-b18e-c8c5912a37f3_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!NxIX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac3829c0-c45f-43d6-b18e-c8c5912a37f3_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NxIX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac3829c0-c45f-43d6-b18e-c8c5912a37f3_1024x1024.png" width="1024" height="1024" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ac3829c0-c45f-43d6-b18e-c8c5912a37f3_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1874841,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/173479062?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac3829c0-c45f-43d6-b18e-c8c5912a37f3_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!NxIX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac3829c0-c45f-43d6-b18e-c8c5912a37f3_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!NxIX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac3829c0-c45f-43d6-b18e-c8c5912a37f3_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!NxIX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac3829c0-c45f-43d6-b18e-c8c5912a37f3_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!NxIX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac3829c0-c45f-43d6-b18e-c8c5912a37f3_1024x1024.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2><strong>TL;DR: Key Highlights</strong></h2><p><em>Thanks to <a href="https://www.linkedin.com/in/alexandermay/">Alex May</a> and <a href="https://www.linkedin.com/in/jbarmash/">Jean Barmash</a> for suggesting we add these quick-hit summaries to help busy developers get the core insights fast.</em></p><p>&#8226; <strong>The judgment gap is real</strong>: Experienced engineers solve problems through targeted analysis; AI uses probabilistic shotgun approaches that often work but miss root causes</p><p>&#8226; <strong>Speed vs effectiveness paradox</strong>: METR study shows experienced developers are 19% slower with AI tools despite feeling 20% faster&#8212;we're deceiving ourselves about current AI effectiveness</p><p>&#8226; <strong>Case study reality check</strong>: When debugging WordPress performance issues, Claude applied 7 optimizations achieving 98% improvement; human engineer Oswaldo made 3 targeted changes that fixed the actual bug</p><p>&#8226; <strong>AI excels in constrained spaces</strong>: Pattern-based tasks like test generation, boilerplate code, and API clients see 70% time savings; complex debugging and architecture decisions still require human judgment</p><p>&#8226; <strong>Economic implications</strong>: Teams need fewer junior developers but more senior architects; AI amplifies junior productivity while sometimes hindering expert-level work</p><p>&#8226; <strong>Practical workflow</strong>: Human defines architecture &#8594; AI implements &#8594; Human reviews &#8594; AI generates tests &#8594; Human validates. Checkpoints beat full autonomy</p><p>&#8226; <strong>Bottom line</strong>: AI is powerful pattern matching without judgment. Use it for mechanical coding tasks, reserve human expertise for decisions that actually matter</p><div><hr></div><p><strong>An experienced engineer will almost always write code better than AI in its current state.</strong> This isn't my pessimism about AI's future or a rejection of the new tools. It's a practical observation from the trenches, one that becomes crystal clear when you watch how differently humans and AI approach the same problem.</p><p>Let me share a cautionary tale that perfectly illustrates where we are with agentic coding in 2025. When code that pulled content from a WordPress blog was experiencing performance issues, two approaches emerged: Claude, representing state-of-the-art AI, and Oswaldo, an experienced engineer. The way each tackled the problem reveals fundamental truths about the current state of AI coding that every developer needs to understand.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2kx8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61f64b57-7f0d-43a0-b29a-cc183e0edc2b_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2kx8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61f64b57-7f0d-43a0-b29a-cc183e0edc2b_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!2kx8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61f64b57-7f0d-43a0-b29a-cc183e0edc2b_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!2kx8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61f64b57-7f0d-43a0-b29a-cc183e0edc2b_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!2kx8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61f64b57-7f0d-43a0-b29a-cc183e0edc2b_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2kx8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61f64b57-7f0d-43a0-b29a-cc183e0edc2b_1024x1024.png" width="1024" height="1024" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/61f64b57-7f0d-43a0-b29a-cc183e0edc2b_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1789554,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/173479062?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61f64b57-7f0d-43a0-b29a-cc183e0edc2b_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2kx8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61f64b57-7f0d-43a0-b29a-cc183e0edc2b_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!2kx8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61f64b57-7f0d-43a0-b29a-cc183e0edc2b_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!2kx8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61f64b57-7f0d-43a0-b29a-cc183e0edc2b_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!2kx8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61f64b57-7f0d-43a0-b29a-cc183e0edc2b_1024x1024.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>A tale of two debugging approaches</h2><p>The problem seemed straightforward enough: code fetching content from a WordPress site was running slowly. Claude, the AI agent, approached it probabilistically, generating hypothesis after hypothesis based on pattern matching from its training data. It suggested optimizing API calls, implementing aggressive caching, checking for network timeouts, examining data fetching patterns&#8212;essentially throwing a comprehensive troubleshooting checklist at the content fetching problem. Each suggestion was reasonable, drawn from the vast corpus of performance optimization issues it had seen before.</p><h3>Claude's AI approach: Optimize everything</h3><p>Here's how Claude tackled the problem:</p><pre><code><code>// Claude's probabilistic optimization approach
function Posts() {
  // Hypothesis 1: Parallel fetching will solve waterfall loading
  const [featuredCategoryId, setFeaturedCategoryId] = useState(-1); // Start optimistically
  
  // Hypothesis 2: Aggressive caching prevents redundant requests
  const SWR_CONFIG = {
    revalidateOnFocus: false,
    revalidateOnReconnect: false,
    dedupingInterval: 60000, // 60-second cache
  };
  
  // Hypothesis 3: Fetch everything in parallel
  const { data: categories } = useSWR(categoriesUrl, fetcher, SWR_CONFIG);
  const { data: posts } = useSWR(
    `${postsUrl}?categories_exclude=${featuredCategoryId}`,
    fetcher,
    SWR_CONFIG
  );
  
  // Hypothesis 4: Memoize expensive operations
  const parsedTitle = useMemo(() =&gt; parseHtml(title, {}), [title]);
  const formattedDate = useMemo(() =&gt; formatDateString(date), [date]);
  
  // Hypothesis 5: Prefetch for faster navigation
  return &lt;Link href={`/posts/${slug}`} prefetch={true}&gt;
}

// Result: 7.2s &#8594; 133ms (98% improvement)
// But: Didn't fix the root cause
</code></code></pre><p>Claude's solution was comprehensive, addressing multiple potential bottlenecks simultaneously. It applied every optimization pattern in its training data, achieving dramatic performance improvements through brute force.</p><h3>Oswaldo's human approach: Find the root cause</h3><p>Oswaldo took a different path entirely. Instead of following a probabilistic decision tree, he used <strong>judgment</strong>. He noticed something subtle in the symptoms that pointed to a specific bottleneck in the data fetching logic. Where Claude saw a universe of equally probable causes, Oswaldo's experience let him recognize a familiar pattern and zero in on the actual issue.</p><pre><code><code>// Oswaldo's targeted fix approach
function Posts() {
  // Root cause analysis: SWR hanging on malformed requests
  
  // Replace problematic SWR with direct control
  const [categories, setCategories] = useState(null);
  const [categoriesIsLoading, setCategoriesIsLoading] = useState(true);
  
  useEffect(() =&gt; {
    const fetchCategories = async () =&gt; {
      try {
        setCategoriesIsLoading(true);
        const response = await fetch(
          `${process.env.NEXT_PUBLIC_BLOG_URL}/wp-json/wp/v2/categories`
        );
        if (!response.ok) {
          throw new Error(`HTTP ${response.status}`);
        }
        const data = await response.json();
        setCategories(data);
        
        // Set featured category ID after successful fetch
        const featuredCategory = data.find(category =&gt; category.slug === "featured");
        if (featuredCategory) {
          setFeaturedCategoryId(featuredCategory.id);
        }
      } catch (error) {
        console.error("Failed to fetch categories:", error);
        setCategories(null);
      } finally {
        setCategoriesIsLoading(false);
      }
    };
    
    fetchCategories();
  }, []); // Fresh data on mount, no caching to hide issues
  
  // Keep SWR for posts but with correct parameters
  const { data: posts } = useSWR(
    featuredCategoryId 
      ? `${url}?categories_exclude=${featuredCategoryId}`
      : null, // Only fetch when featuredCategoryId is ready
    fetcher,
    { 
      revalidateOnFocus: false,
      dedupingInterval: 0, // No aggressive caching hiding bugs
    }
  );
}

// Result: Fixed 5-minute hang, immediate new post visibility
// Method: Identified and fixed the actual bug
</code></code></pre><p>The result? Oswaldo solved the problem with a handful of precise changes. Claude would have eventually gotten there too, but probably after writing twice as much code and exploring multiple dead ends.</p><p>This isn't a story about AI failure&#8212;it's about understanding what judgment means in engineering and why it matters.</p><h2>Judgment operates where probability struggles</h2><p>Recent research from cognitive science reveals something fascinating about how expert programmers think. <a href="https://www.scientificamerican.com/article/intuition-may-reveal-where-expertise-resides-in-the-brain/">The RIKEN Brain Science Institute found that experienced developers literally develop specialized neural circuits in the caudate nucleus</a>, creating what amounts to "special-purpose hardware" for rapid, unconscious pattern recognition. When Oswaldo looked at that content fetching issue, his brain wasn't calculating probabilities&#8212;it was firing specialized neurons that had been shaped by years of similar problems.</p><p>This biological reality explains why the METR study's findings shouldn't surprise us. <a href="https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/">When experienced open-source developers used frontier AI tools (Cursor Pro with Claude 3.5 Sonnet) on mature codebases, they were </a><strong><a href="https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/">19% slower</a></strong><a href="https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/"> than working without AI assistance</a>. Even more telling: developers thought they were 20% faster despite the measured slowdown. We're so enamored with the promise of AI that we're literally deceiving ourselves about its current effectiveness.</p><p>The difference comes down to mental models versus pattern matching. <a href="https://www.seangoedecke.com/debugging/">As Sean Goedecke notes, what distinguishes excellent programmers is "the accuracy and sophistication of the programmer's mental model."</a> Humans maintain three distinct types of mental models when coding: framework models (how Rails works), code models (how this specific function executes), and domain models (what the business actually needs). AI has none of these&#8212;just statistical correlations between text patterns.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!92Mt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F239c5d21-5e98-4ca8-839f-ad9efa21208e_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!92Mt!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F239c5d21-5e98-4ca8-839f-ad9efa21208e_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!92Mt!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F239c5d21-5e98-4ca8-839f-ad9efa21208e_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!92Mt!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F239c5d21-5e98-4ca8-839f-ad9efa21208e_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!92Mt!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F239c5d21-5e98-4ca8-839f-ad9efa21208e_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!92Mt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F239c5d21-5e98-4ca8-839f-ad9efa21208e_1024x1024.png" width="1024" height="1024" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/239c5d21-5e98-4ca8-839f-ad9efa21208e_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1828037,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/173479062?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F239c5d21-5e98-4ca8-839f-ad9efa21208e_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!92Mt!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F239c5d21-5e98-4ca8-839f-ad9efa21208e_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!92Mt!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F239c5d21-5e98-4ca8-839f-ad9efa21208e_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!92Mt!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F239c5d21-5e98-4ca8-839f-ad9efa21208e_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!92Mt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F239c5d21-5e98-4ca8-839f-ad9efa21208e_1024x1024.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>Debugging approaches compared</h3><pre><code><code>// AI Debugging Pattern: Shotgun approach
function debugPerformance() {
  const optimizations = [
    () =&gt; implementParallelFetching(),
    () =&gt; addAggressiveCaching(),
    () =&gt; memoizeExpensiveOperations(),
    () =&gt; addPrefetching(),
    () =&gt; optimizeNetworkRequests(),
    () =&gt; implementLoadingStates(),
    () =&gt; addErrorBoundaries(),
  ];
  
  // Apply all optimizations simultaneously
  optimizations.forEach(opt =&gt; opt());
  
  // Result: Better metrics, but root cause still exists
}

// Human Debugging Pattern: Surgical approach  
function debugPerformance() {
  // 1. Observe symptoms
  console.log("Network tab shows 5-minute hanging requests");
  
  // 2. Form hypothesis based on experience
  // "SWR configuration issues cause this pattern"
  
  // 3. Test hypothesis directly
  const suspiciousCall = useSWR(url, fetcher, config);
  
  // 4. Isolate and fix root cause
  if (config.missing || params.malformed) {
    return fixSWRConfiguration();
  }
  
  // Result: Problem solved at the source
}
</code></code></pre><h2>Where AI coding agents actually deliver value</h2><p>Before you think I'm advocating abandoning AI tools, let me be clear: AI coding assistants have transformed specific aspects of development, and denying their value would be equally misguided. The key is understanding where probabilistic approaches excel and where judgment remains irreplaceable.</p><p><strong>AI dominates in well-defined, constrained spaces.</strong> When Diffblue used AI to generate 4,750+ tests, they saved <strong>132 developer days</strong>. Why? Because unit test generation from specifications is largely pattern matching&#8212;exactly what AI does best. Similarly, <a href="https://resources.github.com/learn/pathways/copilot/essentials/measuring-the-impact-of-github-copilot/">GitHub Copilot users report 70% time savings on boilerplate code</a>, API client generation from OpenAPI specs, and documentation. These are tasks where "good enough" based on common patterns is actually good enough.</p><pre><code><code>// Where AI excels: Pattern-based code generation
function generateTestSuite(apiSpec) {
  // AI can reliably generate this from patterns:
  
  describe('User API', () =&gt; {
    it('should create user with valid data', async () =&gt; {
      const userData = { name: 'John', email: 'john@example.com' };
      const response = await createUser(userData);
      expect(response.status).toBe(201);
      expect(response.body).toMatchObject(userData);
    });
    
    it('should return 400 for invalid email', async () =&gt; {
      const userData = { name: 'John', email: 'invalid-email' };
      const response = await createUser(userData);
      expect(response.status).toBe(400);
    });
    
    // ... 47 more similar tests generated in minutes
  });
}

// Where AI struggles: Business logic requiring judgment
function calculatePricing(user, product, context) {
  // Requires understanding of:
  // - Business rules that aren't in training data
  // - Edge cases specific to this domain  
  // - Trade-offs between accuracy and performance
  // - Integration with existing pricing logic
  
  if (context.isBlackFriday &amp;&amp; user.hasLoyaltyCard) {
    // Should this stack with other discounts?
    // How does this affect our margins?
    // What about regional pricing differences?
    // AI can't make these judgment calls
  }
}
</code></code></pre><p>The real wins come from what I call the "30-minute rule"&#8212;tasks that would take a human roughly 30 minutes show the highest AI success rates. These are substantial enough to benefit from automation but simple enough that AI's lack of deep understanding doesn't matter. Generate CRUD endpoints? Perfect. Write migration scripts? Excellent. Design a distributed system's failure recovery strategy? Not so much.</p><p>Multi-agent environments show particular promise when properly constrained. <a href="https://www.multimodal.dev/post/useful-ai-agent-case-studies">JM Family's BAQA Genie system cut development cycles from weeks to days</a> by using specialized agents for requirements, coding, documentation, and QA. But notice the pattern: each agent handles a well-defined slice of the problem. When AI agents stick to their lanes, they can be remarkably effective.</p><h2>The architecture of future human-AI collaboration</h2><p>The path forward isn't AI versus human&#8212;it's designing workflows that leverage each side's strengths. <a href="https://www.anthropic.com/engineering/building-effective-agents">Anthropic's research on "workflows versus agents"</a> provides a crucial insight: predefined workflows with human checkpoints consistently outperform fully autonomous agents on complex tasks. Their evaluator-optimizer pattern, where AI generates and humans review in iterative loops, points toward a sustainable model for AI-assisted development.</p><p><strong>Effective Human-AI Collaboration Workflow:</strong></p><ol><li><p><strong>Human defines the architecture</strong> (judgment required) - Set technical direction and constraints</p></li><li><p><strong>AI generates implementation</strong> (pattern matching) - Write code following established patterns</p></li><li><p><strong>Human reviews and validates</strong> (judgment required) - Ensure business logic and edge cases are handled</p></li><li><p><strong>AI generates comprehensive tests</strong> (pattern matching) - Create test coverage based on code structure</p></li><li><p><strong>Human reviews test coverage</strong> (judgment required) - Verify tests actually validate important behaviors</p></li></ol><p><strong>Anti-pattern: Full autonomy without checkpoints</strong> Asking AI to handle the entire feature development process without human oversight often fails because AI lacks business context, domain knowledge, and judgment about trade-offs.</p><p>Think of it as the difference between hiring a junior developer and installing a power tool. You don't expect your nail gun to design the house, but you'd be foolish to frame it with a hammer. Similarly, AI excels at the mechanical aspects of coding&#8212;syntax, patterns, boilerplate&#8212;while humans handle the judgment calls about architecture, trade-offs, and business logic.</p><p>Test-driven development with AI represents this sweet spot perfectly. Humans define the test cases (judgment about what matters), AI writes the failing tests (pattern matching), humans review and commit (quality control), then AI implements the code (mechanical execution). Each party does what they do best.</p><h2>Prompting adjustments that close the gap</h2><p>If we accept that AI will often write more verbose, less elegant code than experienced developers, can we prompt our way to better results? The research suggests yes, but with caveats.</p><p>The most effective prompting strategies share a common thread: they inject human judgment into the AI's process. <a href="https://cookbook.openai.com/examples/gpt4-1_prompting_guide">OpenAI's Q&amp;A strategy works because it forces the AI to surface its assumptions for human validation</a>. Role-based prompting ("Act as a security engineer") provides the context that AI lacks naturally. Incremental refactoring with human checkpoints prevents the cascade of "almost right" solutions that plague autonomous generation.</p><pre><code><code>// Ineffective: Open-ended prompting
"Fix the performance issue in this code"

// Effective: Constrained prompting with context
`You are a senior React developer. This component fetches data from WordPress.

CONTEXT: The component should load in under 500ms but currently takes 7+ seconds.
SYMPTOMS: Network tab shows hanging requests to /wp-json/wp/v2/categories
CONSTRAINT: Keep existing functionality, only fix the performance issue

Review this code and identify the root cause:
[code here]

Before proposing a solution, ask yourself:
1. What specifically is causing the hanging request?
2. Is this a caching issue, API issue, or configuration issue?
3. What's the minimal change that fixes the root cause?`

// AI now has the context and constraints needed for targeted analysis
</code></code></pre><p>Here's what actually works: constrain the solution space, provide rich context, avoid open-ended tasks, and always maintain human review. When developers remember that AI is a powerful pattern matcher, not a thinking engineer, they can craft prompts that play to its strengths. Ask it to implement a specific algorithm, not to decide which algorithm to use. Have it write code that passes existing tests, not design the testing strategy.</p><p>My biggest prompting adjustments these days are trying to get AI to write less code and to review solutions more thoroughly. Until AI gets "smarter" at judgment, constraining the scope and requiring explicit validation steps works better than expecting autonomous brilliance.</p><h2>Economic reality check and industry implications</h2><p>The economic data presents a striking paradox. <a href="https://www.mckinsey.com/capabilities/quantumblack/our-insights/seizing-the-agentic-ai-advantage">McKinsey projects $6.1-7.9 trillion in economic benefits from generative AI</a>, yet our content fetching case study and the METR research show experienced developers getting slower with AI assistance. How do we reconcile this?</p><p>The answer lies in distribution and task selection. AI dramatically accelerates junior developers and helps seniors with routine tasks, but it can actively hinder experts working on complex problems. <a href="https://github.blog/news-insights/research/survey-ai-wave-grows/">GitHub's data shows 41% faster task completion for new developers but much smaller gains for experienced ones</a>. The revolution isn't in replacing senior engineers&#8212;it's in amplifying junior productivity and eliminating routine work.</p><p>This has profound implications for team composition. Companies that treat AI as a senior engineer replacement will struggle. Those that use it to give junior developers senior-level productivity on appropriate tasks will thrive. The optimal team might be fewer, more senior engineers focused on architecture and judgment calls, supported by AI handling implementation details and routine code generation.</p><pre><code><code>// Traditional team structure
const team = {
  seniors: 3,     // Architecture + complex features
  mids: 5,        // Feature implementation
  juniors: 4,     // Bug fixes + simple features
  productivity: 1.0
};

// AI-augmented team structure  
const aiTeam = {
  seniors: 3,     // Architecture + judgment calls (100% human)
  mids: 3,        // Review AI code + complex integration (-40% count, +50% productivity)
  juniors: 2,     // Partner with AI on implementation (-50% count, +200% productivity)
  ai: 'unlimited', // Pattern matching + boilerplate generation
  productivity: 1.8
};
</code></code></pre><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Gv9M!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F266c0869-a54a-47bb-ae34-569747b48e43_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Gv9M!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F266c0869-a54a-47bb-ae34-569747b48e43_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!Gv9M!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F266c0869-a54a-47bb-ae34-569747b48e43_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!Gv9M!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F266c0869-a54a-47bb-ae34-569747b48e43_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!Gv9M!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F266c0869-a54a-47bb-ae34-569747b48e43_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Gv9M!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F266c0869-a54a-47bb-ae34-569747b48e43_1024x1024.png" width="1024" height="1024" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/266c0869-a54a-47bb-ae34-569747b48e43_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2044412,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/173479062?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F266c0869-a54a-47bb-ae34-569747b48e43_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Gv9M!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F266c0869-a54a-47bb-ae34-569747b48e43_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!Gv9M!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F266c0869-a54a-47bb-ae34-569747b48e43_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!Gv9M!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F266c0869-a54a-47bb-ae34-569747b48e43_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!Gv9M!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F266c0869-a54a-47bb-ae34-569747b48e43_1024x1024.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>What this means for agentic coding's future</h2><p>Industry predictions suggest we'll achieve "superhuman coder" capabilities by 2027, with AI matching or exceeding human programmers on most tasks. I'm skeptical, not of AI's trajectory but of what "most tasks" means. If we define programming as syntax and pattern implementation, then yes, AI will dominate. If we define it as understanding business needs, making architectural trade-offs, and debugging complex systems, we're much further away.</p><p>Consider the fundamental challenge: every new business requirement, every edge case, every integration with legacy systems requires judgment that can't be pattern-matched from training data. The more successful AI becomes at routine coding tasks, the more valuable human judgment becomes for everything else.</p><p><strong>Tasks AI will likely master by 2027:</strong></p><ul><li><p>Generate boilerplate code and scaffolding</p></li><li><p>Write comprehensive unit tests from specifications</p></li><li><p>Implement standard CRUD operations</p></li><li><p>Generate documentation from code and comments</p></li><li><p>Refactor code using established patterns</p></li><li><p>Translate code between programming languages</p></li></ul><p><strong>Tasks requiring human judgment indefinitely:</strong></p><ul><li><p>Define business requirements and user needs</p></li><li><p>Make architectural trade-offs and technology choices</p></li><li><p>Debug novel failure modes and edge cases</p></li><li><p>Optimize for specific performance constraints</p></li><li><p>Integrate with legacy systems and undocumented APIs</p></li><li><p>Handle regulatory compliance and security requirements</p></li></ul><p><a href="https://altagic.com/artificial-intelligence/the-future-of-ai-what-to-expect-from-2025-to-2030/">Neurosymbolic AI and reasoning models like OpenAI's o3 represent genuine advances</a>, achieving Grandmaster-level performance on competitive programming. But these systems require millions of tokens per task, highlighting the massive computational inefficiency compared to human judgment. Evolution spent millions of years optimizing biological neural networks for efficiency. We're trying to replicate that with brute force computation.</p><h2>Practical recommendations for developers and teams</h2><p><strong>For individual developers navigating this landscape:</strong></p><p>Start with AI on low-stakes, well-defined tasks. Use it to learn new frameworks and APIs where its pattern matching helps you recognize common idioms. Leverage it for test generation and documentation where "good enough" truly is good enough. But maintain your judgment skills&#8212;they're your competitive advantage. The developers who thrive will be those who can orchestrate AI tools while maintaining the ability to dive deep when judgment is required.</p><p>Practice prompt engineering, but don't expect prompting to solve fundamental limitations. No prompt will give AI the judgment to know when a content fetching issue stems from a specific SWR configuration versus a network timeout. That recognition comes from experience encoded in neural pathways, not statistical patterns in training data.</p><p><strong>For engineering teams implementing AI coding tools:</strong></p><p>Adopt <a href="https://resources.github.com/learn/pathways/copilot/essentials/measuring-the-impact-of-github-copilot/">GitHub's four-stage framework: evaluate with volunteers, scale gradually, optimize workflows, then sustain improvements</a>. But customize it for your context. Mature codebases with implicit requirements need different strategies than greenfield projects. The METR study's 19% slowdown happened on 10+ year old codebases with millions of lines of code&#8212;your mileage will vary.</p><p>Invest heavily in documentation and testing&#8212;these become the interfaces through which AI understands your system. Clear coding standards help AI generate consistent code. Comprehensive tests let you safely experiment with AI-generated changes. Think of it as creating an operating manual for your AI assistant.</p><p>Most critically, measure actual productivity, not perceived benefits. The gap between developer perception and reality in the METR study should alarm every engineering manager. Track not just speed but also bug rates, code review time, and long-term maintenance costs. AI that generates code 50% faster but requires 100% more debugging time is a net negative.</p><pre><code><code>// Effective AI integration metrics
const metrics = {
  // Traditional metrics (insufficient alone)
  codeVelocity: '+40%',
  ticketsCompleted: '+25%',
  
  // Essential quality metrics
  bugEscapeRate: '-15%', // Good: AI catches simple bugs
  codeReviewTime: '+30%', // Concerning: More review needed
  technicalDebt: '+10%', // Warning: Monitor closely
  
  // Business impact metrics
  timeToMarket: '+20%',
  customerSatisfaction: 'stable',
  maintenanceCost: '+5%' // Acceptable trade-off
};
</code></code></pre><h2>The bottom line on where we are and where we're going</h2><p>We're at an inflection point where AI coding tools are simultaneously overhyped and underutilized. They're overhyped as replacements for engineering judgment and underutilized as amplifiers of human capability. The content fetching case study isn't an argument against AI&#8212;it's an argument for understanding what makes human engineers valuable and using AI to handle everything else.</p><p>The current state of agentic coding is powerful pattern matching without judgment. AI will solve your problems, but often inelegantly, verbosely, after exploring paths that didn't need exploring. For tasks where patterns work&#8212;much of programming's mechanical work&#8212;this changes everything. For tasks needing judgment, context, and elegant solutions, experienced engineers aren't going anywhere.</p><p>The engineers who will thrive in this new landscape are those who understand this distinction viscerally. They'll use AI to handle the probabilistic pattern matching while reserving their judgment for the decisions that matter. They'll write less code but make more architectural decisions. They'll debug less syntax but spend more time on system design.</p><p>This isn't simply a temporary state awaiting better AI. Even as AI capabilities advance toward the predicted "superhuman coder" of 2027, the gap between pattern matching and judgment will persist. The question isn't whether AI will replace programmers but how the profession will evolve when the mechanical aspects are automated. Based on the content fetching tale and thousands of similar stories playing out across the industry, that evolution will be messier, more gradual, and ultimately more human than the AI evangelists predict.</p><p>The future of software development isn't AI or human&#8212;it's AI and human, each doing what they do best. Understanding that distinction, and designing our tools and workflows around it, is the real challenge of agentic coding. Those who get it right will write less code but build better systems. Those who don't will generate twice as much code to solve problems that judgment could have handled in a few precise lines.</p><div><hr></div><h2>Related Reading</h2><p><strong>On AI Tool Evolution &amp; Market Reality:</strong></p><ul><li><p><a href="https://hyperdev.substack.com/p/the-other-shoe-will-drop">The Other Shoe Will Drop</a> - Why current AI pricing economics are unsustainable</p></li><li><p><a href="https://hyperdev.substack.com/p/around-the-horn-ai-coding-tools-reality">Around the Horn: AI Coding Tools Reality Check</a> - Recent market developments and tool performance</p></li></ul><p><strong>On Debugging &amp; Tool Reliability:</strong></p><ul><li><p><a href="https://hyperdev.substack.com/p/the-ghost-in-the-machine-non-deterministic">The Ghost in the Machine: Non-Deterministic Debugging</a> - When AI tools behave unpredictably</p></li><li><p><a href="https://hyperdev.substack.com/p/whats-in-my-toolkit-august-2025">What's In My Toolkit - August 2025</a> - Current tool setup and why specific choices matter</p></li></ul><p><strong>On AI Orchestration vs Direct Usage:</strong></p><ul><li><p><a href="https://hyperdev.substack.com/p/i-hope-never-to-use-claude-code-again">I Hope Never To Use Claude Code Again</a> - Moving from individual AI assistants to orchestrated teams</p></li><li><p><a href="https://hyperdev.substack.com/p/multi-agent-ai-orchestration-in-practice">Multi-agent AI Orchestration in Practice</a> - Real-world experiences with coordinated AI development</p></li></ul>]]></content:encoded></item><item><title><![CDATA[Pregnant Bugs]]></title><description><![CDATA[Why Agentic Development Shines in Re-platforming]]></description><link>https://hyperdev.matsuoka.com/p/pregnant-bugs</link><guid isPermaLink="false">https://hyperdev.matsuoka.com/p/pregnant-bugs</guid><dc:creator><![CDATA[Robert Matsuoka]]></dc:creator><pubDate>Fri, 12 Sep 2025 14:02:56 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!HWcY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96589613-4794-4b80-81d5-63d478c52ff7_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!HWcY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96589613-4794-4b80-81d5-63d478c52ff7_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!HWcY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96589613-4794-4b80-81d5-63d478c52ff7_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!HWcY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96589613-4794-4b80-81d5-63d478c52ff7_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!HWcY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96589613-4794-4b80-81d5-63d478c52ff7_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!HWcY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96589613-4794-4b80-81d5-63d478c52ff7_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!HWcY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96589613-4794-4b80-81d5-63d478c52ff7_1024x1024.png" width="1024" height="1024" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/96589613-4794-4b80-81d5-63d478c52ff7_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1951797,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/173278994?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96589613-4794-4b80-81d5-63d478c52ff7_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!HWcY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96589613-4794-4b80-81d5-63d478c52ff7_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!HWcY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96589613-4794-4b80-81d5-63d478c52ff7_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!HWcY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96589613-4794-4b80-81d5-63d478c52ff7_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!HWcY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96589613-4794-4b80-81d5-63d478c52ff7_1024x1024.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I'm taking a break from my usual technical deep-dives to give you some light Friday reading and highlight a gem of a term coined by Josue Gandarilla, a QA developer working on Recess&#8212;the parent activity marketplace I'm leading tech for. According to Josue, a "pregnant bug" is a bug that gives birth to other bugs. Picture this: you think you've squashed one issue cleanly, only to watch as baby bugs scuttle out into the darkness, multiplying your problems exponentially.</p><p>While everyone should have a Josue keeping a watchful eye on their features (seriously, find yourself a meticulous QA person), the real point of this post isn't about human bug hunting per se. It's about reinforcing why agentic development absolutely shines when you're re-platforming.</p><h2>The Re-platforming Advantage</h2><p>Unlike new feature work&#8212;where you're often building into ambiguity, iterating on uncertain requirements, and navigating the fog of "what should this actually do?"&#8212;re-platforming gives you something invaluable: a rock-solid specification to work from. You have existing APIs with well-defined contracts, established type schemas, proven frontend components, and battle-tested user flows. These are precisely the elements that AI excels at modeling and implementing.</p><p>Using this natural advantage, we re-platformed <a href="https://hello-recess.com/Austin">Recess</a> in August during what should have been a months-long sprint. The existing system served as our perfect blueprint&#8212;every endpoint documented through usage, every data transformation proven in production, every edge case already discovered and handled.</p><h2>Optimizing for Velocity</h2><p>We're not stopping there. We're currently optimizing our stack by moving to Neon for its superior developer experience and feature-aligned branching capabilities. When this migration is complete, we expect to have a much more agile platform that will significantly increase our development velocity. And watching that acceleration will require more than just Josue keeping guard&#8212;though he'll certainly be busy.</p><h2>The Testing Multiplier Effect</h2><p>Here's where agentic coding becomes truly powerful: testing. Unit tests, regression tests, end-to-end tests, smoke tests, canary deployments&#8212;all the tools you need to accelerate development safely while avoiding those pregnant bugs altogether.</p><p>One of the first things I look for when evaluating codebases for founders is test coverage. Perhaps not surprisingly, it's one of the things development teams NOT led by an internal engineering leader consistently ignore. And it's not just contractors&#8212;the original dev team for <a href="https://hello-recess.com/Austin">Recess</a> had zero test coverage when I arrived.</p><p>Let me be clear: I'm not a <a href="https://en.wikipedia.org/wiki/Test-driven_development">TDD</a> zealot. There are very good reasons why you might add test coverage AFTER building, especially if you're developing in an agile fashion where requirements are fluid. This is very different from writing to external requirements where the spec is locked down. But once you've nailed down functionality and data models&#8212;even while still in flight&#8212;write the tests. It's non-negotiable.</p><p>This becomes particularly crucial with agentic development, where LLMs absolutely love to write code (sometimes too much code). Test coverage isn't just nice to have&#8212;it's your safety net when AI is generating implementations at superhuman speed.</p><p>When AI can generate comprehensive test suites based on your existing codebase and API contracts, you're not just moving faster; you're moving more safely. We're targeting 90%+ test coverage as we ramp up our velocity, because with great speed comes great responsibility to not break things.</p><p>The beauty is that tests are perfect AI work: they're highly structured, follow predictable patterns, and have clear success criteria. Feed an AI your API schema and a few examples, and it can generate exhaustive test cases covering happy paths, edge cases, and error conditions you might not have even considered.</p><h2>The Call to Action</h2><p>If you're facing any re-platforming work, I encourage you to fire up Claude Code, Augment Code, or Cursor and experience this for yourself. Don't approach it like you're building something net-new. Instead, treat your existing system as the detailed specification it actually is, and let AI do what it does best: faithfully implementing well-defined requirements at superhuman speed.</p><p>Your future self&#8212;and your QA team&#8212;will thank you for the robust, well-tested platform you'll end up with. Plus, you'll have far fewer pregnant bugs scuttling around in the shadows.</p><div><hr></div><p><em>Originally published on hyperdev.matsuoka.com</em></p>]]></content:encoded></item></channel></rss>