<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Hyperdev]]></title><description><![CDATA[HyperDev is a technical publication exploring practical agentic AI development and AI-powered coding tools. As a veteran technology executive with 25+ years of experience, I provide honest, hands-on reviews and strategic insights about which AI coding too]]></description><link>https://hyperdev.matsuoka.com</link><image><url>https://substackcdn.com/image/fetch/$s_!j9a7!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab665959-5546-4469-9e93-9e1518976e2b_1024x1024.png</url><title>Hyperdev</title><link>https://hyperdev.matsuoka.com</link></image><generator>Substack</generator><lastBuildDate>Fri, 19 Jun 2026 11:37:39 GMT</lastBuildDate><atom:link href="https://hyperdev.matsuoka.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Robert Matsuoka]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[hyperdev@matsuoka.com]]></webMaster><itunes:owner><itunes:email><![CDATA[hyperdev@matsuoka.com]]></itunes:email><itunes:name><![CDATA[Robert Matsuoka]]></itunes:name></itunes:owner><itunes:author><![CDATA[Robert Matsuoka]]></itunes:author><googleplay:owner><![CDATA[hyperdev@matsuoka.com]]></googleplay:owner><googleplay:email><![CDATA[hyperdev@matsuoka.com]]></googleplay:email><googleplay:author><![CDATA[Robert Matsuoka]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[We’ve Turned A Corner]]></title><description><![CDATA[The anatomy of a 14-hour harness session &#8212; what a fully instrumented Claude Code orchestration run actually did, turn by turn, with a receipt for every move]]></description><link>https://hyperdev.matsuoka.com/p/weve-turned-a-corner</link><guid isPermaLink="false">https://hyperdev.matsuoka.com/p/weve-turned-a-corner</guid><dc:creator><![CDATA[Robert Matsuoka]]></dc:creator><pubDate>Wed, 17 Jun 2026 12:31:13 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!GlSw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7631fb7-806b-4cff-8330-0527d14e6d90_3200x2400.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://matsuoka.com/hyperdev/we-turned-a-corner/timeline.html" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GlSw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7631fb7-806b-4cff-8330-0527d14e6d90_3200x2400.png 424w, https://substackcdn.com/image/fetch/$s_!GlSw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7631fb7-806b-4cff-8330-0527d14e6d90_3200x2400.png 848w, https://substackcdn.com/image/fetch/$s_!GlSw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7631fb7-806b-4cff-8330-0527d14e6d90_3200x2400.png 1272w, https://substackcdn.com/image/fetch/$s_!GlSw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7631fb7-806b-4cff-8330-0527d14e6d90_3200x2400.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GlSw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7631fb7-806b-4cff-8330-0527d14e6d90_3200x2400.png" width="1456" height="1092" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d7631fb7-806b-4cff-8330-0527d14e6d90_3200x2400.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1092,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:502984,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://matsuoka.com/hyperdev/we-turned-a-corner/timeline.html&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/202389566?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7631fb7-806b-4cff-8330-0527d14e6d90_3200x2400.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!GlSw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7631fb7-806b-4cff-8330-0527d14e6d90_3200x2400.png 424w, https://substackcdn.com/image/fetch/$s_!GlSw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7631fb7-806b-4cff-8330-0527d14e6d90_3200x2400.png 848w, https://substackcdn.com/image/fetch/$s_!GlSw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7631fb7-806b-4cff-8330-0527d14e6d90_3200x2400.png 1272w, https://substackcdn.com/image/fetch/$s_!GlSw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7631fb7-806b-4cff-8330-0527d14e6d90_3200x2400.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I watched a Claude Code session run a multi-workstream engineering job last week, and I caught myself doing something I do not usually do with these tools: I stopped intervening. The orchestrator &#8212; the PM layer, running claude-opus-4-8 &#8212; handled the kind of coordination work I would expect from a team lead who has been on the project a year, decomposing it, predicting conflicts, routing to the right specialist. Not a faster coder. A technical lead.</p><p>This time the session was recorded turn by turn, with per-token cost telemetry on every move. So these are not impressions. Below are the specific behaviors I saw, described as precisely as the instrumentation allows, each one carrying its timestamp, the orchestrator&#8217;s own words, and what it cost. The conclusion I&#8217;ll leave to you.</p><p>The job ran on <a href="https://github.com/bobmatnyc/trusty-tools">trusty-tools</a>, a Rust workspace, on 2026-06-10. One human (me), one PM coordinator, a fleet of subagents, and a memory layer plus code search and an adversarial PR reviewer wired in over MCP. It ran for 14 hours and 37 minutes. I authored nine content prompts in that window. The longest was 84 characters.</p><h2>TL;DR</h2><ul><li><p>A single Claude Code + claude-mpm session ran 14h 37m on a real Rust codebase and cost <strong>$485.96</strong> at rack rate &#8212; <strong>$264.65</strong> for the PM (claude-opus-4-8, 265 turns) and <strong>$221.30</strong> for the subagents (sonnet at 4,626 turns, haiku at 605). It made <strong>63 delegations</strong>. On a turn basis, <strong>93% of the work was autonomous</strong>; my share was 7%.</p></li><li><p>I&#8217;m on Anthropic&#8217;s Max plan &#8212; $200/month flat. This session ran inside that subscription. The rack-rate figure is the right way to price the economic value of the work; the actual cost to me was zero marginal dollars.</p></li><li><p>That 7% was nine human-authored prompts in 14.5 hours. Several were one word &#8212; <code>proceed</code>, <code>let's do the top candidates</code>. The longest ran 84 characters. The orchestrator supplied the structure; I supplied the direction changes.</p></li><li><p>The economics work because of cache. The sonnet agents read <strong>3,531 cached tokens for every fresh input token</strong> (482M cache-read against 137K fresh). The PM ran at 587&#215;, haiku at 34,868&#215;. Most of the context window each turn is priced as cache, not fresh input.</p></li><li><p>My idle time shows up on the bill. The most expensive PM turns are cache-cold context rebuilds after I left a long gap: <strong>$7.26</strong> to resume after <code>proceed</code> (1.5 hours of silence), <strong>$5.82</strong> to pick back up after a 4.5-hour wait.</p></li><li><p>The orchestrator ran adversarial reviews on its own engineers&#8217; work <strong>13 times, unprompted</strong> &#8212; &#8220;Per my verification ownership I won&#8217;t take that at face value&#8221; &#8212; and bounced a starred approval back for more work. It absorbed two infrastructure failures without escalating them to me. None of these behaviors is impressive alone. A competent IC does them all before lunch. They appeared together, in one session, with a cost attached to each.</p></li></ul><p>You can <a href="https://matsuoka.com/hyperdev/we-turned-a-corner/timeline.html">explore the full annotated session timeline</a> &#8212; a turn-by-turn companion infographic showing the user-visible output alongside the underlying mechanism for each move.</p><h2>A note on what this is and isn&#8217;t</h2><p>I am not claiming the model understands anything, and I am not grading it on a benchmark. I am describing observed behavior from one session, the way you would describe an animal in the field: here is what it did, here is the order it did it in, here is what it cost, draw your own conclusions. Twelve behaviors stood out. They are below, roughly in the order they appeared.</p><h2>The Stack in November</h2><p>This session would not have run the same way seven months ago, not even close. Three layers changed in that window, and they changed together.</p><p>Start with the model. The session above ran on claude-opus-4-8, released May 28, 2026. Seven months earlier the comparable model was Opus 4.5, which shipped November 24, 2025. Better code is the obvious place to look for the shift, but the one that mattered here sits elsewhere: 4.8 is substantially less likely to let a flaw in its own work pass unremarked &#8212; it surfaces its own errors instead of quietly shipping them. That is the difference between a contractor who does their best and one who tells you when he thinks something is wrong. The 1M context window also moved from beta on 4.5 to generally available on 4.8, which matters across a 14-hour run, and in Claude Code 4.8 defaults to <code>xhigh</code> effort &#8212; the cost figures above reflect that setting.</p><p>Then the Claude Code harness around it. In November 2025, Claude Code was effectively a single-session tool; background agents existed, but worktree isolation did not. WorkTree support is the critical unlock. Each subagent in this session ran in its own isolated git worktree &#8212; its own branch, its own working tree, no chance of stepping on a parallel agent&#8217;s files mid-run. The merge-conflict prediction in behavior #6 is only tractable because the agents writing code cannot collide on disk while they work. Parallel execution without merge confusion needed a harness feature that wasn&#8217;t there last November. Added in the same window: 5-level nested subagent hierarchies, <code>claude agents</code> for session-wide visibility, and Dynamic Workflows for the PM coordination layer.</p><p>The third layer is the orchestration framework. claude-mpm went from v4.26 to v6.5.44 over those seven months &#8212; two major versions, roughly 450 releases. Two changes carry most of the behavioral weight. trusty-memory became mandatory session context, so the PM reads project history before it does anything else, which is behavior #1 above. And worktree-first became the framework default, which is what enables behaviors #3, #6, and the parallel fan-outs in #9. The bench is deeper too: 57 specialized agents now versus about 30 in November, so the routing in behavior #9 has more specialists to route to.</p><p>None of these three improved in isolation. The model&#8217;s self-auditing is only useful if the harness can surface those audits inside a multi-agent pipeline. The worktree isolation is only useful if the orchestration framework knows to reach for it by default. The memory priming is only useful if the model is good enough to act on what it finds. When all three shift in the same six-month window, the effect is not additive.</p><p>Strip away the version numbers and the feature lists, and the functional picture is narrower than it looks. The evolution since November clusters around three areas. Context: the 1M window is generally available now, and trusty-memory loads project history before the PM does anything. Recall: the PM enters each session knowing what happened before, with a deeper bench of specialists to route to. Error checking: the model flags its own flaws, and the orchestration layer runs adversarial review unprompted. None of the twelve behaviors below trace to the model writing better code. They trace to the system getting better at knowing what it knows, remembering what it has done, and catching what it gets wrong.</p><h2>Session Vitals</h2><p>Before the behaviors, the stat block. This is the anatomy laid flat.</p><p>Metric Value Total cost (rack rate) $485.96 PM cost (claude-opus-4-8, 265 turns) $264.65 Subagent cost (sonnet 4,626 turns + haiku 605 turns) $221.30 Delegations (agent calls) 63 Skill + MCP tool calls 18 Wall-clock duration 14h 37m Human share (turn basis) 7% Autonomous share (turn basis) 93% Human-authored content prompts 9 Longest human prompt 84 characters</p><p>The cache numbers sit underneath all of it. The sonnet agents pulled 482,137,941 cache-read tokens against 136,569 fresh input tokens. The PM pulled 70,287,449 cache-read against 119,632 fresh. Those two figures are the reason a 14-hour run costs what a junior contractor costs for an afternoon. I come back to them below.</p><h2>1. Context First</h2><p>The first action after my prompt was not a plan and not code. It was three memory and context calls in a row. The first one failed: <code>memory_recall: missing 'palace' (no --palace default configured)</code>. The PM did not abort or report the error to me. It adjusted the call &#8212; added the palace parameter &#8212; retried, and got a stored finding from earlier work.</p><p>Then it read <code>Cargo.toml</code> and <code>README.md</code> before planning anything. In its own words at 21:48: <em>&#8220;I&#8217;ll start by orienting myself. Let me query project memory and check what &#8216;core services&#8217; and &#8216;console&#8217; refer to, since neither maps obviously to a crate in this workspace.&#8221;</em> One minute later it had resolved both terms and declared the scope &#8212; the equivalent of reading the ticket history before touching the keyboard.</p><h2>2. Plan Before Acting</h2><p>Rather than starting, it named the parallel streams and stated why they could run independently before dispatching anything: <em>&#8220;These are independent, so I&#8217;ll dispatch them in parallel&#8221;</em> (21:49). Structure stated before action taken.</p><p>The pattern held under pressure. When I said <code>let's do the top candidates</code> at 22:03, it refused to treat that as a green light: <em>&#8220;&#8217;The top candidates&#8217; spans everything from a critical bug cluster to a 5,400-line refactor to design/ADR work &#8212; very different sizes and risk. Before I fan out agents (and tokens), let me confirm how aggressively to go.&#8221;</em></p><h2>3. Agents, Narrated</h2><p>Two agents went out in a single PM turn that cost $0.34 &#8212; one to ticketing to pull the open GH issues for the core services, one to local-ops to investigate and deploy the console. Both came back inside two minutes. Each got a defined, bounded job, and the PM said what each was for before sending it. Later waves followed the same shape: at 23:27 three agents launched in parallel for Wave 1, redeploying trusty-search, splitting a 5,421-line file, and drafting an architecture decision record.</p><h2>4. Verify Before Committing</h2><p>When I said <code>let's do the top candidates</code>, the PM stopped and asked rather than fanning out. It called <code>AskUserQuestion</code> and waited 19 minutes for my answer before dispatching a single engineer.</p><p>The same instinct showed up against its own team&#8217;s reports. After the rust-engineer reported PR #1097 done with tests, clippy, and fmt all green, the PM declined to take it: <em>&#8220;Per my verification ownership I won&#8217;t take that at face value &#8212; especially since #1088 and #1089 were collapsed into one commit, and #1089&#8217;s core complaint ... may be only partially addressed.&#8221;</em> It ran an adversarial PR review and a CI check in parallel before accepting the work. The verification was budgeted into the run.</p><h2>5. Name What You Don&#8217;t Know</h2><p>On the architecture decisions baked into ADR-0010, the PM drew the line cleanly: <em>&#8220;it&#8217;s an architecture decision so I&#8217;ll draft it for your sign-off rather than implement blind.&#8221;</em> It surfaced four open questions, each labeled blocking or non-blocking, and answered none of them unilaterally.</p><p>It applied the same judgment to a failure. When a memory write was blocked because the trusty-memory daemon held the palace write lock, the PM diagnosed the cause and made a call: <em>&#8220;Memory write is blocked ... not worth a detour; the work is durably captured in the merged squash commit, closed issues, and commit messages.&#8221;</em> It marked the gap, decided closing it wasn&#8217;t worth the interruption, and said so.</p><h2>6. Conflict Prediction</h2><p>Before any Wave 1 agent was dispatched, the PM named the collision: <em>&#8220;#1096 and #607 both edit </em><code>.line-cap-allowlist.tsv</code><em> (splitting an allowlisted file requires removing/lowering its entry), so running them in parallel guarantees a merge conflict on that file.&#8221;</em> It named the file, the two items that would collide, the mechanism, and the mitigation &#8212; sequential waves &#8212; in the same breath. The same pattern recurred in Wave 2, where it held #607 until #993 landed for the identical reason.</p><h2>7. Unprompted Checkpoint</h2><p>Unprompted, at 02:02, the PM produced a formatted two-section status checkpoint: a &#8220;Shipped to main&#8221; table listing each merged PR with its commit hash and a verification note, and an &#8220;In flight / staged&#8221; list with the state of each item still moving. Self-organized visibility, not a dashboard anyone designed for it.</p><h2>8. Risk-Stratified Options</h2><p>For the ADR-0010 decisions, the PM laid out four questions, each with named options and the cost of each. On the unknown-tag handling: Option P (permissive &#8212; risk: typo&#8217;d tags survive silently), Option L (allowlist &#8212; more control, requires per-index config), Option H (hybrid, the one it proposed). It recommended where it had a view and left the choice with me. At 03:14 it paused and asked what to work on next rather than self-selecting &#8212; which is where the 4.5-hour gap in the log comes from. It was waiting on me.</p><h2>9. Specialist Routing</h2><p>Routing stayed consistent by agent type. Issue reads and epic filing went to ticketing, daemon deploys to local-ops, implementation to rust-engineer. CI polling and merges were version-control&#8217;s; runtime QA went to api-qa, architecture feasibility to research. Across all 63 delegations, the PM did not hand an implementation task to ticketing or a CI task to the engineer.</p><h2>10. Plans Sharpen with Information</h2><p>When recon came back, the plan went from vague to specific. After the first two agents returned, the summary carried exact PR numbers, exact crate versions (trusty-search 0.24.4, trusty-memory 0.15.2, trusty-analyze 0.7.0), exact ports, and one specific stray process to clean up &#8212; none of which existed in my prompt.</p><p>After the issue specs arrived, &#8220;fix the top candidates&#8221; became a coordinated analysis: <em>&#8220;these five cluster around one shared concern &#8212; the </em><code>indexes.toml</code><em> persistence / colocated / warm-boot scan paths &#8212; and #1088/#1089/#1090 in particular interlock through the config-write path, so a coordinated fix is safer than five isolated ones.&#8221;</em> The prompt set a direction; the detail came from what the agents found.</p><h2>11. Self-Maintained Cross-References</h2><p>Throughout, the PM tracked issue numbers, file paths, commit hashes, and the relationships between them. It noticed that #1088 and #1089 had been collapsed into one commit and independently checked whether that collapse was safe &#8212; a cross-reference that was in no agent&#8217;s report, only in the PM&#8217;s own check against the original spec.</p><p>It also caught something I&#8217;d have missed: an external contributor, <code>maui314159</code>, had a live PR (#1082) covering a dependency that the #819 work needed. The PM surfaced it as a coordination point rather than duplicating or ignoring it. Later it drew the boundary precisely: <em>&#8220;#819 isn&#8217;t blocked by conflict &#8212; it&#8217;s gated on accepting a contributor&#8217;s architecture decision, which is your call, not something I&#8217;ll auto-merge.&#8221;</em> Then it reviewed the external PR in full before going further.</p><h2>12. Clean Handoff</h2><p>The session ended on a handoff, not on more output. After filing the final epic (#1119), the PM declared the scope complete, named the one item still in flight, and gave the resume mechanism: <em>&#8220;Session stays paused (</em><code>session-20260611-161949</code><em>). That clears everything you asked for in this window. The only thing still in flight is #819 (KG ingest endpoint, building in </em><code>kg-ingest</code><em>) &#8212; I&#8217;ll relay its PR when it lands but won&#8217;t start anything new. Resume anytime with </em><code>/mpm-session-resume</code><em>.&#8221;</em> I issued <code>/exit</code> 18 minutes later.</p><p>This is the strong version of stopping. The PM did not keep going to look busy, and it did not stop arbitrarily. It identified the boundary &#8212; what was done, what was still moving, where the human picks back up &#8212; and stopped there.</p><h2>What the Instrumentation Adds</h2><p>The twelve behaviors are what you&#8217;d see watching over the shoulder. The telemetry adds three things you can&#8217;t see that way.</p><p><strong>The 7% is even smaller than it sounds.</strong> Nine human-authored prompts in 14.5 hours. Three were a word or a phrase &#8212; <code>proceed</code>, <code>let's do the top candidates</code>, <code>in console running?</code>. The longest content prompt I wrote all session was 84 characters: <code>it doessnt show the individual consoles, theses should be tabs - and it should be am spa</code> &#8212; typos and all. The orchestrator did not need a spec. It needed a direction and the occasional course correction.</p><p><strong>The economics are a cache story.</strong> The sonnet agents read 3,531 cached tokens for every fresh input token. The PM ran at 587&#215;; haiku, doing high-volume narrow tasks, hit 34,868&#215;. Most of each turn&#8217;s context is pulled from cache at cache prices, not re-sent as fresh input. A 14-hour run that re-paid full freight for context on every turn would cost a multiple of $486. The caching is why an orchestration this long is economically viable at all. A note on what I actually paid: I&#8217;m on Anthropic&#8217;s Max plan at $200/month. The $485.96 above is rack rate &#8212; the right figure for understanding the economic value of the output. My marginal cost for this session was zero.</p><p><strong>Waiting costs money, and you can see exactly where.</strong> The most expensive PM turns share one trait: <code>cache_read=0</code> &#8212; the turns where context had to be rebuilt cold after a cache miss, each one corresponding to a long gap I left. The $7.26 turn, the single priciest of the session, came right after I typed <code>proceed</code>, following 1.5 hours of silence; it wrote 364,618 tokens of cold context to restart the pipeline. The $5.82 turn came after my 4.5-hour wait. The orchestrator picked up where it left off both times, with full context. It just had to pay to reconstruct it. In a system like this, my latency has a line item.</p><p>And one finding the behaviors undersell: the failures. The first memory call errored and the PM fixed and retried it. The memory write later failed against a held lock, and the PM diagnosed the daemon contention and moved on. A subagent dropped its connection mid-run on an infra hiccup, and the PM recorded the socket error and later resumed the job to finish it. Three failures, three absorptions, zero escalations to me. That is the part that reads most like a year-one IC: not that nothing broke, but that what broke got handled below my line of sight.</p><h2>So What?</h2><p>Four threads run under all of this.</p><p>The first is where the line now sits between supervision and delegation. For most of these tools, the answer has been &#8220;supervise closely&#8221; &#8212; read every diff, catch every drift. This session moved the line. I supervised at the level of direction and architectural calls; I delegated everything from PR review to conflict avoidance to failure recovery. The PM did its own adversarial review 13 times and bounced a starred approval. Verification did not disappear; it moved inside the loop, and it left a receipt.</p><p>The second is what changes when the coordination work &#8212; not the typing &#8212; is the part the tool does well. The code these systems write stopped being the interesting question a while ago. The interesting development is that the orchestration layer now decomposes work, predicts conflicts, routes to specialists, tracks provenance, and knows when to hand back. That is project management. When the typing is cheap and the coordination is the hard part, a tool that coordinates well is worth more than a tool that types fast.</p><p>The third is the one underneath everything else. No single behavior above is impressive on its own. A competent IC reads the ticket history, names the parallel streams, predicts a merge conflict, routes to the right person, and stops at the decision boundary &#8212; without being told. What changed is that they now appear together, unprompted, in one session &#8212; and this time there&#8217;s a receipt for every one of them. The dollar figure is not the headline. The instrumentation is. We can finally watch the whole thing work and count what it cost, behavior by behavior.</p><p>The fourth is about where to point the question. For a while the useful question was &#8220;what can the model do.&#8221; After this session I think the better question is &#8220;what can the trifecta do,&#8221; because none of the behaviors above trace cleanly to a single component. They come out of the interaction: a model that audits its own work, a harness that isolates parallel work in separate worktrees, and an orchestration layer that routes to the right specialist with project memory already loaded. Pull any one of the three and the session degrades &#8212; the self-audit goes nowhere without a pipeline to surface it, the parallel waves collide without isolation, the routing misfires without memory. The capability lives in the seams between the parts, not in any one of them.</p><p>One last note on the economics. The Max plan is $200 a month. For that, I ran a session that bills at $486 rack rate, and the output is good enough that going back to a metered model is hard to imagine. Dealers give the first one away for a reason. The Max plan is that first bag: the work is compelling enough to be structurally addictive, and the flat rate strips out the per-token friction that might otherwise make you stop and think. I don&#8217;t mean that as a complaint. It&#8217;s a description of how the pricing works on the user.</p><div><hr></div><p><em>Bob Matsuoka is CTO of <a href="https://www.duettocloud.com/">Duetto</a> and writes about AI-powered engineering at <a href="https://hyperdev.substack.com/">HyperDev</a>.</em></p><p><strong>Related reading:</strong></p><ul><li><p><a href="https://matsuoka.com/hyperdev/we-turned-a-corner/timeline.html">Explore the annotated session timeline</a> &#8212; the turn-by-turn companion infographic for this session</p></li><li><p><a href="https://aipowerranking.com/">AI Power Ranking</a> &#8212; Tool comparisons and benchmarks for AI practitioners</p></li><li><p><a href="https://www.linkedin.com/newsletters/ai-power-ranking-7345782916301418496/">LinkedIn Newsletter</a> &#8212; Strategic AI insights for CTOs and engineering leaders</p></li></ul>]]></content:encoded></item><item><title><![CDATA[The Math Doesn't Work (Yet): Inside the AI Profitability Problem]]></title><description><![CDATA[Why Scaling Doesn't Lead To Profitability]]></description><link>https://hyperdev.matsuoka.com/p/the-math-doesnt-work-yet-inside-the</link><guid isPermaLink="false">https://hyperdev.matsuoka.com/p/the-math-doesnt-work-yet-inside-the</guid><dc:creator><![CDATA[Robert Matsuoka]]></dc:creator><pubDate>Wed, 10 Jun 2026 11:31:58 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!XBpX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61162dd8-5fc3-4493-856f-338f6a569f95_1195x896.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XBpX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61162dd8-5fc3-4493-856f-338f6a569f95_1195x896.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XBpX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61162dd8-5fc3-4493-856f-338f6a569f95_1195x896.png 424w, https://substackcdn.com/image/fetch/$s_!XBpX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61162dd8-5fc3-4493-856f-338f6a569f95_1195x896.png 848w, https://substackcdn.com/image/fetch/$s_!XBpX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61162dd8-5fc3-4493-856f-338f6a569f95_1195x896.png 1272w, https://substackcdn.com/image/fetch/$s_!XBpX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61162dd8-5fc3-4493-856f-338f6a569f95_1195x896.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XBpX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61162dd8-5fc3-4493-856f-338f6a569f95_1195x896.png" width="1195" height="896" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/61162dd8-5fc3-4493-856f-338f6a569f95_1195x896.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:896,&quot;width&quot;:1195,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1847405,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/201357245?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61162dd8-5fc3-4493-856f-338f6a569f95_1195x896.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!XBpX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61162dd8-5fc3-4493-856f-338f6a569f95_1195x896.png 424w, https://substackcdn.com/image/fetch/$s_!XBpX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61162dd8-5fc3-4493-856f-338f6a569f95_1195x896.png 848w, https://substackcdn.com/image/fetch/$s_!XBpX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61162dd8-5fc3-4493-856f-338f6a569f95_1195x896.png 1272w, https://substackcdn.com/image/fetch/$s_!XBpX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61162dd8-5fc3-4493-856f-338f6a569f95_1195x896.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>The Math Doesn&#8217;t Work (Yet): Inside the AI Profitability Problem</h1><p>OpenAI&#8217;s own projections show losses getting bigger as revenue gets bigger. Leaked investor documents reported across WSJ, Fortune, and The Information put the company at roughly <a href="https://fortune.com/2025/11/12/openai-cash-burn-rate-annual-losses-2028-profitable-2030-financial-documents/">$74 billion in operating losses in 2028</a> &#8212; on roughly $100 billion in projected revenue. That pairing is the headline. Not a smaller loss as scale arrives. A loss that grows faster than the top line.</p><p>That single relationship is the whole story. We tend to model AI companies as software businesses that will eventually grow into their cost structure the way SaaS companies did before them. The numbers say otherwise. These are capital-intensive infrastructure plays wearing software-company clothing, and the unit economics underneath them run in the opposite direction from the SaaS playbook most of us internalized over the last fifteen years.</p><p>A caveat before the numbers, because it matters for what&#8217;s below: neither OpenAI nor Anthropic publishes audited financials. Most figures here come from leaked investor decks, run-rate annualizations the companies announce in funding rounds, or SEC filings made by their cloud partners. The uncertainty is part of the analysis, not a footnote to it. When a specific timeline or quarterly figure couldn&#8217;t survive cross-checking against primary sources, I left it out.</p><h2>TL;DR</h2><ul><li><p>OpenAI&#8217;s leaked projections show operating losses <em>widening</em> as revenue grows &#8212; roughly <a href="https://fortune.com/2025/11/12/openai-cash-burn-rate-annual-losses-2028-profitable-2030-financial-documents/">$74B in losses on ~$100B revenue projected for 2028, with cumulative cash burn near $115B through 2029</a>.</p></li><li><p>In 2025 OpenAI spent about <a href="https://fortune.com/2025/11/12/openai-cash-burn-rate-annual-losses-2028-profitable-2030-financial-documents/">$1.69 for every dollar of revenue (~$9B net loss on ~$13B revenue)</a>, per leaked documents confirmed by multiple outlets.</p></li><li><p>Anthropic&#8217;s revenue trajectory is steep &#8212; roughly $1B annualized at the end of 2024 to a figure announced in the tens of billions by mid-2026 &#8212; with <a href="https://sacra.com/c/anthropic/">Claude Code alone reported at $2.5B annualized by February 2026</a>.</p></li><li><p><a href="https://www.investing.com/analysis/the-ai-token-pricing-crisis-behind-openai-and-anthropics-revenue-race-200680777">Inference token prices fell about 75% in a year</a>. Selling more AI makes the per-unit economics cheaper, which makes revenue growth harder, not easier.</p></li><li><p><a href="https://www.techtimes.com/articles/317542/20260601/ai-agent-economics-token-tax-locks-gross-margins-30-points-below-saas-baseline.htm">AI-native gross margins sit near 45% versus 75&#8211;85% for mature SaaS</a> &#8212; a structural gap of 23&#8211;33 points no company has yet closed.</p></li><li><p>No company has a verified path to profitability. Every specific breakeven-by-year claim I tried to confirm fell apart under scrutiny.</p></li></ul><h2>Two Companies, Two Shapes</h2><p>OpenAI ended 2025 at <a href="https://futurumgroup.com/insights/ai-capex-2026-the-690b-infrastructure-sprint/">roughly $20 billion in annualized revenue, a figure CFO Sarah Friar has stated directly</a>. That is a large business by any normal measure. It is also a business that, in the same year, spent about <a href="https://fortune.com/2025/11/12/openai-cash-burn-rate-annual-losses-2028-profitable-2030-financial-documents/">$1.69 for every dollar it took in &#8212; somewhere around a $9 billion net loss on roughly $13 billion in recognized revenue</a>, according to leaked documents that WSJ, Fortune, and The Information each reported. The company is majority funded by Microsoft, runs its compute primarily on Azure, and its strategy is scale-first: build the largest models, capture the most usage, and trust that revenue follows the curve.</p><p>Anthropic&#8217;s shape is different. Its revenue trajectory is steeper and more concentrated. The company grew from roughly $1 billion annualized at the end of 2024 to a figure it announced in the tens of billions by mid-2026 &#8212; <a href="https://sacra.com/c/anthropic/">the number it cited in its Series H materials</a>. I&#8217;m deliberately not pinning an exact figure to a month here, because the company has grown several-fold inside a single five-month window and any precise number is stale by the time you read it. What&#8217;s verifiable is the slope, and the slope is steep.</p><p>The more interesting detail is the concentration of value. Claude Code, one product, was reported at <a href="https://sacra.com/c/anthropic/">roughly $2.5 billion annualized by February 2026</a>. A single coding tool driving that much of a company&#8217;s run rate tells you something about where the margin-bearing demand actually lives. Anthropic is backed by Amazon (over $8 billion invested) and Google (over $2 billion), and it has <a href="https://techcrunch.com/2026/05/20/anthropic-will-pay-xai-1-25-billion-per-month-for-compute/">committed to spend more than $100 billion on AWS over ten years</a>, with roughly 1 GW of Trainium capacity targeted by the end of 2026. Two large companies funding it; one of them also selling it the silicon it runs on.</p><h2>Why Compute Is the Problem</h2><p>What separates these companies from every SaaS business you&#8217;ve evaluated: they don&#8217;t own their infrastructure. They rent it, at hyperscaler rates, from the same companies that fund them.</p><p>OpenAI&#8217;s Azure spend reportedly ran around <a href="https://www.theregister.com/2025/11/12/openai_spending_report/">$3.7 billion in 2024 and roughly $8.7 billion across the first three quarters of 2025</a>. Treat those numbers as medium-confidence &#8212; they come from leaked documents, and Microsoft pushed back that the figures &#8220;aren&#8217;t quite right.&#8221; But the direction is consistent with everything else: compute cost is the dominant line item, it&#8217;s largely fixed, and it grows with usage.</p><p>Anthropic&#8217;s arrangement produced one of the stranger details in modern enterprise finance I&#8217;ve read in some time. The company signed a deal for compute from Colossus 1 &#8212; Elon Musk&#8217;s Memphis data center, operated by xAI &#8212; at <a href="https://techcrunch.com/2026/05/20/anthropic-will-pay-xai-1-25-billion-per-month-for-compute/">roughly $1.25 billion per month for 300 MW of capacity, running through May 2029</a>. That&#8217;s not from a leak. It surfaced in SpaceX&#8217;s S-1 SEC filing and was confirmed by CNBC, Axios, and Data Center Dynamics, with a potential total value above $40 billion. There&#8217;s a 90-day mutual cancellation clause, so the headline total overstates the firm commitment. Still: Anthropic &#8212; funded by Google and Amazon &#8212; is paying Elon Musk&#8217;s company more than a billion dollars a month for compute. The AI capital world is stranger from the inside than the press releases suggest.</p><p>Zoom out and the renter problem gets sharper. Hyperscaler capex for 2026 is projected at <a href="https://futurumgroup.com/insights/ai-capex-2026-the-690b-infrastructure-sprint/">$660&#8211;690 billion</a>. Against that, OpenAI&#8217;s $20 billion ARR is roughly 3% of a single year&#8217;s data-center buildout by its suppliers. The companies selling AI applications are small tenants in an infrastructure market they don&#8217;t control and can&#8217;t currently price against.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!GBF-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88636760-5028-4c28-a47a-a9be5e14e2e7_1024x768.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GBF-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88636760-5028-4c28-a47a-a9be5e14e2e7_1024x768.png 424w, https://substackcdn.com/image/fetch/$s_!GBF-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88636760-5028-4c28-a47a-a9be5e14e2e7_1024x768.png 848w, https://substackcdn.com/image/fetch/$s_!GBF-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88636760-5028-4c28-a47a-a9be5e14e2e7_1024x768.png 1272w, https://substackcdn.com/image/fetch/$s_!GBF-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88636760-5028-4c28-a47a-a9be5e14e2e7_1024x768.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GBF-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88636760-5028-4c28-a47a-a9be5e14e2e7_1024x768.png" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/88636760-5028-4c28-a47a-a9be5e14e2e7_1024x768.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1143520,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/201357245?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88636760-5028-4c28-a47a-a9be5e14e2e7_1024x768.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!GBF-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88636760-5028-4c28-a47a-a9be5e14e2e7_1024x768.png 424w, https://substackcdn.com/image/fetch/$s_!GBF-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88636760-5028-4c28-a47a-a9be5e14e2e7_1024x768.png 848w, https://substackcdn.com/image/fetch/$s_!GBF-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88636760-5028-4c28-a47a-a9be5e14e2e7_1024x768.png 1272w, https://substackcdn.com/image/fetch/$s_!GBF-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88636760-5028-4c28-a47a-a9be5e14e2e7_1024x768.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>The Unit Economics Trap</h2><p>This story inverts an instinct most of us trust.</p><p>In normal software, even in the Cloud, scale is your friend. Marginal cost trends toward zero, gross margin climbs as you grow, and a mature SaaS business lands at <a href="https://www.techtimes.com/articles/317542/20260601/ai-agent-economics-token-tax-locks-gross-margins-30-points-below-saas-baseline.htm">75&#8211;85% gross margin</a> because serving the millionth customer costs almost nothing. Volume is the cure.</p><p>Inference doesn&#8217;t behave that way. Every token generated costs compute &#8212; real, metered, non-zero compute &#8212; so the marginal cost of serving usage stays stubbornly positive. And the price you can charge for that token is collapsing. Enterprise transaction data from Ramp shows <a href="https://www.investing.com/analysis/the-ai-token-pricing-crisis-behind-openai-and-anthropics-revenue-race-200680777">inference prices falling roughly 75% in a single year, from around $10 per million tokens to around $2.50</a>. Capability per dollar is improving fast, which is good for buyers and brutal for sellers, because it means the revenue you booked at last year&#8217;s prices reprices downward while your compute bill does not.</p><p>Put the two forces together and you get a squeeze that worsens with success. The better you are at selling inference, the more usage you drive; the more usage you drive, the more the per-unit price falls; the more it falls, the harder it is to grow revenue against a compute bill that scales with that same usage. Volume isn&#8217;t the cure here. Under these dynamics it&#8217;s part of the disease.</p><p>The survey data puts a number on how far this world sits from SaaS. ICONIQ Capital polled about 300 software executives and <a href="https://www.techtimes.com/articles/317542/20260601/ai-agent-economics-token-tax-locks-gross-margins-30-points-below-saas-baseline.htm">pegged AI-native gross margins at 41% in 2024, 45% in 2025, and a projected 52% in 2026</a>. Improving &#8212; but starting from a base 30-plus points below mature SaaS, and closing the distance slowly. A 52% gross margin is a respectable hardware business. It is a structurally difficult software business, especially one still spending heavily to grow.</p><h2>Two Different Bets</h2><p>OpenAI and Anthropic are running different experiments on how you eventually close that gap. Neither has been validated.</p><p>OpenAI&#8217;s bet is scale and breadth. Build the broadest platform, capture consumer and enterprise and API demand simultaneously, and assume that at sufficient scale you gain pricing power over compute, model-efficiency gains compound, and the revenue base grows fast enough to absorb the fixed cost. The leaked projections embody the risk in this bet: they show <a href="https://fortune.com/2025/11/12/openai-cash-burn-rate-annual-losses-2028-profitable-2030-financial-documents/">losses </a><em><a href="https://fortune.com/2025/11/12/openai-cash-burn-rate-annual-losses-2028-profitable-2030-financial-documents/">widening</a></em><a href="https://fortune.com/2025/11/12/openai-cash-burn-rate-annual-losses-2028-profitable-2030-financial-documents/"> through 2028 even as revenue approaches $100 billion, with cumulative cash burn near $115 billion through 2029</a>. The theory requires the curve to bend after the window we can currently see.</p><p>Anthropic&#8217;s bet is narrower and more product-led. Find a wedge where the work is valuable enough that buyers tolerate real prices, prove the margin there, and expand outward. Claude Code is that wedge made concrete &#8212; <a href="https://sacra.com/c/anthropic/">$2.5 billion annualized from developers</a> who pay because the output is worth more than the inference under it. Coding, agents, and enterprise automation are higher-value work than chat, and higher-value work supports prices that don&#8217;t immediately erode under token deflation. The risk: revenue concentration in a single product line, and an infrastructure bill &#8212; AWS commitments plus the xAI deal &#8212; that&#8217;s enormous relative to a company still proving the model.</p><p>Two theories of the same problem. Scale your way past the margin gap, or find work valuable enough that the gap doesn&#8217;t bind. We don&#8217;t yet have the data to say either works.</p><h2>What Would It Actually Take</h2><p>I&#8217;ll skip the timeline speculation &#8212; every specific breakeven-by-year claim I tried to verify died on contact with the sources. The structural requirements are clearer than the dates.</p><p>Three things have to move. First, gross margins have to climb from the mid-40s toward something defensible &#8212; call it 60-plus &#8212; and stay there while volume grows. That means model-efficiency gains (cheaper inference per unit of capability) have to outrun price deflation, rather than getting passed straight through to buyers as lower prices.</p><p>Second, these companies need pricing power over compute, which today they don&#8217;t have. At current scale they&#8217;re tenants. The open question is at what ARR a vendor becomes large enough to negotiate compute like a partner instead of a customer &#8212; or to build its own. Anthropic&#8217;s Trainium commitment and OpenAI&#8217;s various infrastructure moves are bets that vertical integration eventually changes the cost equation. That&#8217;s unproven, and it&#8217;s expensive in the interim.</p><p>Third, the product mix has to keep shifting toward work that resists deflation &#8212; enterprise agents, coding tools, automation that&#8217;s measured against labor cost rather than against the falling price of a token. Claude Code is the cleanest evidence that this category exists and that buyers will pay. Whether it&#8217;s a large enough share of total volume to lift blended margins across a company is the question that decides the whole thing.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!I6FB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6b6f837-7c33-4d8b-9fce-0ef475dfb701_1024x768.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!I6FB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6b6f837-7c33-4d8b-9fce-0ef475dfb701_1024x768.png 424w, https://substackcdn.com/image/fetch/$s_!I6FB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6b6f837-7c33-4d8b-9fce-0ef475dfb701_1024x768.png 848w, https://substackcdn.com/image/fetch/$s_!I6FB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6b6f837-7c33-4d8b-9fce-0ef475dfb701_1024x768.png 1272w, https://substackcdn.com/image/fetch/$s_!I6FB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6b6f837-7c33-4d8b-9fce-0ef475dfb701_1024x768.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!I6FB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6b6f837-7c33-4d8b-9fce-0ef475dfb701_1024x768.png" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b6b6f837-7c33-4d8b-9fce-0ef475dfb701_1024x768.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1409865,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/201357245?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6b6f837-7c33-4d8b-9fce-0ef475dfb701_1024x768.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!I6FB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6b6f837-7c33-4d8b-9fce-0ef475dfb701_1024x768.png 424w, https://substackcdn.com/image/fetch/$s_!I6FB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6b6f837-7c33-4d8b-9fce-0ef475dfb701_1024x768.png 848w, https://substackcdn.com/image/fetch/$s_!I6FB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6b6f837-7c33-4d8b-9fce-0ef475dfb701_1024x768.png 1272w, https://substackcdn.com/image/fetch/$s_!I6FB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6b6f837-7c33-4d8b-9fce-0ef475dfb701_1024x768.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>The Open Questions</h2><p>What we don&#8217;t know outweighs what we do.</p><p>We don&#8217;t know the actual, current gross margins at either company &#8212; only the <a href="https://www.techtimes.com/articles/317542/20260601/ai-agent-economics-token-tax-locks-gross-margins-30-points-below-saas-baseline.htm">AI-native sector estimate of roughly 45%</a>. Neither company publishes the number that would settle the argument. We don&#8217;t know whether Anthropic&#8217;s stack of compute commitments, the decade-long AWS deal alongside the month-by-month xAI arrangement, creates structural tension or healthy redundancy. A company hedging across three infrastructure providers is either diversifying supply or revealing that no single supplier can meet its demand. We don&#8217;t know the ARR threshold at which compute pricing becomes negotiable, which is the hinge the entire margin story turns on.</p><p>And there&#8217;s the strategic risk that has no clean precedent: your infrastructure supplier is also your competitor. Microsoft ships Copilot. Amazon and Google both build models that compete with Anthropic&#8217;s. xAI builds Grok. Every dollar these companies pay for compute partly funds a rival&#8217;s model program. In normal software you don&#8217;t hand your gross margin to the company trying to beat you. Here it&#8217;s the default arrangement.</p><p>So the real question isn&#8217;t whether the AI labs are growing. They obviously are, faster than almost any companies in history. The question is whether revenue growth and margin improvement are the same trend or opposing ones. The SaaS era trained a generation of operators to believe that scale fixes economics. The leaked numbers describe a business where scale, so far, makes the loss bigger. Until one of these companies publishes a gross margin that shows the curve bending, that&#8217;s the math we have. And the math doesn&#8217;t work yet.</p><div><hr></div><p><em>Bob Matsuoka is CTO of <a href="https://www.duettocloud.com/">Duetto</a> and writes about AI-powered engineering at HyperDev.</em></p><p><strong>Related reading:</strong></p><ul><li><p><a href="https://hyperdev.matsuoka.com/p/the-first-70-era">The First 70% Era</a> &#8212; Where agentic AI delivers value and where it stops, and why the higher-value work resists token deflation</p></li><li><p><a href="https://hyperdev.matsuoka.com/p/ai-and-the-rise-of-the-hyperdev">AI and the Rise of the Hyperdev</a> &#8212; Why developers pay real money for AI tooling, the demand side of the margin story</p></li><li><p><a href="https://aipowerranking.com/">AI Power Ranking</a> &#8212; Tool comparisons and benchmarks for AI practitioners</p></li><li><p><a href="https://www.linkedin.com/newsletters/ai-power-ranking-7345782916301418496/">LinkedIn Newsletter</a> &#8212; Strategic AI insights for CTOs and engineering leaders</p></li></ul>]]></content:encoded></item><item><title><![CDATA[What’s Old Is New Again]]></title><description><![CDATA[Nine classic SDLC practices that AI finally makes practical]]></description><link>https://hyperdev.matsuoka.com/p/whats-old-is-new-again</link><guid isPermaLink="false">https://hyperdev.matsuoka.com/p/whats-old-is-new-again</guid><dc:creator><![CDATA[Robert Matsuoka]]></dc:creator><pubDate>Wed, 03 Jun 2026 11:31:06 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!FRWK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b2c67b8-b815-4f06-a401-6d57b2a9d884_873x576.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FRWK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b2c67b8-b815-4f06-a401-6d57b2a9d884_873x576.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FRWK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b2c67b8-b815-4f06-a401-6d57b2a9d884_873x576.png 424w, https://substackcdn.com/image/fetch/$s_!FRWK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b2c67b8-b815-4f06-a401-6d57b2a9d884_873x576.png 848w, https://substackcdn.com/image/fetch/$s_!FRWK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b2c67b8-b815-4f06-a401-6d57b2a9d884_873x576.png 1272w, https://substackcdn.com/image/fetch/$s_!FRWK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b2c67b8-b815-4f06-a401-6d57b2a9d884_873x576.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FRWK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b2c67b8-b815-4f06-a401-6d57b2a9d884_873x576.png" width="873" height="576" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0b2c67b8-b815-4f06-a401-6d57b2a9d884_873x576.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:576,&quot;width&quot;:873,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1435244,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/200236698?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43b89068-cf8e-4789-95f1-e357c61076b0_1024x768.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FRWK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b2c67b8-b815-4f06-a401-6d57b2a9d884_873x576.png 424w, https://substackcdn.com/image/fetch/$s_!FRWK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b2c67b8-b815-4f06-a401-6d57b2a9d884_873x576.png 848w, https://substackcdn.com/image/fetch/$s_!FRWK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b2c67b8-b815-4f06-a401-6d57b2a9d884_873x576.png 1272w, https://substackcdn.com/image/fetch/$s_!FRWK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b2c67b8-b815-4f06-a401-6d57b2a9d884_873x576.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Most of the best ideas in software engineering aren&#8217;t new. They&#8217;ve been written up in books, argued over at conferences, taught in every &#8220;best practices&#8221; deck since the late 1990s. And most teams quietly don&#8217;t do them.</p><p>Not because anyone thinks they&#8217;re wrong. Test-driven development, design by contract, architecture decision records, mutation testing &#8212; ask a room of senior engineers whether these are good ideas and you&#8217;ll get nods. Ask the same room who practices them consistently under deadline pressure and the hands stay down. I&#8217;ve been in that room for twenty-five years, on both sides of the question. I&#8217;ve also been the engineering leader who let those practices slip because shipping the feature mattered more this quarter.</p><p>There&#8217;s a single economic reason these practices lose. The upfront cost is high, the payoff is real but distant, and human attention is the binding constraint. Write the test before the code, document the decision, specify the invariant &#8212; every one of those is a tax you pay now against a benefit you collect later, maybe, if the project lives long enough. Under deadline pressure, that&#8217;s a losing trade for a human. So we skip it, ship, and pay the interest later in bugs and confusion. Call it the impatience tax.</p><p>Agents don&#8217;t pay that tax. They have infinite patience for upfront rigor and roughly zero marginal cost for the tedious work that rigor demands. Writing a thorough test suite for code that doesn&#8217;t exist yet is psychologically brutal for a person and completely fine for a model. That single shift &#8212; the cost of patience going to zero &#8212; quietly inverts the economics of a whole list of practices we knew were right and gave up on anyway.</p><p>This isn&#8217;t a piece about what AI makes <em>possible</em>. Lots of things are possible. It&#8217;s about a narrower, more useful question: which disciplines did we already agree were correct, fight about for decades, and abandon for reasons that no longer hold?</p><p>Here are nine.</p><h2>TL;DR</h2><ul><li><p>These nine practices share one structure: high upfront cost, distant payoff. Human attention is the constraint that kills them under deadline pressure.</p></li><li><p>AI removes the constraint. A failing test is the clearest prompt you can hand an agent; a spec is its input; an ADR is its context. The discipline becomes the interface.</p></li><li><p>TDD, design by contract, and property-based testing turn from &#8220;things we should do&#8221; into the most effective way to <em>constrain</em> agent behavior and prevent hallucinated correctness.</p></li><li><p>Documentation, ADRs, and living docs get a bilateral ROI: agents generate them from code, and they make agents far more effective in your codebase.</p></li><li><p>The catch is real. A 2025 METR randomized trial found experienced developers were about 19% <em>slower</em> with AI assistance. These practices pay off only when AI is used with discipline, not as autocomplete.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hmow!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff117aee0-1765-4975-a0c5-dd7bdddbe34d_1024x431.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hmow!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff117aee0-1765-4975-a0c5-dd7bdddbe34d_1024x431.png 424w, https://substackcdn.com/image/fetch/$s_!hmow!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff117aee0-1765-4975-a0c5-dd7bdddbe34d_1024x431.png 848w, https://substackcdn.com/image/fetch/$s_!hmow!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff117aee0-1765-4975-a0c5-dd7bdddbe34d_1024x431.png 1272w, https://substackcdn.com/image/fetch/$s_!hmow!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff117aee0-1765-4975-a0c5-dd7bdddbe34d_1024x431.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hmow!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff117aee0-1765-4975-a0c5-dd7bdddbe34d_1024x431.png" width="1024" height="431" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f117aee0-1765-4975-a0c5-dd7bdddbe34d_1024x431.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:431,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1083571,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/200236698?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f4ba76c-18f5-413a-bf35-a56dc7861a00_1024x768.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hmow!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff117aee0-1765-4975-a0c5-dd7bdddbe34d_1024x431.png 424w, https://substackcdn.com/image/fetch/$s_!hmow!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff117aee0-1765-4975-a0c5-dd7bdddbe34d_1024x431.png 848w, https://substackcdn.com/image/fetch/$s_!hmow!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff117aee0-1765-4975-a0c5-dd7bdddbe34d_1024x431.png 1272w, https://substackcdn.com/image/fetch/$s_!hmow!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff117aee0-1765-4975-a0c5-dd7bdddbe34d_1024x431.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>1. Test-Driven Development</h2><p>Start with the practice most teams abandoned first.</p><p>Writing tests before code was always the theoretically superior move. It forces you to define the interface before you build behind it, catches bugs at the moment of definition instead of during integration, and leaves behind a living specification of what the code is supposed to do. Kent Beck made the case decades ago and the case held up.</p><p>Almost nobody did it consistently. The reason is psychological, not technical. Writing detailed tests for code that doesn&#8217;t exist yet, while a deadline breathes on your neck, feels like building scaffolding for a house you haven&#8217;t designed. Your brain screams at you to just write the function. So you write the function, promise yourself you&#8217;ll add tests after, and &#8212; well. You know how that goes.</p><p>Now flip the perspective. To an agent, a failing test isn&#8217;t scaffolding. It&#8217;s the clearest possible specification of intent you can provide. &#8220;Make this pass, don&#8217;t break anything else&#8221; is an unambiguous, machine-checkable instruction, which is exactly what a probabilistic system needs to stay honest. The test suite becomes a guardrail that prevents the most dangerous failure mode in AI-assisted coding: confident, plausible, wrong. Hallucinated correctness dies against a red bar.</p><p>TDD went from the discipline most teams couldn&#8217;t sustain to one of the best tools we have for bounding what an agent is allowed to claim it did. Same practice. Opposite economics.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!a841!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb60534e-05b5-4873-ab47-586bb7ec68bd_1024x670.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!a841!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb60534e-05b5-4873-ab47-586bb7ec68bd_1024x670.png 424w, https://substackcdn.com/image/fetch/$s_!a841!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb60534e-05b5-4873-ab47-586bb7ec68bd_1024x670.png 848w, https://substackcdn.com/image/fetch/$s_!a841!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb60534e-05b5-4873-ab47-586bb7ec68bd_1024x670.png 1272w, https://substackcdn.com/image/fetch/$s_!a841!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb60534e-05b5-4873-ab47-586bb7ec68bd_1024x670.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!a841!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb60534e-05b5-4873-ab47-586bb7ec68bd_1024x670.png" width="1024" height="670" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fb60534e-05b5-4873-ab47-586bb7ec68bd_1024x670.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:670,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1640038,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/200236698?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a4cda20-f557-4918-88d1-529fec00bdee_1024x768.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!a841!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb60534e-05b5-4873-ab47-586bb7ec68bd_1024x670.png 424w, https://substackcdn.com/image/fetch/$s_!a841!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb60534e-05b5-4873-ab47-586bb7ec68bd_1024x670.png 848w, https://substackcdn.com/image/fetch/$s_!a841!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb60534e-05b5-4873-ab47-586bb7ec68bd_1024x670.png 1272w, https://substackcdn.com/image/fetch/$s_!a841!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb60534e-05b5-4873-ab47-586bb7ec68bd_1024x670.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>2. Spec-Driven Development and Design by Contract</h2><p>Bertrand Meyer formalized design by contract in the 1980s and built it into the Eiffel language: specify preconditions, postconditions, and invariants, then let the implementation follow from the contract.</p><p>The idea was sound and the adoption was thin, for one stubborn economic reason: the contract only pays off if someone <em>else</em> writes the implementation from it. If you&#8217;re writing both the spec and the code, the spec is overhead &#8212; you already know what you meant. The contract&#8217;s value lives in the handoff, and for most of software history there was no cheap handoff to hand it to.</p><p>Now there is. You write the contract; the agent writes the implementation from it. Spec-driven development stops being a documentation chore and becomes the actual control surface for delegation. The spec is the part requiring human judgment about <em>what</em> the system should do. The implementation &#8212; the part that used to eat the hours &#8212; is the part you delegate. Meyer&#8217;s economics finally close, forty years late, because the missing party in the transaction showed up.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!P-wa!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee5605bf-95f5-490e-b318-7334ff70ae87_1024x487.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!P-wa!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee5605bf-95f5-490e-b318-7334ff70ae87_1024x487.png 424w, https://substackcdn.com/image/fetch/$s_!P-wa!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee5605bf-95f5-490e-b318-7334ff70ae87_1024x487.png 848w, https://substackcdn.com/image/fetch/$s_!P-wa!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee5605bf-95f5-490e-b318-7334ff70ae87_1024x487.png 1272w, https://substackcdn.com/image/fetch/$s_!P-wa!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee5605bf-95f5-490e-b318-7334ff70ae87_1024x487.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!P-wa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee5605bf-95f5-490e-b318-7334ff70ae87_1024x487.png" width="1024" height="487" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ee5605bf-95f5-490e-b318-7334ff70ae87_1024x487.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:487,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1370470,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/200236698?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82d6c7c5-50c0-4b34-9a46-0700c4bf1e82_1024x768.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!P-wa!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee5605bf-95f5-490e-b318-7334ff70ae87_1024x487.png 424w, https://substackcdn.com/image/fetch/$s_!P-wa!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee5605bf-95f5-490e-b318-7334ff70ae87_1024x487.png 848w, https://substackcdn.com/image/fetch/$s_!P-wa!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee5605bf-95f5-490e-b318-7334ff70ae87_1024x487.png 1272w, https://substackcdn.com/image/fetch/$s_!P-wa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee5605bf-95f5-490e-b318-7334ff70ae87_1024x487.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>3. Architecture Decision Records</h2><p>Why does the codebase look like this? Why Postgres and not Dynamo, why this queue, why the weird module boundary that everyone trips over?</p><p>ADRs were the right answer to that question &#8212; a short dated record of each significant decision, the context, and the alternatives rejected. The discipline almost never held. Same shape as everything else here: the cost is immediate (stop, write the thing) and the value accrues slowly, mostly to some future engineer who isn&#8217;t in the room yet.</p><p>Two things flipped at once, which makes this one more interesting than the rest. First, agents can generate ADRs from an existing codebase &#8212; read the git history, the dependency choices, the structure, and reconstruct the decisions that produced them. The retroactive cost of documentation drops toward zero. Second, and this is the part people miss: existing ADRs dramatically improve what an agent can do <em>in</em> your codebase. An agent that can read why you chose eventual consistency won&#8217;t keep proposing changes that assume strong consistency.</p><p>So the ROI went bilateral. Agents help you write ADRs, and ADRs help agents help you. The practice that used to only cost now pays on both ends.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cfey!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F256e6ffa-ac04-401b-bd1d-276e0dc51f21_1024x445.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cfey!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F256e6ffa-ac04-401b-bd1d-276e0dc51f21_1024x445.png 424w, https://substackcdn.com/image/fetch/$s_!cfey!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F256e6ffa-ac04-401b-bd1d-276e0dc51f21_1024x445.png 848w, https://substackcdn.com/image/fetch/$s_!cfey!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F256e6ffa-ac04-401b-bd1d-276e0dc51f21_1024x445.png 1272w, https://substackcdn.com/image/fetch/$s_!cfey!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F256e6ffa-ac04-401b-bd1d-276e0dc51f21_1024x445.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cfey!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F256e6ffa-ac04-401b-bd1d-276e0dc51f21_1024x445.png" width="1024" height="445" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/256e6ffa-ac04-401b-bd1d-276e0dc51f21_1024x445.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:445,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1079171,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/200236698?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0eca5b97-c8ef-45f0-97de-eb23cc9969c2_1024x768.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!cfey!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F256e6ffa-ac04-401b-bd1d-276e0dc51f21_1024x445.png 424w, https://substackcdn.com/image/fetch/$s_!cfey!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F256e6ffa-ac04-401b-bd1d-276e0dc51f21_1024x445.png 848w, https://substackcdn.com/image/fetch/$s_!cfey!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F256e6ffa-ac04-401b-bd1d-276e0dc51f21_1024x445.png 1272w, https://substackcdn.com/image/fetch/$s_!cfey!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F256e6ffa-ac04-401b-bd1d-276e0dc51f21_1024x445.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>4. Continuous Code Review</h2><p>&#8220;Catch issues early&#8221; has sat on every best-practices list since Extreme Programming put continuous review on the map. The advice was never controversial. The bottleneck was always the same: human reviewer attention is finite, expensive, and easily exhausted. So review collapsed into batch PR review &#8212; a tired engineer reading a 600-line diff on a Friday afternoon, approving most of it on faith.</p><p>AI review on every commit &#8212; not batched at the PR boundary, but running as code lands &#8212; is moving from aspirational toward baseline: increasingly the default on teams that have wired it in, though not yet universal. The marginal cost of a careful read went to nearly nothing, and the read happens while the context is still warm.</p><p>But this one comes with an emergent problem worth naming, because it bites teams that adopt the tooling without rethinking the model. PRs are getting larger and arriving faster under AI-assisted development. An agent can produce a 2,000-line change in an afternoon. If your review model still routes everything through a human approver at the end, that human is now the rate limiter, drowning in volume they didn&#8217;t generate and can&#8217;t realistically read. AI review on every commit is part of the answer. The harder part is restructuring <em>what</em> the human reviews &#8212; architecture, intent, the decisions a model shouldn&#8217;t make alone &#8212; and letting the machine handle line-level correctness continuously. Adopt the tool without rethinking the workflow and you&#8217;ve just built a faster way to overwhelm your best reviewer.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!SXYP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2eb6d6b9-de10-452f-8b64-7bb0bcba1041_1024x456.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!SXYP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2eb6d6b9-de10-452f-8b64-7bb0bcba1041_1024x456.png 424w, https://substackcdn.com/image/fetch/$s_!SXYP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2eb6d6b9-de10-452f-8b64-7bb0bcba1041_1024x456.png 848w, https://substackcdn.com/image/fetch/$s_!SXYP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2eb6d6b9-de10-452f-8b64-7bb0bcba1041_1024x456.png 1272w, https://substackcdn.com/image/fetch/$s_!SXYP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2eb6d6b9-de10-452f-8b64-7bb0bcba1041_1024x456.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!SXYP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2eb6d6b9-de10-452f-8b64-7bb0bcba1041_1024x456.png" width="1024" height="456" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2eb6d6b9-de10-452f-8b64-7bb0bcba1041_1024x456.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:456,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1098227,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/200236698?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ce64d7a-b849-4c7e-b523-2565960ee237_1024x768.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!SXYP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2eb6d6b9-de10-452f-8b64-7bb0bcba1041_1024x456.png 424w, https://substackcdn.com/image/fetch/$s_!SXYP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2eb6d6b9-de10-452f-8b64-7bb0bcba1041_1024x456.png 848w, https://substackcdn.com/image/fetch/$s_!SXYP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2eb6d6b9-de10-452f-8b64-7bb0bcba1041_1024x456.png 1272w, https://substackcdn.com/image/fetch/$s_!SXYP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2eb6d6b9-de10-452f-8b64-7bb0bcba1041_1024x456.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>5. Pair Programming</h2><p>Pairing always looked expensive in the most obvious way: two engineers, one task, double the salary against a single unit of output. That intuition was wrong on the numbers &#8212; the measured overhead from the pair-programming studies was closer to 15%, often recovered through fewer defects &#8212; but the 2&#215; gut feeling is what drove the decisions. The benefits were real &#8212; knowledge transfer, real-time review, fewer dumb mistakes &#8212; but the perceived math meant most teams reserved it for critical paths, gnarly bugs, or onboarding a new hire. A luxury, rationed.</p><p>The pair is now a human and an agent, and it&#8217;s available to every engineer continuously, not rationed to the critical path. The knowledge-transfer benefit generalizes &#8212; the agent can explain unfamiliar parts of the codebase on demand. The real-time-review benefit generalizes &#8212; a second set of eyes on every line, every time, without scheduling two calendars. The economics that made pairing a rationed luxury simply don&#8217;t apply when one half of the pair has near-zero marginal cost.</p><p>Worth a caveat: agent-as-pair is genuinely good at the review and explanation half of pairing, and weaker at the part where a human partner pushes back on a bad <em>design</em> before you&#8217;ve written a line. You still need humans pairing with humans for that. But the day-to-day, line-by-line version of pairing just became free, and that&#8217;s most of what pairing was for.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DHDP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18e5b59c-a9a7-44ae-8725-1333f8e08687_1024x592.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DHDP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18e5b59c-a9a7-44ae-8725-1333f8e08687_1024x592.png 424w, https://substackcdn.com/image/fetch/$s_!DHDP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18e5b59c-a9a7-44ae-8725-1333f8e08687_1024x592.png 848w, https://substackcdn.com/image/fetch/$s_!DHDP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18e5b59c-a9a7-44ae-8725-1333f8e08687_1024x592.png 1272w, https://substackcdn.com/image/fetch/$s_!DHDP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18e5b59c-a9a7-44ae-8725-1333f8e08687_1024x592.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DHDP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18e5b59c-a9a7-44ae-8725-1333f8e08687_1024x592.png" width="1024" height="592" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/18e5b59c-a9a7-44ae-8725-1333f8e08687_1024x592.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:592,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1439433,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/200236698?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F611b1613-4024-4d10-8e69-2b2a1936432d_1024x768.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DHDP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18e5b59c-a9a7-44ae-8725-1333f8e08687_1024x592.png 424w, https://substackcdn.com/image/fetch/$s_!DHDP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18e5b59c-a9a7-44ae-8725-1333f8e08687_1024x592.png 848w, https://substackcdn.com/image/fetch/$s_!DHDP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18e5b59c-a9a7-44ae-8725-1333f8e08687_1024x592.png 1272w, https://substackcdn.com/image/fetch/$s_!DHDP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18e5b59c-a9a7-44ae-8725-1333f8e08687_1024x592.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>6. Mutation Testing</h2><p>Coverage numbers lie, and most engineers know it. Eighty percent line coverage tells you eighty percent of your lines got executed by a test &#8212; not that any of those tests would <em>notice</em> if the behavior broke. Mutation testing is the honest measure: it deliberately introduces bugs (flip a comparison, drop a line, change a constant) and checks whether your test suite catches them. If a mutant survives, you have a test that runs code without actually validating it.</p><p>Mutation testing was always the gold standard and almost never run continuously, for one reason: it&#8217;s computationally expensive. You&#8217;re effectively running your whole suite many times over, once per mutation. On a real codebase that&#8217;s brutal. So it lived in research papers and the occasional heroic CI job that someone eventually disabled for being too slow.</p><p>That constraint is mostly gone &#8212; compute is cheap and parallel, and we got more comfortable spending it. And the practice arrived right when we suddenly need it most. AI-generated tests have a characteristic failure mode: they drift toward coverage metrics without meaningful assertions. The model writes a test that calls the function, exercises the path, and asserts almost nothing of substance &#8212; green checkmark, zero protection. Coverage looks great. Mutation testing is the thing that catches exactly that. It&#8217;s the verification layer for a verification layer, and it matters more now than when it was invented, because now a machine is writing the tests.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!EY9I!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70ad3dac-f7cb-48da-9d0a-a6098d28cbe6_1024x768.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!EY9I!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70ad3dac-f7cb-48da-9d0a-a6098d28cbe6_1024x768.png 424w, https://substackcdn.com/image/fetch/$s_!EY9I!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70ad3dac-f7cb-48da-9d0a-a6098d28cbe6_1024x768.png 848w, https://substackcdn.com/image/fetch/$s_!EY9I!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70ad3dac-f7cb-48da-9d0a-a6098d28cbe6_1024x768.png 1272w, https://substackcdn.com/image/fetch/$s_!EY9I!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70ad3dac-f7cb-48da-9d0a-a6098d28cbe6_1024x768.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!EY9I!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70ad3dac-f7cb-48da-9d0a-a6098d28cbe6_1024x768.png" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/70ad3dac-f7cb-48da-9d0a-a6098d28cbe6_1024x768.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1482204,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/200236698?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70ad3dac-f7cb-48da-9d0a-a6098d28cbe6_1024x768.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!EY9I!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70ad3dac-f7cb-48da-9d0a-a6098d28cbe6_1024x768.png 424w, https://substackcdn.com/image/fetch/$s_!EY9I!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70ad3dac-f7cb-48da-9d0a-a6098d28cbe6_1024x768.png 848w, https://substackcdn.com/image/fetch/$s_!EY9I!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70ad3dac-f7cb-48da-9d0a-a6098d28cbe6_1024x768.png 1272w, https://substackcdn.com/image/fetch/$s_!EY9I!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70ad3dac-f7cb-48da-9d0a-a6098d28cbe6_1024x768.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>7. Living Documentation</h2><p>Documentation was always supposed to be a first-class artifact. It almost never was, and the reason is by now familiar: writing docs is tedious, and the penalty for stale docs accrues slowly and lands on someone else. So docs rotted. Every team has a wiki that&#8217;s a graveyard of half-true pages from two reorgs ago.</p><p>AI changes both halves of the equation at once. It generates docs from code, so the writing cost drops. And it <em>consumes</em> docs as context, so the docs earn their keep immediately &#8212; a well-documented codebase is a measurably more useful codebase for an agent working in it. The ROI is immediate and bilateral, same structure as ADRs.</p><p>There&#8217;s a quiet shift hiding in there. Documentation used to be written for humans who&#8217;d mostly never read it. Now it&#8217;s also written for the agent that will read it on every task, which means stale docs don&#8217;t just confuse a future engineer &#8212; they actively degrade your tooling today. The feedback loop tightened from months to minutes. That&#8217;s the kind of change that actually moves behavior, because the cost of skipping it shows up now instead of later.</p><h2>8. Runbook Generation from Incidents</h2><p>On-call always leaned too hard on tribal knowledge. The person who knows why the payment service wedges at 3 a.m. is asleep, on vacation, or left the company last spring. Writing a runbook after each incident was obviously the right move and reliably the thing nobody did, because the incident was <em>over</em> and everyone wanted to go back to bed.</p><p>Incidents become runbooks automatically now. The agent has the incident timeline, the chat transcript, the commands that resolved it, the postmortem &#8212; and it can turn that into a structured runbook while the details are fresh, without asking an exhausted engineer to relive the night. The cost that used to fall right when motivation was lowest now falls on a system that doesn&#8217;t get tired or resentful.</p><p>I&#8217;d treat the generated runbook as a draft a human still signs off on, not gospel. But &#8220;imperfect draft, reviewed in five minutes&#8221; beats &#8220;blank page nobody ever fills in,&#8221; and that was always the real competition.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dM5v!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0c754a4-9b5b-424f-b008-30dbcb217887_1024x693.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dM5v!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0c754a4-9b5b-424f-b008-30dbcb217887_1024x693.png 424w, https://substackcdn.com/image/fetch/$s_!dM5v!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0c754a4-9b5b-424f-b008-30dbcb217887_1024x693.png 848w, https://substackcdn.com/image/fetch/$s_!dM5v!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0c754a4-9b5b-424f-b008-30dbcb217887_1024x693.png 1272w, https://substackcdn.com/image/fetch/$s_!dM5v!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0c754a4-9b5b-424f-b008-30dbcb217887_1024x693.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dM5v!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0c754a4-9b5b-424f-b008-30dbcb217887_1024x693.png" width="1024" height="693" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d0c754a4-9b5b-424f-b008-30dbcb217887_1024x693.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:693,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1615996,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/200236698?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3f83774-28c6-4b46-addd-83efad33390b_1024x768.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dM5v!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0c754a4-9b5b-424f-b008-30dbcb217887_1024x693.png 424w, https://substackcdn.com/image/fetch/$s_!dM5v!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0c754a4-9b5b-424f-b008-30dbcb217887_1024x693.png 848w, https://substackcdn.com/image/fetch/$s_!dM5v!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0c754a4-9b5b-424f-b008-30dbcb217887_1024x693.png 1272w, https://substackcdn.com/image/fetch/$s_!dM5v!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0c754a4-9b5b-424f-b008-30dbcb217887_1024x693.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>9. Property-Based Testing</h2><p>Example-based tests check the cases you thought of. Property-based testing is stronger: you specify the <em>invariants</em> a system must always satisfy &#8212; reversing a list twice returns the original, a serialized-then-deserialized object equals the original, the account balance never goes negative &#8212; and the framework generates hundreds of adversarial inputs trying to break them. QuickCheck pioneered the approach; it finds the edge cases you&#8217;d never have written by hand.</p><p>It never went mainstream outside a few communities, and the bottleneck wasn&#8217;t tooling &#8212; good property-based libraries exist for most languages. The bottleneck was writing good property specifications. Identifying the right invariants requires deep domain reasoning: you have to understand the system well enough to state what must <em>always</em> be true, which is harder than writing a few example cases. Most engineers, under pressure, defaulted to the easier thing.</p><p>This is where AI helps in a way that&#8217;s less obvious than &#8220;it writes the code.&#8221; A model can generate property suites from a spec, and &#8212; more usefully &#8212; it can reason about <em>what invariants a system should satisfy</em> in the first place, surfacing properties you hadn&#8217;t articulated. That&#8217;s the expensive, judgment-heavy part it actually offloads. Combined with mutation testing to keep the generated properties honest, you get a testing approach that was always more powerful than example-based testing and was always too expensive in human reasoning to adopt widely.</p><h2>The Catch</h2><p>I&#8217;d be selling you something if I stopped there, and the data won&#8217;t let me.</p><p>In July 2025, METR ran a randomized controlled trial with experienced open-source developers working on real tasks in repositories they knew well. The developers expected AI assistance to speed them up. It slowed them down &#8212; by roughly 19%. METR&#8217;s February 2026 follow-up found that gap narrowing, and reversing on some measures, as the same kind of developers gained real experience with the tools &#8212; which is to say the 19% was a snapshot of the unfamiliar, undisciplined path, and it closes precisely as people pick up the habits this piece is about.</p><p>That finding is real and it isn&#8217;t a contradiction of everything above. It&#8217;s the missing condition. Every practice in this piece works <em>because</em> it imposes structure on the agent &#8212; TDD as a guardrail, the spec as input, the ADR as context, mutation testing as the check on the check. Used that way, with discipline, AI is constrained toward correctness. Used the other way &#8212; as autocomplete, as a vibe-coding partner you don&#8217;t supervise &#8212; you get more code, faster, with less correctness and a slower path to done once you account for the cleanup. The METR developers, working in code they already understood deeply, may well have been paying exactly that tax: accepting plausible suggestions that took longer to vet and fix than writing it themselves would have.</p><p>So the inversion isn&#8217;t automatic. The cost of patience dropped to zero, which makes the rigorous path finally affordable. It does not make the undisciplined path good. If anything it makes discipline more important, because a tool that produces plausible output at high volume is precisely the tool that most needs a guardrail you can&#8217;t talk your way past. A red test bar doesn&#8217;t care how confident the model sounds.</p><h2>What To Do With This</h2><p>The interesting question was never &#8220;what does AI make possible.&#8221; That list is enormous, mostly speculative, and not very actionable. The better question is the one this whole piece is built on: which practices did we already know were right, argue about for decades, and quietly give up on?</p><p>That list is short, specific, and yours to write. Go pull your own team&#8217;s &#8220;we should really do this but we don&#8217;t&#8221; backlog &#8212; the standing items in retros that everyone agrees with and nobody owns. I&#8217;d bet most of them have the same economic shape: high upfront cost, distant payoff, killed by human impatience under deadline. Test coverage on the legacy module. The runbooks. The ADRs for the three decisions everyone keeps re-litigating. The integration tests that would&#8217;ve caught last quarter&#8217;s outage.</p><p>Run each one through a single question: was this abandoned because it was <em>wrong</em>, or because it was <em>expensive in human patience</em>? The wrong ones, leave abandoned. The expensive-in-patience ones just got cheap. Those are the ones to pick back up first.</p><p>The impatience tax got repealed. The disciplines it used to make unaffordable are sitting right there, mostly unchanged, waiting for someone to notice the price changed.</p><div><hr></div><p><em>Bob Matsuoka is CTO of <a href="https://www.duettocloud.com/">Duetto</a> and writes about AI-powered engineering at <a href="https://hyperdev.substack.com/">HyperDev</a>.</em></p><p><strong>Related reading:</strong></p><ul><li><p>The Other Shoe Has Dropped &#8212; Why enterprise AI bills don&#8217;t match the per-token price collapse</p></li><li><p><a href="https://aipowerranking.com/">AI Power Ranking</a> &#8212; Tool comparisons and benchmarks for AI practitioners</p></li><li><p><a href="https://www.linkedin.com/newsletters/ai-power-ranking-7345782916301418496/">LinkedIn Newsletter</a> &#8212; Strategic AI insights for CTOs and engineering leaders</p></li></ul>]]></content:encoded></item><item><title><![CDATA[The Era of the Leader/Practitioner]]></title><description><![CDATA[Putting "Do" back into "Lead"]]></description><link>https://hyperdev.matsuoka.com/p/the-era-of-the-leaderpractitioner</link><guid isPermaLink="false">https://hyperdev.matsuoka.com/p/the-era-of-the-leaderpractitioner</guid><dc:creator><![CDATA[Robert Matsuoka]]></dc:creator><pubDate>Mon, 01 Jun 2026 11:31:28 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!E_Vr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3731b710-763c-4f36-860f-560f95b11d2b_1024x700.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!E_Vr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3731b710-763c-4f36-860f-560f95b11d2b_1024x700.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!E_Vr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3731b710-763c-4f36-860f-560f95b11d2b_1024x700.png 424w, https://substackcdn.com/image/fetch/$s_!E_Vr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3731b710-763c-4f36-860f-560f95b11d2b_1024x700.png 848w, https://substackcdn.com/image/fetch/$s_!E_Vr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3731b710-763c-4f36-860f-560f95b11d2b_1024x700.png 1272w, https://substackcdn.com/image/fetch/$s_!E_Vr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3731b710-763c-4f36-860f-560f95b11d2b_1024x700.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!E_Vr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3731b710-763c-4f36-860f-560f95b11d2b_1024x700.png" width="1024" height="700" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3731b710-763c-4f36-860f-560f95b11d2b_1024x700.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:700,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1378805,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/200067020?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6656d57-be88-4993-a972-b7c0c5fd743d_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!E_Vr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3731b710-763c-4f36-860f-560f95b11d2b_1024x700.png 424w, https://substackcdn.com/image/fetch/$s_!E_Vr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3731b710-763c-4f36-860f-560f95b11d2b_1024x700.png 848w, https://substackcdn.com/image/fetch/$s_!E_Vr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3731b710-763c-4f36-860f-560f95b11d2b_1024x700.png 1272w, https://substackcdn.com/image/fetch/$s_!E_Vr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3731b710-763c-4f36-860f-560f95b11d2b_1024x700.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Something shifted in the last year, and it took me a while to name it.</p><p>A growing number of people are running organizations while still doing real hands-on technical work. Not as a hobby, not on weekends, not as a vanity exercise to keep their commit graph green. They are building things their teams depend on &#8212; and they are doing it as a deliberate part of the job, not in the cracks between meetings. The work has a specific shape. It is rarely a production feature. It is the layer underneath: developer productivity tooling, internal services, agentic harnesses, MCP connectors, the infrastructure that unblocks everyone else.</p><p>For most of my career this combination didn&#8217;t really hold together. You could code or you could lead, and the moment you tried to do both seriously, one of them rotted. I&#8217;ve watched plenty of technical executives keep a foot in the codebase and slowly become the bottleneck everyone routed around politely. The pattern was familiar enough to be a warning.</p><p>What changed is not that leaders suddenly got more disciplined. It&#8217;s that the time cost of meaningful technical contribution collapsed. Agentic coding made a hybrid role viable that wasn&#8217;t viable before &#8212; and the more I look at it, the more I think this isn&#8217;t a new invention at all. It&#8217;s the recovery of a very old idea that modern specialization interrupted.</p><p>I&#8217;m writing this as someone living in the middle of it. I spend roughly 30% of my time coding. And when I say coding, I mean directing a team of agents &#8212; much closer to that than hands-on work, which is probably the whole point. That&#8217;s not a full-time IC&#8217;s week, and it isn&#8217;t meant to be. It&#8217;s enough to stay close to the work that matters and to build the enabling infrastructure I think is worth my own hands on the keyboard.</p><h2>TL;DR</h2><ul><li><p>A distinct role is emerging: leaders who run organizations and still do deep technical work &#8212; specifically enabling/institutional work (tooling, harnesses, internal services), not critical-path product features.</p></li><li><p>This satisfies Charity Majors&#8217; actual advice. Her line was never &#8220;stop coding.&#8221; It was &#8220;stop writing code in the critical path.&#8221; Enabling work fits that exactly.</p></li><li><p>The integration of strategist and practitioner has deep cross-cultural precedent &#8212; Japan&#8217;s <em>bunbu-ry&#333;d&#333;</em>, Rome&#8217;s Marcus Aurelius, China&#8217;s <em>wen-wu</em>, the Renaissance polymath, Mattis&#8217;s &#8220;warrior monk.&#8221; The modern role is a recovery, not a novelty.</p></li><li><p>Agentic coding is what makes it newly viable: focused sessions now deliver output that once required sustained, uninterrupted immersion. The Anthropic 2026 data shows ~27% of AI-assisted work is work that &#8220;wouldn&#8217;t have been done otherwise.&#8221;</p></li><li><p>Directing agents feels like delegation &#8212; the same skill leaders already use with human reports. Which is why experienced leaders adapt to it more naturally than juniors do.</p></li></ul><h2>The pattern, named</h2><p>The difficulty with the coding executive was never philosophical. It was attentional.</p><p>Charity Majors mapped this years ago in <a href="https://charity.wtf/2017/05/11/the-engineer-manager-pendulum/">The Engineer/Manager Pendulum</a>, and the piece holds up because she was precise about the mechanism. Management is interruptive by design &#8212; your job is to be available, to unblock, to absorb the chaos so your team doesn&#8217;t have to. Serious engineering is the opposite. It requires blocking interruptions for long enough to hold a complex system in your head. Two incompatible attention modes. Try to run both at once and you do neither well.</p><p>But here is the part people skip when they quote her. Majors never said managers should stop coding. Her actual advice was sharper: <em>don&#8217;t write code in the critical path.</em> Don&#8217;t be the person others are waiting on. Stay technical, stay sharp, just don&#8217;t make yourself a dependency that blocks shipping. &#8220;The best frontline eng managers in the world,&#8221; she wrote, &#8220;are the ones that are never more than 2-3 years removed from hands-on work.&#8221;</p><p>That distinction carries the whole argument. Because there is a category of technical work that is consequential without being critical-path, and it turns out to be exactly the work senior people are best positioned to do.</p><p>Call it enabling work. Internal tooling. Developer productivity infrastructure. Agentic harnesses. MCP services that other teams plug into. Architectural prototypes that prove a direction before anyone commits to it. None of this is what blocks a release on Thursday. All of it multiplies whoever comes after. It tolerates interruption &#8212; you can pick it up Tuesday afternoon and put it down when a real fire starts &#8212; precisely because nobody is standing at your desk waiting for it.</p><p>This is also where the industry is putting its money. Gartner named platform engineering a top strategic trend for two years running and projects that 80% of large engineering organizations will run dedicated platform teams by 2026. The structural reason a leader can work here without becoming the bottleneck is built into the definition of the domain: its output unblocks others rather than blocking them.</p><p>Will Larson&#8217;s <a href="https://lethain.com/staff-engineer-archetypes/">Staff Engineer archetypes</a> circle the same territory without quite landing on it. His &#8220;Architect&#8221; sits in a permanent argument &#8212; some organizations demand the Architect stay deep in the code, others forbid it. The leader/practitioner resolves that argument by relocating it: deep in the enabling and infrastructure work, absent from production product code. Not the pendulum, not the staff IC. A real hybrid, and a newly coherent one.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jHB6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdffd3763-7ad5-4119-96d4-4e0c1d01f0cd_1024x684.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jHB6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdffd3763-7ad5-4119-96d4-4e0c1d01f0cd_1024x684.png 424w, https://substackcdn.com/image/fetch/$s_!jHB6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdffd3763-7ad5-4119-96d4-4e0c1d01f0cd_1024x684.png 848w, https://substackcdn.com/image/fetch/$s_!jHB6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdffd3763-7ad5-4119-96d4-4e0c1d01f0cd_1024x684.png 1272w, https://substackcdn.com/image/fetch/$s_!jHB6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdffd3763-7ad5-4119-96d4-4e0c1d01f0cd_1024x684.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jHB6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdffd3763-7ad5-4119-96d4-4e0c1d01f0cd_1024x684.png" width="1024" height="684" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dffd3763-7ad5-4119-96d4-4e0c1d01f0cd_1024x684.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:684,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1540586,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/200067020?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8e7bf7a-504a-4f0c-b75f-1832889fc598_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jHB6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdffd3763-7ad5-4119-96d4-4e0c1d01f0cd_1024x684.png 424w, https://substackcdn.com/image/fetch/$s_!jHB6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdffd3763-7ad5-4119-96d4-4e0c1d01f0cd_1024x684.png 848w, https://substackcdn.com/image/fetch/$s_!jHB6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdffd3763-7ad5-4119-96d4-4e0c1d01f0cd_1024x684.png 1272w, https://substackcdn.com/image/fetch/$s_!jHB6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdffd3763-7ad5-4119-96d4-4e0c1d01f0cd_1024x684.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>This is not new</h2><p>Here is where I want to slow down, because the most interesting thing about this role is how old it is.</p><p>The idea that a leader should be both a strategist and a practitioner &#8212; not separate modes to alternate between but a single integrated way of operating &#8212; shows up independently across at least five civilizations. That kind of convergence usually means a culture has found a durable answer to a real problem.</p><p>Japan gave it a name: <em>bunbu-ry&#333;d&#333;</em> (&#25991;&#27494;&#20001;&#36947;), the way of both the literary and the martial. <em>Bun</em> is letters, cultivation, strategy. <em>Bu</em> is the martial, the active, the practitioner&#8217;s hand. <em>Ry&#333;d&#333;</em> means both ways, together &#8212; not balanced, not traded off, but held at once. By the mid-fourteenth century the dual-talented warrior was already established as the model leader, and during the Edo period the Tokugawa shogunate made it official policy for the samurai class. The phrase that survives captures the stakes: culture without power is ineffective, and power without culture is barbarous.</p><p>The archetype is Miyamoto Musashi. Undefeated in more than sixty duels, often fighting with a wooden sword against live steel. He founded a two-sword school, and in the last months of his life he wrote <em>The Book of Five Rings</em> in a cave. He was also a recognized master of ink painting and calligraphy &#8212; his <em>Shrike on a Withered Branch</em> survives as a designated Important Cultural Property of Japan. The same hands that won sixty duels produced fine art a nation still protects. He didn&#8217;t oscillate between the sword and the brush. He held both, and each sharpened the other. &#8220;When I apply the principle of strategy to the ways of different arts and crafts,&#8221; he wrote, &#8220;I no longer have need for a teacher in any domain.&#8221; Mastery in one discipline illuminating all the others.</p><p>Rome had Marcus Aurelius, the philosopher-king made historical rather than theoretical. He ran the empire and commanded its armies on the Danube, and he wrote the <em>Meditations</em> in the war camps &#8212; fragmentary notes to himself, composed in the middle of campaigning and administration. That book was never published philosophy. It was a working journal, the most powerful man in the ancient world writing to stay grounded while doing the job.</p><p>China institutionalized the same ideal as <em>wen</em> and <em>wu</em> &#8212; civil cultivation and martial capability &#8212; and ran it for roughly thirteen centuries through the scholar-official class. Zeng Guofan is the canonical case: he rose through the imperial examinations to high Confucian office, then built and commanded an army of more than a hundred thousand, reportedly keeping a diary on Neo-Confucian ethics even as he directed the campaigns. The integration wasn&#8217;t left to personal taste. It was built into the examination and career structure.</p><p>And the archetype still lands today. James Mattis earned the nickname &#8220;Warrior Monk&#8221; &#8212; battlefield commander and devoted reader, a 7,000-book library, the <em>Meditations</em> carried into combat. The chain from Aurelius to Mattis is literal: the same book, eighteen centuries apart. That we still reach for &#8220;warrior monk&#8221; as a compliment for a leader tells you the integration never stopped resonating.</p><p>Across all of it, the answer is the same. The contemplative and the active were not specializations to assign to different people. They were a single discipline, each half informing the other. Modernity &#8212; with its org charts, its clean role boundaries, its professional specialization &#8212; interrupted that. The leader/practitioner is not a tech-industry novelty. It&#8217;s an old integration becoming feasible again.</p><h2>Why now</h2><p>So what actually changed? Not the wisdom. The economics.</p><p>The thing that made the coding executive a bad idea was the attention math. Serious technical work demanded long, unbroken stretches of focus &#8212; the exact resource a leadership schedule cannot reliably provide. You cannot design a system in the fifteen minutes between a board prep and a one-on-one. The pendulum was a real constraint, not a failure of will.</p><p>Agentic coding changes that math directly. The unit of work moved up a level. Instead of holding every implementation detail in working memory across a four-hour session, you specify intent, direct an agent, review what comes back, correct course, and direct again. A focused 30-minute session now produces what used to require an afternoon of immersion &#8212; not because the thinking got easier, but because the implementation cost collapsed.</p><p>The Anthropic <a href="https://resources.anthropic.com/2026-agentic-coding-trends-report">2026 Agentic Coding Trends Report</a> puts numbers on the shift. Average session length has climbed to 23 minutes in the agentic era, up from about 4 in the autocomplete era &#8212; the work got denser, not just faster. 78% of Claude Code sessions now involve multi-file edits, up from 34% a year earlier. Teams running multi-agent workflows report 2&#8211;4x faster delivery from task creation to deployment. And the figure that matters most for this argument: roughly 27% of AI-assisted work consists of tasks that &#8220;wouldn&#8217;t have been done otherwise&#8221; &#8212; the scaling projects, the nice-to-have tools, the exploratory infrastructure that was never quite worth the manual hours.</p><p>That 27% is the enabling work. It is the category that lives or dies on time cost, and it&#8217;s the category a leader/practitioner is best placed to take on.</p><p>The arithmetic is what makes 30% credible. I documented a 6&#8211;10x multiplier on focused technical sessions in <a href="https://hyperdev.matsuoka.com/the-irreducibles-what-a-pattern-master-does">The Irreducibles</a> earlier this year &#8212; a project I estimated at 150&#8211;200 billable hours compressed into roughly 50&#8211;70 hours of wall-clock time, most of which wasn&#8217;t coding at all. If directed work runs several times faster than hand-coding, then a day and a half a week can produce what once consumed a full-time engineer&#8217;s week. That&#8217;s not a marginal gain. It&#8217;s a change in what&#8217;s structurally possible.</p><p>There&#8217;s another dimension that doesn&#8217;t show up in the productivity numbers: the work itself is unstable. Agentic coding patterns are still shaking out. There aren&#8217;t many experienced practitioners, the field is moving fast, and we don&#8217;t yet have good consensus on which patterns are load-bearing and which are fashion. A manager who&#8217;s only reading about it can&#8217;t make that distinction on behalf of a team. You have to be in it to know.</p><p>There&#8217;s a counterintuitive wrinkle worth naming: the people best positioned to exploit this are the senior ones. A University of Chicago working paper from late 2025 found experienced developers were 5&#8211;6% more likely to succeed with AI agents for every standard deviation of work experience, largely because they worked plan-first &#8212; laying out objectives, alternatives, and steps before invoking the tool. That&#8217;s the opposite of the assumption that AI flattens the seniority curve. Expertise improves your ability to delegate to a model for the same reason it improves your ability to delegate to a person. AI doesn&#8217;t change what senior engineering is. It reveals what it always was.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!EYNR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F727ed32e-2441-4dc7-bf6b-4ea9a19a2cb7_1024x634.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!EYNR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F727ed32e-2441-4dc7-bf6b-4ea9a19a2cb7_1024x634.png 424w, https://substackcdn.com/image/fetch/$s_!EYNR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F727ed32e-2441-4dc7-bf6b-4ea9a19a2cb7_1024x634.png 848w, https://substackcdn.com/image/fetch/$s_!EYNR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F727ed32e-2441-4dc7-bf6b-4ea9a19a2cb7_1024x634.png 1272w, https://substackcdn.com/image/fetch/$s_!EYNR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F727ed32e-2441-4dc7-bf6b-4ea9a19a2cb7_1024x634.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!EYNR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F727ed32e-2441-4dc7-bf6b-4ea9a19a2cb7_1024x634.png" width="1024" height="634" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/727ed32e-2441-4dc7-bf6b-4ea9a19a2cb7_1024x634.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:634,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1514078,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/200067020?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1745d9f4-d955-45cb-b61a-c12956852f96_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!EYNR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F727ed32e-2441-4dc7-bf6b-4ea9a19a2cb7_1024x634.png 424w, https://substackcdn.com/image/fetch/$s_!EYNR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F727ed32e-2441-4dc7-bf6b-4ea9a19a2cb7_1024x634.png 848w, https://substackcdn.com/image/fetch/$s_!EYNR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F727ed32e-2441-4dc7-bf6b-4ea9a19a2cb7_1024x634.png 1272w, https://substackcdn.com/image/fetch/$s_!EYNR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F727ed32e-2441-4dc7-bf6b-4ea9a19a2cb7_1024x634.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Directing agents is delegation</h2><p>This is the part that interested me the most, and it&#8217;s the bridge between the leadership job and the technical one.</p><p>A year or so ago, working with Claude Code felt like coding. Now it feels like delegating. I <a href="https://hyperdev.matsuoka.com/coding-to-delegation-shift">wrote about that shift</a> when it first became undeniable &#8212; the move from being a programmer who uses AI to being something closer to a technical project manager who directs it. Anthropic&#8217;s report uses the same vocabulary, describing engineers moving &#8220;from writing code to orchestrating the systems that write it.&#8221;</p><p>What I didn&#8217;t fully appreciate at the time is how directly that maps onto the muscle leaders already have. Delegating to an agent feels, in practice, just like delegating to a human engineer. You frame the problem, set the constraints, hand it off, and come back to assess the result. Give a directive, walk away, return to completed work. That loop is the daily reality of management. Leaders developed it because they had to, and it transfers to agents almost without friction.</p><p>So the leader/practitioner doesn&#8217;t have to become a coder again in the old sense. The skill in demand is judgment plus delegation, and that&#8217;s the skill leadership has been building all along. The hands-on knowledge tells you what to ask for and whether the answer is any good. The delegation instinct does the rest.</p><p>This is why the enabling work and the agentic tools fit together so cleanly. Enabling work tends to be well-defined, non-user-facing, and long-horizon &#8212; exactly the profile agents handle well and exactly the profile that tolerates a leader&#8217;s interrupted schedule. The hands-on contribution mostly takes the form of specifying constraints and patterns, which is what I&#8217;ve called <a href="https://hyperdev.matsuoka.com/what-does-a-pattern-master-do">pattern mastery</a>: when you write the pattern down, you&#8217;ve written the spec, and the spec multiplies everyone else&#8217;s output.</p><h2>What it actually looks like</h2><p>Let me ground this without turning it into a war story.</p><p>The concrete examples from my own work are the kind of thing I mean. I built an agentic harness &#8212; the orchestration layer I <a href="https://hyperdev.matsuoka.com/its-the-harness-stupid">argued is the real determinant of AI coding outcomes</a>, where the same model can swing more than a quality point depending on the scaffolding around it. I built MCP services, the <a href="https://hyperdev.matsuoka.com/is-this-the-era-of-the-connector">org-specific connectors</a> that replaced a handful of standalone tools in a few hours of directed work each. None of that was a production feature. All of it was infrastructure other people now depend on.</p><p>Here&#8217;s a detail that may make the point. I now have an &#8220;AI architect&#8221; on my leadership team helping maintain the very infrastructure I originally built &#8212; not just the harness, but our inference relationships, our training program, office hours, the real human work I no longer have the time, or the right, to be doing myself. And I expect to hand off more over time. The enabling work I do today partly becomes the system that does tomorrow&#8217;s enabling work. That handoff is the role in miniature: you build the thing that multiplies the team, then you put someone in place to build the next version.</p><p>The proportion matters. Around 30% hands-on keeps judgment fresh without putting me in the critical path. Even full-time senior ICs aren&#8217;t full-time coders &#8212; Bain&#8217;s Jue Wang, quoted in MIT Technology Review last December, put developer coding time at 20&#8211;40%, with the rest going to analysis, strategy, and the surrounding work. A leader at 30% isn&#8217;t doing something exotic. They&#8217;re spending their technical budget on the layer where it compounds.</p><p>The decision is not &#8220;how do I find time to code.&#8221; It&#8217;s &#8220;what enabling work is worth my own hands?&#8221; Those are different questions. The first leads to the bottleneck I watched so many executives become. The second leads somewhere useful.</p><h2>The choice</h2><p>I&#8217;ll resist overselling this, because it isn&#8217;t for everyone and it isn&#8217;t automatic.</p><p>This is a deliberate role, not a default. Staying technically current costs ongoing investment, and the work is often invisible &#8212; enabling infrastructure rarely shows up in a quarterly review the way a shipped feature does. The role is easy to misread, too. From the outside, a CTO who codes can look like a CTO who hasn&#8217;t let go. The defense against that reading is the discipline Majors named: stay out of the critical path. Build the multipliers, not the blockers.</p><p>The returns are real, though. Fresh judgment &#8212; the kind that lets you evaluate not just what and why but how. Trust from engineers who see you in the work rather than above it. And institutional infrastructure that makes the whole team faster, built by the person with both the technical depth and the positional authority to prioritize it.</p><p>There&#8217;s a closing note in the history worth keeping. <em>Bunbu-ry&#333;d&#333;</em> wasn&#8217;t only a personal aspiration. The Tokugawa shogunate institutionalized it &#8212; built career and class structures around the assumption that a leader should be both. China did the same with its examination system. We&#8217;re not there yet. For now, the leader/practitioner is an individual choice, made one person at a time, made viable by tools that finally collapsed the cost of staying hands-on.</p><p>But the precedent suggests where this could go. When a way of working proves durable, organizations eventually build structures around it. The era of the leader/practitioner is early. It is also, I&#8217;d argue, a return &#8212; to an integration we knew was valuable long before we had the means to make it practical again.</p><div><hr></div><p><em>Bob Matsuoka is CTO of <a href="https://www.duettocloud.com/">Duetto</a> and writes about AI-powered engineering at <a href="https://hyperdev.substack.com/">HyperDev</a>.</em></p><p><strong>Related reading:</strong></p><ul><li><p><a href="https://hyperdev.matsuoka.com/coding-to-delegation-shift">From Coding with AI to Managing AI</a> &#8212; When agentic coding starts to feel like delegation</p></li><li><p><a href="https://hyperdev.matsuoka.com/its-the-harness-stupid">It&#8217;s The Harness, Stupid!</a> &#8212; Why orchestration quality dominates AI coding outcomes</p></li><li><p><a href="https://aipowerranking.com/">AI Power Ranking</a> &#8212; Tool comparisons and benchmarks for AI practitioners</p></li><li><p><a href="https://www.linkedin.com/newsletters/ai-power-ranking-7345782916301418496/">LinkedIn Newsletter</a> &#8212; Strategic AI insights for CTOs and engineering leaders</p></li></ul>]]></content:encoded></item><item><title><![CDATA[The Other Shoe Has Dropped]]></title><description><![CDATA[The Economics of Enterprise Inference Usage]]></description><link>https://hyperdev.matsuoka.com/p/the-other-shoe-has-dropped</link><guid isPermaLink="false">https://hyperdev.matsuoka.com/p/the-other-shoe-has-dropped</guid><dc:creator><![CDATA[Robert Matsuoka]]></dc:creator><pubDate>Fri, 29 May 2026 11:31:28 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!-Qpp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41ae5cbb-29e9-4a44-96a2-fb1724f0bf79_1024x768.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-Qpp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41ae5cbb-29e9-4a44-96a2-fb1724f0bf79_1024x768.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-Qpp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41ae5cbb-29e9-4a44-96a2-fb1724f0bf79_1024x768.png 424w, https://substackcdn.com/image/fetch/$s_!-Qpp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41ae5cbb-29e9-4a44-96a2-fb1724f0bf79_1024x768.png 848w, https://substackcdn.com/image/fetch/$s_!-Qpp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41ae5cbb-29e9-4a44-96a2-fb1724f0bf79_1024x768.png 1272w, https://substackcdn.com/image/fetch/$s_!-Qpp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41ae5cbb-29e9-4a44-96a2-fb1724f0bf79_1024x768.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-Qpp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41ae5cbb-29e9-4a44-96a2-fb1724f0bf79_1024x768.png" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/41ae5cbb-29e9-4a44-96a2-fb1724f0bf79_1024x768.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1174362,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/199673840?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41ae5cbb-29e9-4a44-96a2-fb1724f0bf79_1024x768.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-Qpp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41ae5cbb-29e9-4a44-96a2-fb1724f0bf79_1024x768.png 424w, https://substackcdn.com/image/fetch/$s_!-Qpp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41ae5cbb-29e9-4a44-96a2-fb1724f0bf79_1024x768.png 848w, https://substackcdn.com/image/fetch/$s_!-Qpp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41ae5cbb-29e9-4a44-96a2-fb1724f0bf79_1024x768.png 1272w, https://substackcdn.com/image/fetch/$s_!-Qpp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41ae5cbb-29e9-4a44-96a2-fb1724f0bf79_1024x768.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Two stories from the last two weeks. Uber <a href="https://fortune.com/2026/05/26/uber-coo-ai-spending-tokens-claude-code/">burned through its entire 2026 AI budget in four months</a> on Claude Code, with COO Andrew Macdonald telling the <em>Rapid Response</em> podcast that the link between that spend and shipped consumer features &#8220;is not there yet.&#8221; <a href="https://www.theinformation.com/newsletters/applied-ai/uber-cto-shows-claude-code-can-blow-ai-budgets">The Information had the underlying numbers a few weeks earlier</a>: engineer adoption from 32% to 84% between December and March, heavy users running $500&#8211;$2,000/month in tokens, and CTO Praveen Neppalli Naga torching $1,200 in a two-hour demo. Same week, Microsoft told thousands of engineers in its Experiences + Devices division that their Claude Code access is going away. <a href="https://www.windowscentral.com/microsoft/microsoft-cancels-claude-code-licenses-shifting-developers-to-github-copilot-cli-a-move-likely-driven-by-financial-motives">Windows Central, summarizing The Verge&#8217;s Notepad scoop</a>, has the cutoff at June 30 &#8212; end of fiscal year &#8212; with cost as the actual driver even though EVP Rajesh Jha framed it publicly as convergence on Copilot CLI.</p><p>Two of the most AI-forward enterprises on the planet, same tool, same week. The &#8220;AI is failing&#8221; takes were live within hours.</p><p>I don&#8217;t buy that framing.</p><p>The headlines are getting it wrong. Uber didn&#8217;t cancel anything &#8212; adoption ran ahead of the budget and the company blew its annual spend keeping up. That&#8217;s a planning failure, not a verdict on the tool. Microsoft didn&#8217;t divorce Anthropic either; they&#8217;re still consuming Claude through Azure Foundry and M365 Copilot. What they cancelled is a specific license &#8212; Claude Code at the engineer-seat level &#8212; because engineers preferred it over GitHub Copilot CLI and the division was paying for that preference.</p><p>What both stories show: AI is a new tool and we haven&#8217;t learned to use it well yet. The teams over budget pointed it at problems it wasn&#8217;t the cheapest way to solve, then let it decide for itself how much work to do per task.</p><p>I&#8217;ve made <a href="https://hyperdev.matsuoka.com/p/what-the-other-shoe-sounds-like-when">the cloud parallel here before</a>. Early cloud was expensive and misused. Lift-and-shift workloads routinely ran two or three times their on-prem cost &#8212; I watched that play out across teams I ran, and it took years to correct through architecture. Then the industry learned: right-sizing, reserved instances, autoscaling, serverless where it fit, on-prem where it didn&#8217;t. The bills came down. Not because compute got dramatically cheaper, but because we got more careful about what we asked the cloud to do. AI is in the same phase. Cheap per-token, expensive per-task, and the gap is architectural.</p><p>A few weeks ago I ran controlled head-to-head tests on Opus 4.6 and Opus 4.7 against identical coding tasks. Both models passed every test. Opus 4.7 cost 3.6&#215; more to do it. Same outcomes, same rate card, dramatically more tokens.</p><p>Finout&#8217;s analysis of production deployments <a href="https://www.finout.io/blog/claude-opus-4.7-pricing-the-real-cost-story-behind-the-unchanged-price-tag">tells the same story at scale</a>: up to a 35% cost increase overnight, driven by tokenizer changes that don&#8217;t show up on the per-token rate card. Not one team&#8217;s bad luck &#8212; the shape of the bill across the enterprise AI buyer base right now. The second of two shoes on AI economics.</p><p>I wrote about that <a href="https://hyperdev.matsuoka.com/p/opus-46-vs-47-the-real-cost-of-incremental">version-to-version cost drift in detail</a>. Providers can collapse per-token prices in public while the per-task bill drifts upward in private. The first shoe was the per-token price collapse that made everyone optimistic. The second is the behavioral and architectural cost overhang now landing on quarterly P&amp;Ls.</p><p><strong>TL;DR</strong></p><ul><li><p>Per-token costs at GPT-3.5-equivalent performance are down roughly 280&#215; since late 2022, per <a href="https://aiindex.stanford.edu/report/">Stanford&#8217;s AI Index 2025</a>. Vendor revenue tells the opposite story: Anthropic&#8217;s annualized revenue went from <a href="https://www.pymnts.com/artificial-intelligence-2/2026/anthropic-hits-30-billion-run-rate-as-enterprise-demand-accelerates/">$1B in January 2025 to $30B by April 2026</a> &#8212; a 30&#215; move in 15 months, coming from enterprise inference, not consumer subscriptions.</p></li><li><p>Gartner&#8217;s April 2026 survey: just 28% of AI use cases fully meet ROI expectations, 78% of IT leaders report material AI charges that didn&#8217;t show up in any procurement model.</p></li><li><p>Gartner estimates agentic workflows consume 5&#8211;30&#215; more tokens than equivalent chatbot interactions; Stanford&#8217;s Digital Economy Lab puts the upper bound for coding agents at 1,000&#215;. The cost driver isn&#8217;t the model &#8212; it&#8217;s the workflow architecture wrapped around it.</p></li><li><p>Two patterns hold the line in production. Search-first architectures put inference at the end of a deterministic pipeline. Consolidated single-shot designs replace multi-call chains.</p></li><li><p>Inference is a power tool, not a default. Use it with specific ROI goals per call, apply it to <em>code</em> solutions rather than to directly solve problems, and bound it with deterministic structure on both ends.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TPPJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F491e72b0-16e1-49fe-a551-391ba05461bc_1024x768.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TPPJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F491e72b0-16e1-49fe-a551-391ba05461bc_1024x768.png 424w, https://substackcdn.com/image/fetch/$s_!TPPJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F491e72b0-16e1-49fe-a551-391ba05461bc_1024x768.png 848w, https://substackcdn.com/image/fetch/$s_!TPPJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F491e72b0-16e1-49fe-a551-391ba05461bc_1024x768.png 1272w, https://substackcdn.com/image/fetch/$s_!TPPJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F491e72b0-16e1-49fe-a551-391ba05461bc_1024x768.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TPPJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F491e72b0-16e1-49fe-a551-391ba05461bc_1024x768.png" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/491e72b0-16e1-49fe-a551-391ba05461bc_1024x768.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1163417,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/199673840?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F491e72b0-16e1-49fe-a551-391ba05461bc_1024x768.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!TPPJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F491e72b0-16e1-49fe-a551-391ba05461bc_1024x768.png 424w, https://substackcdn.com/image/fetch/$s_!TPPJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F491e72b0-16e1-49fe-a551-391ba05461bc_1024x768.png 848w, https://substackcdn.com/image/fetch/$s_!TPPJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F491e72b0-16e1-49fe-a551-391ba05461bc_1024x768.png 1272w, https://substackcdn.com/image/fetch/$s_!TPPJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F491e72b0-16e1-49fe-a551-391ba05461bc_1024x768.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Why the Per-Token Savings Didn&#8217;t Reach the Invoice</h2><p>Per-token economics of frontier models have been collapsing for two years. <a href="https://aiindex.stanford.edu/report/">Stanford&#8217;s AI Index 2025</a> puts the decline at roughly 280&#215; from a late-2022 baseline at GPT-3.5-equivalent performance. Most enterprise budget conversations in 2024 started from that headline. The implicit assumption: bills should be going <em>down</em>.</p><p>They&#8217;re not. The clearest read comes from the vendor side. Anthropic&#8217;s annualized revenue <a href="https://www.anthropic.com/news/anthropic-raises-30-billion-series-g-funding-380-billion-post-money-valuation">went from $1B in January 2025 to $30B by April 2026</a>, with roughly 80% from enterprise and API usage rather than consumer subscriptions. Anthropic <a href="https://www.saastr.com/anthropic-just-passed-openai-in-revenue-while-spending-4x-less-to-train-their-models/">now discloses 1,000+ customers spending more than $1M per year</a> &#8212; a cohort that doubled in under two months &#8212; alongside roughly 300,000 business customers. The mid-tier ($100K&#8211;$1M/year) grew 7&#215; year over year.</p><p>Menlo Ventures&#8217; <a href="https://menlovc.com/perspective/2025-the-state-of-generative-ai-in-the-enterprise/">2025 State of Generative AI report</a> cross-checks at the market level: enterprise GenAI spend tripled to $37B in 2025, with LLM API consumption alone at $8.4B by mid-year. The high tier shows up in named deals: <a href="https://sacra.com/c/anthropic/">Snowflake&#8217;s $200M multi-year partnership</a> implies a $50&#8211;70M annual run rate from one customer, and <a href="https://sacra.com/c/anthropic/">Deloitte is deploying Claude across 470,000 employees</a>.</p><p>For a typical enterprise running multiple production AI workloads, $500K&#8211;$2M per year is now the realistic floor. Fortune 100 is running $10M&#8211;$50M+, the most AI-intensive past $100M. The Gartner numbers point the same direction: <a href="https://www.gartner.com/en/newsroom/press-releases/2026-04-07-gartner-says-artificial-intelligence-projects-in-infrastructure-and-operations-stall-ahead-of-meaningful-roi-returns">just 28% of AI use cases fully meet ROI expectations and 20% fail outright</a>, and <a href="https://zylo.com/blog/saas-management-index/">78% of IT leaders report material AI charges</a> that didn&#8217;t show up in any procurement model.</p><p>The per-token chart is real. The invoice is also real. What closes the gap is <em>behavior</em>. Three behaviors specifically.</p><p><strong>Models do more work per task.</strong> Reasoning models reason. Agentic loops loop. The prompt that used to consume 4K tokens now consumes 40K because the assistant explores, plans, second-guesses, and verifies. Some of that is valuable. Much of it is the model performing thoroughness in a way that costs you money. The Opus 4.6-to-4.7 jump I documented earlier: same task, same outcome, 2.9&#215; more output tokens and 4.8&#215; more cache reads.</p><p><strong>Workflows fan out.</strong> A &#8220;single&#8221; task in a modern agentic system might trigger a planner, researcher, coder, reviewer, and summarizer. Each makes its own LLM calls over overlapping context. Gartner&#8217;s March 2026 analysis puts agentic workflows at 5&#8211;30&#215; the token consumption of an equivalent chatbot ask. Stanford Digital Economy Lab&#8217;s April 2026 arXiv paper goes further: coding agents can consume 1,000&#215; more tokens than equivalent chat completions. The agent isn&#8217;t more expensive because it&#8217;s smarter. It&#8217;s more expensive because it&#8217;s louder.</p><p><strong>Context windows fill themselves.</strong> Long context is a feature in marketing and a bill in practice. In our own enterprise Claude.AI usage &#8212; 82,852 messages from 329 employees over 3.5 months, audited via the Anthropic Compliance API &#8212; the average request carried 366,000 input tokens, mostly from 10-turn conversations dragging accumulated history forward into every new turn. Most production systems I&#8217;ve audited show the same fingerprint: pipelines paying for context they aren&#8217;t actually using.</p><p>None of this is fraud and none of it is mysterious. It&#8217;s the natural consequence of letting probabilistic systems decide how much work to do on every call. The savings from cheaper tokens were real. They just got consumed by an order of magnitude more tokens per task.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!T_ba!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7441c01-b144-4b0c-83a2-113c2bb730fd_1024x768.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!T_ba!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7441c01-b144-4b0c-83a2-113c2bb730fd_1024x768.png 424w, https://substackcdn.com/image/fetch/$s_!T_ba!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7441c01-b144-4b0c-83a2-113c2bb730fd_1024x768.png 848w, https://substackcdn.com/image/fetch/$s_!T_ba!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7441c01-b144-4b0c-83a2-113c2bb730fd_1024x768.png 1272w, https://substackcdn.com/image/fetch/$s_!T_ba!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7441c01-b144-4b0c-83a2-113c2bb730fd_1024x768.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!T_ba!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7441c01-b144-4b0c-83a2-113c2bb730fd_1024x768.png" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c7441c01-b144-4b0c-83a2-113c2bb730fd_1024x768.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1253809,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/199673840?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7441c01-b144-4b0c-83a2-113c2bb730fd_1024x768.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!T_ba!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7441c01-b144-4b0c-83a2-113c2bb730fd_1024x768.png 424w, https://substackcdn.com/image/fetch/$s_!T_ba!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7441c01-b144-4b0c-83a2-113c2bb730fd_1024x768.png 848w, https://substackcdn.com/image/fetch/$s_!T_ba!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7441c01-b144-4b0c-83a2-113c2bb730fd_1024x768.png 1272w, https://substackcdn.com/image/fetch/$s_!T_ba!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7441c01-b144-4b0c-83a2-113c2bb730fd_1024x768.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>What I&#8217;ve Found Shipping These Systems</h2><p>The teams handling this well aren&#8217;t the ones cutting AI usage. They&#8217;re changing the <em>shape</em> of how they use it.</p><p>The pattern I keep coming back to: treat inference as the expensive step at the end of a mostly deterministic pipeline. Do the cheap, structured work in code. Reserve the model call for the part that actually requires judgment. Then bound the call hard &#8212; context budget, output budget, quality gate on whether the call even runs.</p><p>Two examples from systems I&#8217;ve been building illustrate this from different angles.</p><h3>Example 1: Code-Intelligence &#8212; Search-First Architecture</h3><p>The naive version of a code review tool is obvious: dump the changed files into Claude, ask for a review. That works. It also costs roughly $0.03 per operation, scales linearly with repo size, and produces a lot of review output you didn&#8217;t need. Claude.AI offers a code review service &#8212; it ended up costing us thousands a month for just a few repos. Augment Code offers a well-regarded one as a GitHub app, but charges a platform fee (a meaningful fraction of our Anthropic spend) <em>just</em> to connect.</p><p>So we built our own. It leverages a multimodal search/RAG/KG engine I&#8217;d already built, so this wasn&#8217;t from scratch.</p><p>The version I actually ship uses a multi-tier search pipeline with the LLM call at the very end:</p><pre><code><code>Stage 1: Vector Search    (~$0.0002 per query, semantic similarity)
Stage 2: BM25 Reranking   (~$0.0001 per query, lexical relevance)
Stage 3: Static Analysis  (~$0.0001 per query, AST + symbol resolution)
Stage 4: Quality Gate     (free, deterministic threshold check)
Stage 5: Single LLM Call  (~$0.03 per call, only if Stages 1-4 passed)</code></code></pre><p>The first four stages cost about $0.0004 combined. They do the bulk of the work: deciding <em>what code is actually relevant</em>, ranking it, pulling structural relationships, and deciding whether the result is even worth asking an LLM about.</p><p>Hard budget controls run through the whole pipeline:</p><pre><code><code># Budget enforcement, not aspiration
MAX_CONTEXT_FILES = 6          # cap on what we send to the model
MAX_REVIEW_WORDS = 500         # cap on what the model returns
RELEVANCE_FLOOR = 0.005        # quality gate before calling the model

if combined_relevance_score &lt; RELEVANCE_FLOOR:
    # no point spending $0.03 to get a review of weakly-related code
    return SkipReason("below relevance floor")

context = select_top_n(ranked_results, MAX_CONTEXT_FILES)
review = llm.review(context, max_output_tokens=MAX_REVIEW_WORDS * 1.4)</code></code></pre><p>The <code>RELEVANCE_FLOOR</code> check is the part to underline. A meaningful percentage of review requests in real codebases don&#8217;t justify an LLM call at all &#8212; the changes are mechanical, the related code trivial, or the search signal weak enough that whatever the model says will be hallucinated context. Refusing to spend $0.03 on those cases is where most of the savings come from.</p><p>Rough economics across a quarter of usage:</p><p>Approach Cost per operation LLM calls per 1K operations Direct &#8220;review the diff&#8221; ~$0.030 1,000 Search-first with gates ~$0.0034 average ~430</p><p>About 80% of the workflow logic ends up deterministic: search, ranking, static analysis, gating. The model handles the last 20% &#8212; judgment on curated context. Same outcome from the user&#8217;s perspective, roughly an order of magnitude cheaper, with more predictable failure modes because most of the pipeline is debuggable code rather than prompt behavior.</p><p>The limitation: this is more work than wiring up a single LLM call, and the gates are only as good as your search infrastructure. The payoff is on the cost and determinism side, not on speed of initial implementation.</p><h3>Example 2: duetto-intelligence &#8212; Context Injection Instead of Replacement</h3><p>The second pattern comes from duetto-intelligence, internal tooling I&#8217;ve been building against that same enterprise Claude.AI usage &#8212; 82,852 real employee messages over 3.5 months, not a thought experiment. The problem here isn&#8217;t &#8220;should we call the LLM at all.&#8221; It&#8217;s: given that our people are already routing structured-data questions through a $0.274-per-request multi-turn Sonnet conversation, what&#8217;s a cheaper path that doesn&#8217;t degrade the answer?</p><p>The audit data made the gap concrete. Average request: 366K input tokens, ten-turn conversation, $0.274 to Anthropic. The same query answered through a Haiku single-pass against deterministically-retrieved internal data: $0.0009. A 300:1 cost ratio on the slice of traffic about structured product knowledge, CRM/account prep, JIRA, people and org lookups.</p><p>Not all traffic. Somewhere in the 35&#8211;40% range based on classified samples. About half of remaining queries genuinely need full Sonnet or Opus reasoning &#8212; writing, debugging, free-form analysis &#8212; and shouldn&#8217;t be intercepted at all.</p><p>The framing matters, because it&#8217;s easy to mis-read this as &#8220;replace Claude with a smaller model.&#8221; It isn&#8217;t. duetto-intelligence acts as a <strong>context injection layer</strong> in front of the user-facing model. When a query has structured-data intent, we route a sub-query to DI, get back a bounded structured result, and inject that into the prompt the larger model sees. The expensive model still does the reasoning &#8212; it just stops being responsible for the deterministic data retrieval it&#8217;s bad at and expensive for.</p><p>The naive design for the routing layer looks like this:</p><pre><code><code>1. Classify the user's intent             &#8594; LLM call (~150 tokens)
2. Plan which subsystems to query         &#8594; LLM call (~250 tokens)
3. Call subsystem A, summarize response   &#8594; LLM call (~300 tokens)
4. Call subsystem B, summarize response   &#8594; LLM call (~300 tokens)
5. Synthesize a final answer              &#8594; LLM call (~250 tokens)
                                          Total: ~1,250 tokens, 5 calls</code></code></pre><p>Each call is plausible on its own. Together they&#8217;re a tax on every user interaction. Latency stacks linearly with calls, and any one of the five can hallucinate in a way that corrupts the rest of the chain.</p><p>The consolidated design replaces three steps with deterministic code:</p><pre><code><code>1. Classify intent                  &#8594; LLM call  (~80 tokens, tight tagger prompt)
2. Fan-out to subsystems            &#8594; code      (0 tokens, intent &#8594; call map)
3. Consolidated synthesis           &#8594; LLM call  (~200 tokens, structured input)
                                       Total: ~280 tokens, 2 calls</code></code></pre><p>The trick is the intent classifier. It produces a tag from a fixed vocabulary of 87 tags &#8212; <code>revenue.query.ytd</code>, <code>forecast.compare.year_over_year</code>, <code>account.lookup.contact</code>, and so on. Each tag maps deterministically to a set of downstream calls in plain Python. No LLM in the routing step. The model isn&#8217;t asked &#8220;what should we do?&#8221; It&#8217;s asked &#8220;what is the user asking about?&#8221; &#8212; a much smaller, bounded question.</p><p>We validated against a 50-query test corpus drawn directly from the compliance data &#8212; real questions people had asked the model in production. After tuning, 100% of those queries land on the fast path with no LLM call required for routing. That&#8217;s the proof the deterministic-discipline part holds at the boundary; the routing isn&#8217;t quietly falling back to a second model call to bail itself out.</p><p>Budget enforcement is explicit in every prompt template:</p><pre><code><code>SOURCE_CHAR_BUDGET = 600    # per data source pulled into context
OUTPUT_TOKEN_BUDGET = 200   # cap on synthesis response
INTENT_TAG_VOCAB = load_intent_taxonomy()  # 87 tags, versioned

def synthesize(intent_tag: str, sources: list[Source]) -&gt; str:
    trimmed = [s.truncate(SOURCE_CHAR_BUDGET) for s in sources]
    return llm.complete(
        prompt=template(intent_tag, trimmed),
        max_tokens=OUTPUT_TOKEN_BUDGET,
    )</code></code></pre><p>The economics, against measured baselines rather than estimates: current Claude.AI Chat spend across the 329-user population runs $15,264 over 3.5 months. Roughly $4,360/month, driven by that $0.274-per-request multi-turn average. If DI intercepts the 35&#8211;40% of traffic that&#8217;s structured-retrieval underneath, projected savings come in around $1,500&#8211;1,700/month, or $18&#8211;20K/year on this single user population. The leverage isn&#8217;t from picking a cheaper model. It&#8217;s from refusing to pay Sonnet rates to answer questions a deterministic system already has the data for.</p><p>The intent vocabulary is the contract. New capability means a new tag, a new downstream mapping, a new prompt template. The model never has to invent structure on the fly. This is what people mean by &#8220;use the LLM to code solutions, not to solve problems directly&#8221; &#8212; the routing logic lives in code, the tagger is a thin call, the synthesis is bounded.</p><p>The source-character budget matters more than the output budget. The compliance audit confirmed it: production overspend is on the <em>input</em> side. 366K input tokens against a few hundred output. Models will happily consume whatever context you hand them. Trimming at the source &#8212; 600 characters per source, no exceptions &#8212; is how you keep per-call cost from drifting upward as the system gets more capable.</p><p>The limitation: this only works on the routable slice. The pattern isn&#8217;t &#8220;eliminate inference.&#8221; It&#8217;s &#8220;stop spending $0.274 to answer questions that have a structured answer at $0.0009.&#8221;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-uHZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe0775c6-8743-46e4-908a-ee4e2a8db5d6_1024x768.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-uHZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe0775c6-8743-46e4-908a-ee4e2a8db5d6_1024x768.png 424w, https://substackcdn.com/image/fetch/$s_!-uHZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe0775c6-8743-46e4-908a-ee4e2a8db5d6_1024x768.png 848w, https://substackcdn.com/image/fetch/$s_!-uHZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe0775c6-8743-46e4-908a-ee4e2a8db5d6_1024x768.png 1272w, https://substackcdn.com/image/fetch/$s_!-uHZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe0775c6-8743-46e4-908a-ee4e2a8db5d6_1024x768.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-uHZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe0775c6-8743-46e4-908a-ee4e2a8db5d6_1024x768.png" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fe0775c6-8743-46e4-908a-ee4e2a8db5d6_1024x768.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1351360,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/199673840?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe0775c6-8743-46e4-908a-ee4e2a8db5d6_1024x768.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-uHZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe0775c6-8743-46e4-908a-ee4e2a8db5d6_1024x768.png 424w, https://substackcdn.com/image/fetch/$s_!-uHZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe0775c6-8743-46e4-908a-ee4e2a8db5d6_1024x768.png 848w, https://substackcdn.com/image/fetch/$s_!-uHZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe0775c6-8743-46e4-908a-ee4e2a8db5d6_1024x768.png 1272w, https://substackcdn.com/image/fetch/$s_!-uHZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe0775c6-8743-46e4-908a-ee4e2a8db5d6_1024x768.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Why &#8220;Just Use a Cheaper Model&#8221; Doesn&#8217;t Save You</h2><p>A reasonable objection: aren&#8217;t the cheaper, smaller models supposed to handle this? Why not route everything to Haiku or an open model and call it solved?</p><p>The pricing math seems to support it. The behavioral math doesn&#8217;t.</p><p>Two things go wrong when you swap a cheaper model into an unstructured workflow. Cheaper models are usually less efficient <em>per task</em> &#8212; more turns to converge, more exploration, more hallucination, which means more retries and verification calls. A model 5&#215; cheaper per token can run 1.5&#8211;2&#215; more expensive per completed task if your workflow lets it spin.</p><p>And the workflow itself is where most of the cost lives. The 5&#8211;30&#215; multiplier is structural, not modal &#8212; it exists regardless of which model you point at it. Switching from Sonnet to Haiku inside an unbounded agent loop changes the per-token cost. It doesn&#8217;t change the loop.</p><p>Model choice is a 2&#8211;5&#215; lever. Architecture choice is closer to an order-of-magnitude lever in the systems I&#8217;ve shipped &#8212; consistently larger than what model swaps deliver. Most teams are over-tuning the model selection and under-tuning the structure around it.</p><p>The default assumption &#8212; including from vendors with strong incentives to sell you more tokens &#8212; is that the answer to AI cost is buying inference more cleverly. The actual answer is using inference less, more deliberately, with hard bounds on what each call is allowed to do.</p><h2>How I Think About Inference Now</h2><p><strong>Inference is a power tool.</strong> Not a default. You don&#8217;t reach for it when a search query, a regex, or a <code>switch</code> statement would do. You reach for it when you need probabilistic judgment over unstructured input. Every call you don&#8217;t make is the cheapest call.</p><p><strong>Use it to code solutions, not to solve problems.</strong> The highest-leverage use of LLMs in my workflow is generating the deterministic code that then handles the workflow without further LLM calls. A model that writes you a 50-line classifier is more valuable than a model that <em>acts as</em> the classifier on every request forever. The first costs tokens once. The second costs tokens every transaction for the life of the system.</p><p><strong>Wrap every call in a budget.</strong> Context budget on the input, token budget on the output, quality gate on whether the call runs at all. Treat the LLM call as you&#8217;d treat a paid API with rate limits and SLA penalties. Because it is.</p><p><strong>Set specific ROI targets per call.</strong> &#8220;AI-assisted code review&#8221; is too coarse to optimize. &#8220;Reviewing files with relevance score &gt; 0.005, capped at 6 files, returning 500 words&#8221; is something you can measure cost-per-outcome on. Even loose ROI math at the call level surfaces where you&#8217;re paying for theater.</p><p><strong>Treat behavioral cost as the primary risk.</strong> Model rate cards will keep coming down. They are not your problem. Your problem is what your pipeline asks of the model and what the model decides to do once asked. That&#8217;s the line item that grew while the unit cost dropped 280&#215;. That&#8217;s the shoe that just dropped.</p><h2>What This Means If You&#8217;re Running an AI Budget</h2><p>Three things to look at, in order of how much they&#8217;ll move the line item.</p><p><strong>Audit the call graph, not the rate card.</strong> Pull a representative day of production traffic and trace the actual LLM calls per user task. Count them. Most teams find a handful of workflows producing the majority of cost, and most of those have 2&#8211;4 LLM calls that could be replaced by deterministic code. That&#8217;s the consolidated-design pattern from the duetto-intelligence example. 50&#8211;80% reductions are common when you actually look.</p><p><strong>Put quality gates in front of inference.</strong> For any workflow where the LLM call is expensive and the input quality is variable, add a deterministic check that decides whether the call is worth making. That&#8217;s the search-first pattern from the code-intelligence example. The savings come from the calls you <em>don&#8217;t</em> make, which never show up on the invoice.</p><p><strong>Set hard context budgets and enforce them in code.</strong> Per-source character limits, per-call token caps, no &#8220;just in case&#8221; context stuffing. The output budget gets attention because it&#8217;s visible. The input budget is usually where the actual money goes.</p><p>None of this requires changing models, switching providers, or making bets on the next frontier release. It&#8217;s architectural work inside the pipeline you already have &#8212; work the per-token price chart has been letting people defer.</p><p>The teams that do it over the next two quarters will look like they got a 5&#8211;10&#215; cost improvement from &#8220;AI getting cheaper.&#8221; The teams that don&#8217;t will look like AI got 3&#8211;4&#215; more expensive while everyone else&#8217;s costs fell. Same providers, same models, same rate cards. Different shoe.</p><div><hr></div><p><em>Bob Matsuoka is CTO of <a href="https://www.duettocloud.com/">Duetto</a> and writes about AI-powered engineering at <a href="https://hyperdev.substack.com/">HyperDev</a>.</em></p><p><strong>Related reading:</strong></p><ul><li><p><a href="https://hyperdev.matsuoka.com/p/opus-46-vs-47-the-real-cost-of-incremental">Opus 4.6 vs 4.7: The Real Cost of Incremental AI Improvements</a> &#8212; The first shoe, on per-task cost drift between model versions</p></li><li><p><a href="https://aipowerranking.com/">AI Power Ranking</a> &#8212; Tool comparisons and benchmarks for AI practitioners</p></li><li><p><a href="https://www.linkedin.com/newsletters/ai-power-ranking-7345782916301418496/">LinkedIn Newsletter</a> &#8212; Strategic AI insights for CTOs and engineering leaders</p></li></ul><p></p>]]></content:encoded></item><item><title><![CDATA[Is Your Digital Brain the Light Saber of the AI Era?]]></title><description><![CDATA[Jedi Knight Tools for the Knowledge Worker]]></description><link>https://hyperdev.matsuoka.com/p/is-your-digital-brain-the-light-saber</link><guid isPermaLink="false">https://hyperdev.matsuoka.com/p/is-your-digital-brain-the-light-saber</guid><dc:creator><![CDATA[Robert Matsuoka]]></dc:creator><pubDate>Wed, 20 May 2026 12:02:03 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!90qs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6533e48e-54c3-4232-aa80-b2fe447f0136_1993x1378.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!90qs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6533e48e-54c3-4232-aa80-b2fe447f0136_1993x1378.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!90qs!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6533e48e-54c3-4232-aa80-b2fe447f0136_1993x1378.png 424w, https://substackcdn.com/image/fetch/$s_!90qs!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6533e48e-54c3-4232-aa80-b2fe447f0136_1993x1378.png 848w, https://substackcdn.com/image/fetch/$s_!90qs!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6533e48e-54c3-4232-aa80-b2fe447f0136_1993x1378.png 1272w, https://substackcdn.com/image/fetch/$s_!90qs!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6533e48e-54c3-4232-aa80-b2fe447f0136_1993x1378.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!90qs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6533e48e-54c3-4232-aa80-b2fe447f0136_1993x1378.png" width="1456" height="1007" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6533e48e-54c3-4232-aa80-b2fe447f0136_1993x1378.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1007,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:5326200,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/198515384?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6533e48e-54c3-4232-aa80-b2fe447f0136_1993x1378.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!90qs!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6533e48e-54c3-4232-aa80-b2fe447f0136_1993x1378.png 424w, https://substackcdn.com/image/fetch/$s_!90qs!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6533e48e-54c3-4232-aa80-b2fe447f0136_1993x1378.png 848w, https://substackcdn.com/image/fetch/$s_!90qs!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6533e48e-54c3-4232-aa80-b2fe447f0136_1993x1378.png 1272w, https://substackcdn.com/image/fetch/$s_!90qs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6533e48e-54c3-4232-aa80-b2fe447f0136_1993x1378.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Your Digital Lightsaber is your AI Memory</figcaption></figure></div><p>My Duetto colleague Jake Becker is sharp. He&#8217;s been ahead of AI adoption on our team &#8212; experimenting early, pushing for new tools, staying current. Last week he messaged me on Slack: &#8220;I wish I had my own CTO Assistant. Like what you have.&#8221;</p><p>I paused.</p><p>I have one. I&#8217;ve had several, in different forms, going back months. But I hadn&#8217;t said anything about it publicly.</p><p>That gap &#8212; between an AI-forward person who knows the tools and a practitioner who actually has the thing &#8212; is what this piece is about.</p><p><strong>TL;DR</strong></p><ul><li><p>Off-the-shelf AI tools are capable but contextually blind. You still have to leave your domain to get help.</p></li><li><p>Serious knowledge workers &#8212; writers, researchers, historians &#8212; have always built their own knowledge systems. Never trusted a vendor to hold their material.</p></li><li><p>Coding is shifting toward what writing has always been: directing, curating, maintaining a living body of knowledge.</p></li><li><p>Building your own AI memory and search layer is the new rite of passage. The tool you build IS the skill being developed.</p></li><li><p>The latest versions of my own stack: trusty-memory and trusty-search &#8212; both open source, both installable today.</p></li></ul><h2>Everyone Wants One. Few Have One.</h2><p>Jake&#8217;s comment revealed something I&#8217;d been taking for granted.</p><p>The major vendors are actively working on this &#8212; memory features, RAG pipelines, personalization layers. But those solutions are optimized for breadth, not for one person&#8217;s actual work across functions.</p><p>ChatGPT, Claude, Copilot &#8212; these are all capable. They&#8217;re also still contextually blind to <em>you</em>. You have to leave your environment, paste in context, explain your situation from scratch, then interpret the output back into your actual work. Vendors are working on it. But the solutions remain generic, and every session still starts from zero.</p><p>I wrote about this in March &#8212; <a href="https://hyperdev.matsuoka.com/personal-bots-abomination">Everyone Blamed Clawd Bot&#8217;s Execution. The Concept Was the Problem.</a> The structural flaw of universal assistants isn&#8217;t fixable. They require you to leave your context to get help. What actually works is the opposite: your tools get assistant capabilities, and assistance comes to where your context lives.</p><p>Off-the-shelf tools haven&#8217;t solved this. They&#8217;ve gotten more powerful &#8212; better reasoning, longer context, faster inference &#8212; but they still don&#8217;t know your codebase, your decisions, your institutional history, your current sprint. They know a lot about the world, and very little about you.</p><p>Jake wanted <em>my</em> assistant. But what he actually wants is <em>his</em> assistant. The one that knows what he knows.</p><p>That&#8217;s a different problem entirely.</p><p>The job changed first.</p><h2>Coding Is Becoming Writing</h2><p>Practitioners feel it before analysts name it.</p><p>A few years ago, being a strong engineer meant writing a lot of code quickly and correctly. Today, with agentic AI coders at their disposal, the best engineers I watch spend their time directing, reviewing, specifying, and curating. The unit of work has moved up a level. Implementation is increasingly delegated. Judgment &#8212; about architecture, trade-offs, what to build and why &#8212; is the differentiator.</p><p>This is not what happens when automation replaces a skill. It&#8217;s what happens when a new discipline appears.</p><p>Writers have always worked this way. A novelist doesn&#8217;t produce words per minute as a primary metric. They produce decisions &#8212; what to say, in what order, with what emphasis. The words are the output of the decisions, not the work itself. What makes a writer productive over a career isn&#8217;t typing speed. It&#8217;s having a system: notes, research, accumulated material, patterns of thought that compound over years.</p><p>Some (typically senior) engineers are arriving at the same realization. Your value isn&#8217;t the code. It&#8217;s the judgment, the accumulated context, the knowledge of what was tried and why it failed. The question is whether that accumulates in your head alone &#8212; which doesn&#8217;t scale, and doesn&#8217;t survive a context switch &#8212; or whether it lives in a system.</p><p>Good engineers who learn to use their AI tools effectively generate better code &#8212; the stack amplifies judgment and accumulated knowledge. The gap isn&#8217;t between skilled and unskilled engineers in isolation; it&#8217;s between engineers who&#8217;ve wired their knowledge into their tools and those who haven&#8217;t. The knowledge is real in both cases. Only in one case does it compound.</p><p>The writers I&#8217;ve observed who sustain serious output over decades all have the same property: they know where things are. Their research is retrievable. Their earlier thinking is available to their current thinking. The system makes the person bigger than their working memory.</p><p>That&#8217;s the gap.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ynth!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd8f91e4-5bab-49cc-8f75-bb43db12e9e7_1999x1227.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ynth!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd8f91e4-5bab-49cc-8f75-bb43db12e9e7_1999x1227.png 424w, https://substackcdn.com/image/fetch/$s_!Ynth!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd8f91e4-5bab-49cc-8f75-bb43db12e9e7_1999x1227.png 848w, https://substackcdn.com/image/fetch/$s_!Ynth!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd8f91e4-5bab-49cc-8f75-bb43db12e9e7_1999x1227.png 1272w, https://substackcdn.com/image/fetch/$s_!Ynth!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd8f91e4-5bab-49cc-8f75-bb43db12e9e7_1999x1227.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ynth!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd8f91e4-5bab-49cc-8f75-bb43db12e9e7_1999x1227.png" width="1456" height="894" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dd8f91e4-5bab-49cc-8f75-bb43db12e9e7_1999x1227.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:894,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:5603002,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/198515384?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd8f91e4-5bab-49cc-8f75-bb43db12e9e7_1999x1227.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Ynth!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd8f91e4-5bab-49cc-8f75-bb43db12e9e7_1999x1227.png 424w, https://substackcdn.com/image/fetch/$s_!Ynth!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd8f91e4-5bab-49cc-8f75-bb43db12e9e7_1999x1227.png 848w, https://substackcdn.com/image/fetch/$s_!Ynth!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd8f91e4-5bab-49cc-8f75-bb43db12e9e7_1999x1227.png 1272w, https://substackcdn.com/image/fetch/$s_!Ynth!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd8f91e4-5bab-49cc-8f75-bb43db12e9e7_1999x1227.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"> The AI Zettelkasten</figcaption></figure></div><h2>Writers Don&#8217;t Trust Vendors With Their Material</h2><p>Niklas Luhmann, a German sociologist working in the 1950s, produced 70 books and nearly 400 articles over his career. I haven&#8217;t written a book yet &#8212; but I have published over 200 articles at HyperDev. Worth naming the parallel. He credits his output to his Zettelkasten &#8212; a slip-box system of 90,000 interconnected index cards, each with a unique identifier linking it to related thoughts. Not a filing cabinet. A network of ideas that got richer with every addition.</p><p>And it&#8217;s worth asking: is that so different from what an AI knowledge system does today? The Zettelkasten was an analog precursor to what trusty-memory and trusty-search do programmatically &#8212; indexing ideas, linking related thoughts, surfacing connections that wouldn&#8217;t otherwise be visible. Luhmann was doing manually what these tools do automatically. Same architecture. New substrate.</p><p>He didn&#8217;t use a vendor product. He built a system that reflected how he thought. The architecture of his Zettelkasten was itself an expression of his intellectual method.</p><p>This isn&#8217;t a historical quirk. It&#8217;s a pattern. Serious knowledge workers have always built their own systems &#8212; commonplace books, research archives, private wikis. The reason is structural: the schema you design reflects how you think. That&#8217;s not something a product gives you. A product gives everyone the same schema.</p><p>Andrej Karpathy pointed at something similar last month with his LLM Wiki gist. His framing: use LLMs not just to write code, but to build and maintain a personal knowledge base. &#8220;Obsidian is the IDE, the LLM is the programmer, the wiki is the codebase.&#8221; Three folders, structured Markdown, a large context window. He concluded: &#8220;I think there is room here for an incredible new product.&#8221;</p><p>He&#8217;s right there&#8217;s room. I wrote about his framing in <a href="https://hyperdev.matsuoka.com/whats-in-your-second-brain">What&#8217;s In Your Second Brain?</a> The product comment is where I&#8217;d push back. You can build tooling around the pattern. You can&#8217;t productize the schema. The schema is the moat &#8212; because it reflects how <em>you</em> think, not how a product manager thinks you think. The ones who get it aren&#8217;t waiting for a product.</p><h2>The Lightsaber Rite of Passage</h2><p>In Star Wars canon, a Padawan doesn&#8217;t receive a lightsaber. They build one.</p><p>The ritual is called the Gathering. Initiates travel alone to the Crystal Caves of Ilum. They have to find their kyber crystal &#8212; the crystal that&#8217;s attuned to them through the Force. The caves are shaped by the initiate&#8217;s own fears and insecurities. The crystal doesn&#8217;t go to the strongest or the fastest. It bonds with the person who confronts what&#8217;s in the way.</p><p>Then they build it themselves, guided by Professor Huyang.</p><p>You can&#8217;t buy this. You can&#8217;t inherit it. The construction is the training. The tool reflects the builder.</p><p>I&#8217;m not the first to reach for this metaphor in tech. But I think it lands differently now. Building your own AI memory and search layer isn&#8217;t just useful. It&#8217;s diagnostic. You can&#8217;t do it without confronting what you actually know, how you actually think, what deserves to persist and what doesn&#8217;t. The schema you design for your knowledge base is a statement about your mind.</p><p>The engineers I know who are operating at the highest level right now &#8212; CTOs, senior architects, tech leads at places moving fast &#8212; they all quietly roll their own. They don&#8217;t announce it. They just have it.</p><h2>My Own Lineage</h2><p>I&#8217;ve been building versions of this for months.</p><p>trusty-izzie was the first &#8212; a simple wrapper. Then ai-commander, a more structured approach to context management. Then open-mpm and claude-mpm, which was where I started thinking seriously about multi-agent orchestration. Then kuzu-memory, a graph-backed memory layer. Then mcp-vector-search, semantic search over my entire codebase.</p><p>Each iteration taught me something about what I actually needed. Not what I thought I needed. What the practice revealed.</p><p>This piece was drafted with a configured writing assistant &#8212; <a href="https://github.com/bobmatnyc/claude-mpm">claude-mpm</a> loaded with my publication style guide, my voice patterns, my article archive. That&#8217;s a saber too. Not a generic chat interface. A tool shaped around how I think and write, producing work I can actually publish rather than work I have to fix. The saber list keeps growing.</p><p>The latest two are the most capable tools I&#8217;ve built.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cyse!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0fc209a6-05f2-4b6d-8950-09c7562bd1ac_2005x1419.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cyse!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0fc209a6-05f2-4b6d-8950-09c7562bd1ac_2005x1419.png 424w, https://substackcdn.com/image/fetch/$s_!cyse!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0fc209a6-05f2-4b6d-8950-09c7562bd1ac_2005x1419.png 848w, https://substackcdn.com/image/fetch/$s_!cyse!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0fc209a6-05f2-4b6d-8950-09c7562bd1ac_2005x1419.png 1272w, https://substackcdn.com/image/fetch/$s_!cyse!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0fc209a6-05f2-4b6d-8950-09c7562bd1ac_2005x1419.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cyse!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0fc209a6-05f2-4b6d-8950-09c7562bd1ac_2005x1419.png" width="1456" height="1030" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0fc209a6-05f2-4b6d-8950-09c7562bd1ac_2005x1419.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1030,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:6229361,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/198515384?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0fc209a6-05f2-4b6d-8950-09c7562bd1ac_2005x1419.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!cyse!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0fc209a6-05f2-4b6d-8950-09c7562bd1ac_2005x1419.png 424w, https://substackcdn.com/image/fetch/$s_!cyse!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0fc209a6-05f2-4b6d-8950-09c7562bd1ac_2005x1419.png 848w, https://substackcdn.com/image/fetch/$s_!cyse!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0fc209a6-05f2-4b6d-8950-09c7562bd1ac_2005x1419.png 1272w, https://substackcdn.com/image/fetch/$s_!cyse!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0fc209a6-05f2-4b6d-8950-09c7562bd1ac_2005x1419.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>The Latest Sabers</h2><p><strong>trusty-memory</strong> is a machine-wide AI memory daemon written in Rust. It uses what I call the Memory Palace architecture &#8212; multiple named palaces, each for a different domain. Sub-5ms baseline retrieval on Apple Silicon. It runs as an MCP server for Claude Code, which means my assistant stores and retrieves memories automatically, across sessions, without me managing any of it explicitly.</p><pre><code><code>cargo install trusty-memory</code></code></pre><p>Available at <a href="https://crates.io/crates/trusty-memory">crates.io/crates/trusty-memory</a>.</p><p><strong>trusty-search</strong> is a machine-wide hybrid code search daemon, also in Rust. Always-on, one install per machine. It combines BM25 lexical search with HNSW vector search (all-MiniLM-L6-v2 INT8) and a Knowledge Graph with 1-2 hop expansion, fused via Reciprocal Rank Fusion. It exposes an MCP server with 11 tools. Stdio and HTTP/SSE transports drop straight into Claude Code.</p><pre><code><code>cargo install trusty-search</code></code></pre><p>Available at <a href="https://crates.io/crates/trusty-search">crates.io/crates/trusty-search</a>.</p><p>These tools, along a set of custom reporting pythons apps along with a custom Slack Bot I use to access the data remotely, comprise my digital brain.</p><p>These aren&#8217;t products I bought. These are tools I built, iterated, and use daily. They know my codebase the way a Zettelkasten knows a scholar&#8217;s intellectual territory &#8212; not because a vendor configured them, but because I did.</p><p>To be precise: trusty-memory and trusty-search are infrastructure utilities &#8212; the memory layer and the search layer. Building the actual assistant that uses them is a separate act of customization. That&#8217;s where the lightsaber metaphor completes: the kyber crystal is only part of it. The construction &#8212; what you build with the crystal &#8212; is the saber.</p><p>When Jake said he wished he had a CTO Assistant, this is what he was gesturing at. Not a prompt template. Not a workflow. A living knowledge layer that compounds.</p><h2>The Right Question</h2><p>Jake asked: &#8220;Can I get a CTO Assistant?&#8221;</p><p>That&#8217;s the wrong question. It assumes the thing is available off the shelf, and the task is finding and configuring it.</p><p>The right question is: &#8220;What would it take to build one that knows what I know?&#8221;</p><p>That question is harder. It requires confronting the shape of your knowledge, what&#8217;s worth persisting, how to structure retrieval. It&#8217;s uncomfortable in the same way the Crystal Caves are uncomfortable &#8212; not because the work is technically difficult, but because you have to be honest about what you actually have.</p><p>Not everyone needs to write Rust. The specific technology isn&#8217;t the point. The point is that the engineers asking the right question are already operating differently. They&#8217;re working like writers &#8212; maintaining a living body of knowledge, building systems that compound, treating their accumulated context as an asset rather than a liability.</p><p>Writers&#8217; discipline has been creeping into engineering for a while. AI made it urgent.</p><p>If you&#8217;re waiting for a product to hand you the thing, you&#8217;re waiting for someone to build your lightsaber. It won&#8217;t be yours.</p><div><hr></div><p><em>Bob Matsuoka is CTO of <a href="https://www.duettocloud.com/">Duetto</a> and writes about AI-powered engineering at <a href="https://hyperdev.substack.com/">HyperDev</a>.</em></p><p><strong>Related reading:</strong></p><ul><li><p><a href="https://hyperdev.matsuoka.com/whats-in-your-second-brain">What&#8217;s In Your Second Brain?</a> &#8212; Karpathy&#8217;s LLM Wiki and the case for a compounding knowledge layer</p></li><li><p><a href="https://hyperdev.matsuoka.com/personal-bots-abomination">Everyone Blamed Clawd Bot&#8217;s Execution. The Concept Was the Problem.</a> &#8212; Why universal assistants are architecturally broken</p></li><li><p><a href="https://aipowerranking.com/">AI Power Ranking</a> &#8212; Tool comparisons and benchmarks for AI practitioners</p></li><li><p><a href="https://www.linkedin.com/newsletters/ai-power-ranking-7345782916301418496/">LinkedIn Newsletter</a> &#8212; Strategic AI insights for CTOs and engineering leaders</p></li></ul>]]></content:encoded></item><item><title><![CDATA[Why Rust is the AI Language of the Future]]></title><description><![CDATA[And It&#8217;s All About the Compiler]]></description><link>https://hyperdev.matsuoka.com/p/why-rust-is-the-ai-language-of-the</link><guid isPermaLink="false">https://hyperdev.matsuoka.com/p/why-rust-is-the-ai-language-of-the</guid><dc:creator><![CDATA[Robert Matsuoka]]></dc:creator><pubDate>Wed, 13 May 2026 12:03:01 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!pFtT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99aa09ae-80b1-4080-b709-45e9f1016d32_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>A developer&#8217;s journey from Python to Rust reveals why the compiler, not the runtime, will define the next generation of AI systems.</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pFtT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99aa09ae-80b1-4080-b709-45e9f1016d32_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pFtT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99aa09ae-80b1-4080-b709-45e9f1016d32_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!pFtT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99aa09ae-80b1-4080-b709-45e9f1016d32_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!pFtT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99aa09ae-80b1-4080-b709-45e9f1016d32_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!pFtT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99aa09ae-80b1-4080-b709-45e9f1016d32_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pFtT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99aa09ae-80b1-4080-b709-45e9f1016d32_1024x1024.png" width="1024" height="1024" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/99aa09ae-80b1-4080-b709-45e9f1016d32_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1563722,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/197158389?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99aa09ae-80b1-4080-b709-45e9f1016d32_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pFtT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99aa09ae-80b1-4080-b709-45e9f1016d32_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!pFtT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99aa09ae-80b1-4080-b709-45e9f1016d32_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!pFtT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99aa09ae-80b1-4080-b709-45e9f1016d32_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!pFtT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99aa09ae-80b1-4080-b709-45e9f1016d32_1024x1024.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Six months ago, I abandoned my Rust/Tauri-based writing app. Not because Tauri was bad&#8212;it wasn&#8217;t. Obsidian simply worked better for my needs. But that experience with Rust left an impression: the compiler was unlike anything I&#8217;d encountered. It didn&#8217;t just catch errors; it made entire classes of bugs impossible.</p><p>Then I built AI Commander for managing tmux sessions. Python would have been the obvious choice&#8212;async programming, process management, plenty of libraries. But I chose Rust, partly out of curiosity, partly because I needed rock-solid thread control. The result was revelation: code that compiled cleanly worked correctly, and performance was extraordinary.</p><p>Now I&#8217;m deep into replacing both mcp-vector-search and kuzu-memory with pure Rust implementations. My new trusty-search outperforms the Python predecessor by an order of magnitude, even though that version used compiled Python extensions. This isn&#8217;t an isolated case&#8212;it&#8217;s part of a fundamental shift happening in AI development.</p><p><strong>Rust isn&#8217;t just another systems language. It&#8217;s becoming the infrastructure language of AI, and the reason has everything to do with letting the compiler handle the heavy lifting&#8212;especially as AI systems increasingly write their own code.</strong></p><h2>The Performance Reality Check</h2><p>The performance gains are measurable and significant. <a href="https://github.com/huggingface/tokenizers">Hugging Face&#8217;s tokenizers library</a>, rewritten in Rust, delivers 10-100x speedups over pure Python implementations. <a href="https://www.pola.rs/">Polars, the Rust-based DataFrame library</a>, consistently outperforms pandas in data processing benchmarks. In embedded systems, <a href="https://blog.rust-embedded.org/performance-engineering/">Rust achieves 98% of C performance</a> while eliminating memory safety issues entirely.</p><p>This isn&#8217;t just raw speed. Memory efficiency matters significantly in production AI systems. Where Python applications might consume gigabytes for large-scale data processing, equivalent Rust implementations often require 3-5x less memory. The difference compounds when deploying AI systems at scale.</p><p>This performance advantage stems from a fundamental philosophical difference in how Rust approaches the safety-speed trade-off.</p><p>When your AI system processes millions of sensor events per second in an autonomous vehicle, or makes split-second trading decisions with millions at stake, runtime failures aren&#8217;t just inconvenient&#8212;they&#8217;re catastrophic. Python&#8217;s flexibility, its greatest strength for research and experimentation, becomes a liability in these production environments.</p><h2>The Compiler Does the Heavy Lifting</h2><p>Here&#8217;s where Rust&#8217;s philosophy is revolutionary. Most languages force you to choose between safety and performance, between readable code and efficient code. Rust rejects this trade-off entirely through what it calls &#8220;zero-cost abstractions.&#8221;</p><p>The principle, borrowed from C++ but perfected in Rust, is elegantly simple: <a href="https://without.boats/blog/zero-cost-abstractions/">&#8220;What you don&#8217;t use, you don&#8217;t pay for. What you do use, you couldn&#8217;t hand code any better.&#8221;</a></p><p>Consider async programming, critical for AI systems handling multiple data streams. In Python, async operations carry runtime overhead&#8212;coroutine scheduling, context switching, memory allocation for tasks. In Rust, the compiler transforms your high-level async/await code into efficient state machines. No runtime scheduler, no hidden allocations, just bare-metal performance wrapped in readable syntax.</p><p><strong>The genius is that the compiler absorbs the complexity.</strong> You write code that looks high-level and readable:</p><pre><code><code>async fn process_ai_requests(stream: &amp;mut DataStream) -&gt; Result&lt;Vec&lt;Response&gt;, Error&gt; {
    let mut responses = Vec::new();

    while let Some(request) = stream.next().await {
        let response = ai_model.infer(request).await?;
        responses.push(response);
    }

    Ok(responses)
}</code></code></pre><p>But the compiler generates assembly that rivals hand-optimized C. The ownership system prevents data races without locks. The borrow checker eliminates memory leaks without garbage collection. The type system catches logic errors that would be runtime failures in dynamic languages.</p><p><strong>This is the heavy lifting</strong>: converting human-friendly abstractions into machine-efficient reality, all at compile time, with zero runtime cost.</p><p>This compiler-centric philosophy becomes even more critical as we enter an era where AI systems increasingly write their own code.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!zxzP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0807e8e4-b156-4e49-9563-5f16505ac2db_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!zxzP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0807e8e4-b156-4e49-9563-5f16505ac2db_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!zxzP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0807e8e4-b156-4e49-9563-5f16505ac2db_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!zxzP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0807e8e4-b156-4e49-9563-5f16505ac2db_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!zxzP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0807e8e4-b156-4e49-9563-5f16505ac2db_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!zxzP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0807e8e4-b156-4e49-9563-5f16505ac2db_1024x1024.png" width="1024" height="1024" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0807e8e4-b156-4e49-9563-5f16505ac2db_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1581968,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/197158389?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0807e8e4-b156-4e49-9563-5f16505ac2db_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!zxzP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0807e8e4-b156-4e49-9563-5f16505ac2db_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!zxzP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0807e8e4-b156-4e49-9563-5f16505ac2db_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!zxzP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0807e8e4-b156-4e49-9563-5f16505ac2db_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!zxzP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0807e8e4-b156-4e49-9563-5f16505ac2db_1024x1024.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>The AI Code Generation Protection Paradox</h2><p>As AI-powered coding tools like Copilot, Claude Code, and Cursor become mainstream, a new challenge emerges: <strong>how do you ensure correctness when the code author might not fully understand what they&#8217;ve written?</strong></p><p>When Claude or Copilot generates Python, Java, TypeScript, or C code, human oversight becomes the primary error-checking mechanism. Code reviewers need to catch type mismatches, memory leaks, race conditions, and logic errors that might not surface until production. As AI-generated code becomes more prevalent and complex, <a href="https://thenewstack.io/rust-creator-graydon-hoare-talks-about-security-history-and-rust/">human review processes struggle to keep pace</a> with the volume and complexity of generated code.</p><p><strong>Think about it</strong>: an AI might generate a perfectly logical-looking concurrent data structure in Python that works flawlessly in single-threaded tests but creates subtle race conditions under load. A human reviewer might miss the threading implications entirely. The same AI generating Rust code? The compiler would reject it outright if the data sharing wasn&#8217;t provably safe.</p><p><strong>Rust&#8217;s compiler fundamentally changes this dynamic.</strong> It doesn&#8217;t matter if the code was written by a human, an AI, or a collaboration between both&#8212;if it compiles, entire categories of dangerous bugs simply cannot exist. Memory safety violations? Caught at compile time. Data races in concurrent code? Impossible. Use-after-free errors? Eliminated by the ownership system.</p><p>This creates a fascinating inversion: <strong>Rust may be the best language for AI-generated code precisely because it doesn&#8217;t trust the programmer.</strong> While other languages rely on developer discipline and code review to catch errors, Rust enforces correctness through the type system and ownership model. The compiler becomes an automated, exhaustive code reviewer that never gets tired, never misses subtle bugs, and never waves through &#8220;probably fine&#8221; code.</p><p>As Microsoft research demonstrates, <a href="https://msrc.microsoft.com/blog/2019/07/a-proactive-approach-to-more-secure-code/">70% of security vulnerabilities stem from memory safety issues</a> that could be eliminated at compile time. When AI systems generate infrastructure code, this shifts from a productivity concern to an existential safety issue.</p><p><strong>The compiler isn&#8217;t just doing heavy lifting for performance; it&#8217;s providing safety guarantees that scale beyond human oversight.</strong></p><h2>Memory Safety in the Age of AI</h2><p><a href="https://thenewstack.io/rust-creator-graydon-hoare-talks-about-security-history-and-rust/">Rust&#8217;s creator, Graydon Hoare</a>, designed the language after getting stuck in a broken elevator caused by software crashes. His goal was simple: write fast, small code without memory bugs. That mission has become critical as AI systems move from labs to production infrastructure.</p><p>In traditional software, memory safety issues mean patches and updates. In AI systems controlling physical infrastructure&#8212;autonomous vehicles, medical devices, financial trading systems&#8212;they can mean catastrophic failure.</p><p>Rust&#8217;s ownership model doesn&#8217;t just prevent crashes; it eliminates entire categories of bugs that plague concurrent AI workloads. While C++ applications in multi-threaded environments regularly exhibit race conditions during testing, Rust&#8217;s compile-time guarantees make such issues structurally impossible.</p><p><strong>But here&#8217;s the crucial part: this safety comes at zero runtime cost.</strong> Unlike garbage-collected languages that trade performance for memory safety, <a href="https://reintech.io/blog/understanding-rust-zero-cost-abstractions/">Rust enforces safety through compile-time checks</a>. Your production AI system runs with the performance of C but the safety guarantees that traditional systems languages can&#8217;t provide.</p><h2>The Python-Rust Symbiosis</h2><p>Here&#8217;s where the narrative gets interesting. <a href="https://blog.jetbrains.com/rust/2025/11/10/rust-vs-python-finding-the-right-balance-between-speed-and-simplicity/">The emerging trend isn&#8217;t Rust replacing Python&#8212;it&#8217;s Rust and Python working together</a>. The <a href="https://pyo3.rs/">PyO3 bridge</a>, Rust-powered Python tools like <a href="https://github.com/astral-sh/ruff">Ruff</a> and <a href="https://github.com/astral-sh/uv">uv</a>, and the growing number of &#8220;Python API, Rust engine&#8221; libraries show that the two languages are becoming symbiotic.</p><p><strong>Python remains dominant for AI research and experimentation.</strong> The ecosystem is unmatched: PyTorch, TensorFlow, scikit-learn, Hugging Face Transformers. PyTorch&#8217;s <a href="https://github.com/pytorch/pytorch">100,000+ GitHub stars</a> reflect an ecosystem that won&#8217;t disappear overnight. When you&#8217;re prototyping models, analyzing data, or building internal tools, Python&#8217;s flexibility and ecosystem make it irreplaceable.</p><p><strong>But production tells a different story.</strong> When performance, memory usage, and reliability matter&#8212;when you&#8217;re building the infrastructure that powers AI systems rather than the models themselves&#8212;Rust increasingly dominates.</p><p>The pattern emerging across tech giants is clear: Python for the interface layer, Rust for the infrastructure layer. Experimentation happens in Jupyter notebooks; production happens in compiled Rust binaries.</p><h2>Looking Forward: Infrastructure vs. Experimentation</h2><p>This division isn&#8217;t arbitrary&#8212;it reflects the maturation of AI from research curiosity to critical infrastructure. Research requires flexibility, rapid iteration, and access to cutting-edge libraries. Production requires reliability, performance, and safety guarantees.</p><p><strong>Rust excels at infrastructure because the compiler handles the complexity of building robust systems.</strong> Memory safety, concurrency, error handling&#8212;all the concerns that make production systems hard to build correctly&#8212;are enforced by the type system rather than relying on developer discipline.</p><p>Consider the trajectory: <a href="https://thenewstack.io/microsofts-bold-goal-replace-1b-lines-of-c-c-with-rust/">Microsoft aims to eliminate C and C++ from their codebase by 2030</a>, replacing it with Rust. <a href="https://rustfoundation.org/resource/rust-and-ai-position-statement/">The Rust Foundation officially positions</a> the language as ideal for &#8220;ultra-reliable AI systems.&#8221; Major tech companies are increasingly choosing Rust for performance-critical AI infrastructure. These aren&#8217;t experimental projects&#8212;they&#8217;re billion-dollar bets on where AI infrastructure is heading.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!OGhR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4e07498-b662-48d5-9fad-b3061ac8ecfe_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!OGhR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4e07498-b662-48d5-9fad-b3061ac8ecfe_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!OGhR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4e07498-b662-48d5-9fad-b3061ac8ecfe_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!OGhR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4e07498-b662-48d5-9fad-b3061ac8ecfe_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!OGhR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4e07498-b662-48d5-9fad-b3061ac8ecfe_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!OGhR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4e07498-b662-48d5-9fad-b3061ac8ecfe_1024x1024.png" width="1024" height="1024" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f4e07498-b662-48d5-9fad-b3061ac8ecfe_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1509742,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/197158389?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4e07498-b662-48d5-9fad-b3061ac8ecfe_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!OGhR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4e07498-b662-48d5-9fad-b3061ac8ecfe_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!OGhR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4e07498-b662-48d5-9fad-b3061ac8ecfe_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!OGhR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4e07498-b662-48d5-9fad-b3061ac8ecfe_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!OGhR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4e07498-b662-48d5-9fad-b3061ac8ecfe_1024x1024.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>The Compiler Advantage</h2><p>What makes Rust uniquely suited for AI isn&#8217;t just performance or safety&#8212;it&#8217;s the fundamental approach of front-loading complexity into the compiler rather than dealing with it at runtime.</p><p><strong>In AI systems, runtime failures are exponentially more expensive than compile-time failures.</strong> A training job that crashes after days of computation. An inference system that memory-leaks during peak traffic. A edge AI device that locks up in a safety-critical situation. These failures cascade through systems in ways that research environments rarely encounter.</p><p>Rust&#8217;s compiler is exhaustive in a way that benefits AI development specifically. It catches data races in concurrent model training. It prevents buffer overflows in tensor operations. It enforces proper error handling in distributed inference systems. It does this without performance overhead, without runtime monitoring, and without hoping that comprehensive testing caught every edge case.</p><p><strong>The compiler becomes your co-pilot in building reliable AI infrastructure.</strong></p><h2>The Personal Proof</h2><p>My own experience validates this thesis. AI Commander needed complex thread management for tmux session control&#8212;exactly the kind of concurrency that&#8217;s error-prone in traditional languages. Rust&#8217;s ownership system made data sharing across threads not just safe, but intuitive. The code that compiled was correct.</p><p>With trusty-search, the performance gains over mcp-vector-search weren&#8217;t just about Rust being &#8220;faster than Python.&#8221; They were about Rust enabling architectural choices&#8212;zero-copy string processing, efficient memory layouts, fine-grained concurrency control&#8212;that would be risky or impossible in garbage-collected languages.</p><p><strong>These aren&#8217;t micro-optimizations. They&#8217;re fundamental design differences that become critical at scale.</strong></p><h2>The Real Limitations of Rust for AI</h2><p>Before declaring Rust the future, it&#8217;s essential to acknowledge where it falls short compared to Python in AI contexts.</p><p><strong>The learning curve is steep.</strong> Rust&#8217;s ownership model requires a fundamental mental shift that can slow initial development. Developers comfortable with garbage-collected languages often struggle with borrowing rules, especially when building complex data structures. The compiler&#8217;s strict requirements, while beneficial long-term, can feel obstructionist when prototyping.</p><p><strong>The ecosystem gap remains significant.</strong> While Rust has impressive performance libraries, Python&#8217;s AI ecosystem is vast and mature. TensorFlow, PyTorch, scikit-learn, and thousands of specialized ML libraries have no Rust equivalents. Building production ML pipelines often requires library combinations that simply don&#8217;t exist in Rust.</p><p><strong>Compilation time can kill iteration speed.</strong> Where Python allows instant feedback during development, Rust&#8217;s compilation process&#8212;especially for complex projects&#8212;can introduce friction that slows experimentation. This matters significantly in AI research where rapid iteration drives discovery.</p><p><strong>Talent pool challenges are real.</strong> Finding experienced Rust developers is substantially harder than finding Python developers. The language&#8217;s complexity means onboarding takes longer, potentially impacting team velocity and project timelines.</p><p><strong>Not every AI workload benefits from Rust&#8217;s strengths.</strong> Data analysis, model training with existing frameworks, and one-off scripts often don&#8217;t require the safety and performance guarantees Rust provides. Using Rust for these tasks can be overkill that reduces productivity without meaningful benefits.</p><p>These limitations explain why the Python-Rust symbiosis model makes sense: leverage each language where it excels rather than forcing one-size-fits-all solutions.</p><h2>The Future is Compiled</h2><p>As AI systems become infrastructure&#8212;as they move from experimental notebooks to production systems handling millions of users&#8212;the languages that power them will need to provide the guarantees that infrastructure requires.</p><p>Python will continue to dominate the experimental and interface layers. But the backbone, the high-performance inference systems, the real-time edge deployments, the safety-critical applications&#8212;these will increasingly be built in languages where the compiler, not the runtime, ensures correctness.</p><p><strong>Rust represents a fundamental shift in how we approach building reliable systems: moving complexity from runtime to compile time, from testing to proving, from hoping to knowing.</strong></p><p>This shift becomes critical as we enter the era of AI-generated code. When AI systems are writing the infrastructure that runs other AI systems, traditional human oversight breaks down. We need tools that can verify correctness automatically, at compile time, without relying on human reviewers to catch subtle but catastrophic errors.</p><p>The compiler does the heavy lifting so production systems can focus on their actual job: running AI workloads reliably, safely, and at scale. In a world where AI infrastructure is becoming as critical as power grids and financial networks&#8212;and where that infrastructure is increasingly written by AI itself&#8212;that guarantee isn&#8217;t just nice to have, it&#8217;s existential.</p><p>The age of agentic AI isn&#8217;t just about better models. It&#8217;s about building the reliable, high-performance infrastructure those models need to operate in the real world, written by AI systems that can&#8217;t be trusted to write safe code in traditional languages. And increasingly, that infrastructure is being built in Rust, one compile-time guarantee at a time.</p><div><hr></div><p><em>Bob Matsuoka writes about the intersection of software engineering and AI at <a href="https://hyperdev.matsuoka.com/">HyperDev</a>. His latest Rust projects replace Python infrastructure with performance gains that would make even a compiler blush.</em></p>]]></content:encoded></item><item><title><![CDATA[AI Memento]]></title><description><![CDATA[Documents aren't the answer]]></description><link>https://hyperdev.matsuoka.com/p/ai-memento</link><guid isPermaLink="false">https://hyperdev.matsuoka.com/p/ai-memento</guid><dc:creator><![CDATA[Robert Matsuoka]]></dc:creator><pubDate>Mon, 11 May 2026 12:30:11 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!mhT3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6814f0f-dc1c-4093-8510-6b952bcace96_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mhT3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6814f0f-dc1c-4093-8510-6b952bcace96_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mhT3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6814f0f-dc1c-4093-8510-6b952bcace96_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!mhT3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6814f0f-dc1c-4093-8510-6b952bcace96_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!mhT3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6814f0f-dc1c-4093-8510-6b952bcace96_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!mhT3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6814f0f-dc1c-4093-8510-6b952bcace96_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mhT3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6814f0f-dc1c-4093-8510-6b952bcace96_1024x1024.png" width="1024" height="1024" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c6814f0f-dc1c-4093-8510-6b952bcace96_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1451351,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/197050605?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6814f0f-dc1c-4093-8510-6b952bcace96_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mhT3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6814f0f-dc1c-4093-8510-6b952bcace96_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!mhT3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6814f0f-dc1c-4093-8510-6b952bcace96_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!mhT3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6814f0f-dc1c-4093-8510-6b952bcace96_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!mhT3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6814f0f-dc1c-4093-8510-6b952bcace96_1024x1024.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Leonard Shelby can&#8217;t form new memories. In <em>Memento</em>, after the attack, every fifteen minutes or so his short-term recall resets. He has a problem most engineers would recognize: a fast working scratch space and no durable storage. So he builds a system. Polaroids with handwritten captions on the back. A wall of pinned notes. Tattoos for the load-bearing facts &#8212; the ones he can&#8217;t afford to lose, can&#8217;t afford to misfile, can&#8217;t afford to look up wrong.</p><p>What he doesn&#8217;t do is try to remember everything. He doesn&#8217;t carry a thicker notebook. He builds infrastructure to find what matters, fast, when he needs it. The pocket holds what&#8217;s relevant right now. The system decides what makes it into the pocket.</p><p>I&#8217;ve been thinking about Leonard a lot this year, because the prevailing argument in my corner of the internet is that LLMs no longer need him. The pocket got bigger. Frontier models can hold a million tokens of context. Token prices fell. So just stuff the pocket. Skip the indexing. Skip the tattoos. Read everything.</p><p>I think that&#8217;s the wrong read of where we&#8217;re going.</p><h2>The real argument, fairly stated</h2><p>The <a href="https://lighton.ai/lighton-blogs/rag-is-dead-long-live-rag-retrieval-in-the-age-of-agents">&#8220;RAG is dead&#8221;</a> <a href="https://akitaonrails.com/en/2026/04/06/rag-is-dead-long-context/">position</a> isn&#8217;t crazy. It goes roughly like this: a handful of frontier models &#8212; Gemini 1.5, Claude&#8217;s extended context tiers &#8212; can now handle contexts approaching 1M tokens without choking. Token prices have fallen to around $0.60 per million for several frontier models. Standing up a real retrieval pipeline &#8212; chunking strategy, embedding model, vector store, hybrid scoring, reranker, eval harness &#8212; is 40 to 80 engineering hours, easily. For a small, static corpus you query a few thousand times a month, just dumping the whole thing into context with prompt caching is simpler and arguably cheaper.</p><p>Claude Code is the existence proof people point at. It doesn&#8217;t ship a vector database. It uses ripgrep, file globs, and reads files into context. And it works pretty well. So why bother with the rest?</p><p>I&#8217;ll concede the envelope. For a 300K-token corpus, single tenant, low query volume, mostly static &#8212; yeah, stuff it. Cache it. Read it. Don&#8217;t build a search stack to feel sophisticated. That&#8217;s a real position and I don&#8217;t want to strawman it.</p><p>But that envelope isn&#8217;t where most of the work lives. And the argument the loud version makes &#8212; that grep replaces search, that long context replaces retrieval &#8212; confuses two different things; there are cracks in that argument.</p><h2>The pocket problem</h2><p>Here&#8217;s the first crack. <a href="https://arxiv.org/abs/2307.03172">&#8220;Lost in the middle.&#8221;</a> Liu et al., 2023, TACL: when you stuff long contexts, model performance forms a U-shape. Strong recall at the beginning, strong recall at the end, degraded recall in the middle. The 2026-gen frontier models have improved on this &#8212; early evals suggest the curve is flatter &#8212; but it&#8217;s not gone.</p><p>So the first thing the bigger pocket buys you is the ability to put a lot of important stuff exactly where the model is most likely to underweight it. You can&#8217;t reorder a 700K-token dump by relevance unless you&#8217;ve already done retrieval. Which means the question doesn&#8217;t go away &#8212; it just hides.</p><p>Second crack: cost. RAG-style queries &#8212; retrieve relevant chunks, feed 8&#8211;16K of context &#8212; run around $0.005&#8211;$0.008 per request at $0.60/million input tokens, when you count the full prompt. A naive full-context pass against a 500K-token corpus runs $0.30 in input tokens alone, before output. That&#8217;s a 40&#8211;60x ratio at list price, and it widens if you have high cache-miss rates or large output windows. At 50K daily queries &#8212; not enormous, this is a mid-sized internal tool &#8212; you&#8217;re choosing between a few hundred dollars a day and tens of thousands. That&#8217;s not a rounding error you absorb. That&#8217;s a re-platforming decision that wakes someone up.</p><p>Third crack, and this is the one that actually makes me grumpy. Grep is not search. Grep is lexical pattern matching. Searching for <code>auth_token</code> finds the string <code>auth_token</code>. It finds it in commented-out dead code from 2023. It finds it in test fixtures with hardcoded sample values. It finds it in a vendor library you don&#8217;t even own. It misses the function called <code>refresh_session_credential</code> that does what you&#8217;re actually looking for, because the words are different. Vector search finds that function. Grep doesn&#8217;t. They&#8217;re not the same tool. Pretending they are is how you ship the wrong fix at 2 AM.</p><p>When people say &#8220;Claude Code uses grep, so search is over,&#8221; they&#8217;re describing a tool optimized for a bounded, file-system-structured corpus where the author controls the index shape. That&#8217;s a legitimate architecture for that problem. It does not generalize to &#8220;ten million product images&#8221; or &#8220;a 30-year codebase across 800 repos&#8221; or &#8220;tenant-scoped documents under per-row access control.&#8221; Long context can&#8217;t replace a knowledge graph, either. You can serialize a graph into text &#8212; flatten the edges, encode the relationships &#8212; but then you&#8217;ve lost typed edges and efficient query-time traversal. You can&#8217;t join across it. You can&#8217;t reason about edge types. You&#8217;d need to flatten the graph to feed it in, at which point you&#8217;ve discarded the structure that made it useful.</p><p>With me so far? The argument isn&#8217;t &#8220;long context is bad.&#8221; Long context is great. The argument is that long context is a <em>consumer</em> of retrieval, not a replacement for it.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7U4n!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53ac32bd-7a2e-4149-82fc-ddfbe69a2873_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7U4n!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53ac32bd-7a2e-4149-82fc-ddfbe69a2873_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!7U4n!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53ac32bd-7a2e-4149-82fc-ddfbe69a2873_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!7U4n!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53ac32bd-7a2e-4149-82fc-ddfbe69a2873_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!7U4n!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53ac32bd-7a2e-4149-82fc-ddfbe69a2873_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7U4n!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53ac32bd-7a2e-4149-82fc-ddfbe69a2873_1024x1024.png" width="1024" height="1024" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/53ac32bd-7a2e-4149-82fc-ddfbe69a2873_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1245807,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/197050605?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53ac32bd-7a2e-4149-82fc-ddfbe69a2873_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7U4n!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53ac32bd-7a2e-4149-82fc-ddfbe69a2873_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!7U4n!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53ac32bd-7a2e-4149-82fc-ddfbe69a2873_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!7U4n!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53ac32bd-7a2e-4149-82fc-ddfbe69a2873_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!7U4n!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53ac32bd-7a2e-4149-82fc-ddfbe69a2873_1024x1024.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>What good retrieval actually looks like</h2><p>A modern retrieval stack, the kind that holds up under real query volume, isn&#8217;t one thing. It&#8217;s a layered pipeline.</p><p>Lexical first &#8212; BM25, TF-IDF, the unfashionable stuff. It&#8217;s still excellent at exact tokens, identifiers, error messages, version numbers. The things vector embeddings smudge.</p><p>Dense second &#8212; vector search over embeddings. all-MiniLM-L6-v2 is the workhorse at 384 dimensions. Catches semantic neighbors, paraphrases, the function-named-different-but-doing-the-same-thing case.</p><p>Fuse them &#8212; Reciprocal Rank Fusion at k=60 is the standard, and it&#8217;s standard because it works. You don&#8217;t pick BM25 <em>or</em> vector. You merge their rank lists.</p><p>Rerank &#8212; a cross-encoder over the top-k from RRF. Slower per-pair, but you only run it on the candidates that made it through the cheap layers.</p><p>The numbers from a 2026 benchmark pass I was reading &#8212; <a href="https://arxiv.org/abs/2604.01733">&#8220;From BM25 to Corrective RAG,&#8221;</a> April 2026 &#8212; make the layered case cleanly:</p><ul><li><p>Dense vectors alone: Recall@5 of 0.587</p></li><li><p>BM25 alone: 0.644</p></li><li><p>Hybrid via RRF: 0.695</p></li><li><p>Hybrid + neural reranker: 0.816</p></li></ul><p>That last number is 39% better than dense-only. It&#8217;s also 17% better than RRF without the reranker. The layers compound. In systems that need both recall and precision at scale, you usually want all four layers. And &#8212; this part matters &#8212; none of them are replaced by a bigger context window. You still have to pick what goes into the pocket.</p><p>There&#8217;s a piece on top of this, too: intent. A query like &#8220;where is <code>process_payment</code> defined&#8221; is not the same shape as &#8220;how does the refund flow handle partial captures.&#8221; The first is a definition lookup; BM25 should dominate, you want exact identifier matching, you don&#8217;t want semantic neighbors fuzzing the result. The second is conceptual; dense embeddings should dominate, you want the chunk that explains the flow even if the words don&#8217;t match. Treating those queries identically is how you get plausible-but-wrong answers.</p><p>The EMNLP 2024 <a href="https://arxiv.org/search/?searchtype=all&amp;query=self-route+long+context+retrieval">&#8220;Self-Route&#8221; paper</a> made a related point I keep coming back to: let the model itself decide whether a query needs retrieval or full-context, based on complexity. They got better accuracy AND lower compute cost. The two approaches aren&#8217;t enemies. They&#8217;re complements with different cost/quality envelopes, and a working system uses both.</p><p><em>(A note on scope: I&#8217;ve been using &#8220;search&#8221; and &#8220;memory retrieval&#8221; somewhat interchangeably here, which isn&#8217;t quite right. They&#8217;re the same pipeline mechanics &#8212; retrieve, rank, inject &#8212; but different problems. Memory retrieval has a temporal model search doesn&#8217;t: recency matters, decay matters, and the source is agent history rather than a document corpus. Memory is also primarily graph-based, because what you&#8217;re trying to recover is relationships and context across time, not just chunks of text. For the purposes of this argument &#8212; context window vs. retrieval discipline &#8212; the distinction doesn&#8217;t change anything. But it&#8217;s worth naming.)</em></p><h2>What I&#8217;m building</h2><p>I have two projects in this space. I&#8217;ll be specific about what each one is and isn&#8217;t, because the field is full of demos that don&#8217;t survive contact with real corpora.</p><p><strong>mcp-vector-search</strong> is the older one. Python, per-project, designed to live next to a codebase as an MCP server. The retrieval core is hybrid: BM25 plus HNSW over MiniLM embeddings, with knowledge-graph expansion built from tree-sitter AST parsing &#8212; function definitions, call edges, import edges, type relationships. Fifteen-plus edge types in the graph. Cross-encoder reranking on top of RRF. MMR for diversity so you don&#8217;t get five near-duplicates in the top results. Temporal decay weighted by git blame age. Cyclomatic complexity factored into ranking, because dense, gnarly code is more often what you&#8217;re looking for than the trivial wrappers around it.</p><p><strong>trusty-search</strong> is what I&#8217;m building toward. Rust daemon, machine-wide rather than per-project, multi-tenant across all my work. Same retrieval core &#8212; RRF at k=60, MiniLM embeddings, BM25 plus HNSW &#8212; but with intent classification on the front. Queries get tagged Definition, Usage, Conceptual, or BugDebt, and each intent has pre-tuned alpha/beta weights between lexical and dense. Sub-10ms p50 on warm queries.</p><p>I&#8217;ll be honest: trusty-search is early-stage. Some of the features I just listed are scaffolding with stubs underneath. The intent classifier is rule-based at the moment and needs to become a small model. The two tools are complementary rather than competing for now &#8212; mcp-vector-search is the deeper analysis layer, trusty-search is the fast facts store you hit constantly during a session &#8212; though the long-term plan is for trusty-search to replace mcp-vector-search.</p><p>The reason I&#8217;m building both is that I keep running into the same wall: I can give an agent a million-token window and it will still ask the wrong question of the wrong file. The bottleneck is not how much I can stuff in. The bottleneck is which 8K of the 800K matters for <em>this</em> query. That&#8217;s a search problem. It&#8217;s been a search problem for sixty years. It&#8217;s still a search problem.</p><h2>Back to Leonard</h2><p>The reason Leonard works as an analogy isn&#8217;t the amnesia. It&#8217;s the discipline. He decides what makes it onto a polaroid. He decides what gets tattooed and what stays loose. He writes &#8220;DON&#8217;T BELIEVE HIS LIES&#8221; on the back of a photo because he won&#8217;t have the context to evaluate trustworthiness later &#8212; so he encodes the conclusion now, into a system he&#8217;ll find when he needs it.</p><p>The context window is the pocket. Whatever Leonard pulls out and looks at right now. It can be enormous. It can be cached. It can be cheap. None of that decides what goes in.</p><p>Retrieval is the tattoo on his wrist. The polaroid pinned to the wall. The system that decides which fact survives, which one is one query away, which one is buried. The job of that system is exactly the job that doesn&#8217;t go away when the pocket gets bigger.</p><p>The question was never &#8220;do I need RAG?&#8221; That framing turned a design discipline into a vendor category and made it easy to dismiss. The question is the one Leonard asks every morning: <em>what do I actually need to find, and have I built the thing that finds it?</em> At a million tokens, you&#8217;re not freed from that question. You&#8217;ve just made it more expensive to answer incorrectly.</p><p>It feels crazy that I wrote <a href="https://open.substack.com/pub/hyperdev/p/50-first-dates-with-claude-code?r=nff5&amp;utm_campaign=post-expanded-share&amp;utm_medium=web">this</a> only a year ago&#8230;</p><div><hr></div><p><em>Bob Matsuoka is CTO of <a href="https://www.duettocloud.com/">Duetto</a> and writes about AI-powered engineering at <a href="https://hyperdev.substack.com/">HyperDev</a>.</em></p><p><strong>Related reading:</strong></p><ul><li><p><a href="https://hyperdev.matsuoka.com/context-memory-search-agentic-work">Context Memory and Search: The Secrets to Effective Agentic Work</a> &#8212; Why context management, memory, and retrieval are the three pillars of effective agentic systems</p></li><li><p><a href="https://hyperdev.matsuoka.com/whats-in-your-second-brain">What&#8217;s In Your Second Brain?</a> &#8212; Tooling for the modern CTO: how to build external memory that actually works</p></li><li><p><a href="https://aipowerranking.com/">AI Power Ranking</a> &#8212; Tool comparisons and benchmarks for AI practitioners</p></li><li><p><a href="https://www.linkedin.com/newsletters/ai-power-ranking-7345782916301418496/">LinkedIn Newsletter</a> &#8212; Strategic AI insights for CTOs and engineering leaders</p></li></ul>]]></content:encoded></item><item><title><![CDATA[Fear and Loathing in AWS]]></title><description><![CDATA[How Claude Helped Me Discover The Joys of Complex Infrastructure]]></description><link>https://hyperdev.matsuoka.com/p/fear-and-loathing-in-aws</link><guid isPermaLink="false">https://hyperdev.matsuoka.com/p/fear-and-loathing-in-aws</guid><dc:creator><![CDATA[Robert Matsuoka]]></dc:creator><pubDate>Thu, 07 May 2026 11:31:07 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!B4Cb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff875b25e-9f80-4411-8fbf-cd4a24bef764_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!B4Cb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff875b25e-9f80-4411-8fbf-cd4a24bef764_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!B4Cb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff875b25e-9f80-4411-8fbf-cd4a24bef764_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!B4Cb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff875b25e-9f80-4411-8fbf-cd4a24bef764_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!B4Cb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff875b25e-9f80-4411-8fbf-cd4a24bef764_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!B4Cb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff875b25e-9f80-4411-8fbf-cd4a24bef764_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!B4Cb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff875b25e-9f80-4411-8fbf-cd4a24bef764_1024x1024.png" width="1024" height="1024" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f875b25e-9f80-4411-8fbf-cd4a24bef764_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2235918,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/196537737?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff875b25e-9f80-4411-8fbf-cd4a24bef764_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!B4Cb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff875b25e-9f80-4411-8fbf-cd4a24bef764_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!B4Cb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff875b25e-9f80-4411-8fbf-cd4a24bef764_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!B4Cb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff875b25e-9f80-4411-8fbf-cd4a24bef764_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!B4Cb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff875b25e-9f80-4411-8fbf-cd4a24bef764_1024x1024.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><blockquote><p><em><strong>TL;DR:</strong> I spent years avoiding AWS due to its overwhelming complexity, preferring the simplicity of Vercel and composable stacks. Then Claude MPM changed how I work with AWS. Now I&#8217;m running sophisticated multi-service AWS deployments with GPU instances, comprehensive monitoring, and infrastructure-as-code&#8212;all managed through AI-assisted tooling. The lesson: agentic approaches transform ops workflows just as profoundly as they do coding.</em></p></blockquote><p>Remember when AWS felt like digital quicksand? Every innocent &#8220;let me just deploy this simple app&#8221; spiraled into an afternoon lost in IAM policies, VPC configurations, and security group rules that made no sense. I&#8217;d start with what should be a five-minute deployment and emerge three hours later, bleary-eyed, with a working application and absolutely no confidence I could recreate the process.</p><p>That&#8217;s why I became a Vercel evangelist. One <code>git push</code>, automatic deployments, zero configuration. The developer experience was everything AWS wasn&#8217;t: predictable, fast, and actually enjoyable. For most projects, this composable stack approach&#8212;Vercel for frontend, managed databases, serverless functions where needed&#8212;delivered exactly the right balance of power and simplicity.</p><p>But somewhere along the way, that changed.</p><h2>The Claude Code/<a href="https://github.com/bobmatnyc/claude-mpm">MPM</a> Turning Point</h2><p>Today I&#8217;m running the kind of infrastructure I would have delegated to an ops team a year ago. GPU instances for ML workloads. Multi-AZ deployments across six subnets. Sophisticated monitoring pipelines that integrate CloudWatch metrics, Cost Explorer analysis, and automated GitHub issue creation. Terraform managing multi-account infrastructure with cross-service dependencies.</p><p>The difference? I don&#8217;t even need AWS&#8217;s Q assistant. I have something better: purpose-built AWS skills in Claude MPM that handle service deployment, infrastructure analysis, and operational workflows.</p><p>This transformation illustrates something crucial about the agentic revolution: it&#8217;s not just changing how we write code. It&#8217;s fundamentally altering how we approach operational complexity.</p><h2>The Infrastructure Reality Check</h2><p>Let me show you what I mean with real numbers. Here&#8217;s what I&#8217;m actually running across three active projects to support internal tools:</p><p><strong>CloudWatch Reporting Service</strong> (Serverless):</p><ul><li><p>12 Lambda functions handling health checks, metrics aggregation, and MCP server functionality</p></li><li><p>API Gateway HTTP API with sophisticated CORS and authentication</p></li><li><p>DynamoDB tables for state management and external directory lookups</p></li><li><p>SNS/SQS for alerting and dead letter queue handling</p></li><li><p>Direct Bedrock integration with Claude 4.5 Haiku for automated analysis</p></li><li><p>CloudWatch Events scheduling 5-minute monitoring cycles</p></li><li><p>Secrets Manager for GitHub app credentials and API keys</p></li></ul><p><strong>Code Intelligence Platform</strong> (Compute):</p><ul><li><p>Two EC2 instances: t3.xlarge for web serving, g4dn.xlarge for GPU-accelerated indexing</p></li><li><p>EBS volumes with gp3 storage and custom IOPS configuration</p></li><li><p>VPC with six subnets across availability zones</p></li><li><p>Application Load Balancer with Route53 DNS and ACM certificates</p></li><li><p>EFS for shared file storage across instances</p></li><li><p>CloudWatch Synthetics for endpoint monitoring</p></li><li><p>Lambda-based Slack notifications triggered by SNS topics</p></li></ul><p><strong>Enterprise Infrastructure</strong> (Multi-Account):</p><ul><li><p>Terragrunt-managed infrastructure across production and staging accounts</p></li><li><p>S3 backend for Terraform state with DynamoDB locking</p></li><li><p>Cross-account IAM policies and service integration</p></li><li><p>Integration with external providers (Sentry, GitHub, Kubernetes clusters)</p></li></ul><p>A year ago, this list would have been my personal infrastructure horror story. Today, it&#8217;s Tuesday.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!oE0u!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde31d590-502b-4d0a-808f-44cb8153123e_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!oE0u!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde31d590-502b-4d0a-808f-44cb8153123e_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!oE0u!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde31d590-502b-4d0a-808f-44cb8153123e_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!oE0u!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde31d590-502b-4d0a-808f-44cb8153123e_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!oE0u!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde31d590-502b-4d0a-808f-44cb8153123e_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!oE0u!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde31d590-502b-4d0a-808f-44cb8153123e_1024x1024.png" width="1024" height="1024" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/de31d590-502b-4d0a-808f-44cb8153123e_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2564092,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/196537737?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde31d590-502b-4d0a-808f-44cb8153123e_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!oE0u!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde31d590-502b-4d0a-808f-44cb8153123e_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!oE0u!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde31d590-502b-4d0a-808f-44cb8153123e_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!oE0u!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde31d590-502b-4d0a-808f-44cb8153123e_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!oE0u!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde31d590-502b-4d0a-808f-44cb8153123e_1024x1024.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>What Changed</h2><p>The transformation wasn&#8217;t gradual. It was a step function that happened when I realized AI assistants could handle the cognitive overhead that makes AWS painful.</p><p><strong>Before</strong>: AWS documentation as archaeological expedition. Digging through service guides, trying to understand the relationship between VPC route tables and security groups, wondering if I need an Internet Gateway or a NAT Gateway or both. Every deployment felt like solving a puzzle where half the pieces were hidden.</p><p><strong>After</strong>: Natural language infrastructure requests. &#8220;Set up monitoring for this Lambda function with alerting to Slack&#8221; becomes a series of guided steps where the AI handles the AWS-specific implementation details while I focus on the business requirements.</p><p>The key insight is that <a href="https://www.digitalocean.com/resources/articles/aws-complicated">AWS&#8217;s complexity isn&#8217;t inherently bad</a>&#8212;it&#8217;s just <strong>cognitively expensive</strong>. When you remove that cognitive load through AI assistance, you can appreciate what all those services actually enable.</p><p>Take my monitoring setup. Previously, I would have settled for basic uptime checks because configuring comprehensive CloudWatch metrics, Cost Explorer integration, and automated issue creation felt like a weekend project. With claude-mpm AWS skills, it became an hour of guided configuration that resulted in production-grade observability.</p><p>Or consider the GPU instance management. The g4dn.xlarge for ML indexing runs sophisticated start/stop automation, monitors for runaway processes, and automatically scales EBS volumes based on data requirements. Setting this up manually would have required deep expertise in EC2 lifecycle management, CloudWatch alarms, and Lambda automation. With AI assistance, I focused on defining the business logic while the tooling handled the AWS implementation.</p><h2>The DX Philosophy Still Matters</h2><p>None of this means AWS wins every comparison. Vercel&#8217;s developer experience remains superior for the use cases it targets. When I need to ship a marketing site or a straightforward web application, <code>git push</code> deployment still beats any infrastructure-as-code workflow.</p><p>The difference is recognizing when complexity serves a purpose versus when it&#8217;s just complexity. Vercel abstracts away infrastructure concerns because most web applications don&#8217;t need granular control over compute, storage, and networking. But when you&#8217;re building systems that do need that control&#8212;ML pipelines, high-throughput data processing, complex service topologies&#8212;AWS&#8217;s granularity becomes valuable rather than burdensome.</p><p>AI assistance changes the cost-benefit calculation. When configuring VPC networking takes 20 minutes of guided conversation instead of three hours of documentation archaeology, you can choose AWS for projects where you previously would have compromised on requirements to avoid operational overhead.</p><p>But there&#8217;s an honest accounting problem buried in that logic. Claude Code isn&#8217;t free. <a href="https://aws.amazon.com/blogs/machine-learning/claude-code-deployment-patterns-and-best-practices-with-amazon-bedrock/">API costs, subscription fees</a>&#8212;if you&#8217;re running significant conversation volume to figure out your infrastructure, you&#8217;re spending real money. At some point, you&#8217;re spending more on AI assistance than a Vercel seat would cost. The &#8220;AWS saves money at scale&#8221; argument gets complicated fast when you factor in the cognitive tooling required to get there.</p><p>So let me be direct about where each wins. Pure self-service developer experience&#8212;one engineer, a web app, ship it fast? <a href="https://dev.to/code42cate/stop-using-aws-4eg">Vercel, and it&#8217;s not particularly close</a>. The moment you need an AI co-pilot to configure your infrastructure, you&#8217;ve added a cost layer that Vercel eliminates by design. But complex multi-service deployments&#8212;ML pipelines, GPU compute alongside serverless, multi-account Terraform, monitoring infrastructure that spans six services&#8212;those don&#8217;t live in Vercel&#8217;s world. That&#8217;s where the math inverts and AWS earns its complexity premium.</p><h2>The Broader Implications</h2><p>This transformation reveals something important about how agentic approaches will reshape technology adoption. We&#8217;re not just making individual tasks more efficient&#8212;we&#8217;re changing which categories of tools become accessible to developers.</p><p>I see this pattern across the infrastructure stack. Database migrations and performance tuning become approachable when AI translates business requirements into specific configuration changes. Kubernetes stops being &#8220;too complex for small teams&#8221; when you can describe desired behavior in natural language and get helm charts and operators generated automatically. IAM policies, security groups, and compliance frameworks become manageable when AI can analyze your application requirements and generate least-privilege configurations.</p><p>These tools were always powerful. They were just too expensive to learn and maintain for many use cases. AI assistance changes that economics.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cV_v!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb243716e-dc40-4005-9910-9a34f522876b_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cV_v!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb243716e-dc40-4005-9910-9a34f522876b_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!cV_v!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb243716e-dc40-4005-9910-9a34f522876b_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!cV_v!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb243716e-dc40-4005-9910-9a34f522876b_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!cV_v!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb243716e-dc40-4005-9910-9a34f522876b_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cV_v!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb243716e-dc40-4005-9910-9a34f522876b_1024x1024.png" width="1024" height="1024" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b243716e-dc40-4005-9910-9a34f522876b_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1968900,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/196537737?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb243716e-dc40-4005-9910-9a34f522876b_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!cV_v!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb243716e-dc40-4005-9910-9a34f522876b_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!cV_v!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb243716e-dc40-4005-9910-9a34f522876b_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!cV_v!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb243716e-dc40-4005-9910-9a34f522876b_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!cV_v!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb243716e-dc40-4005-9910-9a34f522876b_1024x1024.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Where This Goes Next</h2><p>We&#8217;re still early in AI-assisted infrastructure management. Today&#8217;s tooling handles deployment and configuration. Cost optimization, security posture, performance tuning&#8212;those are coming. Full system architecture from high-level requirements is probably further out than the hype suggests, but it&#8217;s not science fiction.</p><p>But the fundamental lesson remains: complexity isn&#8217;t always the enemy. Sometimes it&#8217;s just temporarily inaccessible. When AI removes the accessibility barriers, you can choose tools based on their actual capabilities rather than their learning curves.</p><p>For now, I&#8217;m running infrastructure that would have seemed impossible to manage solo a year ago. And it&#8217;s kind of fun.</p><p>AWS might still feel like quicksand sometimes. But now I have a helicopter.</p><div><hr></div><p><em>Bob Matsuoka is CTO of <a href="https://www.duettocloud.com/">Duetto</a> and writes about AI-powered engineering at <a href="https://hyperdev.substack.com/">HyperDev</a>.</em></p><p><strong>Related reading:</strong></p><ul><li><p><a href="https://aipowerranking.com/">AI Power Ranking</a> &#8212; Tool comparisons and benchmarks for AI practitioners</p></li><li><p><a href="https://www.linkedin.com/newsletters/ai-power-ranking-7345782916301418496/">LinkedIn Newsletter</a> &#8212; Strategic AI insights for CTOs and engineering leaders</p></li></ul>]]></content:encoded></item><item><title><![CDATA[What’s In Your Second Brain?]]></title><description><![CDATA[Tooling for the modern CTO]]></description><link>https://hyperdev.matsuoka.com/p/whats-in-your-second-brain</link><guid isPermaLink="false">https://hyperdev.matsuoka.com/p/whats-in-your-second-brain</guid><dc:creator><![CDATA[Robert Matsuoka]]></dc:creator><pubDate>Mon, 04 May 2026 13:26:26 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!oH6q!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30309841-bfd3-4d55-a5d2-03d8d872cc9b_1024x649.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!oH6q!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30309841-bfd3-4d55-a5d2-03d8d872cc9b_1024x649.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!oH6q!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30309841-bfd3-4d55-a5d2-03d8d872cc9b_1024x649.png 424w, https://substackcdn.com/image/fetch/$s_!oH6q!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30309841-bfd3-4d55-a5d2-03d8d872cc9b_1024x649.png 848w, https://substackcdn.com/image/fetch/$s_!oH6q!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30309841-bfd3-4d55-a5d2-03d8d872cc9b_1024x649.png 1272w, https://substackcdn.com/image/fetch/$s_!oH6q!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30309841-bfd3-4d55-a5d2-03d8d872cc9b_1024x649.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!oH6q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30309841-bfd3-4d55-a5d2-03d8d872cc9b_1024x649.png" width="1024" height="649" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/30309841-bfd3-4d55-a5d2-03d8d872cc9b_1024x649.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:649,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1458693,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/196419582?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa25c8af5-4b58-4f5d-b775-532bd485770e_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!oH6q!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30309841-bfd3-4d55-a5d2-03d8d872cc9b_1024x649.png 424w, https://substackcdn.com/image/fetch/$s_!oH6q!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30309841-bfd3-4d55-a5d2-03d8d872cc9b_1024x649.png 848w, https://substackcdn.com/image/fetch/$s_!oH6q!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30309841-bfd3-4d55-a5d2-03d8d872cc9b_1024x649.png 1272w, https://substackcdn.com/image/fetch/$s_!oH6q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30309841-bfd3-4d55-a5d2-03d8d872cc9b_1024x649.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The modern CTO toolkit isn&#8217;t just apps and coding tools. The real differentiator is a custom knowledge layer &#8212; databases, search indices, memory graphs, behavioral instructions that compound over time. No product gives you this. You build it.</p><p><a href="https://karpathy.ai/">Andrej Karpathy</a> gestured at something similar last month when he posted a <a href="https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f">GitHub Gist</a> he called &#8220;LLM Wiki.&#8221; His framing: stop using LLMs <em>just</em> to write code, use them to build and maintain a personal knowledge base instead. <em>&#8220;Obsidian is the IDE, the LLM is the programmer, the wiki is the codebase.&#8221;</em> Three folders, structured Markdown, a large context window, a few Python scripts. No RAG, no vector database. He concluded with: <em>&#8220;I think there is room here for an incredible new product.&#8221;</em></p><p>He&#8217;s right that there&#8217;s room. But the product comment is where I&#8217;d push back, and I&#8217;ll get to that. What Karpathy is describing isn&#8217;t a note-taking system. It&#8217;s a personal operational knowledge layer. For CTOs specifically, that layer needs to be broader than a personal wiki &#8212; it needs live organizational data, agent-connected search, and context that persists across months of decisions. No app hands you that.</p><h2>TL;DR</h2><ul><li><p>Karpathy&#8217;s LLM Wiki shows the direction: LLMs as knowledge compilers, not just code generators</p></li><li><p>A modern CTO&#8217;s &#8220;second brain&#8221; is more than PKM &#8212; it&#8217;s live databases, custom agents, and contextual search across organizational data</p></li><li><p>When I joined Duetto as CTO, my custom toolkit let me synthesize a 150-person R&amp;D org in weeks instead of months</p></li><li><p>The power isn&#8217;t Obsidian. It&#8217;s what you connect to it &#8212; MCP servers, search indices, knowledge graphs</p></li><li><p>Productizing this is theoretically possible and practically very hard, because the schema is the moat</p></li></ul><h2>The toolkit article got it half right</h2><p>In <a href="https://hyperdev.matsuoka.com/p/whats-in-my-claude-code-toolkit">What&#8217;s In My Toolkit: Claude Code and Family</a>, I wrote about vanilla Claude Code&#8217;s core limitations: context evaporates, code search is keyword-based, memory doesn&#8217;t persist, execution is single-threaded. The tools I built &#8212; <a href="https://github.com/bobmatnyc/claude-mpm">Claude MPM</a>, <a href="https://github.com/bobmatnyc/mcp-vector-search">mcp-vector-search</a>, <a href="https://github.com/bobmatnyc/kuzu-memory">kuzu-memory</a> &#8212; address each of those gaps.</p><p>But that article was about coding workflows. The real story is broader.</p><p>The same architecture that makes a coding session more effective &#8212; persistent memory, semantic search, specialized agents pulling from structured data &#8212; turns out to be extraordinarily useful for executive work. Understanding an organization, tracking decisions over time, querying data across systems, maintaining context across months of meetings and analysis. The toolkit I built for software development became the toolkit I used to onboard as a CTO.</p><p><a href="https://hyperdev.matsuoka.com/p/i-built-a-coding-tool-then-i-used">That onboarding story</a> is documented in detail elsewhere. Short version: I pointed a multi-agent framework at GitHub, JIRA, Slack, Confluence, and budget spreadsheets, and synthesized a 150-person R&amp;D organization in the weeks before my start date. The difference between doing that with a chat interface versus a CLI-based orchestration layer with parallel agents and persistent memory wasn&#8217;t 2x or 5x. It was closer to 10x.</p><p>But the onboarding was just the starting gun. The second brain I assembled keeps compounding.</p><h2>What&#8217;s actually in my second brain</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ICJf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd86cffcd-de7a-4db2-8ab3-ec6f86df65db_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ICJf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd86cffcd-de7a-4db2-8ab3-ec6f86df65db_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!ICJf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd86cffcd-de7a-4db2-8ab3-ec6f86df65db_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!ICJf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd86cffcd-de7a-4db2-8ab3-ec6f86df65db_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!ICJf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd86cffcd-de7a-4db2-8ab3-ec6f86df65db_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ICJf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd86cffcd-de7a-4db2-8ab3-ec6f86df65db_1024x1024.png" width="1024" height="1024" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d86cffcd-de7a-4db2-8ab3-ec6f86df65db_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1082407,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/196419582?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd86cffcd-de7a-4db2-8ab3-ec6f86df65db_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ICJf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd86cffcd-de7a-4db2-8ab3-ec6f86df65db_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!ICJf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd86cffcd-de7a-4db2-8ab3-ec6f86df65db_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!ICJf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd86cffcd-de7a-4db2-8ab3-ec6f86df65db_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!ICJf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd86cffcd-de7a-4db2-8ab3-ec6f86df65db_1024x1024.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Let me be specific. Because when people (now) hear &#8220;second brain&#8221; they usually think Obsidian vaults with color-coded tags and pretty Markdown files. That&#8217;s part of it. It&#8217;s the surface layer.</p><p>The actual power comes from what&#8217;s underneath.</p><h3>The memory layer</h3><p><a href="https://github.com/bobmatnyc/kuzu-memory">kuzu-memory</a> is a KuzuDB-backed knowledge graph that persists across every AI session. It stores learnings from conversations, code commits, decisions, patterns. When I start a new Claude Code session on a problem I&#8217;ve touched before, the context isn&#8217;t blank &#8212; it&#8217;s enriched with what was learned the last time.</p><p>This is the thing people underestimate. A project-specific memory that accumulates over months of work develops a kind of organizational intelligence you can&#8217;t replicate in a single conversation. It knows why a particular architectural decision was made. It knows that a vendor was evaluated and found lacking. It knows the terminology your team uses internally that differs from industry standard.</p><p>KuzuDB isn&#8217;t a product choice for its own sake &#8212; it&#8217;s graph-native, which means it handles relationships well. The connections between people, systems, decisions, and code are as important as the facts themselves.</p><h3>The search layer</h3><p><a href="https://github.com/bobmatnyc/mcp-vector-search">mcp-vector-search</a> provides semantic search across all project files. Not keyword search &#8212; semantic search with AST parsing. When I ask &#8220;where is the analysis I did on contractor productivity last quarter,&#8221; it finds it even if the document never uses those exact words.</p><p>At Duetto, this covers everything in my CTO project: architecture records, meeting notes pulled from Granola, emails I&#8217;ve synthesized, analysis documents, planning artifacts. Months of accumulated context, all searchable in seconds. The underlying code intelligence for the engineering organization runs as a separate service &#8212; mcp-vector-search is for my working knowledge, not the codebase itself.</p><h3>The databases</h3><p>My CTO project has three:</p><ul><li><p><strong>cto.db</strong> &#8212; SQLite. Work classification, people analysis, contributor data, commit history. The operational database for running analyses and reports.</p></li><li><p><strong>analytics.duckdb</strong> &#8212; DuckDB. OLAP queries and analytics. When I need to slice engineering output data in different ways or run something that would be painful in SQLite, it goes here.</p></li><li><p><strong>duetto_knowledge.db</strong> &#8212; The RAG-queryable knowledge base backing a Flask web app for interactive exploration.</p></li></ul><p>These aren&#8217;t a product I bought. They&#8217;re a schema I designed, built incrementally, and own completely. The schema reflects how I think about the organization, which is precisely why it&#8217;s useful.</p><h3>The connectors</h3><p><a href="https://github.com/bobmatnyc/gworkspace-mcp">gworkspace-mcp</a> handles Drive, Docs, Sheets, Gmail, Calendar, and more. I wrote my own rather than using the off-the-shelf options &#8212; Google&#8217;s first-party integration and Anthropic&#8217;s default both have significant tool coverage gaps. Mine exposes substantially more of the Workspace API surface and integrates transparently with Claude MPM, so agents can use Google Workspace tools without any special configuration at the call site.</p><p>Beyond Workspace: Notion API for product specs and planning documents. Extraction scripts for JIRA, Confluence, Slack, Datadog, and AWS. Each system outputs to as raw data, which feeds analysis pipelines that generate reports stored in a project directory.</p><p>For company-wide memory, two more tools: duetto-memory and duetto-directory. These handle shared organizational context &#8212; information that needs to flow between tools and across team members rather than staying in a single session. Memory persists within our VPC, encrypted to individual users&#8217; OAuth keys. Not even our own IT has access to it. Context shared from Claude Code shows up in Claude.ai, and vice versa, without any manual sync.</p><p>The entire flow is queryable. From a single Claude session, I can ask about budget trends, team velocity, specific architectural decisions, or what a particular engineer has been working on for the last three months. Because it&#8217;s all in the same context-addressable system.</p><h3>Obsidian as the front door</h3><p>Yes, I use Obsidian. But it&#8217;s a front door, not the building. The vault holds my personal notes, research captures, and synthesized analysis. The Obsidian Web Clipper feeds raw material into the knowledge pipeline. Templates enforce consistent structure.</p><p>Karpathy&#8217;s insight about Obsidian as IDE is right in the narrow sense: it&#8217;s the interface you use to read and organize. But the interesting work happens outside it &#8212; in the databases, the agents, the search indices, the custom scripts.</p><h2>CLAUDE.md files everywhere</h2><p>The context layer isn&#8217;t just data. It&#8217;s also behavioral instructions.</p><p>Every major directory in my project has a CLAUDE.md. The root CTO project one is 400 lines of conventions, routing logic, document lifecycle rules, and architectural decisions. Every subdirectory has a more focused version. Every specialized agent has its own constraints.</p><p>These files are my second brain&#8217;s schema, expressed as instructions rather than data. A single routing rule &#8212; &#8220;if the prompt mentions meeting notes, save to <code>projects/meetings/2026-W##/</code>&#8220; &#8212; sounds trivial. But it means twelve months of meeting notes accumulate in consistent, queryable locations rather than wherever an agent happened to save them. Multiply that by forty routing rules across fifteen subdirectories, and the entire corpus becomes navigable. The CLAUDE.md files are what make the databases useful. Without them, the data is just data.</p><p>Karpathy put it well: &#8220;You share the schema, not the code.&#8221; The schema is the valuable part. The schema is what compounds.</p><p>My schema took months to build. It will keep getting better. No product ships with the right schema for my organization, because no product knows what I know about how Duetto&#8217;s R&amp;D works.</p><h2>The productization question</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-gqt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87551032-97c3-4a89-8697-729269302cfa_1024x708.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-gqt!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87551032-97c3-4a89-8697-729269302cfa_1024x708.png 424w, https://substackcdn.com/image/fetch/$s_!-gqt!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87551032-97c3-4a89-8697-729269302cfa_1024x708.png 848w, https://substackcdn.com/image/fetch/$s_!-gqt!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87551032-97c3-4a89-8697-729269302cfa_1024x708.png 1272w, https://substackcdn.com/image/fetch/$s_!-gqt!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87551032-97c3-4a89-8697-729269302cfa_1024x708.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-gqt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87551032-97c3-4a89-8697-729269302cfa_1024x708.png" width="1024" height="708" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/87551032-97c3-4a89-8697-729269302cfa_1024x708.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:708,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1454118,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/196419582?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85bbeb78-8523-459e-9cb0-e65b1d5bce18_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-gqt!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87551032-97c3-4a89-8697-729269302cfa_1024x708.png 424w, https://substackcdn.com/image/fetch/$s_!-gqt!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87551032-97c3-4a89-8697-729269302cfa_1024x708.png 848w, https://substackcdn.com/image/fetch/$s_!-gqt!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87551032-97c3-4a89-8697-729269302cfa_1024x708.png 1272w, https://substackcdn.com/image/fetch/$s_!-gqt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87551032-97c3-4a89-8697-729269302cfa_1024x708.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Karpathy said there&#8217;s room for an incredible product. He&#8217;s not wrong about the gap. He might be wrong about the solution.</p><p>The structural problems with productizing a second brain:</p><p><strong>Context compounds, products don&#8217;t.</strong> My system gets smarter with every commit, meeting, and conversation. A SaaS product serves thousands of customers and maintains no one&#8217;s specific context. The more I use my system, the wider the gap between it and any off-the-shelf alternative.</p><p><strong>The schema is the moat.</strong> My knowledge architecture reflects how I think about engineering organizations. Someone else&#8217;s knowledge architecture would be different. Products that force their schema on you &#8212; and every product does &#8212; are imposing someone else&#8217;s way of thinking on your problem. That friction is small at first and grows over time.</p><p><strong>Privacy is structural, not incidental.</strong> My databases contain org structures, salary data, performance patterns, vendor negotiations. Routing that through third-party infrastructure creates risk that&#8217;s practically impossible to contain. When I built duetto-memory for enterprise use, the entire stack stays within our VPC, with memories encrypted to individual users&#8217; OAuth keys. Not even IT can read them. That level of isolation is nearly impossible to provide as a multi-tenant SaaS.</p><p>Some layers could be productized &#8212; the infrastructure, not the intelligence. A well-designed memory MCP with sensible defaults. Semantic search that works without configuration. Privacy-preserving graph storage you don&#8217;t have to host yourself. The plumbing.</p><p>The schema, the decisions, and the accumulated context can&#8217;t be productized. Those are yours. That&#8217;s the point &#8212; and it&#8217;s also why the product gap Karpathy sees will remain open even after someone tries to fill it.</p><h2>Can anyone do this?</h2><p>There&#8217;s an access problem here, and I&#8217;d be dishonest not to acknowledge it.</p><p>Building what I&#8217;ve described requires knowing Python well enough to write extraction scripts, understanding enough about graph databases to design a schema, and being comfortable with CLI-based tooling and MCP server configuration. Not every CTO has that background. Not every technical leader wants to spend weekends building personal infrastructure.</p><p>The irony is that the people who most need better organizational intelligence &#8212; executives without deep engineering backgrounds &#8212; are least equipped to build these systems. And the people who are most capable of building them are often less interested in the executive problems the systems could solve.</p><p>Tiago Forte, who wrote <em><a href="https://www.buildingasecondbrain.com/">Building a Second Brain</a></em>, has been making this point for years. His PARA method and CODE framework are accessibility layers &#8212; ways to make the underlying ideas approachable without requiring you to build a graph database. The methodology is sound. But it was designed for knowledge workers, not for CTOs running engineering organizations who need live data pipelines, not filing systems. Well-designed for whom?</p><p>Karpathy&#8217;s LLM Wiki is explicitly a system for someone comfortable writing Python and working with file systems. His Gist has code in it. That&#8217;s a feature for his audience and a barrier for everyone else.</p><h2>What I&#8217;d watch for</h2><p>A few trends that will determine whether this remains a DIY space or gets productized:</p><p><strong>MCP as infrastructure.</strong> The <a href="https://hyperdev.matsuoka.com/p/the-mcp-cat-is-out-of-the-bag">Model Context Protocol</a> creates a standard interface for exactly this kind of knowledge infrastructure. Memory servers, search servers, database connectors &#8212; they all expose the same interface to any compatible AI client. The ecosystem is growing fast. As more MCP servers mature, the configuration burden drops.</p><p><strong>Searchable context beats raw window size.</strong> Karpathy argues for plain Markdown because ~400K words fit in a modern context window. That&#8217;s true, and the window is getting larger. But the more important shift is that structured, searchable context doesn&#8217;t have a ceiling. A well-organized knowledge base that spans years of meetings, decisions, and analysis delivers more than any single context window can hold &#8212; and the value scales with the quality of the organization, not the size of the model.</p><p><strong>Local model quality.</strong> Karpathy runs Anthropic agents via Claude Code. But local model quality is improving fast. A system that uses the cloud API for synthesis and queries but runs a local model for routine indexing tasks would be significantly cheaper and more private. Not ready yet. Getting closer.</p><p>The product Karpathy thinks exists &#8212; if it gets built &#8212; probably looks like a well-designed local MCP server with clean configuration, sensible defaults, and a plugin ecosystem for connectors. Not a SaaS. Not a cloud database. Something you install and own.</p><p>The people who need it most will have already built their own before any product ships. And in the process of building it, they&#8217;ll have accumulated the one thing no product can give them: months of their own operational context, organized the way their own mind works.</p><p>That&#8217;s not a consolation prize. That&#8217;s the whole point.</p><div><hr></div><p><em>Bob Matsuoka is CTO of <a href="https://www.duettocloud.com/">Duetto</a> and writes about AI-powered engineering at <a href="https://hyperdev.substack.com/">HyperDev</a>.</em></p><p><strong>Related reading:</strong></p><ul><li><p><a href="https://hyperdev.matsuoka.com/p/whats-in-my-claude-code-toolkit">What&#8217;s In My Toolkit: Claude Code and Family</a> &#8212; The coding layer of the stack</p></li><li><p><a href="https://hyperdev.matsuoka.com/p/i-built-a-coding-tool-then-i-used">I Built a Coding Tool. Then I Used It to Onboard as CTO</a> &#8212; Applying agent orchestration to organizational analysis</p></li><li><p><a href="https://aipowerranking.com/">AI Power Ranking</a> &#8212; Tool comparisons and benchmarks for AI practitioners</p></li><li><p><a href="https://www.linkedin.com/newsletters/ai-power-ranking-7345782916301418496/">LinkedIn Newsletter</a> &#8212; Strategic AI insights for CTOs and engineering leaders</p></li></ul>]]></content:encoded></item><item><title><![CDATA[You weren't imagining things...Claude Code was dumber this month]]></title><description><![CDATA[Unintended consequences of optimization]]></description><link>https://hyperdev.matsuoka.com/p/you-werent-imagining-thingsclaude</link><guid isPermaLink="false">https://hyperdev.matsuoka.com/p/you-werent-imagining-thingsclaude</guid><dc:creator><![CDATA[Robert Matsuoka]]></dc:creator><pubDate>Fri, 24 Apr 2026 13:53:25 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!JH1l!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F960004b8-63a5-4f9b-98b1-ee35085cd059_1024x779.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!JH1l!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F960004b8-63a5-4f9b-98b1-ee35085cd059_1024x779.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JH1l!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F960004b8-63a5-4f9b-98b1-ee35085cd059_1024x779.png 424w, https://substackcdn.com/image/fetch/$s_!JH1l!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F960004b8-63a5-4f9b-98b1-ee35085cd059_1024x779.png 848w, https://substackcdn.com/image/fetch/$s_!JH1l!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F960004b8-63a5-4f9b-98b1-ee35085cd059_1024x779.png 1272w, https://substackcdn.com/image/fetch/$s_!JH1l!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F960004b8-63a5-4f9b-98b1-ee35085cd059_1024x779.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JH1l!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F960004b8-63a5-4f9b-98b1-ee35085cd059_1024x779.png" width="1024" height="779" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/960004b8-63a5-4f9b-98b1-ee35085cd059_1024x779.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:779,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1738565,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/195350759?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5bc7c126-0147-4eb2-8a84-d2fa3808f4b4_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!JH1l!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F960004b8-63a5-4f9b-98b1-ee35085cd059_1024x779.png 424w, https://substackcdn.com/image/fetch/$s_!JH1l!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F960004b8-63a5-4f9b-98b1-ee35085cd059_1024x779.png 848w, https://substackcdn.com/image/fetch/$s_!JH1l!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F960004b8-63a5-4f9b-98b1-ee35085cd059_1024x779.png 1272w, https://substackcdn.com/image/fetch/$s_!JH1l!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F960004b8-63a5-4f9b-98b1-ee35085cd059_1024x779.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>So if you&#8217;ve been using Claude Code and noticed it felt... off... you weren&#8217;t imagining it.</p><p><a href="https://www.anthropic.com/engineering/april-23-postmortem">Anthropic published a full breakdown</a> yesterday and it&#8217;s actually three separate bugs that compounded into what looked like one big degradation. The developer community was right to be concerned, and the evidence they collected was instrumental in getting this fixed.</p><p>Here&#8217;s what actually happened:</p><h2>1. They silently downgraded reasoning effort (March 4)</h2><p>They switched Claude Code&#8217;s default from high to medium reasoning to reduce latency. Users noticed immediately. They reverted it on April 7.</p><p>Classic &#8220;we know better than users&#8221; move that backfired. From their postmortem:</p><blockquote><p>&#8220;This was the wrong tradeoff. We reverted this change on April 7 after users told us they&#8217;d prefer to default to higher intelligence and opt into lower effort for simple tasks.&#8221;</p></blockquote><p>The UI was appearing frozen in high reasoning mode, so they made an executive decision to sacrifice quality for speed. Developers immediately felt the difference and pushed back hard.</p><h2>2. A caching bug made Claude forget its own reasoning (March 26)</h2><p>This one was particularly insidious. They tried to optimize memory for idle sessions&#8212;clear old thinking after an hour of inactivity to speed up resumption. Sounds reasonable, right?</p><p>A bug caused it to wipe Claude&#8217;s reasoning history on EVERY turn for the rest of a session, not just once. So Claude kept executing tasks while literally forgetting why it made the decisions it did.</p><p>The cascading effects were brutal:</p><ul><li><p>Every request became a cache miss</p></li><li><p>Usage limits drained faster than expected</p></li><li><p>Claude appeared &#8220;forgetful and repetitive&#8221;</p></li><li><p>Sessions felt like they were constantly resetting</p></li></ul><h2>3. A system prompt change capped responses at 25 words between tool calls (April 16)</h2><p>They added this seemingly innocent instruction: &#8220;keep text between tool calls to 25 words. Keep final responses to 100 words.&#8221;</p><p>It caused a measurable 3% drop in coding quality across both Opus 4.6 and 4.7. They caught this through ablation testing&#8212;removing the instruction and measuring the performance difference.</p><p>Reverted April 20.</p><h2>The community evidence was damning</h2><p>While Anthropic was investigating internally, the developer community was building their own case. <a href="https://github.com/anthropics/claude-code/issues/42796">Stella Laurenzo from AMD&#8217;s AI group</a> published the most comprehensive analysis&#8212;6,852 Claude Code sessions and over 234,000 tool calls.</p><p>Her findings:</p><ul><li><p>Median visible thinking length collapsed 73% (2,200 &#8594; 600 characters)</p></li><li><p>API calls per task spiked up to 80x from February to March</p></li><li><p>Claude was choosing &#8220;simplest fix&#8221; over correct solutions</p></li></ul><p><a href="https://venturebeat.com/technology/mystery-solved-anthropic-reveals-changes-to-claudes-harnesses-and-operating-instructions-likely-caused-degradation">BridgeMind&#8217;s testing</a> showed Opus 4.6 accuracy dropping from 83.3% to 68.3%.</p><p>The data was undeniable.</p><h2>The perfect storm effect</h2><p>Here&#8217;s what made this particularly hard to pin down: all three bugs affected different traffic slices on different schedules. The combined effect looked like random, inconsistent degradation.</p><p>Hard to reproduce internally. Hard for users to isolate the exact cause. It just felt... wrong.</p><p>Some sessions hit the reasoning downgrade. Others hit the caching bug. The unlucky ones hit multiple issues simultaneously. No wonder it seemed like Claude was having random bad days.</p><h2>What this reveals about AI product development</h2><p>This postmortem is actually refreshing in its transparency. Most AI companies would have quietly fixed the issues and moved on. Anthropic owned the mistakes publicly.</p><p>But it also highlights a fundamental tension in AI product development: users often prefer maximum capability over convenience optimizations. The reasoning effort downgrade was done for user experience (reduce perceived latency), but developers would rather wait for better output.</p><p>The lesson: don&#8217;t optimize away what users value most without asking them first.</p><h2>All fixed now (v2.1.116)</h2><p>As of April 20, all three issues are resolved:</p><ul><li><p>Default reasoning is now &#8220;xhigh&#8221; for Opus 4.7, &#8220;high&#8221; for others</p></li><li><p>Caching bug squashed</p></li><li><p>Verbosity limits removed</p></li><li><p>Usage limits reset for all subscribers</p></li></ul><p>Anthropic is also committing to more transparency going forward with a <a href="https://x.com/claudedevs">dedicated </a><a href="https://github.com/ClaudeDevs">@ClaudeDevs</a> account for deeper technical communication with developers.</p><p>The community was right to raise hell about this. And Anthropic&#8217;s response&#8212;full transparency with concrete fixes&#8212;sets a good precedent for how AI companies should handle quality regressions.</p><p>Your coding assistant is back to full strength.</p><h2>Independent Validation</h2><p>The technical analysis backing this story comes from multiple independent sources. <a href="https://github.com/anthropics/claude-code/issues/42796">Stella Laurenzo&#8217;s comprehensive audit</a> of 6,852 sessions provided the quantitative foundation. <a href="https://venturebeat.com/technology/mystery-solved-anthropic-reveals-changes-to-claudes-harnesses-and-operating-instructions-likely-caused-degradation">BridgeMind&#8217;s testing</a> offered controlled benchmark data. These weren&#8217;t isolated complaints&#8212;they were systematic investigations with reproducible findings.</p><p>When a company publishes a detailed postmortem acknowledging specific engineering decisions that degraded their product, and that postmortem aligns with community-gathered evidence, we&#8217;re seeing transparency in action. The developer community did the work to document the problems. Anthropic owned the solutions.</p><div><hr></div><p><em>Bob Matsuoka is CTO of <a href="https://www.duettocloud.com/">Duetto</a> and writes about AI-powered engineering at <a href="https://hyperdev.substack.com/">HyperDev</a>.</em></p><p><strong>Related reading:</strong></p><ul><li><p><a href="https://aipowerranking.com/">AI Power Ranking</a> &#8212; Tool comparisons and benchmarks for AI practitioners</p></li><li><p><a href="https://www.linkedin.com/newsletters/ai-power-ranking-7345782916301418496/">LinkedIn Newsletter</a> &#8212; Strategic AI insights for CTOs and engineering leaders</p></li></ul>]]></content:encoded></item><item><title><![CDATA[Opus 4.6 vs 4.7: The Real Cost of Incremental AI Improvements]]></title><description><![CDATA[Opus 4.7 dropped last week.]]></description><link>https://hyperdev.matsuoka.com/p/opus-46-vs-47-the-real-cost-of-incremental</link><guid isPermaLink="false">https://hyperdev.matsuoka.com/p/opus-46-vs-47-the-real-cost-of-incremental</guid><dc:creator><![CDATA[Robert Matsuoka]]></dc:creator><pubDate>Wed, 22 Apr 2026 11:31:49 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!RRSE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6e9ba8e-5560-47ed-89a3-105c294e2cba_1024x631.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RRSE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6e9ba8e-5560-47ed-89a3-105c294e2cba_1024x631.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RRSE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6e9ba8e-5560-47ed-89a3-105c294e2cba_1024x631.png 424w, https://substackcdn.com/image/fetch/$s_!RRSE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6e9ba8e-5560-47ed-89a3-105c294e2cba_1024x631.png 848w, https://substackcdn.com/image/fetch/$s_!RRSE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6e9ba8e-5560-47ed-89a3-105c294e2cba_1024x631.png 1272w, https://substackcdn.com/image/fetch/$s_!RRSE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6e9ba8e-5560-47ed-89a3-105c294e2cba_1024x631.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RRSE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6e9ba8e-5560-47ed-89a3-105c294e2cba_1024x631.png" width="1024" height="631" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f6e9ba8e-5560-47ed-89a3-105c294e2cba_1024x631.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:631,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1088319,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/194980096?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed99c244-332a-4c81-baa4-bc05b22811bb_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!RRSE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6e9ba8e-5560-47ed-89a3-105c294e2cba_1024x631.png 424w, https://substackcdn.com/image/fetch/$s_!RRSE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6e9ba8e-5560-47ed-89a3-105c294e2cba_1024x631.png 848w, https://substackcdn.com/image/fetch/$s_!RRSE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6e9ba8e-5560-47ed-89a3-105c294e2cba_1024x631.png 1272w, https://substackcdn.com/image/fetch/$s_!RRSE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6e9ba8e-5560-47ed-89a3-105c294e2cba_1024x631.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Opus 4.7 dropped last week. Lots of excitement. Then the other shoe dropped. I ran identical coding tasks against both Opus 4.6 and 4.7 to see if the capability improvements justify the cost increase. Both models passed all 10 tests. The quality difference is real &#8212; but Opus 4.7 consumed 3.6&#215; more tokens and cost 3.6&#215; more for the same outcome.</p><p>That&#8217;s not a typo. Same task, same success rate, nearly 4&#215; the cost.  I have the receipts.</p><p>Anthropic says &#8220;pricing unchanged&#8221; because the per-token rates stayed the same. What they don&#8217;t mention is that Opus 4.7 systematically burns through more tokens to complete identical work. The model writes, then revises. Opus 4.6 writes correctly the first time. Both approaches work. Only one bills you for the revision process.</p><h2>TL;DR</h2><ul><li><p><strong>Both models passed 10/10 tests</strong>: Quality improvements are measurable but incremental (better typing, more thorough code)</p></li><li><p><strong>3.6&#215; cost increase for identical outcomes</strong>: $0.38 vs $1.38 for a 30-minute coding task in controlled testing</p></li><li><p><strong>Token consumption drives cost, not capability</strong>: Opus 4.7&#8217;s iterative working style consumes 2.9&#215; more output tokens per task</p></li><li><p><strong>Agentic mode required</strong>: One-shot testing shows 4.7 fails 9/10 tests without tool access, while 4.6 passes perfectly</p></li><li><p><strong>Per-token rates unchanged, real bill moved anyway</strong>: 4.7 burns 4.8&#215; more cache tokens per task &#8212; the rate card stays flat, your invoice doesn&#8217;t</p></li></ul><h2>The Controlled Test</h2><p>Last week I ran a head-to-head benchmark using the same Level 1 coding task for both models: build a complete Python CLI tool (Markdown Table Formatter) from scratch with full test coverage.</p><p><strong>Test setup:</strong></p><ul><li><p>Framework: <code>claude_agent_sdk</code> v0.1.64, full agentic mode</p></li><li><p>Models: <code>claude-opus-4-6</code> vs <code>claude-opus-4-7</code></p></li><li><p>Success criteria: Pass all 10 provided pytest tests</p></li><li><p>Execution: Concurrent runs with identical prompts</p></li></ul><p>Both models succeeded. The difference was entirely in how they got there.</p><h3>The Numbers</h3><p>Metric Opus 4.6 Opus 4.7 Ratio Wall clock time 114.8s 259.1s 2.3&#215; slower Agent turns 17 23 35% more Output tokens 6,384 18,289 <strong>2.9&#215; more</strong> Cache read tokens 215,853 1,034,165 4.8&#215; more <strong>Total cost</strong> <strong>$0.38</strong> <strong>$1.38</strong> <strong>3.6&#215; more expensive</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!t0xC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94f2c7bf-9b23-4a7a-bf78-c32f92ffb159_1024x461.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!t0xC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94f2c7bf-9b23-4a7a-bf78-c32f92ffb159_1024x461.png 424w, https://substackcdn.com/image/fetch/$s_!t0xC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94f2c7bf-9b23-4a7a-bf78-c32f92ffb159_1024x461.png 848w, https://substackcdn.com/image/fetch/$s_!t0xC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94f2c7bf-9b23-4a7a-bf78-c32f92ffb159_1024x461.png 1272w, https://substackcdn.com/image/fetch/$s_!t0xC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94f2c7bf-9b23-4a7a-bf78-c32f92ffb159_1024x461.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!t0xC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94f2c7bf-9b23-4a7a-bf78-c32f92ffb159_1024x461.png" width="1024" height="461" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/94f2c7bf-9b23-4a7a-bf78-c32f92ffb159_1024x461.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:461,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:408227,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/194980096?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8c51309-ac37-4b87-99c4-8b119ee71742_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!t0xC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94f2c7bf-9b23-4a7a-bf78-c32f92ffb159_1024x461.png 424w, https://substackcdn.com/image/fetch/$s_!t0xC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94f2c7bf-9b23-4a7a-bf78-c32f92ffb159_1024x461.png 848w, https://substackcdn.com/image/fetch/$s_!t0xC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94f2c7bf-9b23-4a7a-bf78-c32f92ffb159_1024x461.png 1272w, https://substackcdn.com/image/fetch/$s_!t0xC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94f2c7bf-9b23-4a7a-bf78-c32f92ffb159_1024x461.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>The Behavioral Fingerprint</h3><p>The tool usage patterns reveal why 4.7 costs more:</p><p>Tool Opus 4.6 Opus 4.7 Write 6 7 Bash 6 9 Read 4 1 <strong>Edit</strong> <strong>0</strong> <strong>5</strong></p><p>Opus 4.7 made 5 Edit calls to revise files after writing them. Opus 4.6 made zero &#8212; it wrote all 6 source files correctly in a single pass, ran pytest once, passed 10/10 tests.</p><p>The cache token burn (4.8&#215; more) suggests 4.7 does extended internal reasoning between each tool call. It&#8217;s thinking harder, which shows up in better code quality &#8212; more type hints (35 vs 25 function definitions), more thorough coverage (820 vs 471 lines of code). But you pay for that thinking process.</p><h3>Quality vs Cost Trade-off</h3><p>The output quality difference is genuine. Opus 4.7&#8217;s code was more defensively written &#8212; better typed, more thorough on edge cases. When you&#8217;re in the middle of debugging a genuinely hairy distributed system problem or making an architectural call with real downstream implications, that extra care is worth something.</p><p>But for this Level 1 coding challenge, both approaches delivered identical functionality. The question becomes: is 40% better typing and 74% more comprehensive coverage worth 260% higher costs?</p><h2>When the Premium Isn&#8217;t Optional</h2><p>I tested both models in one-shot mode (no tools, single response) to see if you could avoid the iterative cost overhead.</p><p>Metric Opus 4.6 Opus 4.7 Output tokens 4,725 23,907 Cost $0.27 $0.94 Tests passed <strong>10/10</strong> <strong>1/10</strong></p><p>Opus 4.7 failed catastrophically without tool access. It generated 5&#215; more tokens but couldn&#8217;t follow output format instructions &#8212; most files were unparseable and 9 of 10 tests failed. Opus 4.6 passed perfectly on the first attempt.</p><p>This reveals a structural dependency: Opus 4.7&#8217;s quality advantages require the full agentic feedback loop. You can&#8217;t switch to a cheaper execution mode to control costs. The iterative self-correction that makes it better is also what makes it expensive &#8212; there&#8217;s no cheaper version of how this model works.</p><h2>Scale It Out</h2><p>Scale these numbers to something realistic:</p><p><strong>100 equivalent coding tasks per day:</strong></p><ul><li><p>Opus 4.6: ~$38/day &#8594; ~$13,870/year</p></li><li><p>Opus 4.7: ~$138/day &#8594; ~$50,370/year</p></li><li><p><strong>Annual cost increase: +$36,500</strong></p></li></ul><p>This matches what production teams are reporting. The <a href="https://www.finout.io/blog/claude-opus-4.7-pricing-the-real-cost-story-behind-the-unchanged-price-tag">Finout analysis</a> documented overnight cost jumps from $500 to $675/day after deploying 4.7. My testing provides a mechanistic explanation: the model&#8217;s working style is token-intensive by design.</p><p>The cost increase compounds with Anthropic&#8217;s separate tokenizer changes that <a href="https://byteiota.com/claude-opus-4-7-tokenizer-35-cost-inflation-hits-api-users/">increase consumption up to 35%</a> for identical prompts. You get hit twice: more tokens per task, plus each token costs more to count.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Fa_P!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4b637c7-4d0e-444b-87c0-ec3089be81af_900x664.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Fa_P!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4b637c7-4d0e-444b-87c0-ec3089be81af_900x664.png 424w, https://substackcdn.com/image/fetch/$s_!Fa_P!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4b637c7-4d0e-444b-87c0-ec3089be81af_900x664.png 848w, https://substackcdn.com/image/fetch/$s_!Fa_P!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4b637c7-4d0e-444b-87c0-ec3089be81af_900x664.png 1272w, https://substackcdn.com/image/fetch/$s_!Fa_P!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4b637c7-4d0e-444b-87c0-ec3089be81af_900x664.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Fa_P!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4b637c7-4d0e-444b-87c0-ec3089be81af_900x664.png" width="900" height="664" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c4b637c7-4d0e-444b-87c0-ec3089be81af_900x664.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:664,&quot;width&quot;:900,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1101576,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/194980096?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff56b5641-d94f-4ea0-bbd8-d6c9a8f2643a_900x900.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Fa_P!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4b637c7-4d0e-444b-87c0-ec3089be81af_900x664.png 424w, https://substackcdn.com/image/fetch/$s_!Fa_P!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4b637c7-4d0e-444b-87c0-ec3089be81af_900x664.png 848w, https://substackcdn.com/image/fetch/$s_!Fa_P!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4b637c7-4d0e-444b-87c0-ec3089be81af_900x664.png 1272w, https://substackcdn.com/image/fetch/$s_!Fa_P!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4b637c7-4d0e-444b-87c0-ec3089be81af_900x664.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>This Is Anthropic&#8217;s Move, Not the Industry&#8217;s</h2><p>Other providers aren&#8217;t doing this. <a href="https://www.swebench.com/">GPT-5.4 achieves comparable benchmark performance</a> without the tokenizer change. Anthropic can pull this off because they&#8217;re ahead on benchmarks right now &#8212; that&#8217;s the advantage, and they&#8217;re using it.</p><p>Which means this is actually a model selection problem, not a budget problem.</p><p>I&#8217;m not upgrading my agentic workflows to 4.7 by default. For complex architectural work where the reasoning depth matters &#8212; distributed systems debugging, refactoring decisions with downstream implications &#8212; yes, 4.7 earns the premium. For routine code generation, test writing, documentation? 4.6 passes the same tests at a quarter of the cost, as I just demonstrated.</p><p>Sonnet is even more aggressive on cost for work that doesn&#8217;t need Opus-level reasoning at all. I&#8217;ve been pushing more of my day-to-day agentic tasks there.</p><p>GPT-5.4 is worth keeping in rotation too. Comparable coding benchmark performance, no tokenizer games, and the competitive pressure helps if you ever need to push back on Anthropic pricing.</p><p>The <a href="https://www.reddit.com/r/MachineLearning/">Reddit community</a> caught the tokenizer changes within hours of release while Anthropic&#8217;s communications stayed focused on &#8220;unchanged pricing.&#8221; That&#8217;s the early warning system. Watch community cost reports when a new model drops, not the vendor announcement.</p><p>Anthropic will keep doing this as long as they&#8217;re leading. The way you stay ahead of it is knowing your actual token consumption per task &#8212; not the rate card, the real burn &#8212; and routing work to the cheapest model that gets the job done.</p><div><hr></div><p><em>Bob Matsuoka is CTO of <a href="https://www.duettocloud.com/">Duetto</a> and writes about AI-powered engineering at <a href="https://hyperdev.substack.com/">HyperDev</a>.</em></p><p><strong>Related reading:</strong></p><ul><li><p><a href="https://aipowerranking.com/">AI Power Ranking</a> &#8212; Tool comparisons and benchmarks for AI practitioners</p></li><li><p><a href="https://www.linkedin.com/newsletters/ai-power-ranking-7345782916301418496/">LinkedIn Newsletter</a> &#8212; Strategic AI insights for CTOs and engineering leaders</p></li></ul>]]></content:encoded></item><item><title><![CDATA[It’s The Harness, Stupid!]]></title><description><![CDATA[Why AI tool orchestration now matters more than foundation model quality]]></description><link>https://hyperdev.matsuoka.com/p/its-the-harness-stupid</link><guid isPermaLink="false">https://hyperdev.matsuoka.com/p/its-the-harness-stupid</guid><dc:creator><![CDATA[Robert Matsuoka]]></dc:creator><pubDate>Mon, 13 Apr 2026 17:17:16 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!376u!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c1f4dab-a192-4b50-bbbb-5c889e29af13_1024x523.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!376u!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c1f4dab-a192-4b50-bbbb-5c889e29af13_1024x523.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!376u!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c1f4dab-a192-4b50-bbbb-5c889e29af13_1024x523.png 424w, https://substackcdn.com/image/fetch/$s_!376u!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c1f4dab-a192-4b50-bbbb-5c889e29af13_1024x523.png 848w, https://substackcdn.com/image/fetch/$s_!376u!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c1f4dab-a192-4b50-bbbb-5c889e29af13_1024x523.png 1272w, https://substackcdn.com/image/fetch/$s_!376u!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c1f4dab-a192-4b50-bbbb-5c889e29af13_1024x523.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!376u!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c1f4dab-a192-4b50-bbbb-5c889e29af13_1024x523.png" width="1024" height="523" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5c1f4dab-a192-4b50-bbbb-5c889e29af13_1024x523.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:523,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1201765,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/193459844?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4af78c41-cc16-45d4-abb7-f0e0ee92722b_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!376u!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c1f4dab-a192-4b50-bbbb-5c889e29af13_1024x523.png 424w, https://substackcdn.com/image/fetch/$s_!376u!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c1f4dab-a192-4b50-bbbb-5c889e29af13_1024x523.png 848w, https://substackcdn.com/image/fetch/$s_!376u!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c1f4dab-a192-4b50-bbbb-5c889e29af13_1024x523.png 1272w, https://substackcdn.com/image/fetch/$s_!376u!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c1f4dab-a192-4b50-bbbb-5c889e29af13_1024x523.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>It&#8217;s The Harness, Stupid!</h2><p><strong>Why AI tool orchestration now matters more than foundation model quality</strong></p><p><em>Author: Bob Matsuoka, CTO @ Duetto Research</em> <br><em>April 6, 2026</em></p><h2>TL;DR</h2><ul><li><p>Same-model testing reveals 0.82-point quality spread (3.93 to 4.75) and 7x efficiency differences&#8212;orchestration dominates outcomes</p></li><li><p>Market validation: Claude maintains 70% developer preference despite GPT-5.4 achieving model parity through superior harness quality</p></li><li><p>Reddit analysis confirms Codex efficiency gains come from orchestration improvements, not just model upgrades</p></li><li><p>Competitive advantage has shifted permanently from model superiority to ecosystem superiority</p></li></ul><p><strong>Bottom line: The harness era has begun. Choose tools based on workflow fit, not benchmark claims.</strong></p><h2>The $50B Model Myth</h2><p>The AI industry has a fixation problem. Every week brings breathless announcements about parameter counts, training costs, and benchmark scores. &#8220;GPT-6 has 50 trillion parameters!&#8221; &#8220;Our model scored 94.7% on SWE-bench!&#8221; &#8220;We spent $2 billion on compute!&#8221;</p><p>Three converging pieces of evidence prove this approach is fundamentally wrong.</p><div class="callout-block" data-callout="true"><p><strong>Evidence #1:</strong> I tested eight AI coding agents across five programming challenges. Four agents used identical Claude Sonnet 4.6 models. Quality scores ranged from 3.93 to 4.75&#8212;a 0.82-point spread on the same foundation model.</p></div><div class="callout-block" data-callout="true"><p><strong>Evidence #2:</strong> GPT-5.4 achieved parity with Claude Sonnet 4.6 on coding benchmarks. Yet Claude maintains 70% developer preference through superior ecosystem quality.</p></div><div class="callout-block" data-callout="true"><p><strong>Evidence #3:</strong> Reddit developer communities confirm Codex&#8217;s efficiency improvements come from orchestration architecture changes, not just model upgrades.</p></div><p><strong>The harness matters more than the model.</strong> Choosing an AI coding tool is now primarily an engineering decision, not a model selection decision. The next competitive advantage isn&#8217;t bigger models&#8212;it&#8217;s better orchestration.</p><h2>Evidence Pillar #1: The Smoking Gun Laboratory Data</h2><h3>The Bake-Off Setup</h3><p>I designed five programming challenges ranging from 30-minute tasks to 8-hour full-stack builds:</p><ul><li><p><strong>Level 1-2:</strong> Simple scripts and basic applications</p></li><li><p><strong>Level 3:</strong> API integration with Docker containerization</p></li><li><p><strong>Level 4:</strong> Extensible data processing pipeline (architecture test)</p></li><li><p><strong>Level 5:</strong> Full-stack web application with authentication</p></li></ul><p>Eight agents competed: Claude Code, Claude MPM, Codex, Gemini CLI, Auggie, Qwen+Aider, DeepSeek+Aider, and Warp AI. Each received identical prompts. A panel of expert developers blind-reviewed all submissions across eight criteria: functionality, correctness, best practices, architecture, code reuse, testing, error handling, and documentation.</p><h3>The Harness Advantage Data</h3><p><strong>Table 1: Same Model, Different Worlds</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TErH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47d85d71-6503-4452-ad8c-957585385133_867x409.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TErH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47d85d71-6503-4452-ad8c-957585385133_867x409.png 424w, https://substackcdn.com/image/fetch/$s_!TErH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47d85d71-6503-4452-ad8c-957585385133_867x409.png 848w, https://substackcdn.com/image/fetch/$s_!TErH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47d85d71-6503-4452-ad8c-957585385133_867x409.png 1272w, https://substackcdn.com/image/fetch/$s_!TErH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47d85d71-6503-4452-ad8c-957585385133_867x409.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TErH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47d85d71-6503-4452-ad8c-957585385133_867x409.png" width="867" height="409" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/47d85d71-6503-4452-ad8c-957585385133_867x409.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:409,&quot;width&quot;:867,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:74882,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/193459844?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47d85d71-6503-4452-ad8c-957585385133_867x409.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!TErH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47d85d71-6503-4452-ad8c-957585385133_867x409.png 424w, https://substackcdn.com/image/fetch/$s_!TErH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47d85d71-6503-4452-ad8c-957585385133_867x409.png 848w, https://substackcdn.com/image/fetch/$s_!TErH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47d85d71-6503-4452-ad8c-957585385133_867x409.png 1272w, https://substackcdn.com/image/fetch/$s_!TErH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47d85d71-6503-4452-ad8c-957585385133_867x409.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Four agents using identical Claude Sonnet 4.6 models. Quality scores from 3.93 to 4.75&#8212;a 0.82-point spread. <a href="https://github.com/bobmatnyc/claude-mpm">claude-mpm</a> finished in 45 minutes while warp took 313 minutes. Almost <strong>7x longer for lower quality results</strong>.</p><h3>The Scaling Pattern</h3><p>The harness advantage compounds with complexity:</p><ul><li><p><strong>Levels 1-2:</strong> All agents performed similarly. Simple tasks don&#8217;t reveal orchestration differences.</p></li><li><p><strong>Level 3:</strong> API integration and Docker setup separated agents that plan from those that code-and-fix. Clear gaps emerged.</p></li><li><p><strong>Levels 4-5:</strong> Architecture and full-stack challenges broke most agents. Only well-orchestrated systems completed the complex workflows.</p></li></ul><p>The pattern is clear: as complexity increases, harness quality becomes the primary determinant of success.</p><h2>Evidence Pillar #2: Market Validation &#8212; GPT-5.4 Caught Up</h2><h3>Model Parity Achievement</h3><p>February-April 2026 benchmarks confirm <strong>GPT-5.4 has achieved parity with Claude Sonnet 4.6</strong>:</p><p><strong>Core Benchmarks:</strong></p><ul><li><p><strong>SWE-bench Verified</strong>: GPT-5.4 ~80% vs Claude 79.6% (statistical tie)</p></li><li><p><strong>SWE-bench Pro</strong>: GPT-5.4 57.7% vs Claude 43.6% (GPT leads complex problems)</p></li><li><p><strong>Terminal-Bench</strong>: GPT-5.4 75.1% vs Claude ~65% (DevOps advantage)</p></li><li><p><strong>Context handling</strong>: Both models feature 1M token windows</p></li></ul><h3>Yet Claude Still Dominates Through Harness Advantages</h3><p>Despite achieving model parity, the competitive landscape tells the harness story:</p><p><strong>Market Reality:</strong></p><ul><li><p><strong>Developer preference</strong>: Claude 70% (superior workflow integration)</p></li><li><p><strong>Enterprise share</strong>: Anthropic +4.9% MoM growth, OpenAI -1.5% decline</p></li><li><p><strong>Revenue</strong>: Claude Code $2B ARR in 6 months</p></li></ul><p><strong>Even when models reach parity, harness quality determines adoption.</strong></p><h3>The Multi-Model Strategic Reality</h3><p>Leading organizations aren&#8217;t choosing between models anymore&#8212;they&#8217;re deploying <strong>three-tier strategic architectures</strong> based on cost-performance optimization:</p><p><strong>Tier 1: Daily Workhorse (60-70% of requests)</strong></p><ul><li><p><strong>Claude Sonnet 4.6</strong>: <a href="https://medium.com/@mkteam/gpt-5-4-vs-claude-sonnet-4-6-2026-the-ultimate-ai-model-comparison-49526cac8b14">$3/$15 per million tokens</a></p></li><li><p>High-volume development, routine coding tasks</p></li><li><p><a href="https://www.nxcode.io/resources/news/claude-sonnet-4-6-vs-gpt-5-4-coding-comparison-2026">95%+ of premium model quality at half the cost</a></p></li><li><p>Default choice for most enterprise development work</p></li></ul><p><strong>Tier 2: Specialized Operations (20-30% of requests)</strong></p><ul><li><p><strong>GPT-5.4</strong>: <a href="https://medium.com/@mkteam/gpt-5-4-vs-claude-sonnet-4-6-2026-the-ultimate-ai-model-comparison-49526cac8b14">$2.50/$15 per million tokens</a></p></li><li><p>Terminal operations, DevOps workflows, CI/CD debugging</p></li><li><p><a href="https://www.morphllm.com/best-ai-model-for-coding">75.1% Terminal-Bench score (10-point lead over competitors)</a></p></li><li><p><a href="https://medium.com/@ricardomsgarces/openai-codex-vs-github-copilot-why-codex-is-winning-the-future-of-coding-f9a2767695b0">Inherited Codex&#8217;s terminal operation dominance</a></p></li></ul><p><strong>Tier 3: Premium Analysis (10-20% of requests)</strong></p><ul><li><p><strong>Claude Opus 4.6</strong>: <a href="https://medium.com/@mkteam/gpt-5-4-vs-claude-sonnet-4-6-2026-the-ultimate-ai-model-comparison-49526cac8b14">$5/$25 per million tokens</a></p></li><li><p>Complex reasoning, architectural decisions, high-stakes analysis</p></li><li><p><a href="https://help.apiyi.com/en/gpt-5-4-vs-claude-opus-4-6-comparison-2026-en.html">World leader in abstract reasoning (87.4% vs GPT-5.4&#8217;s 83.9%)</a></p></li><li><p>When cost justifies maximum capability</p></li></ul><p>This confirms the core thesis: when models are &#8220;good enough,&#8221; teams optimize for <strong>strategic cost-performance fit</strong>, not raw capability or marketing claims.</p><h2>Evidence Pillar #3: Community Validation &#8212; The Codex Orchestration Story</h2><h3>Reddit Confirms Orchestration Improvements</h3><p>Reddit research explains Codex&#8217;s impressive efficiency results (42 minutes, 4.49 quality score). The evidence confirms improvements come from orchestration, not just model upgrades.</p><p><strong>Architectural Evolution Evidence:</strong></p><ul><li><p><a href="https://medium.com/@aliazimidarmian/openai-codex-from-2021-code-model-to-a-2025-autonomous-coding-agent-85ef0c48730a">Codex evolved from &#8220;embedded assistant&#8221; &#8594; &#8220;independent agent with multi-agent orchestration&#8221;</a></p></li><li><p><a href="https://www.digitalapplied.com/blog/gpt-5-2-codex-openai-model-guide-2026">GPT-5.2-Codex (Jan 2026) with 192K context + MCP tool orchestration</a></p></li><li><p><a href="https://developers.openai.com/blog/openai-for-developers-2025">&#8220;Command center for agents&#8221; interface launched Feb 2026</a></p></li></ul><p><strong>Workflow Efficiency Improvements:</strong></p><ul><li><p><a href="https://reelmind.ai/blog/openai-codex-code-generation-features-reddit-developer-insights">Developers report queuing &#8220;4-5 Codex tasks before diving into manual work&#8221;</a></p></li><li><p><a href="https://reelmind.ai/blog/openai-codex-code-generation-features-reddit-developer-insights">&#8220;2-3 completed PRs waiting for review&#8221; after a coffee break</a></p></li><li><p><a href="https://www.nxcode.io/resources/news/openai-codex-app-review-2026">P99 response time 45ms vs Copilot&#8217;s 55ms through better context management</a></p></li><li><p><strong>Parallel processing capabilities</strong> that enable true background orchestration</p></li></ul><p><strong>Enterprise Orchestration Benefits:</strong></p><ul><li><p><strong><a href="https://www.quantumrun.com/consulting/openai-codex-statistics/">70% more pull requests</a></strong><a href="https://www.quantumrun.com/consulting/openai-codex-statistics/"> merged weekly at OpenAI</a></p></li><li><p><strong><a href="https://www.quantumrun.com/consulting/openai-codex-statistics/">50% reduction</a></strong><a href="https://www.quantumrun.com/consulting/openai-codex-statistics/"> in code review times at Cisco</a></p></li><li><p><strong><a href="https://www.quantumrun.com/consulting/openai-codex-statistics/">67% reduction</a></strong><a href="https://www.quantumrun.com/consulting/openai-codex-statistics/"> in median turnaround time at Duolingo</a></p></li><li><p><strong><a href="https://www.quantumrun.com/consulting/openai-codex-statistics/">90% Fortune 100 adoption</a></strong><a href="https://www.quantumrun.com/consulting/openai-codex-statistics/"> validates orchestration value at scale</a></p></li></ul><h3>The Community Strategic Deployment Pattern</h3><p>Reddit developers now recommend <strong>different tools for different purposes</strong>:</p><ul><li><p><strong>Claude Code</strong>: Code quality and reasoning</p></li><li><p><strong>Cursor</strong>: Daily coding integration</p></li><li><p><strong>OpenAI Codex</strong>: Complex multi-agent workflows and long-horizon autonomy</p></li></ul><p>This matches exactly what the market data predicted: teams use orchestrated tools strategically rather than seeking one universal solution.</p><h2>The Harness Quality Ladder</h2><p>Based on all three evidence pillars, I see four tiers of orchestration quality emerging:</p><p><strong>Tier 1: Basic Wrappers</strong></p><ul><li><p>Simple API access, minimal context management</p></li><li><p>Examples: Raw ChatGPT interface, basic API wrappers</p></li><li><p>Limitation: No file coordination, poor context retention</p></li></ul><p><strong>Tier 2: Workflow Tools</strong></p><ul><li><p>File awareness, some context management</p></li><li><p>Examples: GitHub Copilot, basic IDE extensions</p></li><li><p>Capability: Single-file optimization, limited cross-file understanding</p></li></ul><p><strong>Tier 3: Orchestrated Systems</strong></p><ul><li><p>Multi-file coordination, workflow integration</p></li><li><p>Examples: Cursor, Claude Code, well-configured aider</p></li><li><p>Advantage: Understands project structure, handles complex tasks</p></li></ul><p><strong>Tier 4: Agentic Frameworks</strong></p><ul><li><p>Multi-agent coordination, planning, verification</p></li><li><p>Examples: claude-mpm, advanced orchestration systems</p></li><li><p>Power: Full project lifecycle, quality assurance, architectural thinking</p></li></ul><p>The performance cliff between tiers is exponential, not linear. Bad orchestration can make great models perform poorly; great orchestration can make good models perform excellently.</p><h2>Academic and Industry Validation</h2><p>This isn&#8217;t just empirical observation. Multiple 2026 research papers and industry studies support the harness thesis:</p><p><strong>Academic Consensus:</strong><br>The arXiv paper <a href="https://arxiv.org/html/2511.14136v1">&#8220;Beyond Accuracy: A Multi-Dimensional Framework for Evaluating Enterprise Agentic AI Systems&#8221;</a> shows that domain-tuned models with better orchestration achieve superior cost-normalized accuracy despite using smaller base models.</p><p><a href="https://pricepertoken.com/leaderboards/benchmark/humaneval">SWE-bench data</a> reveals the same pattern. Cursor, Claude Code, and Auggie all use similar base models yet score between 50.2% and 55.4%, while the raw model score is only 45.9%. The 5.9-point improvement comes entirely from better context retrieval and agent design.</p><p><strong>Business Reality Check:</strong><br><a href="https://claude5.com/news/enterprise-ai-adoption-2026-how-businesses-deploy-claude-gpt">Enterprise adoption surveys</a> show a clear shift in CTO priorities. &#8220;Model performance&#8221; is dropping in tool evaluation criteria, replaced by governance, integration quality, and workflow fit. As one 2026 McKinsey report put it: &#8220;CTOs are realizing their biggest bottleneck isn&#8217;t model performance&#8212;it&#8217;s governance.&#8221;</p><h2>What This Means for Engineering Leaders</h2><h3>Stop Optimizing for Benchmarks</h3><p>The old procurement mindset was model-first: &#8220;We need access to GPT-6 for competitive advantage.&#8221; The new reality is that benchmark performance doesn&#8217;t predict practical utility. SWE-bench scores don&#8217;t tell you whether a tool will integrate with your existing workflow, handle your codebase size, or recover gracefully from errors.</p><p>Start evaluating harness quality:</p><ul><li><p><strong>Context management:</strong> How well does it understand your project structure?</p></li><li><p><strong>File coordination:</strong> Can it work intelligently across multiple files?</p></li><li><p><strong>Error recovery:</strong> Does it handle failures gracefully or require constant babysitting?</p></li><li><p><strong>Workflow integration:</strong> How does it fit with your team&#8217;s existing development process?</p></li></ul><h3>Budget for Orchestration Quality</h3><p>The three evidence pillars show that investing in better orchestration yields measurable returns:</p><ul><li><p><strong>Quality per minute:</strong> claude-mpm achieved 4.75 quality in 45 minutes; warp achieved 3.94 in 313 minutes</p></li><li><p><strong>Market validation:</strong> Claude maintains dominance despite model parity through superior developer experience</p></li><li><p><strong>Enterprise results:</strong> 70% more PRs, 50% faster code review, 67% faster turnaround</p></li></ul><p>The ROI case for harness investment is clear and quantifiable.</p><h3>Team Productivity Focus</h3><p>Tool choice impacts your entire development pipeline. The 7x speed difference between well and poorly orchestrated tools using the same model means tool selection is a productivity multiplier, not just a capability decision.</p><p>Better tools also reduce onboarding time and increase adoption rates. A tool that works reliably gets used; one that requires constant troubleshooting gets abandoned.</p><h2>The Competitive Landscape Evolution</h2><h3>Codex Deserves Recognition</h3><p>Codex&#8217;s performance has significantly improved. At 42 minutes for all five levels with a 4.49 quality score, it achieved by far the best efficiency in my study. GPT-5.4+ combined with the orchestration improvements OpenAI made represents a compelling package. The Reddit research confirms this wasn&#8217;t just a model upgrade&#8212;it was an architectural evolution toward multi-agent orchestration.</p><h3>Claude Code&#8217;s Harness Moat</h3><p>While Claude Code performed well (4.53 quality score), the market validation shows its true strength: <strong>ecosystem superiority</strong>. Despite GPT-5.4 achieving model parity, Claude maintains 70% developer preference through superior harness quality. This is exactly what sustainable competitive advantage looks like in the post-parity era.</p><h3>The Multi-Model Future</h3><p>All evidence points to the same conclusion: the era of picking one model is over. Leading organizations deploy <strong>three-tier cost-performance architectures</strong>, optimizing for specific strengths rather than seeking universal solutions.</p><p>Real enterprise case studies validate this pattern:</p><ul><li><p><strong><a href="https://www.datastudios.org/post/claude-in-the-enterprise-case-studies-of-ai-deployments-and-real-world-results">TELUS (57,000 employees)</a></strong><a href="https://www.datastudios.org/post/claude-in-the-enterprise-case-studies-of-ai-deployments-and-real-world-results">: Uses Sonnet as core engine across developer teams</a></p></li><li><p><strong><a href="https://www.datastudios.org/post/claude-in-the-enterprise-case-studies-of-ai-deployments-and-real-world-results">Zapier</a></strong><a href="https://www.datastudios.org/post/claude-in-the-enterprise-case-studies-of-ai-deployments-and-real-world-results">: 800+ internal agents using strategic model selection</a></p></li><li><p><strong><a href="https://devtk.ai/en/blog/claude-api-pricing-guide-2026/">Financial Services</a></strong><a href="https://devtk.ai/en/blog/claude-api-pricing-guide-2026/">: Monthly costs ~$80 at massive scale through optimized routing</a></p></li></ul><p>The successful pattern: <strong>Sonnet for volume, GPT-5.4 for DevOps, Opus for complexity</strong>.</p><h2>The Token Economics Reality</h2><p>claude-mpm achieved the highest quality score (4.75) but used 87 million tokens versus codex&#8217;s 120K. This looks expensive until you consider the output: 262 comprehensive tests (vs codex&#8217;s 32), complete documentation, 100% verification rates, and multi-file coordination (note: this was also a wake-up call to me to focus on token optimization, current version is much stingier)</p><p>The 700x token multiplier isn&#8217;t overhead&#8212;it&#8217;s the cost of work a solo agent skips. <strong>Orchestration doesn&#8217;t waste tokens&#8212;it spends them on comprehensive deliverables.</strong></p><p>The optimization question: Could you achieve 80% of the quality benefits at 30% of the token cost? The opportunity isn&#8217;t eliminating orchestration&#8212;it&#8217;s finding the minimal viable team size for maximum impact.</p><h3>The Vendor Bias Problem: &#8220;Opus for Everything&#8221;</h3><p>Boris Cherny, the Claude Code lead, recently advocated for using &#8220;Opus for everything.&#8221; This perfectly illustrates the disconnect between vendor recommendations and practical deployment reality.</p><p><strong>Only someone working for Anthropic can say that.</strong></p><p>When your employer provides unlimited access to premium models, of course you&#8217;d recommend the most expensive option for every task. But real organizations operating with P&amp;L responsibility make strategic decisions about when premium capability justifies premium cost.</p><p>This vendor bias actually <strong>validates the multi-model thesis</strong>:</p><ul><li><p><strong>Vendors say:</strong> &#8220;Use our premium model for everything&#8221;</p></li><li><p><strong>Users do:</strong> Strategic model selection based on task complexity and budget constraints</p></li><li><p><strong>Market reality:</strong> 70% prefer Claude for daily coding (cost/speed), GPT-5.4 for complex reasoning (quality ceiling)</p></li></ul><p>Cherny&#8217;s comment inadvertently proves that <strong>cost-conscious orchestration</strong> is the real competitive battleground. Companies that figure out optimal model routing&#8212;not maximal model usage&#8212;will have sustainable advantages.</p><p>The vendors push premium. The market chooses strategically. <strong>The harness makes both possible.</strong></p><h2>The Future: Welcome to the Harness Era</h2><h3>What Changes for Developers</h3><p>Tool selection framework:</p><ol><li><p><strong>Workflow fit:</strong> Does it match how your team works?</p></li><li><p><strong>Integration quality:</strong> Plays well with existing tools?</p></li><li><p><strong>Reliability:</strong> Can you trust it with production code?</p></li><li><p><strong>Model quality:</strong> Fourth priority</p></li></ol><h3>What Changes for the Industry</h3><p>Foundation models are becoming commodities. Differentiation shifts to integration, context management, and user experience. The next unicorns will be harness companies, not model companies.</p><p>Major funding flows to orchestration companies. Enterprise procurement evaluates integration first, model second.</p><h3>The Competitive Moat Shift</h3><p>The old game was: train bigger models, claim benchmark superiority. The new game is: build better orchestration, solve real workflow problems. Model access becomes a utility; workflow mastery becomes the moat.</p><h2>Practical Recommendations</h2><h3>For CTOs and Engineering Leaders</h3><ul><li><p><strong>Audit orchestration quality</strong>: Test tools with your actual codebase for 2-week trials</p></li><li><p><strong>Budget 60/40</strong>: Spend more on harness development than model subscription fees</p></li><li><p><strong>Measure real metrics</strong>: Track pull request velocity and code review time, not benchmark scores</p></li><li><p><strong>Evaluate integration first</strong>: How well does it fit your existing CI/CD pipeline?</p></li></ul><h3>For Developers</h3><ul><li><p><strong>Test with real projects</strong>: Spend 2 days with each tool on actual work before deciding</p></li><li><p><strong>Learn orchestration patterns</strong>: Context management and file coordination matter more than prompts</p></li><li><p><strong>Invest in mastery</strong>: The 7x efficiency difference justifies significant learning time</p></li><li><p><strong>Ignore marketing claims</strong>: Model access means nothing without good orchestration</p></li></ul><h3>For the AI Industry</h3><ul><li><p><strong>Build for workflow integration</strong>: Solve real development pipeline problems</p></li><li><p><strong>Measure practical utility</strong>: Developer retention and task completion rates beat benchmarks</p></li><li><p><strong>Focus on context management</strong>: Multi-file coordination is the real competitive moat</p></li></ul><h2>Conclusion: The Questions That Matter Now</h2><p>The old question was: &#8220;What&#8217;s the best model?&#8221;</p><p>The new question is: &#8220;What&#8217;s the best harness for my team&#8217;s workflow?&#8221;</p><p>Three evidence sources prove we&#8217;ve crossed a threshold: foundation models are &#8220;good enough,&#8221; and orchestration quality now dominates outcomes. Laboratory testing, market validation, and community confirmation point to the same reality.</p><p>The foundation model is the engine. The harness is the car. The best engine in the world won&#8217;t get you anywhere without wheels.</p><p><strong>The harness era has begun. Drive accordingly.</strong></p><div><hr></div><p><em>Bob Matsuoka is CTO at <a href="https://www.duettocloud.com/">Duetto Research</a> and creator of <a href="https://github.com/bobmatnyc/claude-mpm">Claude MPM</a>, one of the agents evaluated in this study. All evaluation data and methodology are available at <a href="https://github.com/bobmatnyc/ai-coding-bake-off">github.com/bobmatnyc/ai-coding-bake-off</a> for reproducibility.</em></p><div><hr></div><h2>Appendix: Complete Results Data</h2><h3>Quality Scores by Criterion</h3><p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dSX7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96cbd80c-434e-456b-ae62-1fb565e1ec0d_981x368.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dSX7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96cbd80c-434e-456b-ae62-1fb565e1ec0d_981x368.png 424w, https://substackcdn.com/image/fetch/$s_!dSX7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96cbd80c-434e-456b-ae62-1fb565e1ec0d_981x368.png 848w, https://substackcdn.com/image/fetch/$s_!dSX7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96cbd80c-434e-456b-ae62-1fb565e1ec0d_981x368.png 1272w, https://substackcdn.com/image/fetch/$s_!dSX7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96cbd80c-434e-456b-ae62-1fb565e1ec0d_981x368.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dSX7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96cbd80c-434e-456b-ae62-1fb565e1ec0d_981x368.png" width="981" height="368" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/96cbd80c-434e-456b-ae62-1fb565e1ec0d_981x368.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:368,&quot;width&quot;:981,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:69754,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/193459844?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96cbd80c-434e-456b-ae62-1fb565e1ec0d_981x368.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dSX7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96cbd80c-434e-456b-ae62-1fb565e1ec0d_981x368.png 424w, https://substackcdn.com/image/fetch/$s_!dSX7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96cbd80c-434e-456b-ae62-1fb565e1ec0d_981x368.png 848w, https://substackcdn.com/image/fetch/$s_!dSX7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96cbd80c-434e-456b-ae62-1fb565e1ec0d_981x368.png 1272w, https://substackcdn.com/image/fetch/$s_!dSX7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96cbd80c-434e-456b-ae62-1fb565e1ec0d_981x368.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>GPT-5.4 vs Claude Sonnet 4.6 Market Data</h3><p><strong>SWE-bench Performance:</strong></p><ul><li><p>SWE-bench Verified: GPT-5.4 ~80% vs Claude 79.6% (statistical tie)</p></li><li><p>SWE-bench Pro: GPT-5.4 57.7% vs Claude 43.6% (GPT advantage on complex problems)</p></li><li><p>Terminal-Bench: GPT-5.4 75.1% vs Claude ~65% (GPT DevOps advantage)</p></li></ul><p><strong>Market Metrics:</strong></p><ul><li><p>Developer preference (daily coding): Claude 70%</p></li><li><p>Enterprise market share: Anthropic +4.9% MoM, OpenAI -1.5% MoM</p></li><li><p>Claude Code revenue: $2B ARR in 6 months</p></li></ul><h3>Methodology Notes</h3><ul><li><p><strong>Laboratory data:</strong> Single run evaluation with disclosed author bias</p></li><li><p><strong>Market data:</strong> Cross-validated across 15+ authoritative sources</p></li><li><p><strong>Community research:</strong> Reddit analysis across 8+ developer subreddits</p></li><li><p><strong>Statistical confidence:</strong> Mean inter-reviewer deviation of 0.216 points</p></li><li><p><strong>Reproducible:</strong> All data and prompts available in public repository</p></li></ul>]]></content:encoded></item><item><title><![CDATA[I Met a Movie Star Mila Jovovich — As a Coder]]></title><description><![CDATA[More evidence of the democratization of software]]></description><link>https://hyperdev.matsuoka.com/p/i-met-a-movie-star-mila-jovovich</link><guid isPermaLink="false">https://hyperdev.matsuoka.com/p/i-met-a-movie-star-mila-jovovich</guid><dc:creator><![CDATA[Robert Matsuoka]]></dc:creator><pubDate>Sat, 11 Apr 2026 12:31:14 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!_YC7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d0e03e6-ff0f-4040-ab56-8239bf91a20d_1024x659.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_YC7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d0e03e6-ff0f-4040-ab56-8239bf91a20d_1024x659.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_YC7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d0e03e6-ff0f-4040-ab56-8239bf91a20d_1024x659.png 424w, https://substackcdn.com/image/fetch/$s_!_YC7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d0e03e6-ff0f-4040-ab56-8239bf91a20d_1024x659.png 848w, https://substackcdn.com/image/fetch/$s_!_YC7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d0e03e6-ff0f-4040-ab56-8239bf91a20d_1024x659.png 1272w, https://substackcdn.com/image/fetch/$s_!_YC7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d0e03e6-ff0f-4040-ab56-8239bf91a20d_1024x659.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_YC7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d0e03e6-ff0f-4040-ab56-8239bf91a20d_1024x659.png" width="1024" height="659" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6d0e03e6-ff0f-4040-ab56-8239bf91a20d_1024x659.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:659,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1463075,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/193848267?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F188090d8-280b-48b0-81af-3a11dec4dac3_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_YC7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d0e03e6-ff0f-4040-ab56-8239bf91a20d_1024x659.png 424w, https://substackcdn.com/image/fetch/$s_!_YC7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d0e03e6-ff0f-4040-ab56-8239bf91a20d_1024x659.png 848w, https://substackcdn.com/image/fetch/$s_!_YC7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d0e03e6-ff0f-4040-ab56-8239bf91a20d_1024x659.png 1272w, https://substackcdn.com/image/fetch/$s_!_YC7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d0e03e6-ff0f-4040-ab56-8239bf91a20d_1024x659.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I didn&#8217;t expect to meet Mila Jovovich through a GitHub issue.</p><p>But there I was last week, deep-diving into her AI memory framework called <a href="https://github.com/milla-jovovich/mempalace">MemPalace</a>, when I discovered something remarkable: the &#8220;Resident Evil&#8221; and &#8220;Fifth Element&#8221; star had created one of the most talked-about AI memory systems of 2026. And she&#8217;d done it using Claude Code, the same AI-assisted development environment I use daily.</p><p>More remarkably, when I found critical bugs in her benchmark methodology, she responded directly through her Claude Code workflow, acknowledging the issues and implementing fixes. Not through a PR team or engineering intermediaries &#8212; Mila herself, using AI-assisted development to debug complex memory retrieval algorithms at 9 AM on a Thursday.</p><p>This isn&#8217;t a story about a celebrity coding stunt. It&#8217;s about something much more profound: we&#8217;ve entered an era where outcomes and features drive development, not the technical limitations of writing code.</p><h2>The MemPalace Phenomenon</h2><p>In April 2026, Mila Jovovich and developer Ben Sigman released MemPalace, an open-source AI memory system that immediately went viral. Within 48 hours, it had <a href="https://github.com/milla-jovovich/mempalace">over 23,000 GitHub stars</a>. The system claimed to achieve the first perfect score on the LongMemEval benchmark, scoring 96.6% raw recall.</p><p>The project represents something unprecedented: a free, locally-running memory system that rivals expensive cloud alternatives like Mem0 ($19-249/month) and Zep ($25+/month). It uses the &#8220;memory palace&#8221; technique &#8212; a classical memory method dating back to ancient Greece &#8212; implemented through ChromaDB and SQLite, with zero ongoing API costs.</p><p>The technical architecture includes basic Claude Code integration (save hooks every 15 messages and before context compression) and 24 tools via the Model Context Protocol (MCP), making it compatible across multiple AI platforms.</p><p>The duo had spent months building it using Claude Code&#8217;s AI-assisted development environment. As Sigman noted, he provided &#8220;the engineering chops&#8221; while Jovovich drove the architectural vision.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://github.com/milla-jovovich" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!93ZZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d7e3197-469d-4e18-86f1-3033d8bd4a27_459x460.jpeg 424w, https://substackcdn.com/image/fetch/$s_!93ZZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d7e3197-469d-4e18-86f1-3033d8bd4a27_459x460.jpeg 848w, https://substackcdn.com/image/fetch/$s_!93ZZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d7e3197-469d-4e18-86f1-3033d8bd4a27_459x460.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!93ZZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d7e3197-469d-4e18-86f1-3033d8bd4a27_459x460.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!93ZZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d7e3197-469d-4e18-86f1-3033d8bd4a27_459x460.jpeg" width="459" height="460" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7d7e3197-469d-4e18-86f1-3033d8bd4a27_459x460.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:460,&quot;width&quot;:459,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:52413,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:&quot;https://github.com/milla-jovovich&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/193848267?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d7e3197-469d-4e18-86f1-3033d8bd4a27_459x460.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!93ZZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d7e3197-469d-4e18-86f1-3033d8bd4a27_459x460.jpeg 424w, https://substackcdn.com/image/fetch/$s_!93ZZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d7e3197-469d-4e18-86f1-3033d8bd4a27_459x460.jpeg 848w, https://substackcdn.com/image/fetch/$s_!93ZZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d7e3197-469d-4e18-86f1-3033d8bd4a27_459x460.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!93ZZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d7e3197-469d-4e18-86f1-3033d8bd4a27_459x460.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>When Audits Meet AI-Generated Code</h2><p>That&#8217;s when things got interesting.</p><p>As someone who works extensively with AI memory systems &#8212; I maintain <a href="https://github.com/bobmatnyc/kuzu-memory">KuzuMemory</a>, a graph-based memory framework &#8212; I was naturally curious about MemPalace&#8217;s benchmark methodology. The claimed 96.6% recall rate was extraordinary, especially for a system running entirely locally.</p><p>So I dove in.</p><p>What I found were several methodological issues that fundamentally undermined the headline numbers. The benchmark adapter was discarding assistant turns in conversation history, causing systematic under-recall on certain question types. More critically, the benchmark wasn&#8217;t actually testing MemPalace&#8217;s core functionality &#8212; it was primarily testing ChromaDB&#8217;s raw vector search capabilities.</p><p>I filed <a href="https://github.com/milla-jovovich/mempalace/issues/242">Issue #242</a> documenting the assistant turn bug, and <a href="https://github.com/milla-jovovich/mempalace/issues/214">Issue #214</a> showing that the 96.6% score was essentially a ChromaDB score, not a MemPalace score.</p><p>Mila&#8217;s response was immediate and technically sophisticated:</p><blockquote><p>&#8220;Hey <a href="https://github.com/bobmatnyc">@bobmatnyc</a> &#8212; I&#8217;ve taken a look and ran it through CLI. This is a real bug and it&#8217;s urgent. You caught that <code>benchmarks/longmemeval_bench.py</code> at lines 189-190 builds each session&#8217;s indexed document by concatenating <em>only</em> <code>user</code> role turns... <strong>Fix priority: this must land before any public benchmark re-run.</strong>&#8220;</p></blockquote><p>She didn&#8217;t deflect or dismiss. She debugged the issue herself, identified the exact lines of code causing the problem, explained the downstream impact on other benchmarks, and outlined a detailed fix plan including regression tests.</p><p>This wasn&#8217;t PR speak. This was an AI-assisted developer engaging seriously with technical criticism.</p><h2>The Democratization Shift</h2><p>This interaction crystallized something profound about our current moment in software development.</p><p>We&#8217;re witnessing the emergence of a new class of builders: technically-minded individuals who understand software conceptually but may not have traditional coding backgrounds. AI-assisted development tools like Claude Code, GitHub Copilot, and Cursor have lowered the implementation barrier to the point where vision and domain expertise matter more than syntax mastery.</p><p>Mila Jovovich exemplifies this shift perfectly. Without formal technical education (she left school in 7th grade for modeling), she spent months intensively learning AI-assisted development through Claude Code starting in late 2025. She understood the conceptual framework of memory palaces deeply enough to architect a sophisticated system. Her collaboration with Ben Sigman &#8212; CEO of Bitcoin lending platform Libre Labs, who provided the engineering expertise while she drove architectural vision &#8212; represents a new model of software development where domain knowledge and AI tool fluency can substitute for traditional programming backgrounds.</p><p>The fact that a movie star can release a technically competent, widely-adopted memory framework isn&#8217;t a commentary on coding getting easier (though it has). It&#8217;s about software development becoming more accessible to domain experts and visionaries who previously couldn&#8217;t bridge the implementation gap.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!f3FK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b861799-f605-4e03-bec5-f88ec1387a42_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!f3FK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b861799-f605-4e03-bec5-f88ec1387a42_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!f3FK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b861799-f605-4e03-bec5-f88ec1387a42_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!f3FK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b861799-f605-4e03-bec5-f88ec1387a42_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!f3FK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b861799-f605-4e03-bec5-f88ec1387a42_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!f3FK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b861799-f605-4e03-bec5-f88ec1387a42_1024x1024.png" width="1024" height="1024" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8b861799-f605-4e03-bec5-f88ec1387a42_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1971642,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/193848267?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b861799-f605-4e03-bec5-f88ec1387a42_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!f3FK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b861799-f605-4e03-bec5-f88ec1387a42_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!f3FK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b861799-f605-4e03-bec5-f88ec1387a42_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!f3FK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b861799-f605-4e03-bec5-f88ec1387a42_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!f3FK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b861799-f605-4e03-bec5-f88ec1387a42_1024x1024.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>What MemPalace Gets Right</h2><p>Despite the benchmark issues I uncovered, MemPalace demonstrates genuine technical sophistication. The memory palace metaphor isn&#8217;t just marketing &#8212; it&#8217;s a thoughtful architectural choice that makes AI memory systems more intuitive and debuggable.</p><p>The system includes elegant features like per-agent memory &#8220;wings&#8221; that prevent cross-contamination between different AI assistants. The Claude Code integration hooks are well-designed, automatically triggering memory saves at logical conversation boundaries. The MCP implementation is clean and follows established patterns.</p><p>Most importantly, the project tackles a real problem: most AI memory systems are either expensive cloud services or complex local installations. MemPalace provides a middle path that&#8217;s both free and relatively easy to deploy.</p><p>Through my testing and integration experiments, I learned techniques that improved my own KuzuMemory system. The competitive analysis forced me to think more carefully about memory organization patterns and retrieval strategies. This kind of cross-pollination benefits the entire ecosystem.</p><h2>The Validation Requirement</h2><p>But the benchmark controversy highlights a crucial point: democratized software development still requires traditional validation methods.</p><p>AI-assisted coding tools excel at implementation but can perpetuate subtle conceptual errors throughout a codebase. The MemPalace benchmark issues weren&#8217;t obvious bugs &#8212; they were methodological problems that required domain expertise to identify.</p><p>This creates an interesting dynamic: AI tools enable rapid development by non-traditional developers, but peer review by experienced practitioners becomes even more critical. The community response to MemPalace&#8217;s inflated benchmarks wasn&#8217;t hostile &#8212; it was collaborative debugging at scale.</p><p>Mila&#8217;s willingness to engage directly with technical criticism and implement fixes demonstrates the right approach. The democratization of software development doesn&#8217;t eliminate the need for technical rigor; it distributes that rigor across a broader community.</p><h2>The Harness Thesis Validated</h2><p>This story perfectly validates what I call the &#8220;harness thesis&#8221; &#8212; that we&#8217;ve entered an era where AI tool ecosystems matter more than underlying model capabilities.</p><p>MemPalace succeeded not because Mila wrote perfect code from scratch, but because she effectively orchestrated Claude Code to implement her vision. The system&#8217;s value comes from its architectural choices, integration quality, and user experience &#8212; not from novel algorithmic breakthroughs.</p><p>Similarly, my ability to audit and improve the system came not from superior coding skills, but from having developed complementary expertise with memory systems and benchmark methodology. The collaboration that emerged &#8212; distributed across GitHub issues, with contributors from multiple backgrounds &#8212; represents the new model of software development.</p><p>We&#8217;re not just building different software; we&#8217;re building software differently.</p><h2>Meeting Mila Through Code</h2><p>In the end, I did meet Mila Jovovich &#8212; through our AI Agents, lines of Python code, GitHub issues, and technical discussions about memory retrieval algorithms, mediated by our respective Claude Code workflows. Not the meeting I would have predicted, but somehow more meaningful than a typical celebrity encounter.</p><p>She embodies a new archetype: the technical visionary who uses AI tools to implement sophisticated ideas without traditional programming backgrounds. Her willingness to engage with criticism and continuously improve the system demonstrates the collaborative spirit that makes this new era of development possible.</p><p>The future of software isn&#8217;t just about better AI models or more powerful tools. It&#8217;s about enabling more people with domain expertise and creative vision to participate in building the systems that shape our digital world.</p><p>And sometimes, that means meeting your childhood movie star idol in a GitHub issue thread, debugging memory palace algorithms together.</p><div><hr></div><p><em>Bob Matsuoka is CTO of <a href="https://www.duettocloud.com/">Duetto</a> and writes about AI-powered engineering at <a href="https://hyperdev.substack.com/">HyperDev</a>.</em></p><p><strong>Related reading:</strong></p><ul><li><p><a href="https://hyperdev.matsuoka.com/its-the-harness-stupid">It&#8217;s The Harness Stupid</a> &#8212; Why AI tool ecosystems matter more than model capabilities</p></li><li><p><a href="https://aipowerranking.com/">AI Power Ranking</a> &#8212; Tool comparisons and benchmarks for AI practitioners</p></li><li><p><a href="https://www.linkedin.com/newsletters/ai-power-ranking-7345782916301418496/">LinkedIn Newsletter</a> &#8212; Strategic AI insights for CTOs and engineering leaders</p></li></ul><p><strong>Referenced Links:</strong></p><ul><li><p><a href="https://github.com/milla-jovovich/mempalace">MemPalace GitHub Repository</a></p></li><li><p><a href="https://github.com/bobmatnyc/kuzu-memory">KuzuMemory GitHub Repository</a></p></li><li><p><a href="https://github.com/milla-jovovich/mempalace/issues/242">Issue #242: Benchmark adapter bug</a></p></li><li><p><a href="https://github.com/milla-jovovich/mempalace/issues/214">Issue #214: ChromaDB vs MemPalace scoring</a></p></li></ul>]]></content:encoded></item><item><title><![CDATA[The Software Factory is the Next Big Challenge]]></title><description><![CDATA[Many enterprises are rolling their own]]></description><link>https://hyperdev.matsuoka.com/p/the-software-factory-is-the-next</link><guid isPermaLink="false">https://hyperdev.matsuoka.com/p/the-software-factory-is-the-next</guid><dc:creator><![CDATA[Robert Matsuoka]]></dc:creator><pubDate>Wed, 08 Apr 2026 12:30:16 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!tsf8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58714619-d828-47fe-aa27-7777b26b3b11_1024x825.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!tsf8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58714619-d828-47fe-aa27-7777b26b3b11_1024x825.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!tsf8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58714619-d828-47fe-aa27-7777b26b3b11_1024x825.png 424w, https://substackcdn.com/image/fetch/$s_!tsf8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58714619-d828-47fe-aa27-7777b26b3b11_1024x825.png 848w, https://substackcdn.com/image/fetch/$s_!tsf8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58714619-d828-47fe-aa27-7777b26b3b11_1024x825.png 1272w, https://substackcdn.com/image/fetch/$s_!tsf8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58714619-d828-47fe-aa27-7777b26b3b11_1024x825.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!tsf8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58714619-d828-47fe-aa27-7777b26b3b11_1024x825.png" width="1024" height="825" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/58714619-d828-47fe-aa27-7777b26b3b11_1024x825.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:825,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1823642,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/193118243?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90d9b63c-9a04-4eb4-895e-c659c19b1b3b_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!tsf8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58714619-d828-47fe-aa27-7777b26b3b11_1024x825.png 424w, https://substackcdn.com/image/fetch/$s_!tsf8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58714619-d828-47fe-aa27-7777b26b3b11_1024x825.png 848w, https://substackcdn.com/image/fetch/$s_!tsf8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58714619-d828-47fe-aa27-7777b26b3b11_1024x825.png 1272w, https://substackcdn.com/image/fetch/$s_!tsf8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58714619-d828-47fe-aa27-7777b26b3b11_1024x825.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The Software Factory is the future of software development</figcaption></figure></div><p>Stripe engineers send Slack messages that automatically become production code. Not suggestions. Not drafts. Production code merged into their main branch, supporting over a trillion dollars in annual payment processing.</p><p>Their <a href="https://stripe.dev/blog/minions-stripes-one-shot-end-to-end-coding-agents">&#8220;Minions&#8221; system</a> generates <a href="https://www.infoq.com/news/2026/03/stripe-autonomous-coding-agents/">1,300 pull requests per week</a> with zero human-written code. Fire-and-forget automation from conversation to deployment. While the rest of us debate whether AI can write good code, Stripe has built a software factory that produces enterprise-grade applications at scale.</p><p>The software factory isn&#8217;t a future concept. It&#8217;s a present reality, and it represents the next fundamental challenge for engineering organizations.</p><h2>What We&#8217;re Building at Duetto</h2><p>I&#8217;ve been thinking about this a lot lately. At Duetto, we&#8217;re exploring what a software factory could look like for hospitality technology. Not because we want to eliminate developers, but because we&#8217;re hitting the limits of traditional development approaches for domain-specific applications.</p><p>Our challenge isn&#8217;t just writing code&#8212;it&#8217;s translating complex hotel revenue management requirements into software that works reliably across thousands of properties with different systems, data formats, and business rules. The cognitive load of keeping all these variations in mind while building features is becoming unsustainable.</p><p>What if we could describe what we need in something like our APEX specifications, and have the system generate not just code, but complete deployments? Kubernetes instances running Claude Code agents, database migrations, monitoring setup, the whole stack configured for that specific use case.</p><p>The goal isn&#8217;t replacing our engineering team. Our developers should be solving revenue optimization algorithms and building domain-specific integrations, not configuring YAML files for the hundredth deployment variation.</p><h2>The Stripe Blueprint</h2><p>Stripe&#8217;s Minions reveal what a production software factory actually looks like when you strip away the hype and focus on what works.</p><p><strong>Five-Layer Pipeline</strong>: Their system transforms Slack messages into production-ready pull requests through a structured pipeline. Not magic&#8212;engineering discipline applied to automation.</p><p><strong>Sandboxed Execution</strong>: Every agent runs in isolated containers with codebase checkouts. They can&#8217;t access production systems, can&#8217;t cause cascading failures, can&#8217;t break things outside their designated scope. <a href="https://www.anup.io/stripes-coding-agents-the-walls-matter-more-than-the-model/">The walls matter more than the model</a>.</p><p><strong>Surgical Tool Selection</strong>: Their <a href="https://www.mindstudio.ai/blog/what-is-ai-agent-harness-stripe-minions">Model Context Protocol</a> provides access to hundreds of internal tools, but agents get intelligently prefetched access to only the ~15 tools relevant to their specific task. Not everything available&#8212;the right things available.</p><p><strong>One-Shot Optimization</strong>: Instead of conversational back-and-forth, their agents are <a href="https://www.sitepoint.com/stripe-minions-architecture-explained/">optimized for well-defined work</a> that completes in a single execution. Better latency, lower costs, more predictable outcomes.</p><p>The results speak for themselves: 1,300 PRs weekly, zero human-written code in merged changes, supporting their entire payment infrastructure. This isn&#8217;t a pilot program. This is their production development workflow.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!18qG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F405e3b24-3997-4ecc-89e7-bcd78eb0c218_1024x674.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!18qG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F405e3b24-3997-4ecc-89e7-bcd78eb0c218_1024x674.png 424w, https://substackcdn.com/image/fetch/$s_!18qG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F405e3b24-3997-4ecc-89e7-bcd78eb0c218_1024x674.png 848w, https://substackcdn.com/image/fetch/$s_!18qG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F405e3b24-3997-4ecc-89e7-bcd78eb0c218_1024x674.png 1272w, https://substackcdn.com/image/fetch/$s_!18qG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F405e3b24-3997-4ecc-89e7-bcd78eb0c218_1024x674.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!18qG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F405e3b24-3997-4ecc-89e7-bcd78eb0c218_1024x674.png" width="1024" height="674" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/405e3b24-3997-4ecc-89e7-bcd78eb0c218_1024x674.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:674,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1235022,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/193118243?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1885512e-da12-485c-9bc1-eaa9bd2512e6_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!18qG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F405e3b24-3997-4ecc-89e7-bcd78eb0c218_1024x674.png 424w, https://substackcdn.com/image/fetch/$s_!18qG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F405e3b24-3997-4ecc-89e7-bcd78eb0c218_1024x674.png 848w, https://substackcdn.com/image/fetch/$s_!18qG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F405e3b24-3997-4ecc-89e7-bcd78eb0c218_1024x674.png 1272w, https://substackcdn.com/image/fetch/$s_!18qG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F405e3b24-3997-4ecc-89e7-bcd78eb0c218_1024x674.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>The Broader Software Factory Landscape</h2><p>Stripe isn&#8217;t alone in building these systems, just the most public about their approach.</p><p>Netflix has their federated developer console integrating dozens of tools into a single unified experience. <a href="https://engineering.atspotify.com/2024/4/supercharged-developer-portals">Spotify&#8217;s Backstage</a> holds 89% market share among internal developer platforms, reducing time-to-tenth-pull-request by 55% for new developers.</p><p>The open source ecosystem is catching up quickly. <a href="https://openhands.dev/">OpenHands</a> provides a model-agnostic platform for cloud coding agents with $18.8M in Series A funding. <a href="https://www.turing.com/blog/top-5-ai-code-generation-tools-in-2024">CodeT5</a> handles multi-language code generation. <a href="https://github.com/features/copilot/enterprise">GitHub Copilot Enterprise</a> is expanding beyond code completion into full workflow automation.</p><p>Major cloud providers are also building comprehensive platforms. Microsoft&#8217;s GitHub Copilot Workspace, Google&#8217;s Duet AI for developers, and Amazon&#8217;s Q Developer all represent enterprise-grade attempts at software factory capabilities.</p><p>According to Gartner, <a href="https://calmops.com/devops/internal-developer-platform-idp-2026-complete-guide/">80% of large engineering organizations</a> now have dedicated platform teams. The question isn&#8217;t whether software factories are coming&#8212;it&#8217;s whether your organization will build one or buy one.</p><h2>What a Proper Software Factory Requires</h2><p>Building a software factory isn&#8217;t just about connecting AI tools to deployment pipelines. Based on what&#8217;s working at Stripe and emerging patterns across the industry, here are the essential components:</p><h3>Artifact Response Systems</h3><p>Your factory needs to respond to structured specifications and generate complete deployments. At Duetto, this might mean taking an APEX specification for a new revenue optimization feature and producing:</p><ul><li><p>Kubernetes deployment configurations</p></li><li><p>Database migration scripts</p></li><li><p>Monitoring and alerting setup</p></li><li><p>Load testing scenarios</p></li><li><p>Documentation</p></li></ul><p>The system should handle the entire deployment lifecycle from specification to running production service, not just generate code that someone has to manually deploy.</p><h3>Strategic Human Review Checkpoints</h3><p>Notice I said strategic, not comprehensive. Stripe&#8217;s fire-and-forget model works because they&#8217;ve identified the specific points where human judgment adds value without blocking automation.</p><p>For enterprise applications, you need checkpoints at:</p><ul><li><p><strong>Specification validation</strong>: Do the requirements make business sense?</p></li><li><p><strong>Security review</strong>: Are access patterns and data handling appropriate?</p></li><li><p><strong>Integration testing</strong>: Does this work with existing systems?</p></li><li><p><strong>Production readiness</strong>: Are monitoring and rollback capabilities sufficient?</p></li></ul><p>The key is making these gates fast and decisive, not bureaucratic approval processes that defeat the purpose of automation.</p><h3>Scaffolding for Error Detection</h3><p>Your factory will produce broken code. That&#8217;s not a bug&#8212;that&#8217;s reality. The difference between a prototype and a production system is sophisticated error detection and recovery.</p><p>This means:</p><ul><li><p><strong>Isolated execution environments</strong> where failures can&#8217;t cause broader damage</p></li><li><p><strong>Automated testing and iteration</strong> when initial attempts fail</p></li><li><p><strong>Multi-layer validation</strong> before anything reaches production</p></li><li><p><strong>Comprehensive rollback capabilities</strong> for when something gets through anyway</p></li></ul><p>Stripe&#8217;s sandbox architecture is brilliant because it lets agents fail safely while learning from those failures to improve future attempts.</p><h3>Success Criteria Parameters</h3><p>Your factory needs to know what success looks like for each type of work. Not just &#8220;the code compiles,&#8221; but measurable business outcomes.</p><p>For a hospitality feature, success might mean:</p><ul><li><p>Performance benchmarks met under load</p></li><li><p>Integration tests pass with five different PMS systems</p></li><li><p>Revenue impact measurable within 30 days</p></li><li><p>Zero customer-facing errors in the first week</p></li></ul><p>Define these criteria upfront, build them into your validation pipeline, and let the factory optimize for actual business value rather than technical metrics alone.</p><h3>Cost Tracking and Optimization</h3><p>AI-powered development isn&#8217;t free. You need visibility into the computational costs, tool usage, and human review time for each generated system.</p><p>Stripe optimizes for this explicitly&#8212;their one-shot agents cost less than conversational approaches, their surgical tool selection reduces context costs, their automated testing prevents expensive human debugging cycles.</p><p>Track these metrics from day one. The difference between a cost-effective software factory and an expensive experiment is usually found in the operational details.</p><h3>Deployment Models</h3><p>Your factory needs sophisticated understanding of how to deploy different types of applications. Golden Path workflows that codify best practices, environment promotion strategies that reduce risk, and rollback procedures that restore service quickly when things go wrong.</p><p>This is where domain expertise becomes critical. A generic software factory might know how to deploy a web service, but does it understand the specific requirements for hospitality payment processing, guest data privacy, and integration with property management systems?</p><h2>The Duetto Context</h2><p>At Duetto, we&#8217;re thinking about how a software factory could handle the complexity of hospitality technology. Our domain has unique challenges:</p><p><strong>Data Integration Complexity</strong>: Every hotel uses different systems with different data formats. A software factory needs to understand these variations and generate appropriate integration code.</p><p><strong>Regulatory Requirements</strong>: Guest privacy, payment processing, accessibility compliance. The factory needs to embed these requirements into everything it produces.</p><p><strong>Performance Characteristics</strong>: Revenue management systems need to process pricing updates in near real-time across thousands of rooms and rate plans. The factory needs to optimize for these specific performance patterns.</p><p><strong>Operational Constraints</strong>: Hotels can&#8217;t afford downtime during peak booking periods. Deployment strategies need to account for hospitality business cycles.</p><p>We&#8217;re not trying to build a general-purpose software factory. We&#8217;re exploring how to build one that deeply understands our domain and can produce applications that work reliably in hospitality environments.</p><h2>The Reality Check</h2><p>Building a software factory is hard. Not because the technology doesn&#8217;t exist&#8212;Stripe proves it does&#8212;but because the organizational challenges are substantial.</p><p><strong>ROI Demonstration</strong>: You need to show measurable productivity improvements and cost savings. &#8220;The AI is impressive&#8221; isn&#8217;t sufficient justification for the investment required.</p><p><strong>Security and Compliance</strong>: Automated code generation that touches customer data or payment systems requires additional security layers and audit capabilities.</p><p><strong>Developer Workflow Changes</strong>: Your engineering team needs to learn new ways of working. Some will embrace it, others will resist. Change management is as important as the technical implementation.</p><p><strong>Quality Assurance Evolution</strong>: Your QA processes need to evolve from testing human-written code to validating AI-generated systems. Different failure modes, different testing strategies.</p><p><strong>Integration Complexity</strong>: Your factory needs to work with existing systems, databases, APIs, and workflows. The harder the integration challenge, the longer the implementation timeline.</p><p>These aren&#8217;t reasons to avoid building a software factory. They&#8217;re reasons to approach the project with realistic expectations and proper preparation.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bVFQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5931edca-6b79-4588-bae5-ab6a883a7b66_1024x678.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bVFQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5931edca-6b79-4588-bae5-ab6a883a7b66_1024x678.png 424w, https://substackcdn.com/image/fetch/$s_!bVFQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5931edca-6b79-4588-bae5-ab6a883a7b66_1024x678.png 848w, https://substackcdn.com/image/fetch/$s_!bVFQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5931edca-6b79-4588-bae5-ab6a883a7b66_1024x678.png 1272w, https://substackcdn.com/image/fetch/$s_!bVFQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5931edca-6b79-4588-bae5-ab6a883a7b66_1024x678.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bVFQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5931edca-6b79-4588-bae5-ab6a883a7b66_1024x678.png" width="1024" height="678" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5931edca-6b79-4588-bae5-ab6a883a7b66_1024x678.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:678,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1446253,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/193118243?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71a61e37-d209-4927-8ab4-44a35ed22bb9_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bVFQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5931edca-6b79-4588-bae5-ab6a883a7b66_1024x678.png 424w, https://substackcdn.com/image/fetch/$s_!bVFQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5931edca-6b79-4588-bae5-ab6a883a7b66_1024x678.png 848w, https://substackcdn.com/image/fetch/$s_!bVFQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5931edca-6b79-4588-bae5-ab6a883a7b66_1024x678.png 1272w, https://substackcdn.com/image/fetch/$s_!bVFQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5931edca-6b79-4588-bae5-ab6a883a7b66_1024x678.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Looking Forward</h2><p>The trajectory is clear. <a href="https://leanopstech.com/blog/platform-engineering-in-2025-the-future-of-developer-productivity/">Software factories are moving from experimental to mainstream</a>, with proven systems operating at enterprise scale and standardized architecture patterns emerging across the industry.</p><p>The question for engineering leaders isn&#8217;t whether this transformation will happen. It&#8217;s whether your organization will be an early adopter that shapes how software factories work in your domain, or a later adopter that implements patterns developed by others.</p><p>At Duetto, we&#8217;re betting on being early. Not because we want to be on the cutting edge for its own sake, but because the companies that figure out domain-specific software factories first will have a significant competitive advantage in application development speed and quality.</p><p>The software factory represents the next evolution of platform engineering. The organizations that master it will build better software faster than those that don&#8217;t.</p><p>The challenge isn&#8217;t technical anymore. It&#8217;s organizational, strategic, and operational.</p><p>The question is: Are you ready to build one?</p><div><hr></div><p><em>About this analysis: This piece draws from comprehensive research on production software factory implementations, including detailed analysis of Stripe&#8217;s Minions architecture, enterprise platform engineering initiatives, and emerging open source solutions. The author is exploring software factory applications for hospitality technology at Duetto.</em></p><p><em>About the author: Bob Matsuoka is Chief Technology Officer at Duetto and creator of Claude MPM (Multi-agent Project Manager). He has implemented AI-assisted development workflows across enterprise engineering teams and writes about the practical realities of AI integration in software development at <a href="https://hyperdev.substack.com/">HyperDev</a>.</em></p>]]></content:encoded></item><item><title><![CDATA[Is The Claude Code Team Moving Too Quickly?]]></title><description><![CDATA[What To Think of the Source Leak]]></description><link>https://hyperdev.matsuoka.com/p/is-the-claude-code-team-moving-too</link><guid isPermaLink="false">https://hyperdev.matsuoka.com/p/is-the-claude-code-team-moving-too</guid><dc:creator><![CDATA[Robert Matsuoka]]></dc:creator><pubDate>Mon, 06 Apr 2026 12:30:41 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!xLBj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd07636e2-b451-4839-853e-91fc8ca0b4b3_1024x595.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xLBj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd07636e2-b451-4839-853e-91fc8ca0b4b3_1024x595.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xLBj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd07636e2-b451-4839-853e-91fc8ca0b4b3_1024x595.png 424w, https://substackcdn.com/image/fetch/$s_!xLBj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd07636e2-b451-4839-853e-91fc8ca0b4b3_1024x595.png 848w, https://substackcdn.com/image/fetch/$s_!xLBj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd07636e2-b451-4839-853e-91fc8ca0b4b3_1024x595.png 1272w, https://substackcdn.com/image/fetch/$s_!xLBj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd07636e2-b451-4839-853e-91fc8ca0b4b3_1024x595.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xLBj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd07636e2-b451-4839-853e-91fc8ca0b4b3_1024x595.png" width="1024" height="595" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d07636e2-b451-4839-853e-91fc8ca0b4b3_1024x595.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:595,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1151803,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/193101232?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d9c1f20-243f-4cf5-b1c1-b708e63ffae8_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xLBj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd07636e2-b451-4839-853e-91fc8ca0b4b3_1024x595.png 424w, https://substackcdn.com/image/fetch/$s_!xLBj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd07636e2-b451-4839-853e-91fc8ca0b4b3_1024x595.png 848w, https://substackcdn.com/image/fetch/$s_!xLBj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd07636e2-b451-4839-853e-91fc8ca0b4b3_1024x595.png 1272w, https://substackcdn.com/image/fetch/$s_!xLBj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd07636e2-b451-4839-853e-91fc8ca0b4b3_1024x595.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>On March 31, 2026, Anthropic accidentally shipped their entire Claude Code source&#8212;512,000 lines of TypeScript&#8212;in an npm package. What followed was perhaps the most intense technical autopsy in AI history. The verdict? Mixed, and revealing.</p><p>The criticism has been swift and pointed. A 5,594-line file with a single 3,167-line function sporting 12 levels of nesting. Regex-based frustration detection looking for &#8220;wtf&#8221; and &#8220;shit&#8221;. A quarter million wasted API calls per day from a three-line bug. As one critic put it: &#8220;A multi-billion-dollar AI company is detecting user frustration with a regex.&#8221;</p><p>But before we pile on, we need to ask: <strong>What does &#8220;good code&#8221; even mean when you&#8217;re building client-side LLM applications?</strong></p><h2>The Unprecedented Challenge</h2><p>Claude Code isn&#8217;t your typical software. It&#8217;s a client-side application that orchestrates conversations with large language models, manages context across sessions, and attempts to maintain coherent state while working with fundamentally non-deterministic systems.</p><p>This creates problems that traditional software engineering practices weren&#8217;t designed for:</p><ul><li><p><strong>Context management</strong>: Handling arbitrarily long conversations that exceed model limits</p></li><li><p><strong>Failure recovery</strong>: When your core computation is a 20% failure-rate API call</p></li><li><p><strong>State synchronization</strong>: Keeping UI, conversation history, and model context aligned</p></li><li><p><strong>Dynamic adaptation</strong>: Code that needs to adapt to changing model capabilities</p></li></ul><p>The leaked source reveals sophisticated solutions to these problems: a three-layer memory architecture, anti-distillation mechanisms, dual parser systems for safety. The engineering is <s>genuinely</s> impressive, even if the implementation is sometimes ugly.</p><h2>The Meta-Problem: AI Writing AI</h2><p>Claude Code was partially written by Claude Code. This represents the first documented case of a large-scale AI tool generating significant portions of its own source code&#8212;not just incremental improvement, but a categorical change in development methodology that creates unprecedented quality control challenges when AI-generated code scales beyond human review capacity.</p><p>When AI generates code at scales that exceed human review capacity, traditional quality control breaks down. That 3,167-line function? Probably not written by a human. The 12 levels of nesting? Algorithmic patterns, not human design choices.</p><p><strong>This is the real story</strong>: We&#8217;re witnessing the first major autopsy of self-bootstrapping AI tooling.</p><h2>Deterministic vs. LLM Code: Different Standards Apply</h2><p>I&#8217;ve been thinking about this distinction a lot lately in my work with <a href="https://github.com/bobmatnyc/claude-mpm">Claude MPM</a>, an open-source multi-agent code generation framework built on Claude Code that coordinates specialized AI agents for software development workflows. When you&#8217;re building traditional, deterministic software, all the usual rules apply. Clean functions, clear abstractions, maintainable architecture. Use your normal code analysis tools.</p><p>But when you&#8217;re building LLM-integrated systems, the rules change:</p><ol><li><p><strong>Failure is the default</strong>: Your core operations fail 20% of the time</p></li><li><p><strong>Context is expensive</strong>: Every token counts toward limits</p></li><li><p><strong>Behavior is emergent</strong>: The system does things you didn&#8217;t explicitly program</p></li><li><p><strong>Adaptation is constant</strong>: Model capabilities change monthly</p></li></ol><p>In this world, a 5,594-line file might be ugly, but if it successfully manages complex failure recovery across multiple conversation threads, it might also be <em>correct</em>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!aIuf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd26aa417-4783-42d6-8105-488131dfe518_1024x790.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!aIuf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd26aa417-4783-42d6-8105-488131dfe518_1024x790.png 424w, https://substackcdn.com/image/fetch/$s_!aIuf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd26aa417-4783-42d6-8105-488131dfe518_1024x790.png 848w, https://substackcdn.com/image/fetch/$s_!aIuf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd26aa417-4783-42d6-8105-488131dfe518_1024x790.png 1272w, https://substackcdn.com/image/fetch/$s_!aIuf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd26aa417-4783-42d6-8105-488131dfe518_1024x790.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!aIuf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd26aa417-4783-42d6-8105-488131dfe518_1024x790.png" width="1024" height="790" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d26aa417-4783-42d6-8105-488131dfe518_1024x790.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:790,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1466836,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/193101232?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd33fb826-655a-4d8b-87cc-bfade8984326_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!aIuf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd26aa417-4783-42d6-8105-488131dfe518_1024x790.png 424w, https://substackcdn.com/image/fetch/$s_!aIuf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd26aa417-4783-42d6-8105-488131dfe518_1024x790.png 848w, https://substackcdn.com/image/fetch/$s_!aIuf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd26aa417-4783-42d6-8105-488131dfe518_1024x790.png 1272w, https://substackcdn.com/image/fetch/$s_!aIuf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd26aa417-4783-42d6-8105-488131dfe518_1024x790.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>The Code Analysis Checkpoint Strategy</h2><p>This is where I&#8217;ve found success with my recent updates to the code analyzer in Claude MPM. The analyzer utilizes <a href="https://github.com/modelcontextprotocol/servers/tree/main/src/mcp-vector-search">mcp-vector-search</a> for comprehensive codebase analysis, providing AST-based semantic search, full-text search capabilities, and knowledge graph construction for architectural pattern detection. Instead of trying to prevent AI from generating messy code (impossible), I focus on <strong>regular refactoring and analysis checkpoints</strong>.</p><p>The analyzer has gotten very good at catching two specific issues:</p><ol><li><p><strong>Drift</strong>: When AI-generated code slowly diverges from intended architecture</p></li><li><p><strong>Bloat</strong>: When generated solutions become unnecessarily complex over time</p></li></ol><p>I make a point to run these checkpoints regularly, treating them as essential maintenance rather than optional cleanup. It&#8217;s like running <code>cargo clippy</code> or <code>eslint</code>, but for AI-generated architectural decisions.</p><p>The key insight: <strong>AI code needs different kinds of maintenance than human code</strong>.</p><h2>Outcome-Based Generation: Does It Work?</h2><p>Here&#8217;s my perhaps controversial take: If Claude Code successfully helps developers ship better software faster, then the messy internals might not matter as much as we think.</p><p>The leaked code reveals a system that:</p><ul><li><p>Handles millions of conversations per day</p></li><li><p>Maintains context across arbitrarily long sessions</p></li><li><p>Provides sophisticated memory management</p></li><li><p>Implements multiple safety layers</p></li><li><p>Delivers a $2.5 billion ARR product experience</p></li></ul><p>Is the implementation elegant? No. Does it work? Apparently, yes. Because we can observe/measure what it&#8217;s building completely independently of what built it.</p><p>This doesn&#8217;t excuse basic engineering failures (that <code>.npmignore</code> mistake was embarrassing). But it does suggest we need new frameworks for evaluating AI-generated systems.</p><h2>The Scaffolding Solution</h2><p>Rather than trying to make AI generate perfect code, we can scaffold around the inevitable messiness:</p><p><strong>Automated refactoring checkpoints</strong>: Regular cleanup of AI-generated bloat<br><strong>Architectural constraints</strong>: Guard rails that prevent the worst patterns<br><strong>Outcome validation</strong>: Testing that focuses on behavior over implementation<br><strong>Human oversight</strong>: Strategic points where humans validate AI decisions</p><p>This is the approach I&#8217;ve been taking with Claude MPM, and it&#8217;s proven remarkably effective. Let the AI generate messy-but-functional code, then use tooling to clean it up systematically.</p><h2>What This Means for the Industry</h2><p>The Claude Code leak represents a watershed moment. It&#8217;s our first real look at what happens when AI tools build themselves at scale.</p><p>The criticism is valid&#8212;basic engineering discipline matters, even in AI systems. A missing <code>.npmignore</code> file is inexcusable for a billion-dollar product.</p><p>But the deeper question is whether we&#8217;re applying the right standards. Traditional code quality metrics may not capture what actually matters for AI-integrated systems.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NrDl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87df3326-02c3-4838-b302-ab23ab5d5e19_1024x825.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NrDl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87df3326-02c3-4838-b302-ab23ab5d5e19_1024x825.png 424w, https://substackcdn.com/image/fetch/$s_!NrDl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87df3326-02c3-4838-b302-ab23ab5d5e19_1024x825.png 848w, https://substackcdn.com/image/fetch/$s_!NrDl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87df3326-02c3-4838-b302-ab23ab5d5e19_1024x825.png 1272w, https://substackcdn.com/image/fetch/$s_!NrDl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87df3326-02c3-4838-b302-ab23ab5d5e19_1024x825.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NrDl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87df3326-02c3-4838-b302-ab23ab5d5e19_1024x825.png" width="1024" height="825" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/87df3326-02c3-4838-b302-ab23ab5d5e19_1024x825.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:825,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1937908,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/193101232?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1273dbb0-a196-49bc-b0ca-17f437545fd8_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!NrDl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87df3326-02c3-4838-b302-ab23ab5d5e19_1024x825.png 424w, https://substackcdn.com/image/fetch/$s_!NrDl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87df3326-02c3-4838-b302-ab23ab5d5e19_1024x825.png 848w, https://substackcdn.com/image/fetch/$s_!NrDl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87df3326-02c3-4838-b302-ab23ab5d5e19_1024x825.png 1272w, https://substackcdn.com/image/fetch/$s_!NrDl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87df3326-02c3-4838-b302-ab23ab5d5e19_1024x825.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Moving Forward</h2><p>Anthropic probably <em>is</em> moving too quickly in some ways. The leak revealed security vulnerabilities, competitive intelligence losses, and quality control failures that suggest inadequate human oversight.</p><p>But they&#8217;re also pioneering entirely new categories of software. The problems they&#8217;re solving&#8212;context management, failure recovery, human-AI collaboration&#8212;don&#8217;t have established best practices yet.</p><p>The real lesson isn&#8217;t that AI-generated code is inherently bad. It&#8217;s that we need new practices for building, reviewing, and maintaining systems that exceed human comprehension scales.</p><p><strong>The question isn&#8217;t whether Claude Code&#8217;s internals are messy. It&#8217;s whether we can build better scaffolding around AI-generated systems to catch the problems that matter while accepting the messiness we can&#8217;t avoid.</strong></p><p>The Claude Code team probably needs to slow down on the basics&#8212;security, testing, deployment hygiene. But they&#8217;re moving fast on problems that genuinely require speed to solve before competitors do.</p><p>That&#8217;s a nuanced position in an industry that loves simple takes. But nuance is what the moment requires.</p><p><em>What do you think? Are we being too hard on AI-generated code, or not hard enough? Share your thoughts in the comments.</em></p><div><hr></div><p><em>About this analysis: This piece draws from extensive technical analysis of the March 31, 2026 Claude Code source leak, including community responses, security assessments, and business impact analysis. The author maintains active development projects using AI-assisted coding tools and has direct experience with the challenges discussed.</em></p><p><em>About the author: Bob Matsuoka is Chief Technology Officer at <a href="https://www.duettocloud.com/">Duetto</a> and creator of Claude MPM (Multi-agent Project Manager). He has implemented AI-assisted development workflows across enterprise engineering teams and writes about the practical realities of AI integration in software development at <a href="https://hyperdev.substack.com">HyperDev</a>.</em></p>]]></content:encoded></item><item><title><![CDATA[Moving Past the 10-Tab Workflow]]></title><description><![CDATA[Autonomous Orchestration Management Is Next]]></description><link>https://hyperdev.matsuoka.com/p/moving-past-the-10-tab-workflow</link><guid isPermaLink="false">https://hyperdev.matsuoka.com/p/moving-past-the-10-tab-workflow</guid><dc:creator><![CDATA[Robert Matsuoka]]></dc:creator><pubDate>Wed, 01 Apr 2026 12:32:10 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!wz2X!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5c54d57-2f21-4893-a87d-88523deb8ae7_1024x690.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wz2X!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5c54d57-2f21-4893-a87d-88523deb8ae7_1024x690.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wz2X!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5c54d57-2f21-4893-a87d-88523deb8ae7_1024x690.png 424w, https://substackcdn.com/image/fetch/$s_!wz2X!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5c54d57-2f21-4893-a87d-88523deb8ae7_1024x690.png 848w, https://substackcdn.com/image/fetch/$s_!wz2X!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5c54d57-2f21-4893-a87d-88523deb8ae7_1024x690.png 1272w, https://substackcdn.com/image/fetch/$s_!wz2X!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5c54d57-2f21-4893-a87d-88523deb8ae7_1024x690.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wz2X!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5c54d57-2f21-4893-a87d-88523deb8ae7_1024x690.png" width="1024" height="690" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a5c54d57-2f21-4893-a87d-88523deb8ae7_1024x690.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:690,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1272299,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/192741144?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb87ec05a-31c6-4037-a9e0-db7c8c8fcba4_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wz2X!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5c54d57-2f21-4893-a87d-88523deb8ae7_1024x690.png 424w, https://substackcdn.com/image/fetch/$s_!wz2X!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5c54d57-2f21-4893-a87d-88523deb8ae7_1024x690.png 848w, https://substackcdn.com/image/fetch/$s_!wz2X!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5c54d57-2f21-4893-a87d-88523deb8ae7_1024x690.png 1272w, https://substackcdn.com/image/fetch/$s_!wz2X!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5c54d57-2f21-4893-a87d-88523deb8ae7_1024x690.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">From Tab Chaos to Autonomic Orchestration</figcaption></figure></div><p>I&#8217;m looking at my (iTerm) terminal right now. Ten tmux sessions. Each session holds a different project context&#8212;one monitoring CI failures, another handling a code review, a third debugging a production issue.</p><p>This is the reality of modern agent development work.</p><h2>TL;DR</h2><ul><li><p><strong>Multi-session reality</strong>: Power users average 8-12 terminal sessions; most work involves modification, bug response, and PR handling&#8212;not new code generation</p></li><li><p><strong>Natural workflow origination</strong>: Future systems trigger from product team actions, CI failures, and automated events rather than human prompts </p></li><li><p><strong>Orchestration evolution</strong>: From human-orchestrated agents to orchestration-of-orchestrators where prime coordinators are non-human</p></li><li><p><strong>Production examples</strong>: Stripe&#8217;s Minions (1,300 PRs/week), GitLab&#8217;s Duo Agent Platform, Meta&#8217;s REA demonstrate hierarchical agent orchestration</p></li><li><p><strong>Architecture shift</strong>: Claude Code&#8217;s SDK model enables workflow-driven development through persistent, context-aware agent orchestration</p></li></ul><h2>The 10-Tab Reality</h2><p>According to recent developer workflow studies, <a href="https://www.heyuan110.com/posts/ai/2026-03-03-tmux-guide-ai-development/">tmux has become the standard for AI-assisted development</a>, with <a href="https://dev.to/_d7eb1c1703182e3ce1782/tmux-tutorial-the-complete-developer-workflow-guide-2026-33b3">persistent sessions solving the context-switching tax</a>. The productivity advantage isn&#8217;t the multiplexing&#8212;it&#8217;s the persistence. Projects become environments you step in and out of rather than things you open and close.</p><p>But here&#8217;s what the productivity tutorials miss: most of those tabs aren&#8217;t generating software.</p><p><strong>My current session breakdown:</strong></p><ul><li><p>3 sessions: non-coding -- my CTO knowledge base (currently analyzing our Sumo use), a writing assistant, and our Duetto product management framework</p></li><li><p>4 sessions: coding - various internal tools and MCP connectors</p></li><li><p>2 sessions: coding - new projects</p></li><li><p>1 session: code review</p></li></ul><p>The 8:2 ratio holds across most senior developers I&#8217;ve observed. Most development work involves responding to existing systems, not creating new ones.</p><p>This distribution points toward something significant: <strong>the future of development orchestration isn&#8217;t human-initiated.</strong></p><h2>Beyond Prompt-Driven Development</h2><p>Claude Code&#8217;s new SDK architecture reflects this reality. Instead of starting with human prompts, work originates from natural workflow events:</p><ul><li><p>Product team creates ticket &#8594; Implementation specification generated</p></li><li><p>CI pipeline fails &#8594; Diagnostic agent analyzes failure, proposes fix</p></li><li><p>PR submitted &#8594; Review agent examines code, suggests improvements</p></li><li><p>Production alert triggered &#8594; Incident response agent investigates, documents findings</p></li><li><p>Security scan detects vulnerability &#8594; Remediation agent generates patch</p></li></ul><p>The pattern: <strong>Event &#8594; Agent Response &#8594; Human Review &#8594; Autonomous Resolution</strong>.</p><p>Humans remain in the loop, but as orchestrators and validators rather than initiators. The shift from &#8220;What should I build?&#8221; to &#8220;How should this system respond?&#8221;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!G7Uu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20f98126-397c-40f4-bf7d-6fc025ab018d_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!G7Uu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20f98126-397c-40f4-bf7d-6fc025ab018d_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!G7Uu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20f98126-397c-40f4-bf7d-6fc025ab018d_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!G7Uu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20f98126-397c-40f4-bf7d-6fc025ab018d_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!G7Uu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20f98126-397c-40f4-bf7d-6fc025ab018d_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!G7Uu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20f98126-397c-40f4-bf7d-6fc025ab018d_1024x1024.png" width="1024" height="1024" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/20f98126-397c-40f4-bf7d-6fc025ab018d_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1418038,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/192741144?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20f98126-397c-40f4-bf7d-6fc025ab018d_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!G7Uu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20f98126-397c-40f4-bf7d-6fc025ab018d_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!G7Uu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20f98126-397c-40f4-bf7d-6fc025ab018d_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!G7Uu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20f98126-397c-40f4-bf7d-6fc025ab018d_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!G7Uu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20f98126-397c-40f4-bf7d-6fc025ab018d_1024x1024.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Yes, Dall-e is Still Atrocious With Spelling.  Don&#8217;t @ me!</figcaption></figure></div><h2>Orchestration of Orchestrators: Production Examples</h2><h3>Stripe&#8217;s Blueprint Architecture</h3><p><a href="https://medium.com/@oracle_43885/how-stripe-built-secure-unattended-ai-agents-merging-1-000-pull-requests-weekly-1ff42f3fe550">Stripe&#8217;s Minions system demonstrates mature orchestration-of-orchestrators</a>. Their &#8220;blueprint&#8221; pattern alternates between deterministic code nodes and agentic reasoning loops, generating <a href="https://blog.bytebytego.com/p/how-stripes-minions-ship-1300-prs">1,300+ pull requests weekly</a>.</p><p><strong>Architecture insight</strong>: <a href="https://www.mindstudio.ai/blog/stripe-minions-blueprint-architecture-deterministic-agentic-nodes">Each blueprint functions as a strict contract between orchestration and execution</a>. Task definitions specify input requirements, output formats, constraints, and success criteria. The orchestrator manages workflow, agents handle implementation.</p><p><strong>Security model</strong>: <a href="https://www.sitepoint.com/stripe-minions-architecture-explained/">Every Minion execution runs in isolated VMs</a> with no internet or production access. The system has submission authority but not merge authority&#8212;all changes require human review.</p><h3>GitLab&#8217;s Intelligent Orchestration</h3><p><a href="https://about.gitlab.com/blog/agentic-sdlc-gitlab-and-tcs-deliver-intelligent-orchestration-across-the-enterprise/">GitLab&#8217;s Duo Agent Platform treats agents as durable actors</a> that plan, modify code, fix pipelines, and enforce security with traceability. Multiple AI agents handle parallel tasks&#8212;code generation, testing, CI/CD fixes&#8212;while developers maintain oversight through defined rules.</p><p><strong>Orchestration insight</strong>: <a href="https://docs.gitlab.com/user/duo_agent_platform/">GitLab positions itself as an AI orchestration plane</a> where humans and agents share delivery responsibility. The platform coordinates multi-agent workflows across the entire software lifecycle rather than providing isolated AI tools.</p><h3>Meta&#8217;s Hierarchical Agent Systems</h3><p><a href="https://engineering.fb.com/2026/03/17/developer-tools/ranking-engineer-agent-rea-autonomous-ai-system-accelerating-meta-ads-ranking-innovation/">Meta&#8217;s Ranking Engineer Agent (REA) demonstrates autonomous ML lifecycle management</a>. REA Planner and REA Executor components, supported by shared skill and knowledge systems, autonomously evolve ads ranking models at scale.</p><p><strong>Acquisition significance</strong>: <a href="https://venturebeat.com/orchestration/why-meta-bought-manus-and-what-it-means-for-your-enterprise-ai-agent">Meta&#8217;s $2B Manus acquisition</a> focused on orchestration infrastructure rather than foundation models. Manus&#8217;s achievement was engineering an execution layer enabling models to browse, code, manipulate files, and complete multi-step workflows autonomously.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!21xU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0980b901-da6a-408f-a4c7-2dd2188be40c_1024x663.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!21xU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0980b901-da6a-408f-a4c7-2dd2188be40c_1024x663.png 424w, https://substackcdn.com/image/fetch/$s_!21xU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0980b901-da6a-408f-a4c7-2dd2188be40c_1024x663.png 848w, https://substackcdn.com/image/fetch/$s_!21xU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0980b901-da6a-408f-a4c7-2dd2188be40c_1024x663.png 1272w, https://substackcdn.com/image/fetch/$s_!21xU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0980b901-da6a-408f-a4c7-2dd2188be40c_1024x663.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!21xU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0980b901-da6a-408f-a4c7-2dd2188be40c_1024x663.png" width="1024" height="663" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0980b901-da6a-408f-a4c7-2dd2188be40c_1024x663.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:663,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1147097,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/192741144?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef64881a-c7f7-4bab-8d8d-2393ee5bf8b0_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!21xU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0980b901-da6a-408f-a4c7-2dd2188be40c_1024x663.png 424w, https://substackcdn.com/image/fetch/$s_!21xU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0980b901-da6a-408f-a4c7-2dd2188be40c_1024x663.png 848w, https://substackcdn.com/image/fetch/$s_!21xU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0980b901-da6a-408f-a4c7-2dd2188be40c_1024x663.png 1272w, https://substackcdn.com/image/fetch/$s_!21xU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0980b901-da6a-408f-a4c7-2dd2188be40c_1024x663.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>The Architecture Implications</h2><h3>Beyond the Single-Agent Model</h3><p>The production examples reveal a consistent pattern: successful autonomous development requires <strong>hierarchical orchestration</strong> rather than monolithic AI assistants.</p><p><strong>Traditional approach</strong>: Human &#8594; Single Agent &#8594; Code<br><strong>Emerging pattern</strong>: Event &#8594; Orchestrator &#8594; Specialized Agents &#8594; Validation &#8594; Resolution</p><h3>Context Preservation at Scale</h3><p><a href="https://medium.com/@gveloper/using-iterm2s-built-in-integration-with-tmux-d5d0ef55ec30">The tmux paradigm</a> of persistent sessions maps directly to agent orchestration. Instead of recreating context for each interaction, systems maintain ongoing project understanding across multiple concurrent workflows.</p><p><strong>Implementation insight</strong>: <a href="https://iterm2.com/documentation-tmux-integration.html">iTerm2&#8217;s tmux integration (-CC mode)</a> provides the UI pattern for agent orchestration&#8212;persistent remote workspaces with native interface feel. The same architecture principles apply to agent coordination.</p><h2>Where This Leads</h2><h3>Non-Human Prime Orchestrators</h3><p>The logical endpoint isn&#8217;t humans managing multiple agents&#8212;it&#8217;s orchestrating systems that manage agent ecosystems. <a href="https://www.deloitte.com/us/en/insights/industry/technology/technology-media-and-telecom-predictions/2026/ai-agent-orchestration.html">According to Gartner&#8217;s 2025 Agentic AI research</a>, nearly 50% of surveyed vendors identified AI orchestration as their primary differentiator.</p><p><strong>Pattern emergence</strong>: Meta-agents or orchestrator-generalists will control specialized agents, assign tasks, interpret results, and revise goals in real-time. <a href="https://arxiv.org/pdf/2601.13671">Hierarchical orchestration becomes essential for enterprise-scale implementations</a>.</p><h3>The Developer Role Evolution</h3><p>Instead of managing 10 terminal sessions, a framework orchestrates autonomous workflows. Each workflow maintains its own context, responds to its own triggers, and escalates to human attention when required. Some of those will be human/experimentation/new development driven, the majority will be responding to the automated lifecycle.</p><p><strong>Skills that matter</strong>:</p><ul><li><p><strong>Workflow boundary definition</strong>: Which autonomous streams can operate independently?</p></li><li><p><strong>Escalation criteria design</strong>: When do workflows require human intervention?</p></li><li><p><strong>Cross-workflow dependency management</strong>: How do autonomous streams coordinate?</p></li><li><p><strong>Quality gate enforcement</strong>: What validation must occur before autonomous resolution?</p></li></ul><h2>Implementation Considerations</h2><p>Teams experimenting with orchestrated autonomous development should consider:</p><ol><li><p><strong>Event-driven architecture</strong>: Which existing workflows could trigger autonomous responses?</p></li><li><p><strong>Context preservation systems</strong>: How will agent workflows maintain project understanding?</p></li><li><p><strong>Isolation and security</strong>: What boundaries prevent autonomous agents from causing damage?</p></li><li><p><strong>Human oversight integration</strong>: Where do human validation points occur in autonomous workflows?</p></li><li><p><strong>Cross-workflow coordination</strong>: How do parallel autonomous streams avoid conflicts?</p></li></ol><p>The transition from 10-tab manual orchestration to autonomous lifecycle orchestration isn&#8217;t theoretical. Stripe, GitLab, and Meta demonstrate production implementations. The question becomes implementation timeline and organizational readiness.</p><p>Early adopters are discovering that the competitive advantage comes not from having the smartest individual AI agents, but from orchestrating networks of specialized agents that collaborate effectively at scale.</p><div><hr></div><p><em>Bob Matsuoka is CTO of <a href="https://www.duettocloud.com/">Duetto</a> and writes about AI-powered engineering at <a href="https://hyperdev.substack.com/">HyperDev</a>.</em></p><p><strong>Related reading:</strong></p><ul><li><p><a href="https://www.coddykit.com/pages/blog-detail?id=512757">Stripe&#8217;s Minions: Inside Their Enterprise AI Coding Agent Strategy</a> &#8212; Blueprint orchestration architecture and production metrics</p></li><li><p><a href="https://docs.gitlab.com/user/duo_agent_platform/">GitLab Duo Agent Platform</a> &#8212; Intelligent orchestration across software lifecycle</p></li><li><p><a href="https://www.heyuan110.com/posts/ai/2026-03-03-tmux-guide-ai-development/">Tmux Complete Guide: AI-Powered Multi-Agent Workflows</a> &#8212; Terminal multiplexing for autonomous development</p></li><li><p><a href="https://aipowerranking.com/">AI Power Ranking</a> &#8212; Tool comparisons and benchmarks for AI practitioners</p></li><li><p><a href="https://www.linkedin.com/newsletters/ai-power-ranking-7345782916301418496/">LinkedIn Newsletter</a> &#8212; Strategic AI insights for CTOs and engineering leaders</p></li></ul>]]></content:encoded></item><item><title><![CDATA[Is This The Era of the Connector?]]></title><description><![CDATA[Go To Where The People Are]]></description><link>https://hyperdev.matsuoka.com/p/is-this-the-era-of-the-connector</link><guid isPermaLink="false">https://hyperdev.matsuoka.com/p/is-this-the-era-of-the-connector</guid><dc:creator><![CDATA[Robert Matsuoka]]></dc:creator><pubDate>Thu, 26 Mar 2026 12:32:38 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!p4t8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F908c25b0-f6df-4b53-bc8c-0984c5c409df_1024x819.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!p4t8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F908c25b0-f6df-4b53-bc8c-0984c5c409df_1024x819.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!p4t8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F908c25b0-f6df-4b53-bc8c-0984c5c409df_1024x819.png 424w, https://substackcdn.com/image/fetch/$s_!p4t8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F908c25b0-f6df-4b53-bc8c-0984c5c409df_1024x819.png 848w, https://substackcdn.com/image/fetch/$s_!p4t8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F908c25b0-f6df-4b53-bc8c-0984c5c409df_1024x819.png 1272w, https://substackcdn.com/image/fetch/$s_!p4t8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F908c25b0-f6df-4b53-bc8c-0984c5c409df_1024x819.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!p4t8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F908c25b0-f6df-4b53-bc8c-0984c5c409df_1024x819.png" width="1024" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/908c25b0-f6df-4b53-bc8c-0984c5c409df_1024x819.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1414425,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/192036152?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3d13406-f339-4790-b5f2-7117dd24336e_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!p4t8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F908c25b0-f6df-4b53-bc8c-0984c5c409df_1024x819.png 424w, https://substackcdn.com/image/fetch/$s_!p4t8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F908c25b0-f6df-4b53-bc8c-0984c5c409df_1024x819.png 848w, https://substackcdn.com/image/fetch/$s_!p4t8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F908c25b0-f6df-4b53-bc8c-0984c5c409df_1024x819.png 1272w, https://substackcdn.com/image/fetch/$s_!p4t8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F908c25b0-f6df-4b53-bc8c-0984c5c409df_1024x819.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>TL;DR</h2><ul><li><p>Users consolidate around 4 core platforms (Slack, Notion, Email+Office, AI Tool) while rejecting standalone/SASS tools</p></li><li><p>Connectors that bring data to users beat new tools that require visits</p></li><li><p>Infrastructure breakthrough: Slack manifests + MCP protocol + LLM services make org-specific connectors trivial</p></li><li><p>Democratization effect: Bootcamp engineers can now build sophisticated integrations that once required senior developers</p></li><li><p>Production evidence: 3 connectors (4-6 hours each, $1-2.5K in AI tokens) replaced 5-6 standalone tools that would cost $150-300K+ traditionally</p></li><li><p>Tolerance for &#8220;broad-based general tools&#8221; declining &#8212; UX mindshare captures traffic even when APIs do the work</p></li></ul><div><hr></div><p>In the past two weeks, I&#8217;ve built three connectors that collectively replaced what would have been five or six standalone tools.</p><p><strong>Engineering Search Connector</strong>: Hosted semantic and knowledge graph search service built by repurposing mcp-vector-search. Unified search across 150+ GitHub repos, 1,700+ wiki pages, and ticket systems. Accessible through Slack bot, web interface, CLI, and MCP connector for Claude.AI that brings engineering knowledge to where people already work.</p><p><strong>CRM Data Connector</strong>: Live customer data piped directly into Claude.AI sessions via MCP. No dashboard to check, no reports to generate. Ask &#8220;What&#8217;s our pipeline this quarter?&#8221; and get live data in 1-3 seconds.</p><p><strong>Document Workflow Connector</strong>: Artifact browser and guided PR workflow for non-technical contributors. Product managers can explore and propose changes to structured docs without touching git or learning new interfaces.</p><p>None of these required users to adopt a new primary tool. Each brings specialized functionality to platforms they already inhabit daily. And each took roughly 4-6 hours to build (plus agent time).</p><p>This isn&#8217;t a productivity humble-brag. It&#8217;s evidence of a fundamental shift in how organizations interact with their data. We&#8217;re entering the connector era &#8212; building bridges between specialized intelligence and the handful of platforms where users actually live, rather than standalone applications they have to visit.</p><p>The numbers support this pattern. Users toggle between apps 1,200 times daily, losing 40% productivity to context switching. Connector ecosystems are exploding: Slack&#8217;s marketplace hosts 2,600+ apps with 550K+ daily custom integrations. The MCP protocol went from 100K to 8M downloads in six months &#8212; unprecedented adoption for plumbing infrastructure.</p><p>The question isn&#8217;t whether Slack, Notion, and Claude.AI will survive the AI wave. It&#8217;s whether the hundreds of specialized tools competing for attention understand that the game has changed. Users have less tolerance for broad-based general tools than they once did. The platforms that capture UX mindshare will get most of the traffic, even if APIs and agents do the actual work behind the scenes.</p><p>The evidence is clear from user behavior: they don&#8217;t want to learn a new search interface, remember another login, or context-switch to yet another tab. They want the intelligence layer to meet them where they already are.</p><h2>The Source of Truth Problem</h2><p>Most organizations have a source-of-truth problem they haven&#8217;t fully articulated. They have Slack for real-time communication. They have Notion or Confluence for documentation. They have Google Docs for drafts that become documents that become outdated that stay around anyway. They have JIRA for tickets that may or may not reflect what was actually decided. They call this a &#8220;knowledge management system.&#8221; It&#8217;s more accurately a distributed archive of partially-intentional artifacts with no clear authority hierarchy.</p><p>The question &#8220;who owns this decision?&#8221; leads to a Slack thread from eight months ago, a Notion page that three people edited and nobody is certain is current, and a Google Doc someone linked in a comment that requires permission to access. This is the status quo. It functions, after a fashion, because humans are good at triangulating across ambiguous sources and asking colleagues to fill gaps.</p><p>AI agents are not good at this. They will confidently synthesize the eight-month-old Slack thread with the outdated Notion page and present the result as a coherent answer. The errors won&#8217;t be obvious. They&#8217;ll be subtly wrong in ways that require domain expertise to catch.</p><p>The source of truth problem was always real. It was manageable when every query ran through a human brain. It becomes actively dangerous when queries run through an inference layer first.</p><p>What you actually need &#8212; what organizations are starting to build &#8212; is a repository where the data structure enforces truth. Not a place where the right answer might be findable if you look hard enough. A place where the structure of the data makes the wrong answer harder to produce.</p><p>But here&#8217;s the connector insight: that structured repository doesn&#8217;t need to be where users spend their time. It can be the authoritative backend that feeds connectors in the platforms users already inhabit.</p><h2>Where Users Actually Live</h2><p>User attention has consolidated around four core platforms:</p><p><strong>Slack</strong>: Real-time coordination, team presence, ephemeral decisions. 32.3 million daily active users with 550K+ custom integrations daily.</p><p><strong>Email + Office Suite</strong>: Formal communication, document collaboration, external stakeholder interface. Microsoft reports 400M+ Office 365 commercial users.</p><p><strong>Notion</strong>: Knowledge management, project tracking, collaborative documentation. 100M+ users consolidating entire productivity stacks.</p><p><strong>Claude.AI</strong>: AI assistance, analysis, content generation. Rapidly becoming the default interface for LLM interactions across knowledge work.</p><p>Each platform serves a legitimate core function. Tool builders make the mistake of assuming they can compete for primary platform status by building something better. Users are done adopting new primary platforms. They&#8217;re consolidating around tools that already have their attention.</p><p>The pattern reveals a deeper truth: people live in transactional systems, not knowledge systems. Slack is where decisions happen. Email is where approvals flow. Claude.AI is where analysis gets done. These are transactional - work happens there daily.</p><p>Confluence is a perfectly good wiki tool. But it&#8217;s knowledge-at-rest, not transactional. People don&#8217;t live there. They visit when forced to document something, then return to their transactional workflows. The knowledge gets stale because maintenance happens in a different system than usage. (Notion manages to straddle the line between knowledge at rest and transactional)</p><p>Integration platforms like Zapier understand this - they connect 8,000+ apps with 3.4M+ business users by bringing specialized functionality to existing workflows rather than creating new destinations.</p><p>Users just want the data, dammit. They don&#8217;t want to learn your interface.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!prOp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba561016-ed0c-4147-9623-63f9996b9b13_1024x806.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!prOp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba561016-ed0c-4147-9623-63f9996b9b13_1024x806.png 424w, https://substackcdn.com/image/fetch/$s_!prOp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba561016-ed0c-4147-9623-63f9996b9b13_1024x806.png 848w, https://substackcdn.com/image/fetch/$s_!prOp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba561016-ed0c-4147-9623-63f9996b9b13_1024x806.png 1272w, https://substackcdn.com/image/fetch/$s_!prOp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba561016-ed0c-4147-9623-63f9996b9b13_1024x806.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!prOp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba561016-ed0c-4147-9623-63f9996b9b13_1024x806.png" width="1024" height="806" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ba561016-ed0c-4147-9623-63f9996b9b13_1024x806.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:806,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1186021,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/192036152?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3416bd4-5246-41d4-9f66-0351982baeb1_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!prOp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba561016-ed0c-4147-9623-63f9996b9b13_1024x806.png 424w, https://substackcdn.com/image/fetch/$s_!prOp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba561016-ed0c-4147-9623-63f9996b9b13_1024x806.png 848w, https://substackcdn.com/image/fetch/$s_!prOp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba561016-ed0c-4147-9623-63f9996b9b13_1024x806.png 1272w, https://substackcdn.com/image/fetch/$s_!prOp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba561016-ed0c-4147-9623-63f9996b9b13_1024x806.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>The Connector Infrastructure Moment</h2><p>What changed? Three pieces of infrastructure matured simultaneously:</p><p><strong>Slack Manifest Tool</strong> makes organization-specific bots trivial to build. The manifest.yaml format standardizes permissions, scopes, and deployment. Weeks of OAuth wrestling became hours of configuration.</p><p><strong>MCP Protocol</strong> achieved &#8220;USB-C for AI&#8221; universal connectivity. Claude.AI, ChatGPT, and dozens of platforms support the same connector format. Build once, deploy everywhere. The 100K to 8M download growth in six months reflects pent-up demand.</p><p><strong>LLM Services</strong> like Bedrock and OpenRouter provide natural language interfaces that make connectors intelligent rather than just data pipes. Ask questions in plain English, get structured responses, maintain conversation context.</p><p><strong>Semantic Search Infrastructure</strong> like mcp-vector-search can be repurposed as hosted services, adding intelligence layers that understand meaning rather than just matching keywords. This transforms basic data access into contextual knowledge retrieval &#8212; a crucial enabler for connectors that need to surface relevant information rather than exact matches.</p><p>Combined, you can build a production connector in a single afternoon. Slack manifest defines the bot interface. MCP schema defines the data sources. Semantic search handles intelligent retrieval. Bedrock provides the language understanding. Deploy to AWS Lambda and you&#8217;re live.</p><p>My three connectors follow this exact pattern. The engineering search connector repurposes mcp-vector-search as a hosted service with all-MiniLM-L6-v2 embeddings for semantic and knowledge graph search, but the user interface is just Slack commands and Claude.AI MCP tools. The CRM data connector is a headless AWS service that makes customer data available through natural language queries in Claude.AI. The document workflow connector provides git workflows through a web UI that non-technical users can navigate.</p><p>Each connector took 4-6 hours to build. Each would have taken 4-6 months to build as a standalone application with user management, authentication, interface design, mobile responsiveness, and all the infrastructure a &#8220;real app&#8221; requires.</p><p><strong>The Democratization Effect</strong>: The infrastructure shift goes beyond development speed &#8212; it&#8217;s democratizing who can build sophisticated integrations. What once required senior engineers with deep API knowledge can now be handled by bootcamp graduates following established patterns. I built these first three connectors to validate the approach, but similar projects will go to junior engineers going forward.</p><p>This changes resource allocation fundamentally. Organizations can solve integration problems without burning senior engineering cycles on &#8220;plumbing&#8221; work. Information that was once very hard to obtain is now trivial to access.</p><p>The economics are compelling. Building three production connectors cost roughly $1,000-2,500 in AI tokens over 44 days. Traditional contractor development for equivalent functionality would have run $150-300K+. The connector approach isn&#8217;t just faster &#8212; it&#8217;s 100x more cost-effective.</p><p>The adoption metrics prove the value. The CRM connector launched March 18th with 23 invocations on day one. No formal rollout, no training sessions, no onboarding docs. Just organic discovery across a 300+ person company. By week two, daily usage tripled to 95 invocations per day. Tuesday hit 152 invocations &#8212; including a 40-query analysis session in a single hour. That&#8217;s 299 queries in 7 days with zero errors, from a connector that took 4-6 hours to build.</p><p>The era isn&#8217;t about choosing between platforms. It&#8217;s about connecting specialized intelligence to the platforms users have already chosen.</p><h2>Why Wikis Can&#8217;t Compete in the Connector World</h2><p>Traditional knowledge management tools face a structural mismatch in connector architecture. Wikis assume users will &#8220;go to the tool&#8221; for information. Connectors flip that assumption: the tool comes to the user.</p><p>This creates specific problems:</p><p><strong>The Authoring/Retrieval Tension</strong>: Wikis optimize for collaborative authoring &#8212; anybody can edit, flexible structure, link everything, evolve over time. This is the opposite of what retrieval needs: consistent schema, clear ownership, explicit governance. When you pipe wiki content through a connector, you inherit all the inconsistencies that collaborative authoring creates.</p><p><strong>Search Architecture Limitations</strong>: Confluence&#8217;s search is notoriously bad because it does keyword matching on unstructured text. This was problematic before LLMs. With LLM-powered connectors, it becomes worse because the AI layer adds confidence to bad retrieval results. Users get wrong answers delivered with conviction.</p><p><strong>Static Data Problem</strong>: Notion&#8217;s AI operates on static content snapshots, disconnected from real-time operational state. When CRM connectors query &#8220;What&#8217;s our pipeline this quarter?&#8221; through a Notion connector, it&#8217;s answering based on what someone wrote about the pipeline, not live customer data. The connector amplifies the staleness problem.</p><p><strong>Governance at Scale</strong>: Wiki governance defaults to &#8220;community-maintained,&#8221; which means in practice nobody is responsible for accuracy. As organizations scale, wikis accumulate pages nobody knows are outdated. Connectors don&#8217;t solve this &#8212; they accelerate the distribution of stale information.</p><p>Our structured document framework represents the alternative: git-backed Markdown with schema-validated YAML frontmatter. Every document has explicit metadata: owner, status, domain, confidence, time_box. The structure is the feature. When document workflow connectors expose this, the schema ensures consistent data quality regardless of interface.</p><p>Structured document repositories outperform wikis for AI query by 35-60% in controlled tests. Clean Markdown with explicit metadata reduces token usage by 20-30% and improves retrieval accuracy significantly. This isn&#8217;t philosophical &#8212; it&#8217;s measurable.</p><p>Wikis remain useful for collaborative drafting and evolving reference material. But they&#8217;re not the right backend for connector architecture. The connector era requires structured data sources that can maintain quality across multiple interface layers.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NjzV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F879cc371-1f68-486d-8602-c68b47411a94_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NjzV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F879cc371-1f68-486d-8602-c68b47411a94_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!NjzV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F879cc371-1f68-486d-8602-c68b47411a94_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!NjzV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F879cc371-1f68-486d-8602-c68b47411a94_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!NjzV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F879cc371-1f68-486d-8602-c68b47411a94_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NjzV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F879cc371-1f68-486d-8602-c68b47411a94_1024x1024.png" width="1024" height="1024" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/879cc371-1f68-486d-8602-c68b47411a94_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1015968,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/192036152?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F879cc371-1f68-486d-8602-c68b47411a94_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!NjzV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F879cc371-1f68-486d-8602-c68b47411a94_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!NjzV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F879cc371-1f68-486d-8602-c68b47411a94_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!NjzV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F879cc371-1f68-486d-8602-c68b47411a94_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!NjzV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F879cc371-1f68-486d-8602-c68b47411a94_1024x1024.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Building Connectors vs. Standalone Tools</h2><p>The strategic choice organizations face isn&#8217;t &#8220;which tool should we build?&#8221; but &#8220;should we build a tool or a connector?&#8221; My experience with three connectors illuminates the trade-offs:</p><p><strong>Engineering search</strong> could have been a standalone search platform. Instead, it&#8217;s accessible through Slack commands, CLI tools, web interface for visualizations, and MCP tools for Claude.AI sessions. Same search capability, four different interaction models depending on user context.</p><p><strong>CRM integration</strong> could have been a dashboard with charts and filters. Instead, it&#8217;s a headless MCP service that makes customer data available through natural language in Claude.AI. Ask &#8220;Show me deals over $100K in our target vertical&#8221; and get live results in 1-3 seconds. No dashboard to learn, no visual interface to maintain.</p><p><strong>Document workflows</strong> could have been a product management SaaS platform. Instead, it&#8217;s a guided workflow that helps non-technical contributors interact with existing git-backed document frameworks. Browse artifacts, generate AI summaries, submit PRs &#8212; all through interfaces that match users&#8217; technical comfort levels.</p><p>Specialized intelligence delivered through platforms users already inhabit. The connector approach wins on several fronts.</p><p>Development time: 4-6 hours vs. months. No user management, authentication, responsive design, or mobile apps to build.</p><p>Adoption friction: Zero onboarding. No new logins, training sessions, or change management overhead.</p><p>Maintenance burden: Focus on data logic and intelligence, not interface maintenance across device types and browser versions.</p><p>Integration: Connectors compose naturally with existing workflows. Slack discussions can include live Salesforce data. Claude.AI analysis can pull from engineering knowledge graphs. Standalone tools require export/import workflows.</p><p>The business case is compelling: connector development costs 10-20% of standalone application development while achieving 3-4x higher user engagement.</p><p>The implications go beyond development efficiency. Users have less tolerance for &#8220;broad-based general tools&#8221; than they once did. Managing dozens of application contexts creates unsustainable cognitive load. Platforms that capture daily attention get most of the traffic, even when APIs and agents do the computational work behind the scenes.</p><p>This creates different winner-take-all dynamics. The winners aren&#8217;t necessarily the best tools. They&#8217;re the platforms users choose to inhabit, plus the connectors that bring specialized capability to those platforms.</p><h2>What This Means for Your Stack</h2><p>The connector era doesn&#8217;t eliminate existing tools &#8212; it clarifies their appropriate roles and challenges their assumptions about user attention.</p><p><strong>Slack keeps its coordination function</strong>: Real-time presence, threading, ephemeral decisions. But it becomes a command interface for structured data sources rather than a knowledge repository itself.</p><p><strong>Notion retains collaborative authoring value</strong>: Drafting, evolving documentation, reference material. But it stops being the &#8220;source of truth&#8221; for operational decisions. That role shifts to structured backends accessible through Notion connectors.</p><p><strong>Specialized tools survive by becoming intelligent backends</strong>: Your CRM, your monitoring system, your code repositories &#8212; these maintain their core data authority. But user interaction shifts to connector layers in platforms where users already work.</p><p>The question to ask about any tool: Is this where I want an AI agent pointing when it needs authoritative information? If the answer is no, it&#8217;s not your source of truth. It might still be valuable &#8212; as a backend, as a collaborative space, as a specialized interface for expert users. But it doesn&#8217;t earn the designation of &#8220;primary platform.&#8221;</p><p><strong>The organizational challenge</strong>: Getting non-technical teams comfortable with structured data workflows is real change management. Document workflow connectors address this by providing guided interfaces for git-backed workflows. But someone still needs to own schema design and governance processes.</p><p><strong>Who should build connectors first</strong>: Engineering-adjacent teams with strong PM-engineering collaboration. Organizations where AI hallucination on operational decisions creates measurable cost. Companies that have already felt the pain of distributed knowledge management.</p><p><strong>Timing matters</strong>: Most organizations haven&#8217;t built connector strategies yet. Companies that establish structured knowledge backends with connector frontends in 2026 will have 12-18 months of advantage when AI-mediated query becomes standard practice.</p><p>The connector era isn&#8217;t about choosing between platforms. It&#8217;s about connecting intelligent backends to platforms users have already chosen. Organizations that get this right will operate with less context switching and faster access to operational data.</p><p>Users just want the data, dammit. The question is: will you bring it to them, or keep expecting them to come to you?</p><div><hr></div><p><em>Bob Matsuoka is CTO of <a href="https://www.duettocloud.com/">Duetto</a> and writes about AI-powered engineering at <a href="https://hyperdev.substack.com/">HyperDev</a>.</em></p><p><strong>Related reading:</strong></p><ul><li><p><a href="https://aipowerranking.com/">AI Power Ranking</a> &#8212; Tool comparisons and benchmarks for AI practitioners</p></li><li><p><a href="https://www.linkedin.com/newsletters/ai-power-ranking-7345782916301418496/">LinkedIn Newsletter</a> &#8212; Strategic AI insights for CTOs and engineering leaders</p></li></ul>]]></content:encoded></item><item><title><![CDATA[One Year of HyperDev]]></title><description><![CDATA[From Skeptic (back) to CTO]]></description><link>https://hyperdev.matsuoka.com/p/one-year-of-hyperdev</link><guid isPermaLink="false">https://hyperdev.matsuoka.com/p/one-year-of-hyperdev</guid><dc:creator><![CDATA[Robert Matsuoka]]></dc:creator><pubDate>Fri, 20 Mar 2026 12:31:14 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!nSwX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2767f452-9347-4e6e-9bb5-34a5d4417dca_1024x581.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!nSwX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2767f452-9347-4e6e-9bb5-34a5d4417dca_1024x581.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nSwX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2767f452-9347-4e6e-9bb5-34a5d4417dca_1024x581.png 424w, https://substackcdn.com/image/fetch/$s_!nSwX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2767f452-9347-4e6e-9bb5-34a5d4417dca_1024x581.png 848w, https://substackcdn.com/image/fetch/$s_!nSwX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2767f452-9347-4e6e-9bb5-34a5d4417dca_1024x581.png 1272w, https://substackcdn.com/image/fetch/$s_!nSwX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2767f452-9347-4e6e-9bb5-34a5d4417dca_1024x581.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nSwX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2767f452-9347-4e6e-9bb5-34a5d4417dca_1024x581.png" width="1024" height="581" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2767f452-9347-4e6e-9bb5-34a5d4417dca_1024x581.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:581,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1344053,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/191547847?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F936fa04c-e4d5-42fd-8433-eedd0d54fa22_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!nSwX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2767f452-9347-4e6e-9bb5-34a5d4417dca_1024x581.png 424w, https://substackcdn.com/image/fetch/$s_!nSwX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2767f452-9347-4e6e-9bb5-34a5d4417dca_1024x581.png 848w, https://substackcdn.com/image/fetch/$s_!nSwX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2767f452-9347-4e6e-9bb5-34a5d4417dca_1024x581.png 1272w, https://substackcdn.com/image/fetch/$s_!nSwX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2767f452-9347-4e6e-9bb5-34a5d4417dca_1024x581.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>A retrospective on building in public during the great AI coding transformation</em></figcaption></figure></div><p>One year ago, &#8220;<a href="https://hyperdev.matsuoka.com/p/50-first-dates-with-claude-code">50 First Dates with Claude Code</a>&#8220; took me several hours to write with Joanie&#8217;s help in Google Docs. This morning, I drafted two comprehensive Anthropic articles using <a href="https://github.com/bobmatnyc/claude-mpm">claude-mpm</a> agents in 45 minutes - research, generation, editing, the works.</p><p>That evolution mirrors exactly what happened across the entire AI coding industry this year (and the ascension of Claude Code, which was released shortly before I began my journey). I wasn&#8217;t just writing about the transformation. I was living it.</p><h2>The Journey: From Accident to Infrastructure</h2><p>HyperDev started by accident. In March 2025, I posted a <a href="https://www.linkedin.com/pulse/fear-loathing-hyperdev-land-ai-coding-experiment-robert-matsuoka-xis4c/">LinkedIn experiment</a> about spending 12 hours building a travel planning app with AI tools. I was a technology executive who &#8220;hadn&#8217;t coded seriously in 20 years&#8221; testing whether the productivity claims were real or hype.</p><p>The response was immediate and intense. Richard Wang left a prescient comment: &#8220;&#8217;AI allows a non-engineer to build a product without coding&#8217; is hype... &#8216;AI can improve a developer&#8217;s productivity by 10x&#8217; is true.&#8221; That experiment generated 4,000 lines of AI code in a single session and launched what became 168 articles over nine months.</p><p>April through June was chaos. I built claude-multiagent-pm, a prototype that worked well enough to be exciting and poorly enough to be frustrating. Token costs were obscene - every subprocess inherited the entire conversation context. I shipped 44 repositories. Probably a third represent false starts or abandoned approaches.</p><p>But that&#8217;s what I was learning: what not to build. Which constraints matter. Where the sharp edges live. The breakthrough insight from this period: <strong>Infrastructure beats features</strong>. The tools that demo well (flashy autocomplete, pretty interfaces) weren&#8217;t the ones that sustained daily use. The protocols, memory systems, and context management layers were what made sustained multi-agent work possible.</p><h2>The Breakthrough: When Everything Changed</h2><p>Mid-July 2025, Claude Code shipped context filtering. Sounds like a minor technical detail. It changed everything.</p><p>Before: my prototype burned tokens like a furnace and required constant babysitting.<br>After: I rebuilt everything. claude-mpm emerged with 1,545 commits over the rest of the year.</p><p>I remember the specific moment it hit me: I&#8217;d just pushed a feature that I would never have taken on without a team. Four hours of engaged time, a few days of agentic time. Twenty years away from serious coding. Four months back. Contributing production code for paying clients.</p><p>The tools weren&#8217;t just making me more productive. They were making me more <em>ambitious</em>. I was taking on problems that required sustained, complex thinking because I had AI teammates that could handle the execution details while I focused on architecture and strategy.</p><p>This is when my perspective shifted from &#8220;AI coding tools are interesting&#8221; to &#8220;AI coding tools are transformative.&#8221; Not because they eliminated the need for programming knowledge, but because they amplified existing knowledge into production-quality artifacts.</p><p>The writing and building formed a virtuous cycle. I documented what I learned building tools. The documentation attracted practitioners who used the tools. Their feedback improved the tools. claude-mpm gained 30+ stars and daily use across six months of client work. These weren&#8217;t GitHub tourism projects - they were tools that other practitioners adopted because they solved real problems I&#8217;d discovered through real use.</p><h2>The Evidence: Predictions and Numbers</h2><p>Looking back through a year of articles, my prediction accuracy was surprisingly good.</p><p><strong>What I got right:</strong></p><ul><li><p><strong>Multi-agent orchestration</strong> would prove superior to monolithic assistants (claude-mpm&#8217;s adoption validated this)</p></li><li><p><strong>Infrastructure over features</strong> as the determining factor for tool longevity (memory systems outlasted flashy demos)</p></li><li><p><strong>CLI-agentic coding going mainstream</strong> (Claude Code&#8217;s 46% &#8220;most loved&#8221; rating proved the thesis)</p></li><li><p><strong>Pricing correction timing</strong> (18-24 months from October 2025 - signals are clear at six months)</p></li></ul><p><strong>What surprised even me:</strong></p><ul><li><p><strong>Speed of Claude Code&#8217;s dominance</strong> (faster than even advocates expected)</p></li><li><p><strong>Writing-building credibility loop</strong> (thought leadership through shipping, not just analysis)</p></li></ul><p>The quantified impact tells its own story:</p><ul><li><p><strong>4,919 commits</strong> in nine months of sustained development</p></li><li><p><strong>69.7 billion tokens</strong> processed across all tools and projects</p></li><li><p><strong>198 articles</strong> published (3.2/week sustained)</p></li><li><p><strong>547 production deployments</strong> across client and personal projects</p></li><li><p><strong>$45,000</strong> in AI compute at rack rates (subsidized to $8,000)</p></li></ul><p>But the qualitative transformation matters more. I evolved from asking &#8220;Is this real?&#8221; to asking &#8220;How do we scale this organizationally?&#8221;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!12oI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9826b0d-2c86-424a-93e7-4fdcace62d9e_1024x688.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!12oI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9826b0d-2c86-424a-93e7-4fdcace62d9e_1024x688.png 424w, https://substackcdn.com/image/fetch/$s_!12oI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9826b0d-2c86-424a-93e7-4fdcace62d9e_1024x688.png 848w, https://substackcdn.com/image/fetch/$s_!12oI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9826b0d-2c86-424a-93e7-4fdcace62d9e_1024x688.png 1272w, https://substackcdn.com/image/fetch/$s_!12oI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9826b0d-2c86-424a-93e7-4fdcace62d9e_1024x688.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!12oI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9826b0d-2c86-424a-93e7-4fdcace62d9e_1024x688.png" width="1024" height="688" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f9826b0d-2c86-424a-93e7-4fdcace62d9e_1024x688.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:688,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1328689,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/191547847?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00be159e-43fc-4e28-b2cc-cd184f828341_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!12oI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9826b0d-2c86-424a-93e7-4fdcace62d9e_1024x688.png 424w, https://substackcdn.com/image/fetch/$s_!12oI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9826b0d-2c86-424a-93e7-4fdcace62d9e_1024x688.png 848w, https://substackcdn.com/image/fetch/$s_!12oI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9826b0d-2c86-424a-93e7-4fdcace62d9e_1024x688.png 1272w, https://substackcdn.com/image/fetch/$s_!12oI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9826b0d-2c86-424a-93e7-4fdcace62d9e_1024x688.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>The Transformation: From Observer to Practitioner to Leader</h2><p>The most significant development was personal: in January 2026, I joined <a href="https://www.duettocloud.com">Duetto</a> as CTO.</p><p>This wasn&#8217;t a career change - it was an expansion. Twenty-five years of technology leadership combined with eight months of hands-on AI development created a unique perspective. I wasn&#8217;t returning to technology leadership despite the AI work. I was taking the role <em>because</em> of it.</p><p>What changed wasn&#8217;t my capabilities - I&#8217;ve been leading technology teams for decades. What changed was having AI as a force multiplier that converted that knowledge into actual artifacts. I could prototype entire systems, validate approaches, and demonstrate concepts that previously would have required dedicated engineering resources.</p><p>The leadership experience informed architectural decisions. The practitioner activity created credibility. The writing documented both. Now I get to test everything I&#8217;ve been writing about at enterprise scale.</p><h2>The Reality Check: Both/And, Not Either/Or</h2><p>October 2025, I published &#8220;<a href="https://hyperdev.substack.com/p/is-ai-a-bubble-i-didnt-think-so">Is AI A Bubble? I Didn&#8217;t Think So Until I Heard Of SDD</a>.&#8221; The piece synthesized something I&#8217;d been wrestling with: how can genuine transformation and bubble dynamics exist simultaneously?</p><p>The answer: they can. And do.</p><p>The AI coding revolution is real. The bubble dynamics are also real. Codeium at 70x ARR multiples (vs dot-com peak of 18x) while providing genuine value to practitioners. My $45,000 in AI compute costs exemplified both the unsustainable economics and the genuine value creation. The 82% subsidy rate can&#8217;t last, but the ROI still works at full rates for sustained professional use.</p><p>Companies with product-market fit and operational discipline will survive the correction. Those burning capital on &#8220;technical potential&#8221; without user adoption won&#8217;t. The technology remains transformative even if the valuations prove unsustainable.</p><h2>What&#8217;s Next: From Individual Productivity to Organizational Transformation</h2><p>The industry is transitioning from individual productivity tools to enterprise transformation frameworks. The early adopters who mastered AI-assisted development workflows now face a different challenge: scaling those practices across entire engineering organizations.</p><p>The questions have evolved:</p><ul><li><p><strong>Year 1</strong>: &#8220;Can AI tools make me more productive?&#8221;</p></li><li><p><strong>Year 2</strong>: &#8220;How do we maintain code quality and security with AI-generated code?&#8221;</p></li><li><p><strong>Year 3</strong>: &#8220;How do we transform hiring, onboarding, and career development when AI changes what programming means?&#8221;</p></li></ul><p>At Duetto, I&#8217;m working on these questions at scale. How do you migrate an enterprise engineering team from traditional development practices to AI-assisted workflows while maintaining operational excellence? How do you balance productivity gains with governance requirements? How do you rethink technical leadership when junior developers can ship senior-quality code with AI assistance?</p><p>These are the infrastructure problems that matter now. Not the tools themselves, but the organizational systems that make the tools effective at scale.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rmcT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed789576-b3b5-4bb1-9911-2ca41a638f82_763x208.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rmcT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed789576-b3b5-4bb1-9911-2ca41a638f82_763x208.png 424w, https://substackcdn.com/image/fetch/$s_!rmcT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed789576-b3b5-4bb1-9911-2ca41a638f82_763x208.png 848w, https://substackcdn.com/image/fetch/$s_!rmcT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed789576-b3b5-4bb1-9911-2ca41a638f82_763x208.png 1272w, https://substackcdn.com/image/fetch/$s_!rmcT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed789576-b3b5-4bb1-9911-2ca41a638f82_763x208.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rmcT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed789576-b3b5-4bb1-9911-2ca41a638f82_763x208.png" width="763" height="208" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ed789576-b3b5-4bb1-9911-2ca41a638f82_763x208.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:208,&quot;width&quot;:763,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:26260,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/191547847?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed789576-b3b5-4bb1-9911-2ca41a638f82_763x208.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!rmcT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed789576-b3b5-4bb1-9911-2ca41a638f82_763x208.png 424w, https://substackcdn.com/image/fetch/$s_!rmcT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed789576-b3b5-4bb1-9911-2ca41a638f82_763x208.png 848w, https://substackcdn.com/image/fetch/$s_!rmcT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed789576-b3b5-4bb1-9911-2ca41a638f82_763x208.png 1272w, https://substackcdn.com/image/fetch/$s_!rmcT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed789576-b3b5-4bb1-9911-2ca41a638f82_763x208.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">my &#8220;Green Wall&#8221;</figcaption></figure></div><h2>The Retrospective: What One Year Taught Me</h2><p>HyperDev became valuable because it documented the transformation in real-time, from the perspective of someone living it. Not retrospective analysis of what happened, but contemporary documentation of what was happening.</p><p>The key insights:</p><p><strong>Timing matters.</strong> Starting documentation right at the inflection point captured both the chaos and the consolidation. Personal journey paralleled industry maturation.</p><p><strong>Practitioner perspective beats observer perspective.</strong> Direct experience with the tools, including their limitations and sharp edges, generated insights that pure analysis couldn&#8217;t match.</p><p><strong>Building creates credibility.</strong> Shipping tools that other practitioners adopt generates more authority than analytical commentary alone.</p><p><strong>Writing and building amplify each other.</strong> Documentation of practice creates thought leadership. Thought leadership creates opportunities. Opportunities create more practice to document.</p><p>One year ago, I was asking whether AI coding tools were genuinely transformative or just sophisticated autocomplete. The answer: they&#8217;re genuinely transformative, but in ways that none of us fully anticipated. The transformation wasn&#8217;t about eliminating the need for programming knowledge. It was about amplifying existing knowledge into production-quality artifacts faster than previously possible.</p><p>Most importantly, the transformation was about enabling individual practitioners to think and build at organizational scale. That&#8217;s what I experienced personally. That&#8217;s what I documented in nearly 200 articles. And that&#8217;s what I&#8217;m now implementing at enterprise scale.</p><p>HyperDev year one was about learning what was possible. Year two is about making it practical. Year three might be about making it inevitable.</p><p>The tools keep getting better. The workflows keep evolving. The organizational challenges keep getting more complex.</p><p>And I&#8217;m still here, still building, still documenting, now leading, still learning.</p><p>What a year it&#8217;s been. What a year it&#8217;s going to be.</p><div><hr></div><p><em><a href="https://hyperdev.matsuoka.com">HyperDev</a> documents the real-world application of AI development tools by practitioners building production systems. For technical deep dives and business analysis of the tools behind this transformation, look for upcoming coverage of Claude Code&#8217;s competitive dominance.</em></p>]]></content:encoded></item><item><title><![CDATA[Everyone Blamed Clawd Bot’s Execution. The Concept Was the Problem.]]></title><description><![CDATA[Is A Personal Assistant Bot Really Helpful?]]></description><link>https://hyperdev.matsuoka.com/p/everyone-blamed-clawd-bots-execution</link><guid isPermaLink="false">https://hyperdev.matsuoka.com/p/everyone-blamed-clawd-bots-execution</guid><dc:creator><![CDATA[Robert Matsuoka]]></dc:creator><pubDate>Thu, 12 Mar 2026 11:32:09 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!51wj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F878e8350-5206-465d-9bf0-b600661c22ed_1024x774.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!51wj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F878e8350-5206-465d-9bf0-b600661c22ed_1024x774.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!51wj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F878e8350-5206-465d-9bf0-b600661c22ed_1024x774.png 424w, https://substackcdn.com/image/fetch/$s_!51wj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F878e8350-5206-465d-9bf0-b600661c22ed_1024x774.png 848w, https://substackcdn.com/image/fetch/$s_!51wj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F878e8350-5206-465d-9bf0-b600661c22ed_1024x774.png 1272w, https://substackcdn.com/image/fetch/$s_!51wj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F878e8350-5206-465d-9bf0-b600661c22ed_1024x774.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!51wj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F878e8350-5206-465d-9bf0-b600661c22ed_1024x774.png" width="1024" height="774" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/878e8350-5206-465d-9bf0-b600661c22ed_1024x774.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:774,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1722868,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/190672571?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb05bbe76-836e-475e-b360-793755bf1927_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!51wj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F878e8350-5206-465d-9bf0-b600661c22ed_1024x774.png 424w, https://substackcdn.com/image/fetch/$s_!51wj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F878e8350-5206-465d-9bf0-b600661c22ed_1024x774.png 848w, https://substackcdn.com/image/fetch/$s_!51wj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F878e8350-5206-465d-9bf0-b600661c22ed_1024x774.png 1272w, https://substackcdn.com/image/fetch/$s_!51wj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F878e8350-5206-465d-9bf0-b600661c22ed_1024x774.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The story everyone told about Clawd Bot missed the point entirely. Austrian developer Peter Steinberger built an open-source AI assistant that went viral &#8212; 145,000 GitHub stars, 2 million visitors in a week. Then Anthropic forced a trademark-based name change because &#8220;Clawd&#8221; was too similar to &#8220;Claude.&#8221; The community called it petty. DHH called Anthropic &#8220;customer hostile.&#8221; The irony: Clawd Bot users were actually buying more Claude subscriptions, providing free marketing to Anthropic, yet they still demanded the shutdown.</p><p>But everyone focused on the wrong drama. The trademark dispute was noise. The real problem was deeper: Clawd Bot was built because someone could, not because anyone needed it.</p><p>I tested Clawd Bot for about a week. The interface was clean, the onboarding smooth, the responses capable. But it required permissions I wouldn&#8217;t give to any tool &#8212; access to email, calendars, messaging, sensitive services. The execution had real problems. But even if those were fixed, it would still be solving the wrong problem.</p><p>Here&#8217;s where I should admit: I tried building a digital assistant, izzie, when I started experimenting with AI agents. I never got it to a point I found useful. Not because of technical limitations &#8212; because the entire concept of a universal assistant doesn&#8217;t match how work actually happens.</p><h2>TL;DR</h2><ul><li><p>Clawd Bot was successful open-source project by Peter Steinberger that Anthropic forced to rename; execution wasn&#8217;t the problem</p></li><li><p>The real question: when do you need an &#8220;assistant&#8221;? Most execs won&#8217;t trust AI scheduling; the value is intelligent data movement between services</p></li><li><p>Context switching is a symptom, not the root issue &#8212; the issue is what assistants should be doing at all</p></li><li><p>The product management sessions: Granola meeting notes, calendar checks, Slack updates, Notion sync &#8212; all from within one tool, data flowing intelligently between services</p></li><li><p>The commercial evidence: Cursor, Notion AI, Linear&#8217;s AI triage &#8212; the winners embedded AI in tools as infrastructure, not interface</p></li><li><p>trusty-izzie&#8217;s highest value isn&#8217;t the chat interface &#8212; it&#8217;s as a local MCP service exporting personal context to every other tool</p></li><li><p>The universal assistant category isn&#8217;t going to produce a winner. It&#8217;s going to dissolve.</p></li></ul><h2>What the Universal Assistant Model Gets Wrong (And It&#8217;s Not Just Execution)</h2><p>Clawd Bot had serious execution problems &#8212; it&#8217;s a security nightmare requiring broad permissions across email, calendars, messaging platforms, and sensitive services. You can&#8217;t ignore that. But even if the security issues were solved, universal assistants face a deeper structural problem: they assume people need an assistant in the traditional sense.</p><p>Walk through what even a well-executed version of the same product model looks like.</p><p>Smooth onboarding. Crystal-clear use cases. High-quality AI responses. Clean interface design. Users know exactly what to ask and how to ask it.</p><p>You still have to leave whatever you&#8217;re working on to use it. And when you do, the context you were carrying &#8212; the code you were reviewing, the initiative you were drafting, the design decision you were working through &#8212; is no longer present. You&#8217;ve moved somewhere that knows nothing about any of that.</p><p>So you explain. &#8220;I&#8217;m working on the &#8216;YYY&#8217; data ingestion initiative, and I need to check whether the points Mark raised in Tuesday&#8217;s meeting are addressed in the current design.&#8221; The assistant doesn&#8217;t know what &#8216;YYY&#8217; is. Doesn&#8217;t have Tuesday&#8217;s meeting. Doesn&#8217;t know Mark, the current design, or the organizational context that makes &#8220;addressed&#8221; mean something specific. You load all of it by hand.</p><p>In demos, this overhead is invisible. Demo tasks are self-contained by design &#8212; the context fits in a sentence or two. In practice, your working context isn&#8217;t self-contained. It&#8217;s weeks of accumulated decisions, relationships, dependencies, and constraints that live distributed across your tools. You can&#8217;t paste it into a chat window. You can&#8217;t even fully articulate it. It&#8217;s partially tacit, partially in documents, partially in the history of the tool you&#8217;re using.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!skTt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a50911a-4ef2-491e-8d2e-01234b0fec77_1024x641.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!skTt!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a50911a-4ef2-491e-8d2e-01234b0fec77_1024x641.png 424w, https://substackcdn.com/image/fetch/$s_!skTt!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a50911a-4ef2-491e-8d2e-01234b0fec77_1024x641.png 848w, https://substackcdn.com/image/fetch/$s_!skTt!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a50911a-4ef2-491e-8d2e-01234b0fec77_1024x641.png 1272w, https://substackcdn.com/image/fetch/$s_!skTt!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a50911a-4ef2-491e-8d2e-01234b0fec77_1024x641.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!skTt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a50911a-4ef2-491e-8d2e-01234b0fec77_1024x641.png" width="1024" height="641" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8a50911a-4ef2-491e-8d2e-01234b0fec77_1024x641.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:641,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1522333,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/190672571?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff92dde62-56c6-4fa2-ab9a-1147e8c99362_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!skTt!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a50911a-4ef2-491e-8d2e-01234b0fec77_1024x641.png 424w, https://substackcdn.com/image/fetch/$s_!skTt!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a50911a-4ef2-491e-8d2e-01234b0fec77_1024x641.png 848w, https://substackcdn.com/image/fetch/$s_!skTt!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a50911a-4ef2-491e-8d2e-01234b0fec77_1024x641.png 1272w, https://substackcdn.com/image/fetch/$s_!skTt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a50911a-4ef2-491e-8d2e-01234b0fec77_1024x641.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>The Session That Clarified It</h2><p>A few weeks into the new role at Duetto, I was doing product management work in a <a href="https://github.com/bobmatnyc/claude-mpm">claude-mpm</a> session &#8212; reviewing open initiatives, managing the PR queue, creating proposals. Standard operational work for a new CTO getting oriented.</p><p>I wanted to add an infrastructure initiative. Cloud Dev Sleds &#8212; dedicated cloud development machines for the engineering team. The context was in a meeting I&#8217;d had the day before. In the old workflow, this would mean: switch to Granola, find the right meeting, read the transcript, extract the relevant points, switch back, and then write the initiative with that context now loaded in my head rather than in the tool.</p><p>Instead I just asked: &#8220;Review my meeting with Mark yesterday in Granola to get context. I want to create the initiative as a feasibility, cost, and LOE assessment.&#8221;</p><p>The tool pulled the notes. I created the initiative. The product context &#8212; what other infrastructure work was in flight, what the team structure looked like, what the related architectural decisions were &#8212; never left. The Granola content landed inside that context rather than requiring me to carry it manually between tools.</p><p>Same session: needed to check whether I had a conflict for an upcoming demo. Calendar check, without opening Google Calendar.</p><p>Same session: the team needed a status update. Posted directly to the engineering Slack channel, with proper <code>&lt;@USERID&gt;</code> mentions so people actually got notified. The message reflected the same initiatives I&#8217;d been working on all session &#8212; not because I copy-pasted anything, but because the tool already knew what was in flight.</p><p>Later: set up a Notion sync &#8212; initiative statuses with links to the docs, updated automatically.</p><p>The efficiency argument is real but secondary. The more important thing is that the product context never left. The tool knew what initiatives existed, who owned what, what the architectural decisions were, which PRs were waiting on which engineers. When I pulled Granola notes, they arrived inside that context. When I posted to Slack, the message was informed by that context. A universal assistant would have required me to reconstruct and transport that context manually every time I needed to cross a tool boundary.</p><p>No universal assistant is going to have that work knowledge. Not because the AI isn&#8217;t capable. Because the knowledge lives in the tool, accumulated over months &#8212; PRDs, design decisions, initiative history, team assignments, the proposals that got approved and the ones that didn&#8217;t. You don&#8217;t recreate that in a chat window.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!AP4m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94610dbb-00aa-45e1-93a6-e96a14419f5c_1024x647.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!AP4m!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94610dbb-00aa-45e1-93a6-e96a14419f5c_1024x647.png 424w, https://substackcdn.com/image/fetch/$s_!AP4m!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94610dbb-00aa-45e1-93a6-e96a14419f5c_1024x647.png 848w, https://substackcdn.com/image/fetch/$s_!AP4m!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94610dbb-00aa-45e1-93a6-e96a14419f5c_1024x647.png 1272w, https://substackcdn.com/image/fetch/$s_!AP4m!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94610dbb-00aa-45e1-93a6-e96a14419f5c_1024x647.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!AP4m!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94610dbb-00aa-45e1-93a6-e96a14419f5c_1024x647.png" width="1024" height="647" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/94610dbb-00aa-45e1-93a6-e96a14419f5c_1024x647.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:647,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1372751,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/190672571?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F257a2fb4-ff26-4bb3-8f3e-82d9973a7a60_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!AP4m!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94610dbb-00aa-45e1-93a6-e96a14419f5c_1024x647.png 424w, https://substackcdn.com/image/fetch/$s_!AP4m!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94610dbb-00aa-45e1-93a6-e96a14419f5c_1024x647.png 848w, https://substackcdn.com/image/fetch/$s_!AP4m!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94610dbb-00aa-45e1-93a6-e96a14419f5c_1024x647.png 1272w, https://substackcdn.com/image/fetch/$s_!AP4m!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94610dbb-00aa-45e1-93a6-e96a14419f5c_1024x647.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>The Deep Context Problem</h2><p>The thing that makes domain tools irreplaceable isn&#8217;t AI capability. It&#8217;s accumulated context.</p><p>A product management tool carries months of initiative history. The CTO knowledge base carries organizational decisions, vendor relationships, strategic context that builds over time. These aren&#8217;t things you can summarize in a system prompt. They&#8217;re queryable, interconnected, grounded in real artifacts. The tool has developed something like institutional memory &#8212; and that memory is what makes AI assistance inside the tool qualitatively different from AI assistance outside it.</p><p>Universal assistants are built for breadth. Any question, any domain, any task. That breadth is the pitch and also the structural weakness. The model that&#8217;s ready for anything is primed for nothing specifically. It has no idea that &#8220;the YYY initiative&#8221; refers to a specific ingestion redesign with a particular set of constraints, a particular set of people involved, and three months of design decisions behind it.</p><p>The inversion worth stating plainly: the tools you work in every day already have more relevant context than any assistant will. The right move is surfacing AI capabilities inside those tools, not pulling people out of those tools into a separate assistant layer.</p><p>But here&#8217;s what&#8217;s happening at the executive level. I&#8217;m finding more and more technical executives using Claude Code as knowledge assistance &#8212; not because they&#8217;re universal assistants, but because the amount of data and complexity they can manage far exceeds what standard off-the-shelf tools provide. The deep context problem can&#8217;t be solved with generic solutions.</p><p>For MPM, I built specific connectors: gworkspace-mcp, slack-mpm, notion-mpm, granola-mcp (the last from Granola, the others myself because <a href="https://hyperdev.substack.com/p/mcp-was-a-brilliant-idea-but-it-needs">mcp has limitations</a>). That became as much of an &#8220;assistant&#8221; as I needed, besides izzie. No universal chat interface. Just targeted data bridges that let Claude access specific services when I&#8217;m working on something that needs their context.</p><p>The commercial evidence points the same direction. The AI tooling products with real adoption aren&#8217;t universal assistants. Cursor put AI in the editor. Notion AI put AI in the documents. Linear&#8217;s triage put AI in the issue tracker. Each works because the AI operates inside existing context. The pattern is consistent enough that it&#8217;s probably not coincidence.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!tqnj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc968550-faa1-4415-a59d-39051323dc48_1024x751.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!tqnj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc968550-faa1-4415-a59d-39051323dc48_1024x751.png 424w, https://substackcdn.com/image/fetch/$s_!tqnj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc968550-faa1-4415-a59d-39051323dc48_1024x751.png 848w, https://substackcdn.com/image/fetch/$s_!tqnj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc968550-faa1-4415-a59d-39051323dc48_1024x751.png 1272w, https://substackcdn.com/image/fetch/$s_!tqnj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc968550-faa1-4415-a59d-39051323dc48_1024x751.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!tqnj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc968550-faa1-4415-a59d-39051323dc48_1024x751.png" width="1024" height="751" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fc968550-faa1-4415-a59d-39051323dc48_1024x751.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:751,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1457550,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://hyperdev.matsuoka.com/i/190672571?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26977664-13e1-43a1-b5a5-d1bf0f4eb443_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!tqnj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc968550-faa1-4415-a59d-39051323dc48_1024x751.png 424w, https://substackcdn.com/image/fetch/$s_!tqnj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc968550-faa1-4415-a59d-39051323dc48_1024x751.png 848w, https://substackcdn.com/image/fetch/$s_!tqnj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc968550-faa1-4415-a59d-39051323dc48_1024x751.png 1272w, https://substackcdn.com/image/fetch/$s_!tqnj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc968550-faa1-4415-a59d-39051323dc48_1024x751.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Interface vs Infrastrccture</figcaption></figure></div><h2>What I Got Wrong About My Own Bot</h2><p>I (re)built trusty-izzie as a personal assistant &#8212; natural language queries over my email and calendar history, local graph database, vector embeddings, stays on my machine. It works. But &#8220;personal assistant&#8221; was the wrong frame for where the value lies.</p><p>The thing izzie has is a grounded, real-time, locally-stored representation of my professional life &#8212; people, relationships, projects, scheduling, communications history. That&#8217;s a context store. Every tool I use should have access to it without me switching to izzie to ask.</p><p>The right version of izzie isn&#8217;t the one you talk to. It&#8217;s the one that runs as a local MCP service &#8212; always on, queryable by anything that needs personal context. The product management tool asks it about scheduling. The writing environment surfaces relevant prior conversations. The coding environment knows who owns what system before I have to explain it. None of that requires me to open izzie. It requires izzie to be infrastructure rather than interface.</p><p>If you want to try izzie yourself: <a href="https://izzie.bot/">izzie.bot</a> has the details, and the full source is at <a href="https://github.com/bobmatnyc/trusty-izzie">github.com/bobmatnyc/trusty-izzie</a>. I strongly recommend building from source using an agentic coder to verify the code is safe &#8212; never trust AI tooling with your personal data without auditing it first.</p><p>Not there yet. But the frame shift changes what to build next.</p><h2>What the Architecture Looks Like</h2><p>If you&#8217;re building a personal AI tool, the question isn&#8217;t &#8220;what will users ask the assistant?&#8221; It&#8217;s &#8220;where do users have context, and how do you bring assistance there without making them leave?&#8221;</p><p>The test is simple. Does using your tool require leaving the context where the relevant information lives? If yes, you&#8217;re fighting the architecture. Users will use it occasionally, for low-friction tasks. They won&#8217;t build their workflow around it.</p><p>The tools that pass the test: Claude Code (your codebase is the context), Cursor (you stay in the editor), Notion AI (you stay in the document), Linear AI triage (you stay in the issue tracker). The tools that fail it: every standalone AI assistant that requires opening a new interface and re-explaining what you&#8217;re working on.</p><p>For domain tools with real depth &#8212; months of accumulated decisions, relationships, history &#8212; the connectors are the product. The LLM orchestration is the interface layer. The accumulated context is what no competitor can replicate by building a better general assistant. The moat isn&#8217;t the AI. It&#8217;s what the AI is operating inside.</p><p>For personal infrastructure like izzie: build the MCP service before the chat UI. The chat UI is useful and I use it. The MCP service is what makes the tool true infrastructure rather than one more thing to switch to.</p><p>The universal assistant category isn&#8217;t going to produce a winner because the category is structured wrong. The capabilities will get absorbed by the tools where the relevant context lives &#8212; because that&#8217;s where the value is, and users will figure that out even if product teams don&#8217;t.  The infrastructure driving this &#8212; entity and relationship detection, email, calendar, and task management (all built for Izzie) &#8212; will likely be delivered by the personal productivity tool providers (hello Google).</p><p>Clawd Bot wasn&#8217;t a failed product. It was wildly popular, but I suspect will have been a flash in the pan once the shininess wears off and the liabilities outweigh the usefulness. That distinction matters, because if you think it&#8217;s an execution problem, you go looking for a better universal assistant. If you understand it&#8217;s a conceptual problem &#8212; that most &#8220;assistant&#8221; work is intelligent data movement &#8212; you build infrastructure instead of interfaces.</p><div><hr></div><p><em>Bob Matsuoka is CTO of <a href="https://www.duettocloud.com/">Duetto</a> and writes about AI-powered engineering at <a href="https://hyperdev.substack.com/">HyperDev</a>.</em></p><p><strong>Related reading:</strong></p><ul><li><p><a href="https://hyperdev.matsuoka.com/p/what-does-a-pattern-master-actually">What Does A Pattern Master Do</a>? &#8212; The role of expertise in AI development</p></li><li><p><a href="https://aipowerranking.com/">AI Power Ranking</a> &#8212; Tool comparisons and benchmarks for AI practitioners</p></li><li><p><a href="https://www.linkedin.com/newsletters/ai-power-ranking-7345782916301418496/">LinkedIn Newsletter</a> &#8212; Strategic AI insights for CTOs and engineering leaders</p></li></ul>]]></content:encoded></item></channel></rss>