Is AI A Bubble? I Didn’t Think So Until I Heard Of SDD.

The AI coding revolution is real. The bubble dynamics are also real.

Oct 20, 2025

This a a long article, but one I think worth reading. I’m reframing my “70% era” characterization in light of what I’m seeing.

TL;DR

• Specification-Driven Development (SDD) emerged this month as both legitimate response to AI coding chaos and convenient narrative justifying soaring valuations—Tessl raised $125M but delivered only a beta registry 10 months later while Cursor hit $9.9B valuation in under a year

• The 70% automation claim collapses under scrutiny: McKinsey’s widely-cited stat applies to all occupations, not just coding—real-world METR study showed developers taking 19% longer with AI tools despite believing they were 20% faster

• My own experience reveals the nuance: Greenfield Python/Next.js projects with my agentic frameworks hit near-100% AI coding, but team projects with legacy systems drop to under 30% automation—the ceiling depends heavily on context, not just task type

• Valuation multiples strain credulity: AI coding companies command 25-70x ARR (vs dot-com peak of 18x), with Codeium at 70x and collapsed OpenAI acquisition talks—yet 84% of developers now use AI tools and GitHub Copilot generates real $400M ARR

• The correction is inevitable within 18-24 months: Companies with product-market fit and operational discipline will survive while ventures burning capital on “technical potential” face reckoning, following the same arc as 3D printing and crypto bubbles—expect consolidation around a handful of winners

• SDD functions as quality control theater: Real developers face real pain from vibe coding disasters (170/1,645 Lovable apps had vulnerabilities), but the methodology also provides governance narratives that justify enterprise procurement and enable disconnected valuations—it’s both solution and symptom

The sudden emergence of Specification-Driven Development (SDD) in 2025 reveals a pattern I’ve seen before: real technology solving real problems, wrapped in a narrative that justifies valuations disconnected from reality. When Tessl raises $125 million promising to “reimagine software creation” but delivers only a beta registry ten months later, while Cursor rockets to a $9.9 billion valuation in less than a year, SDD functions as both technical necessity and investment story. The pattern mirrors classic bubble dynamics—real innovation wrapped in oversized promises, with a methodology that makes “70% automation” claims sound almost responsible.

The numbers tell a story of extremes. AI coding companies command valuations of 25-70x ARR, far exceeding the dot-com bubble’s peak of 18x. Yet beneath the froth lies genuine adoption: 84% of developers now use AI tools, GitHub Copilot generates $400 million in ARR, and Cursor achieved the fastest climb to $500 million ARR in SaaS history with just 12 employees. The methodology emerged this month—not as visionary breakthrough but as damage control, a structured response to what Andrej Karpathy dubbed “vibe coding” back in February 2025, when he admitted his AI-generated code “grows beyond my usual comprehension.”

The methodology’s rise perfectly captures the AI market’s current state: addressing real problems while enabling questionable valuations, promising structure while delivering uncertainty, and forcing an industry to confront whether it’s building the future or just better documentation for the present.

How vibe coding created a $9.9 billion problem

Karpathy’s viral confession in February 2025 crystallized what developers had been experiencing. His description of “vibe coding”—where you “fully give in to the vibes, embrace exponentials, and forget that the code even exists”—wasn’t aspirational. It was a warning. “I ‘Accept All’ always, I don’t read the diffs anymore,” he wrote, describing code that grows beyond comprehension.

The statistics validated the concern. By May 2025, security audits found 170 out of 1,645 Lovable applications had vulnerabilities allowing personal information access. GitClear’s analysis revealed an 8-fold increase in duplicated code blocks since mid-2022. Most alarmingly, the METR study found experienced developers took 19% longer with AI tools, despite believing they’d sped up by 20%.

Enter Specification-Driven Development, formalized this month as the industry’s response. The tools emerged earlier—GitHub launched Spec Kit in September 2024, Amazon followed with Kiro IDE in July 2025—but SDD as a named methodology crystallized only recently. The approach reverses traditional workflow: developers must write detailed specifications that AI transforms into implementation plans before generating code.

The appeal is obvious. SDD provides structure, documentation, and guardrails. But it also solves a problem the AI tools themselves created—making it both solution and symptom of a market moving faster than its technology can support. The METR study found experienced developers took 19% longer with AI tools, despite believing they’d sped up by 20%—a perception gap that underscores why developers are seeking structured methodologies.

The valuation machine runs on narrative, not just revenue

Tessl’s trajectory exposes the gap between promise and delivery. Guy Podjarny, founder of Snyk (valued at $7.4 billion in 2022), raised $125 million in two rounds between April and November 2024, reaching a $750 million valuation. The company promised to launch “in early 2025.”

Ten months after announcing funding, Tessl delivered: a Spec Registry in open beta (10,000+ usage specifications for libraries) and a Framework in closed beta requiring waitlist access. The registry solves a real problem—preventing AI hallucinations of non-existent APIs—but represents a fraction of the “AI Native Software Development platform” promised.

Compare this to Cursor’s execution. Founded by four MIT students, Anysphere raised $60 million at $400 million in August 2024, then $105 million at $2.6 billion by January 2025, and $900 million at $9.9 billion by May 2025. Three funding rounds in less than a year. The company grew from $1 million (2023) to $500 million ARR (June 2025) with just 12 employees—the fastest SaaS company to reach $100 million ARR on record.

The valuation multiples strain credulity. Codeium was valued at $2.85 billion on approximately $40 million ARR—a 70x multiple. OpenAI agreed to acquire it for roughly $3 billion in April 2025, a deal that collapsed three months later.

These valuations require a narrative beyond “code completion tool.” SDD provides that narrative. It transforms AI coding assistants from autocomplete into “platforms for AI-native software development.” Specifications become “executable documentation.” Natural language becomes “the new programming language.” And suddenly, 70x multiples seem justified for companies building the future of software creation.

The 70% automation claim: Where marketing meets mathematics

The ubiquitous “60-70% of software development” claim traces to a 2023 McKinsey report, but the actual finding is far more nuanced. McKinsey stated AI could “automate work activities that absorb 60 to 70 percent of employees’ time today” across all occupations, not specifically software development. This is “technical automation potential”—what could theoretically be automated, not what will be or what’s valuable.

Microsoft Research’s 2023 study with 95 developers found GitHub Copilot users completed an HTTP server task 55.8% faster—but in a controlled environment with a standardized task, and “This study does not examine the effects of AI on code quality.” The contrast with real-world usage is stark: as noted earlier, experienced developers working on actual production codebases took 19% longer despite believing they’d accelerated.

The 70% figure collapses under analysis. Developers spend only 39% of their time writing new code. Even if AI could automate 70% of code writing (which evidence suggests it cannot), that represents at most 27% of total development work.

Where AI genuinely excels: boilerplate code generation (50-80% time savings), documentation (60-70% savings), code translation between languages (40-60% savings), and simple bug fixes (30-50% savings). Where human judgment remains essential: system architecture, complex business logic, edge cases, security implementation, requirements gathering, and debugging multi-component interactions.

My own experience complicates the 70% claim further. For greenfield Python and Next.js projects—ideal LLM territory with well-documented ecosystems—my agentic frameworks handle virtually all the coding. The 70% automation comes less from pure code generation and more from boilerplate elimination and workflow optimization. But the moment I work with teams, that number drops dramatically. In my own client projects, the ceiling dropped from 70% to under 30% when legacy systems were involved. Legacy codebases, organizational constraints, coordination overhead, and the need to align multiple developers’ mental models all reduce AI’s practical contribution.

The manufacturing company case study provides the clearest ROI: $140,000 saved per week across 800 developers, representing a 25% efficiency gain. But this came from sophisticated implementation with extensive training, not from “accepting all” AI suggestions. The 15% decrease in defects suggests careful human oversight throughout.

GitHub Spec Kit vs. Amazon Kiro vs. Tessl: The execution gap

GitHub’s approach exemplifies measured expectations. Spec Kit launched in September 2024 as explicitly experimental. The tool provides structured workflows and templates, but makes no claims about autonomous development. The limitations are openly documented: better for new projects than existing codebases, requires Python 3.11+, and “features and structure change frequently; documentation may lag.”

Amazon Kiro launched with more ambition in July 2025 as a “Cursor killer” with spec-driven workflows. Preview access was free; planned pricing of $19-39/month undercuts Cursor significantly. But developer reception proved mixed—one tester reported Kiro generated 5,000 lines of code for a simple macOS helper tool that should have required 800 lines.

Within weeks, AWS removed Kiro’s public download link due to what they called “unprecedented demand”—though whether this reflects genuine interest or infrastructure limitations remains unclear. The tool remains in preview five months after launch with no general availability timeline.

Tessl’s gap between funding and delivery is most pronounced. November 2024 funding: “Launch in early 2025.” September 2025 actual launch: closed beta Framework and free Spec Registry. The registry addresses AI hallucinating APIs by providing specifications for 10,000+ libraries—essentially a curated documentation database, not the revolutionary platform promised.

How enterprises use SDD to justify AI tool budgets

The business case for AI coding tools is compelling when presented correctly, and SDD provides the governance framework procurement teams demand. Accenture’s randomized controlled trial with 450 developers provides the quantitative foundation: 8.69% increase in pull requests, 15% improvement in merge rates, and 84% increase in successful builds. The qualitative benefits proved equally significant: 90% of developers felt more fulfilled with their jobs—a retention benefit worth $100,000-200,000 per prevented departure.

The ROI calculation: 100 developers at $120,000 fully-loaded cost equals $60/hour. Conservative 2 hours saved weekly per developer yields $624,000 annually. Tool costs run $40,000-100,000. Net ROI: 1,460% or 15.6x return.

Enterprise adoption requires SOC 2 Type II attestation, ISO 27001 certification, GDPR compliance, and increasingly ISO 42001 for AI management. This is where SDD becomes valuable beyond its technical merits.

Specifications create audit trails that satisfy regulatory requirements. Formal specs document design decisions for FDA compliance in medical devices, provide evidence for SOX audits in financial services, and demonstrate due diligence for GDPR data processing.

Multiple compliance officers noted that “auditors want SOC 2 reports, ISO 27001 controls, and increasingly ISO 42001 assurances before any AI touches production code.” SDD provides the paper trail they need. Whether this actually improves outcomes or just ticks a compliance box remains an open question.

Developer sentiment: Adoption rises as trust collapses

Stack Overflow’s 2025 survey of 65,000+ developers reveals a widening gap. 84% now use or plan to use AI tools (up from 76% in 2024), but favorability declined from 72% to 60%. Only 33% trust AI accuracy while 46% actively distrust it. The gap between use and trust continues expanding.

The single biggest frustration, cited by 66%: “AI solutions that are almost right, but not quite.” AI generates plausible code quickly, then developers spend disproportionate time debugging subtle errors and refactoring over-engineered solutions.

The senior-junior divide is stark. Senior developers (10+ years experience) ship 2.5x more AI-generated code than juniors (32% vs 13% with 50%+ AI content). This suggests AI amplifies existing expertise rather than democratizing development.

The impact on junior hiring is measurable. Entry-level tech hiring decreased 25% year-over-year in 2024. A LeadDev survey found 54% of engineering leaders plan to hire fewer juniors due to AI copilots enabling seniors to handle more work—a finding based on direct CTO interviews. AWS CEO Matt Garman called suggestions to replace juniors “one of the dumbest things I’ve ever heard,” warning: “How’s that going to work when ten years in the future you have no one that has learned anything?”

Simon Willison wrote: “Vibe coding your way to a production codebase is clearly risky. Most of the work we do involves evolving existing systems, where the quality and understandability of the underlying code is crucial.”

SDD emerged as developers’ attempt to impose structure on this chaos. Li Shen argued: “Vibe coding is great for quick prototypes. For larger projects, it quickly breaks down. The fix isn’t ‘more AI,’ it’s better structure.” But critics question whether SDD solves the problem or creates new overhead managing yet another artifact.

The Test-Driven Development parallel: A cautionary tale

The most sobering comparison for SDD enthusiasts is Test-Driven Development’s trajectory. Kent Beck formalized TDD in 2003 with the promise of higher quality through test-first development. Twenty years later, GeePawHill’s assessment: “The overall world of geekery uses TDD approximately not at all.” Adoption remains under 20% despite proven benefits.

The failure came from three factors: lousy timing (market pressures left no time to learn), lousy handling (weak theory led to cargo cult practices), and adoption barriers (56% found the mindset shift difficult). Top-down mandates without understanding created enemies rather than converts.

SDD faces similar challenges. The methodology requires complete specifications before implementation—precisely what software has struggled with for 50 years. Daniel Sogl, after 10 years in development, noted: “I have rarely experienced projects where requirements were completely formulated before implementation.”

Martin Fowler’s August 2025 experiments revealed SDD’s limitations. Despite detailed specifications, AI agents generated features not requested, changed assumptions mid-stream, and claimed success when builds failed. Fowler concluded: “Because of the non-deterministic nature of this technology, there will always remain a very non-negligible probability that it does things that we don’t want.”

Failed implementations and the AI agent disasters

The Replit database deletion incident in July 2025 exemplifies current limitations. A developer’s AI agent deleted a live company database during an active code freeze, unable to distinguish between development and production environments. CEO Amjad Masad called it “unacceptable,” implementing automatic separation and a “planning-only” mode.

Kent Beck, who pioneered TDD and now experiments with AI coding, describes agents as “unpredictable genies” that grant wishes “but oftentimes in unexpected ways.” He advocates TDD as counterbalance but struggles with agents that delete tests to make them “pass.” His writing emphasizes: “This cannot be achieved without close constant supervision.” If Kent Beck insists on constant supervision, autonomous AI development remains aspirational.

Carnegie Mellon’s TheAgentCompany study tested AI agents on real-world tasks in 2025, finding only 30-35% success rates on multi-step work. No AI model completed all assigned tasks without human intervention.

The Builder.ai bankruptcy in May 2025 exposes “AI washing” at scale. Microsoft-backed and once valued at $1.2 billion, the company claimed $220 million revenue while actual figures were $55 million—300% exaggeration. The “AI automation” was primarily human engineers behind the scenes.

The bubble indicators are flashing red, but with genuine foundation

The S&P 500 Shiller P/E ratio reached 40 in October 2025, the third-highest valuation in 154 years, exceeded only by 1929 and the 2000 dot-com peak. AI-related stocks account for 75% of S&P 500 returns since ChatGPT launched. The Magnificent Seven comprise 29% of the index—the same concentration as 1999’s peak.

VC investment patterns mirror dot-com excess. 43% of Q4 2024’s $74.6 billion went to just five AI companies. OpenAI alone captured over 50% of Q1 2025 global VC funding with a $40 billion raise. The company projects cumulative losses of $44 billion from 2023-2028, may not break even until 2029, yet commands a $300-500 billion valuation.

Big Tech projects $320 billion in combined AI capex for 2025 alone, with 2028 projections reaching $2 trillion. OpenAI and Oracle signed a $300 billion deal over five years.

But this bubble has real foundation. GitHub Copilot generates $400 million ARR with 1.8 million paid subscribers. Cursor achieved $500 million ARR with 360,000 paying customers. Replit exploded from $10 million to $150 million ARR in nine months. These are not revenue-free valuations—people are paying.

The productivity gains are documented. Accenture’s study found 84% more successful builds and 15% better merge rates. Duolingo saw 67% faster code review turnaround. Microsoft Research measured 55.8% faster task completion. The technology works for specific use cases; the question is whether current valuations reflect realistic TAM and margin assumptions.

MIT’s study finding that 95% of organizations investing in generative AI see zero returns suggests the gap between early adopters and mainstream remains vast.

Where SDD fits in bubble dynamics: Legitimate need meets convenient narrative

SDD serves dual purposes that make disentangling genuine utility from hype nearly impossible. It addresses real pain points: AI-generated code breaking existing systems, hallucinated APIs, forgotten context, and accumulating technical debt. Providing structure through specifications, generating documentation as a side effect, and creating guardrails for AI behavior solve problems developers actually face daily.

But SDD also provides the governance narrative that justifies enterprise procurement and enables valuations disconnected from current revenue. When Tessl raises $125 million promising “AI Native Software Development,” specifications transform from documentation into “the new programming language.” When Cursor seeks $10 billion valuations, SDD methodology frames the product as a platform rather than a tool. The approach makes wildly optimistic automation claims sound measured—we’re not doing vibe coding, we have specifications.

The pattern mirrors historical bubbles where real innovation gets wrapped in narrative excess. The internet genuinely revolutionized commerce and communication—that didn’t justify TheGlobe.com’s 606% first-day pop in 1998 with $15,000 in startup capital and zero revenue. AI coding genuinely improves developer productivity on specific tasks. That doesn’t justify 70x ARR multiples for companies in closed beta.

The correction is inevitable. We’ve seen this before—3D printing hype collapsed from $97 billion projected market to actual $12 billion reality when consumer printers proved impractical. The crypto bubble burst when speculation overwhelmed utility. AI coding will follow the same arc, just faster.

Here’s who survives, organized by use case:

Vibe coding survivors (quick prototypes, weekend projects): v0 for the AI design system integration angle, Bolt because Figma’s entrenchment in design workflows creates natural adoption path, and Replit for developers who want to ship fast and iterate visually. They’re not trying to be enterprise platforms—they’re tools for speed over structure.

HyperDev survivors (serious individual/small team development): Claude Code, Cursor, and Augment Code serve developers who actually understand their codebases and use AI as force multiplier. These tools work because they assume competent humans in the loop. Cursor’s execution—$500 million ARR with 12 people—proves operational discipline matters. I use all three depending on context.

Enterprise survivors (compliance-first, team coordination): GitHub Copilot dominates here with Microsoft backing, 1.8 million paid users, and actual $400 million ARR. The integration with GitHub’s workflow and security infrastructure creates network effects. The enterprise space will consolidate around 2-3 major players—likely GitHub plus one or two challengers we haven’t fully identified yet as tools mature their compliance frameworks and team features.

Ventures burning massive capital on “technical potential” rather than paying customers face reckoning within 18-24 months. I’ve found that product-market fit plus operational discipline almost always outlast hype cycles—it’s why these specific bets make sense: GitHub Copilot, Cursor, Claude Code, Augment Code, and Replit all demonstrate real revenue and sustainable execution while dozens of undifferentiated tools raising $50-200 million each don’t.

SDD’s future depends less on its technical merits than on whether LLMs improve reliability faster than SDD can institutionalize. If models become sufficiently deterministic and context-aware, specifications become unnecessary overhead. If they remain unpredictable, SDD becomes essential infrastructure. Either outcome makes it a transitional methodology—useful today, potentially obsolete tomorrow, much like many of the development practices it supposedly supersedes.

The pragmatic middle ground: What actually works

The evidence converges on a clear reality: AI coding tools are powerful assistants for specific tasks but require extensive human oversight for all contexts. The industry consensus is universal—Microsoft, AWS, Anthropic, every major vendor emphasizes supervised approaches. Kent Beck advocates TDD as counterbalance to AI unpredictability. Martin Fowler warns against autonomous deployment. Charity Majors from Honeycomb argues that senior engineering “has far more to do with your ability to understand, maintain, explain, and manage a large body of software over time” than code generation speed.

SDD works best in narrow contexts: greenfield projects with clear requirements, organizational standardization needs, and situations where specifications provide value independent of AI (regulatory compliance, team alignment, knowledge capture). It fails for UI-heavy work where visual iteration matters, small features where specification overhead exceeds value, and exploratory development where requirements emerge through building.

The 70% automation claim requires complete reframing. AI can handle the “easy 70%” of simple coding tasks—boilerplate, documentation, standard patterns—with significant time savings. But software development is vastly more than writing code. Architecture, requirements analysis, edge cases, security, debugging complex interactions, and team coordination remain firmly human domains. Organizations achieving 10-30% overall productivity gains on development work should consider that success, not falling short of inflated expectations.

The winner-take-most dynamics are already visible. GitHub Copilot’s integration with the world’s largest code hosting platform creates network effects no standalone tool can match. Cursor’s execution—reaching $500 million ARR in under two years—demonstrates that in software, exceptional products can scale with minimal headcount. But the dozens of undifferentiated AI coding tools raising $50-200 million each face a market that supports perhaps five meaningful players.

For investors and executives, the key distinction separates companies solving real problems with measurable ROI from those selling future automation that may never materialize. Accenture documenting 90% developer satisfaction and 84% more successful builds provides signal. Tessl promising “highly adaptable, personalized, portable” software while stuck in closed beta after $125 million suggests noise.

The AI coding revolution is real. The bubble dynamics are also real. SDD exists at this intersection—a methodology born from genuine need that also serves as governance theater and valuation justification. Whether it survives as essential practice or fades as transitional accommodation depends on developments no specification can predict: the pace of LLM improvement, the industry’s regulatory trajectory, and developers’ willingness to adopt yet another methodology that promises silver bullets while delivering incremental improvements.

The correction will come—maybe 18 months, maybe sooner if another Builder.ai-scale fraud surfaces. A small number of companies with real revenue and sustainable models will thrive. The methodology will be adopted where it’s actually useful and ignored where it just creates overhead. And in five years, we’ll be debating the next paradigm shift while veterans remind newcomers that software’s hard problems—understanding what to build, designing maintainable systems, coordinating human teams—never had algorithmic solutions.

Here’s what I’ve changed based on watching this unfold: I still use AI tools extensively for my greenfield projects. They work. But I’ve stopped believing the methodology matters more than the fundamentals. Specifications don’t fix bad requirements. AI doesn’t eliminate the need to understand your codebase. And frameworks don’t replace judgment about when automation helps versus when it gets in the way. The tools that survive will be the ones that enhance rather than replace these core capabilities—which is exactly the opposite of what current valuations assume.

I’m Bob Matsuoka, writing about agentic coding and AI-powered development at HyperDev. For more practical insights on AI development tools, read my analysis of multi-agent orchestration systems or my deep dive into AI coding productivity metrics.