TL;DR
• Traditional IDE features (autocomplete, debugging, refactoring) actually help developers learn, according to a controlled UC San Diego eye-tracking study. The IDE itself isn’t the villain.
• AI code generation layered on top of IDEs is producing the first measurable evidence of cognitive atrophy: 19% slower with AI tools while believing they were 20% faster (METR study), 41% more bugs (Uplevel), 30% more static analysis warnings (Carnegie Mellon).
• The perception-reality gap spans roughly 40 percentage points. Developers, external ML experts, and managers all predicted AI would speed things up. Everyone was wrong in the same direction.
• The deeper problem isn’t deskilling experienced developers. It’s “never-skilling” an entire generation whose learning period coincides with ubiquitous AI assistance. HackerRank reports lead developer hiring grew 22% YoY while entry-level grew only 7%.
• CLI-based agentic tools and Specification-Driven Development (SDD) / Ticket-Driven Development (TkDD) workflows force a different cognitive mode: thinking before coding, specifying intent, supervising execution. The IDE’s tight feedback loop encourages the opposite: accept, move on, don’t think too hard.
I’ve been thinking about something that keeps showing up in my conversations with CTOs and engineering leads. It usually sounds like this: “My senior engineers love Claude Code. My mid-level engineers refuse to leave Cursor. And my juniors can’t function without inline AI suggestions.”
That pattern bothered me enough to spend time researching the issue. What I found changes how I think about the CLI-vs-IDE debate. Turns out it isn’t about tool preferences or terminal elitism. The question is whether the comfortable, suggestion-rich environment of modern IDEs is actively undermining the cognitive skills that matter most in an era of agentic AI.
Here’s what the evidence says: it depends on which layer you’re talking about. And the distinction matters more than most developers realize.
The IDE didn’t make you dumb (but it set the stage)
Let’s start with what might be a surprising finding. A 2024 UC San Diego study ran a between-subjects experiment with 32 programmers using an unfamiliar Gmail Java API. Participants with traditional autocomplete enabled scored significantly higher on post-study knowledge tests (mean ~38 vs ~32 points, p ≈ 0.0079) and completed tasks 8.2% faster. The learning benefit was equivalent to roughly 7.2 years of programming experience.
That’s a big deal. Traditional autocomplete works like a searchable index. It presents options. You still choose, you still think. The study found autocomplete didn’t even reduce keystrokes significantly. Its value came from serving as an efficient information-delivery mechanism, cutting documentation reading time by 16 minutes.
So, to the the surprise of this crotchety “learn-coding-before-the-IDE” developer, the IDE itself isn’t the problem. IntelliSense, syntax highlighting, integrated debugging, refactoring tools: these function as cognitive augmentation. They help you find information faster while you’re building understanding. The ACM published this research and the authors were careful to distinguish between these features and what came next.
The warning they included reads like prophecy now: “As AI-based autocomplete tools, such as Copilot, become more popular, it will be important to re-evaluate the learning implications, since these tools may reduce the cognitive involvement of programmers.”
That re-evaluation has arrived. And the results are ugly.
AI code generation crosses the line
James Prather’s research group ran 21 laboratory sessions with eye-tracking at ICER 2024, studying how novice programmers interact with generative AI tools. They found something concerning: AI tools compound existing metacognitive difficulties and introduce entirely new failure modes they labeled Interruption, Mislead, and Progression.
Students with strong metacognitive skills benefited from AI assistance. Students who already struggled were pushed further behind, finishing with what the researchers called an “illusion of competence.” They believed they understood code they couldn’t reproduce independently. The gap between strong and weak learners widened. That’s the opposite of what educational tools are supposed to do.
A Stanford security study found developers using AI assistants wrote significantly less secure code and were simultaneously more confident it was secure. Let that sink in. Worse outcomes. Higher confidence.
Research on Copilot-generated code in GitHub projects found that 48%+ of AI-generated code contains security vulnerabilities. Actual CWEs in production repositories.
The 40-point perception gap
The single most striking finding comes from METR’s 2025 randomized controlled trial. Sixteen experienced open-source developers tackled 246 real-world tasks on mature repositories (averaging 22,000+ stars and a million lines of code) using Cursor Pro with Claude 3.5/3.7 Sonnet.
The result: developers were 19% slower with AI tools. Before starting, they predicted a 24% speedup. Afterward, they still believed they’d been 20% faster. External ML experts had predicted a 38% speedup. Everyone was wrong in the same direction, and the gap between perception and reality spanned roughly 40 percentage points.
As Sean Goedecke noted in his analysis of the METR study, this wasn’t about bad tools or bad developers. It was about the cognitive overhead of evaluating, correcting, and integrating AI-generated code eating the time savings from not typing it yourself.
This pattern shows up everywhere you look. Google’s DORA 2024 report, surveying 39,000 professionals, found every 25% increase in AI adoption correlated with a 1.5% decrease in delivery throughput and a 7.2% decrease in delivery stability. Seventy-five percent of developers reported feeling more productive. The measurements said otherwise.
A Carnegie Mellon difference-in-differences study of 807 GitHub repositories adopting Cursor found a transient velocity spike (3-4x more lines added in month one) followed by persistent quality degradation: static analysis warnings up 30%, code complexity up 41%. GitClear’s analysis of 211 million changed lines found code duplication blocks increased eightfold during 2024 and refactoring declined from 25% to under 10% of changed lines.
More code. Worse code. And developers who thought they were crushing it.
Your brain on autocomplete
The cognitive science explains the mechanism clearly enough. Betsy Sparrow’s landmark 2011 Science paper on “Google Effects on Memory” demonstrated that when people expect future access to information, they have lower recall of the information itself but enhanced recall of where to find it. The internet becomes a transactive memory partner. You remember the path to knowledge, not the knowledge.
Applied to programming: developers who rely on autocomplete may remember that a method exists in the dropdown rather than what it does. For traditional autocomplete, the UC San Diego study suggests this tradeoff is acceptable or even beneficial. But AI code generation goes much further. It doesn’t just tell you what methods exist. It writes entire implementations you may never fully comprehend.
A 2025 study of 666 participants published in MDPI found a significant negative correlation (r = −0.75) between frequent AI tool usage and critical thinking abilities, mediated by cognitive offloading. Younger participants (17-25) showed higher AI dependence and lower critical thinking scores. Higher education served as a protective buffer, but the feedback loop was clear: AI usage increases cognitive offloading, which reduces critical thinking, which increases AI dependency.
Robert Bjork’s concept of “desirable difficulties” provides the theoretical basis for why removing struggle from programming does harm. When you type code from memory instead of accepting a suggestion, you engage deeper encoding through the generation effect. When you debug manually instead of clicking a fix-it button, you build stronger mental schemas. Research on productive failure shows students who struggle before receiving instruction outperform those who receive direct instruction first.
A 2026 Frontiers in Medicine paper distinguishes between “deskilling” (losing existing abilities) and what they call “never-skilling.” That second concept is the one that keeps me up at night. An entire generation of developers whose foundational learning period coincides with ubiquitous AI assistance may never develop the mental schemas that experienced developers take for granted. The FAA now recommends more manual flying to counter autopilot-induced skill decay. Endoscopists using AI for polyp detection saw detection rates drop from 28% to 22% when AI was turned off. This pattern shows up in every profession where automation handles the thinking.
DHH could feel it in his fingers
The industry voices on this are converging, even from people you wouldn’t expect to agree.
David Heinemeier Hansson, the creator of Ruby on Rails, put it viscerally in his 2025 Lex Fridman interview: “I don’t let AI drive my code. I’ve tried that, I’ve tried the Cursors and the Windsurfs, and I don’t enjoy that way of writing. I can literally feel competence draining out of my fingers.” He keeps AI output in a separate window to prevent passive consumption and insists on doing the typing himself because “you learn with your fingers.”
His specific example stuck with me. He discovered he was repeatedly asking AI for the same Bash conditional syntax. By not typing it, he wasn’t learning it. His analogy: “You’re not going to get fit by watching fitness videos. You have to do the sit-ups.”
Casey Muratori, whose “Clean Code, Horrible Performance” video demonstrated up to 15x performance penalties from IDE-friendly abstraction patterns, argues that modern development practices actively pessimize software by hiding how CPUs actually work. His Performance-Aware Programming course exists explicitly to teach what IDE abstractions conceal. Jonathan Blow’s 2019 talk “Preventing the Collapse of Civilization” frames the issue in starker terms: each generation of developers inherits diluted knowledge, and the accumulated abstraction layers represent civilizational risk.
The pragmatic middle ground comes from developers like ThePrimeagen, who built “99,” a Neovim AI plugin explicitly designed for “people without skill issues,” deliberately restricting AI to specific, developer-controlled areas rather than giving it full autonomy. His philosophy: AI assists, it doesn’t replace. Zed Shaw, author of the “Learn Code the Hard Way” series, advises beginners to avoid IDEs entirely during initial learning: “If you take the easy tool-based route, then you’re dependent on the tool you use.”
And HN being HN, one commenter nailed the practical consequence: “If I had a Bitcoin for every IDE superstar programmer who couldn’t navigate his way around the build system, I wouldn’t have to write software for a living.”
The survey data tells the generational story
The Stack Overflow 2025 Developer Survey (49,000+ respondents) quantifies the split. Early-career developers use AI tools daily at 55.5% compared to 47.3% for developers with 10+ years of experience. More significantly, 46% of developers now actively distrust AI output, exceeding the 33% who trust it. Only 3% report high trust. Positive sentiment toward AI tools has declined from over 70% in 2023 to 60% in 2025. Experienced developers are the most skeptical: lowest “highly trust” rate (2.6%), highest “highly distrust” rate (20%).
HackerRank’s 2025 report, drawing on 26 million developers and 3 million assessments, reveals the hiring consequence: lead developer hiring grew 22% year-over-year while entry-level hiring grew only 7%. The report explicitly cites employer concerns about whether early-career developers can code without heavy AI assistance. Stanford data shows employment among software developers aged 22-25 fell nearly 20% between 2022 and 2025.
Meanwhile, JetBrains’ 2025 State of Developer Ecosystem found 68% of developers expect AI proficiency to become a job requirement, and GitHub’s Octoverse 2025 reports that nearly 80% of new developers use Copilot within their first week. AI-assisted coding is the default learning environment for an entire generation, and we have precisely zero longitudinal studies on what that does to skill development.
Remember “vibe coding”? Andrej Karpathy coined it in February 2025: “fully give in to the vibes, embrace exponentials, and forget that the code even exists.” The Stack Overflow survey found 72% of developers say vibe coding plays no role in their professional work. Karpathy himself retreated, admitting his “Nanochat” project was “basically entirely hand-written” because AI agents “just didn’t work well enough.”
The abstraction argument is older than you think (and this time it’s different)
Joel Spolsky’s 2002 “Law of Leaky Abstractions” remains the foundational text: “All non-trivial abstractions, to some degree, are leaky.” His most underappreciated line: “Abstractions save us time working, but they don’t save us time learning.”
Every abstraction boundary in computing history has produced this same anxiety. When FORTRAN arrived in 1957, assembly programmers viewed it as a crutch. Ed Post’s satirical 1983 essay “Real Programmers Don’t Use Pascal” captured the gatekeeping pattern so precisely it reads as prophecy of today’s Vim-vs-IDE debates. Each transition involved genuine loss of low-level understanding, genuine productivity gains, gatekeeping rhetoric from incumbents, and eventual normalization.
But the calculator analogy that defenders of AI coding tools love to invoke breaks down on closer inspection. A 2003 meta-analysis of 54 studies found calculator use did not hinder mathematical skill development. However, as cognitive scientist Amy Jo Ko argues, LLMs differ because they replace entire cognitive processes, not just computation. Calculators don’t hallucinate. They don’t generate plausible-but-wrong solutions. The better analogy would be handheld calculators that routinely display 2+2=5 with complete confidence.
The strongest counterargument is that every prior generation of abstraction-skeptics was ultimately wrong. The FORTRAN skeptics lost. The structured programming skeptics lost. IDE skeptics, broadly, lost. JetBrains’ learning curve survey found learners who use IDEs encounter fewer obstacles, get stuck less often, and handle version control more easily. But the question isn’t whether IDEs helped. It’s whether AI code generation is another step on the same escalator or a qualitatively different kind of abstraction that crosses from augmenting cognition to replacing it.
Based on what I’ve seen in the research, I think it’s the latter. And I think the IDE is where developers are most likely to experience this crossing without noticing it.
Why this matters for CLI and specification-driven workflows
Here’s where this connects to the broader argument I’ve been building about CLI-based agentic tools and the shift toward Specification-Driven Development (SDD).
IDE-integrated AI tools optimize for the wrong cognitive mode. They sit inside your editor, constantly suggesting, constantly completing, making it effortless to accept code you haven’t thought through. The tight feedback loop that makes IDEs feel productive is the same loop that enables the accept-and-move-on behavior the research keeps flagging. The entire UX is designed to keep you writing code faster, not thinking about code more carefully.
CLI-based agentic tools and SDD/TkDD workflows force a different cognitive mode entirely. When you’re working with Claude Code from the terminal, you can’t just tab-accept a suggestion mid-line. You have to think about what you want before you ask for it. You write specifications. You decompose tasks into tickets. You define acceptance criteria. Then you supervise execution and review results.
This is where I see the distinction between SDD and what I call Ticket-Driven Development (TkDD). SDD gives you the strategic framework: specs are the primary artifact, not code. TkDD gives you the workflow mechanics: every unit of work lives in a ticket that captures not just the requirement but the evolution of thinking during human-AI collaboration. The ticket becomes the forcing function that makes you articulate intent before the agent writes a single line.
I use mcp-ticketer for this daily. Before an agent touches code, I’ve written a ticket that specifies what I want, why I want it, what the constraints are, and how I’ll know it’s done. That process of articulation is exactly the “desirable difficulty” that Bjork’s research says builds deeper understanding. It’s the sit-ups DHH was talking about. The IDE workflow lets you skip the sit-ups. TkDD makes you do them.
Think about Boris Cherny’s workflow. The Anthropic staff engineer who created Claude Code uses Plan mode to iterate on architecture until satisfied, then switches to auto-accept mode where Claude “can usually 1-shot it.” He runs 10-15 concurrent sessions. That’s not editing code in an IDE. That’s specifying, supervising, and reviewing. The cognitive work happens before the code exists, not while it’s being suggested to you inline.
Multi-agent orchestration amplifies this effect. When you’re running five Claude Code instances across git worktrees through tmux, your job is architectural thinking and quality review. There’s no autocomplete to accept. There’s no inline suggestion to wave through. You’re operating at the specification and supervision layer, which is precisely the cognitive level the research says matters most for building and maintaining deep understanding.
The line between tool and crutch
The evidence doesn’t support the claim that IDEs themselves have made software engineers less capable. Traditional IDE features function as cognitive augmentation that helps developers access information faster while building understanding.
What the evidence does support, with increasing conviction, is that AI code generation represents a qualitative break from prior tooling. The convergence of findings (19% slowdown masked by perceived speedup, 41% increase in bugs, 30% rise in static analysis warnings, a −0.75 correlation between AI usage and critical thinking, and the widening gap between strong and weak learners) points to a tool category that degrades understanding while creating an illusion of competence.
The danger isn’t that experienced developers will forget how to code. It’s that the next generation won’t learn how to think about code at a level deeper than “accept suggestion.” IDEs don’t cause this problem, but they’re the delivery mechanism. The inline, always-on, friction-free suggestion environment of modern IDE-based AI is precisely optimized to bypass the cognitive processes that build expertise.
CLI-based agentic workflows and the SDD/TkDD paradigm aren’t just different tools. They’re different cognitive modes. They require you to think before you prompt, specify before you execute, and review with genuine comprehension rather than a quick scan of inline diffs.
That’s not terminal elitism. That’s responding to what the research actually shows: the developers who’ll thrive in the next phase are the ones who can think at the level of specifications, architecture, and agent supervision. Not the ones who got really fast at pressing Tab.
I’m Bob Matsuoka, writing about agentic coding and AI-powered development at HyperDev.
Related reading:
The Other Shoe Will Drop — The economics of AI-assisted development and specification-driven workflows
What’s In My Toolkit: August 2025 — My daily CLI-based agentic workflow
TkDD: Ticket-Driven Development — Why tickets are the forcing function for AI collaboration
The Age of the CLI, Part 2 — From nanny coding to fire-and-check-in





