Skip to content

Cognitive Atrophy: The Crisis of AI-Generated Code

Cognitive Atrophy: The Crisis of AI-Generated Code

The Productivity Paradox and the Illusion of Speed

The metric that matters is no longer “lines produced per hour” and has shifted to the time it takes for a competent human to trust what was produced. This displacement may seem subtle, but it changes the entire economics of engineering. When an assistant generates in minutes what used to take an afternoon, the local gain is real; the problem is that the factory doesn’t end at the cutting machine—it ends at quality inspection, integration, and operations. The DORA report shows this exact mismatch: adopting AI capabilities increased individual effectiveness by 0.17x, but it also increased app delivery instability by 0.1x (Google DORA AI Capabilities Model Report, 2025). In executive terms, it’s a trade of a visible bottleneck for an invisible liability. Developers feel speed at their fingertips; organizations inherit uncertainty in the pipeline.

This paradox appears when writing becomes cheap, but understanding remains expensive. A microservice can be assembled in five minutes with automated assistance, but the full cycle often still takes five days because someone must verify business invariants, transactional side effects, test coverage, and architectural adherence. In this context, code review stops being an administrative step and starts requiring forensic-level expertise. Adam Tornhill argues in Your Code as a Crime Scene that defects and bottlenecks rarely originate only in static code; they emerge in patterns of change, churn hotspots, and areas where nobody fully understands what was altered. Mass-generated code amplifies exactly these gray zones: more apparent volume, less human context embedded. The practical result is easy to observe in mature teams: larger PRs, weaker justifications, and senior reviewers acting as technical insurers for decisions that were not made with full understanding.

There is also a structural side effect: when production accelerates without equivalent discipline of refactoring, the system begins to resemble a logistics network full of locally effective patches that are globally dysfunctional. GitClear’s longitudinal analysis of 211 million lines between 2020 and 2025 found an 8x increase in code duplication, while “moved” code (an important signal of refactoring and conscious reuse) fell from 25% in 2021 to less than 10% in 2025 (GitClear Research, 2025). This collides with a principle laid out by David Thomas and Andrew Hunt in The Pragmatic Programmer: duplication isn’t just aesthetic waste; it multiplies future cost, inconsistency, and operational risk. The tool shortens the path to the first working version, but often lengthens the entire road to maintenance.

Nicholas Carr describes in The Glass Cage a recurring mechanization pattern: the more we delegate central cognitive tasks to the system, the less we train the mental circuits needed to supervise it with rigor. In engineering this shows up when the author can trigger the generator but cannot defend the produced choices. The strategic mistake is confusing syntactic throughput with an organization’s capacity to deliver reliably. Companies that measure only time spent opening PRs are optimizing corporate behavior equivalent to stacking goods without expanding dock space, check-in capacity, or transportation. Sooner or later, inventory piles up; here that stuck inventory is technical debt disguised as seemingly productive output.

For technical leadership, the implication is straightforward: KPIs must move from enthusiasm about generation toward metrics tied to comprehension and stability. If individual effectiveness rises by 0.17x while instability grows by 0.1x (Google DORA AI systems Capabilities Model Report, 2025), any narrative based only on raw speed is incomplete by definition. The correct benchmark should include average time-to-review for PRs assisted by AI, post-merge reversal rate (rollback/revert rate), introduced duplication density, and senior effort consumed to validate contributions. This repositioning also changes what profile is valued inside teams: fewer people who are fast prompt operators; more engineers able to explain system causality under pressure.

Cognitive Atrophy and the “Junior-Year Wall” Phenomenon

If the first risk of automation is producing more than the organization can review, the second is quieter: training professionals who execute without consolidating a mental model. This is where artificial intelligence deskilling comes in: it’s not just tool dependence; it’s losing cognitive muscle in tasks that differentiate an executor from an engineer (decomposing problems, forming hypotheses, tracing causality, and debugging systems under ambiguity). Nicholas Carr describes in The Glass Cage a mechanism known in aviation and manufacturing: when machines take over difficult work for long enough, operators preserve a superficial sense of control—but lose the expertise needed to intervene when the flow leaves the script.

In software this shows up strongly in debugging. While everything seems coherent, assistants accelerate; when a rare condition appears (race conditions), intermittent regressions occur, or failures arise between layers—many techies realize they outsourced not only syntax but actual reasoning capacity about the framework. The study Human–AI Collaboration in Programming Education documents this pattern in academic settings with direct implications for professional training: published by MDPI as an observed reduction in debugging proficiency among students who used LLM-based assistants beyond what was necessary; also there is recurrent sticking/lock-in behavior in advanced courses (MDPI, 2026). This picture helps explain the “junior-year wall”: first two years may look productive because tools mask gaps; then third year brings more demanding compilers, distributed systems, concurrent architecture—or integration projects without continuous support.

The critical difference lies between using automation as a calculator versus using it as permanent prosthesis. A calculator saves time after arithmetic has been mastered; a prosthesis replaces effort before competence exists. In programming, debugging functions as structural training: it’s uncomfortable and not very glamorous even so—it builds perception about broken invariants where states leak between functions and why elegant abstractions collapse under real load. Those who debug manually learn where internal mechanisms fail; those who iterate prompts tend to develop a superficial relationship with technical causality.

This cognitive impoverishment also helps explain voluminous contributions whose practical authorship nobody can technically sustain over time. If junior devs haven’t trained logical decomposition or disciplined debugging early on, they arrive at corporate environments capable of producing extensive artifacts yet unable to defend them under adversarial review. The predictable outcome is shifting senior reviewers into constant forensic auditing: there’s less room for active mentorship and more time spent trying to understand missing intent.

For technical leaders and educators alike—prohibiting assistants rarely solves things; redesigning criteria by which competence is recognized does make a difference. Producing functional output is no longer sufficient evidence of complete technical mastery. The real test becomes different: starting a solution without a crutch? Debugging without immediately resorting to automatic generation? Justifying trade-offs between local simplicity and systemic complexity? The “junior-year wall” matters because it anticipates a larger organizational difficulty: companies may be hiring apparent speed while losing future capacity for architecture.

The Explosion of “AI Slop” and New Technical Debt

“AI slop” isn’t simply bad code—bad code has always existed. What’s new is industrial scale: software plausible on the surface (executable) yet architecturally empty (architecturally hollow). It passes superficial tests like “it works on my machine,” but fails the decisive criterion for living systems: being understandable over time—evolvable—and economically sustainable.

The most compelling data comes from GitClear’s longitudinal analysis of 211 million lines between 2020 and 2025 (GitClear Research): an 8x increase in duplication via copy/paste alongside a sharp drop in “moved” code associated with consolidation/refactoring done consciously (from 25% in 2021 to less than 10% in 2025) (GitClear Research, 2025). This combination deserves careful reading because it signals progressive loss of organizational capacity to consolidate abstractions.

Adam Tornhill offers a useful lens in Your Code as a Crime Scene: persistent defects proliferate where there’s high churn (code churn), low shared understanding, and knowledge fragmentation across repositories. As marginal cost for generating files drops close to zero for many teams, temptation grows to replace refactoring with proliferation: accepting repeated local implementations that are “good enough” instead of extracting common components; letting slightly divergent versions coexist until critical fixes require manual replication.

Nicholas Carr helps explain why this pattern sets in quickly: automating reduces friction precisely at stages where human judgment previously existed around abstraction boundaries—structured reuse becomes harder to internalize as part of everyday practice. Then new technical debt emerges—besides .* also poorly internalized logically for simplifying later on. DRY stops being living practice and becomes decorative slogan if knowledge leaves few canonical points inside the system.

Open source reactions already show concrete consequences outside abstract debate: in May 2026 the RPCS3 project tightened guidelines after rejecting waves of artificial intelligence-generated PRs with severe regressions; it also began requiring mandatory disclosure of use of these tools and explicit evidence of human testing (RPCS3 on X and GitHub Readme), banning contributions classified as undeclared “AI slop” (RPCS3 on X; project GitHub Readme , 2026). In complex projects review rarely compensates indefinitely for volume without consistent author-level understanding

Real-World Challenges and Limitations

The central limitation isn’t the ability to generate syntax; it’s the recurring difficulty of sustaining deep architectural context over time. Legacy systems don’t behave like isolated exercises. They resemble ancient cities whose plumbing has been modified for decades under real constraints (operational incidents, historical compromises). A model may suggest an elegant renovation for a specific room while ignoring that the wall there supports invisible decisions above it.

That’s how locally correct changes produce systemic regressions in mature codebases: models “see” textual patterns and statistical relationships; teams then have to deal with accumulated causality—implicit contracts, invisible dependencies, and zones where a redundant line exists because removing it would be expensive after some prior disaster taught that lesson internally.

This fragility becomes even more evident when generated volume grows alongside erosion of conscious refactoring. GitClear’s analysis of 211 million lines from 2020–2025 recorded an 8x increase in duplication via copy/paste while “moved” code dropped drastically (from 25% in 2021 to less than 10% in 2025) (GitClear Research, 2025). This directly connects to discussions about churn: the more changes turn into superficial additions without deliberate reorganization, the higher the likelihood that defects concentrate in the same turbulent regions.

The RPCS3 case made this operational limit unavoidable: in May 2026 there was public rejection after multiple AI-generated PRs introduced severe regressions. After that, mandatory disclosure of resource usage became required, along with explicit proof via human testing—plus summary bans for undeclared submissions classified as “AI slop” (RPCS3 on X; GitHub Readme of the project). The reason was that changes seemed acceptable on a quick read: they removed apparent redundancies, simplified internal flows—without understanding delicate mechanisms involved in temporal synchronization, specific compatibility with games, and invariants accumulated over the years (RPCS3 on X; GitHub Readme of the project, 2026).

There’s an uncomfortable strategic implication that’s especially relevant for critical old technological heritage: the more critical the legacy system is, the lower the marginal value of generation without supervision—the higher the premium paid for engineering capable of explaining intent under pressure. The DORA report captured part of this shift by showing individual effectiveness gains of 0.17x, accompanied by increased delivery instability of 0.1x (Google DORA AI Capabilities Model Report, 2025). In legacy codebases this asymmetry tends to worsen because each local acceleration amplifies systemic validation cost.

Nicholas Carr frames this point by suggesting automation fails less by executing routine tasks poorly than by weakening human vigilance required when something deviates from script (The Glass Cage). Translating that into engineering: risk isn’t only receiving wrong suggestions—it’s losing professionals who can quickly recognize why a seemingly correct suggestion collides with invisible constraints present in the real system.

The Collapse of Mentorship and the Senior-Training Paradox

A frequently underestimated aspect of this transition is understanding review as a direct line to building traditional technical maturity—typically constructed through a slow cycle of trial, critical feedback, and correction inside small PRs guided by specific input to junior reasoning. Naming trade-offs and architectural criteria becomes internalized gradually; over time it becomes possible to explain causality beyond delivering functional diffs.

When automatic generation dumps extensive blocks whose author barely understands what senior reviewers are looking at, those reviewers gradually stop playing an active mentoring role and instead become constant forensic auditors trying to answer: “Will this break production?”

The DORA report captured this mechanism as well: individual effectiveness gains of 0.17x, paired with increased delivery instability of 0.1x (Google DORA artificial intelligence Capabilities Model Report, 2025). In practice, the pipeline gets faster entering—but congested exactly where real technical knowledge transfer happens.

This collapse has a cumulative effect because it destroys the training ground for future seniors. Architectural judgment doesn’t come ready-made; it’s built through hundreds of short cycles—David Thomas and Andrew Hunt explicitly defend this in The Pragmatic Programmer, highlighting that a mentor’s evolution into mastery depends on direct contact with technical consequences beyond just visible outcomes (The Pragmatic Programmer). If junior work becomes commoditized and tools deliver plausible solutions on demand, organizations save headcount this quarter but erode internal school over the next five years.

Nicholas Carr describes a similar mechanism in The Glass Cage: efficient automating can preserve superficial performance while weakening the craftsmanship needed to supervise exceptions and anomalies outside the script (The Glass Cage).

A concrete example makes this paradox hard to ignore: at RSA Conference 2026, Anthropic reported through Project Glasswing that an agent found an OpenBSD flaw invisible for 27 years, tied to a rare kind of discovery typical specialists in security systems at low level (Forbes, 2026). Technically impressive—and strategically ambiguous—because if machines can perform investigation before it used to require years of accumulated experience, companies will have an economic incentive to reduce full-time junior work at scale. Industry reports discussed suggests commoditization could drive down operational costs up to 100x for repetitive analytical tasks (Forbes, 2026).

There is only one decisive difference between hiring instant expertise that finds a historical vulnerability once versus continuously manufacturing experts—forming clinical residents requires continuous residency-like learning guided by real friction.

Organizationally, this pressures future scarcity precisely in roles that are hard to improvise: architecture security offensive/defensive performance tuning incident response hard systems Adam Tornhill points out that framework sickness occurs in zones where frequent change meets low shared understanding (Your Code as a Crime Scene). If juniors stop learning through real friction and seniors spend their days validating opaque artifacts instead, companies end up with a deformed age structure: operators assisted by tooling with only a few trusted specialists at the top—almost no intermediate layer able to mature enough to take responsibility for critical changes later (Your Code as a Crime Scene).

That’s why the correct debate stops being “use or don’t use.” The executive question becomes whether institutional mechanisms preserve speed while transforming accumulated competence through smaller PRs explainable by their authors—review oriented toward causal questions—not just cosmetic prompt policies. Causal rotation formal debugging is harder; metrics tied to senior effort consumed and assisted contributions matter more than anything about prompt aesthetics.

Cultural and Social Impacts

Decisive cultural change usually starts as identity before it becomes technical. For decades, developer prestige was tied to manual fluency: knowing APIs, writing syntax quickly, navigating frameworks without looking at keys. The shift called Vibe Coding 2_0 moves gravity away from syntax typists toward orchestrators who define boundaries impose constraints validate causality coordinate multiple automated executions while keeping an intact mental map of how systems work.

A useful analogy comes from transitioning from analog aviation to highly automated cockpits: pilots stopped being judged on moving levers all day and started being judged on knowing when to trust instruments—and when not—and on how to take control during anomalous conditions. In engineering, this increases the importance of architecture decomposition and logical auditing. It also makes clear who outsourced reasoning. Nicholas Carr had already warned that automation absorbed poorly doesn’t eliminate human work—it degrades practice until there’s only a feeling of command without true understanding of mechanisms (The Glass Cage).

At this point academic debate between outskilling vs newskilling discussed in Communications of the ACM gains strategic weight: outskilling replaces fundamental human competencies before they consolidate newskilling assumes systems expand human capacity—discovering patterns operating levels previously inaccessible—which corrodes foundations instead. The difference separates opposite professional cultures. In the first culture, engineers become managers producing plausible responses generated by algorithmic third parties. In the second culture, they use cognitive microscopes—agents—to test hypotheses rapidly and explore larger solution spaces.

The current problem is industry rewarding volume before verifying maturity. When GitClear identified an eightfold increase in duplication between 2020–2025 alongside a drop in moved code, it signaled both technical culture shift at once: teams practicing fewer behaviors associated with deliberate engineering—consolidating abstractions refactoring intention reducing redundancy David Thomas and Andrew Hunt defended exactly this mature competence manifests as less unnecessary written output and better ability to eliminate needless repetition (The Pragmatic Programmer).

In open source, social impact appears without corporate shock absorbers. Maintainers have always acted as technical caretakers and community curators—but now they’re pushed into degrading roles as probabilistic trash auditors. In May 2026 RPCS3 tightened rules after rejecting waves of AI-generated PRs causing severe regressions; it required mandatory disclosure of tool usage plus explicit proof via human testing—and summary bans for submissions classified as “AI slop” when not declared (RPCS3 on X; GitHub Readme, 2026). This changes open collaboration’s social contract: voluntary or underfunded maintainers start filtering material produced with almost zero marginal cost—a task users often can’t understand well enough to explain diffs.

In business terms, it would mean turning central technical councils into sophisticated spam triage defensive repetitive content psychologically corrosive

This erosion threatens intergenerational learning space. Open projects used to function like public workshops: newcomers observed real patterns received hard but instructive corrections—they learned implicit codes good engineering practices contributing gradually over time. When streams get flooded with massive submissions generated by tools, communities lose pedagogical density; exhausted maintainers tend to close doors early—tightening language requirements demanding extra proof reducing tolerance for error reactions understandable under continuous overload.

But each added defense layer increases entry costs for legitimate contributors—a relevant social irony given programming democratization promises could end up concentrating power among just a few individuals able to audit hard code under pressure while driving away honest but confused participants relying on low-quality automatic senders.

There’s also symbolic status inversion among developers. Previously there was pride in craftsmanship mastering implementation details; now culture bifurcates: a minority operates as technical maestro specifying invariants coordinating specialized agents validating setup integration while most risk becoming nominal supervisors producing outputs without mastering them end-to-end.

The boundary will likely be defined less by access tools more by cognitive discipline around them. Adam Tornhill showed healthy software depends on collective capacity interpreting patterns change context historical memory requires institutional judgment accumulated human experience beyond fast generation (Your Code as a Crime Scene). so decisive cultural impact from Vibe Coding 2_0 will be separating very clearly who governs complexity versus who dispatches executable text. This distinction reorganizes reputation employability influence within teams and technical communities quickly

Vibe Coding 2.0 and the Future of Architecture

The practical consequence of this transition becomes clear: the most valuable developer in the next decade tends to be the one who reduces architectural uncertainty. Writing syntax has become a low marginal-cost activity: validating invariants, drawing boundaries, controlling coupling, explaining causality—these remain expensive. In terms of function, it redefines the engineer’s and architect’s roles.

Auditor because they must inspect solutions to preserve business rules, non-functional properties, and implicit contracts; statistical models treat these as noise. Architect because they decide where automation accelerates discovery and where it multiplies entropy. Nicholas Carr offers the right conceptual frame when automation takes on central tasks without preserving the exercise of human judgment—expertise deteriorates silently until it’s missing precisely when it matters. In distributed software, degraded regression propagates across layers; seemingly innocent local decisions compromise future evolution (A Gaiola).

Mature teams need to abandon childish metrics like volume generated, number of suggestions accepted, and sprint lines. These measures are equivalent to evaluating a CFO by how many spreadsheets are open. What matters is structural quality of the delivered asset.

A serious set of OKRs includes: maintainability index; critical domain coverage; net duplication rate introduced by a release; percentage of consolidated code versus cloned code; average time to resolve complex multi-causal bugs; senior effort spent on corrective review. There is an empirical basis for this shift. GitClear identified an eightfold increase in duplication between 2020 and 2025, with code moved down below 10% after 20–25 years—plus DORA showed an individual effectiveness gain of only 0.17%, accompanied by an instability increase of 0.16% (as reported (0 .17x, 0 .1x)), while keeping the original citations already provided (GitClear Research , 2025) (Google DORA AI Capabilities Model Report , 2025).

This repositioning also changes how we should use leading tools correctly. Claude Code and Vertex make sense when they operate as assisted discovery instruments: comparative pattern analysis, controlled exploration of architectural space—not cognitive crutches. Outsourcing decisions without governance.

With governance, senior teams use Claude Code to generate alternative hypotheses, modular decomposition, map probable regression surfaces, synthesize adversarial scenarios before refactoring, and critically orchestrate multiple experiments with Vertex—observability classification of incidents, semantic analysis over large internal datasets. The decisive epistemological point is that a tool should expand human investigative capacity—not replace mental model construction. It’s the difference between a microscope that lets you see better versus asking the microscope to conclude on its own: patient treatment versus premature conclusions.

When the boundary gets lost, the pattern described by Tornhill reappears: much apparent churn with little shared understanding—hotspots become increasingly expensive to keep (Your Code as Crime Scene).

There are clear signs when discipline is missing. RPCS3 hardening in May 26 required explicit disclosure and proof using tools plus human verification; public reversal after severe regressions showed that human review alone can’t compensate for absent authorship or for minimal understanding required.

Any serious organization will need to require minimum authorial explainability for AI systems-assisted code, including PRs small enough to be defensible by their author; design reviews focused on invariants pre-implementation; post-incident rituals evaluating besides .* also who truly understood the altered logic. David Thomas and Andrew Hunt insisted on manifest mastery: eliminate duplication and make knowledge explicit. In Vibe Coding, this becomes even more important because generating alternatives has become trivial—choosing the right abstraction remains rare.

An emerging profile combines three unglamorous but highly strategic competencies: formulating good constraints; auditing systemic coherence; aggressively simplifying after initial generation. Whoever masters this will use Claude Code and Vertex labs to find new patterns beyond individual intuition. Whoever doesn’t will turn platforms into sophisticated crutches that produce premium-looking technical debt.

The distinction between outskilling and newskilling lies exactly here: in the first case, teams outsource basic reasoning and lose muscle; in the second case, they preserve human fundamentals while expanding exploratory capacity through well-governed agents.

Companies shape career paths and evaluation/training around this advantage: less textual throughput and more verifiable architectural discernment under pressure.

Conclusion

The article’s central thesis is simple but operationally demanding: the problem isn’t using AI to write code—it’s allowing it to replace the human cycle of understanding, deciding, and taking responsibility. When that happens, loss doesn’t show up first in throughput; it shows up in silent structural degradation—in costlier corrective reviews—and in an inability to explain why a seemingly small change destabilized the framework.

The signals described above point in this direction. GitClear recorded an eightfold increase in duplication between 2020 and 2025, while DORA showed only a 0.17% individual gain accompanied by a 0.16% instability increase. These numbers reinforce that textual productivity and engineering quality are not equivalent. Without criteria for comprehensible authorship, maintainability, and architectural coherence, automation tends to push cost forward—precisely where it becomes hardest to absorb.

The next step for technical leaders and executives isn’t to slow adoption but to change what management is optimizing for. Tools like Claude Code and Vertex should be evaluated by their ability to expand investigation capacity, reduce relevant uncertainty, and accelerate validation—not by raw output volume.

This requires explicit authorial explainability policies; invariant-oriented review; structural metrics; training focused on well-formulated constraints; systemic auditing; post-generation simplification. In upcoming cycles, competitive difference will be less about who generates more code—and more about who preserves enough understanding to evolve systems under pressure without accumulating invisible entropy.

Further Reading

Recommended Books

  • The Pragmatic Programmer: From Journeyman to Master (The Pragmatic Programmer) – David Thomas and Andrew Hunt. Published by Addison-Wesley Professional (1999, 2019 – 2nd ed.). This “Bible” of software engineering is essential for contrasting classic development principles (such as DRY — Don’t Repeat Yourself) with today’s generation of AI-cloned code—emphasizing deep understanding and maintenance.
  • Clean Code: A Handbook of Agile Software Craftsmanship – Robert C. Martin (Uncle Bob). Published by Prentice Hall (2008). Essential for understanding principles for writing clean code that is readable and easy to maintain—crucial skills for developers who need to audit and refactor AI-generated code.
  • Refactoring: Improving the Design of Existing Code – Martin Fowler. Published by Addison-Wesley Professional (1999, 2018 – 2nd ed.). This book provides techniques and patterns for improving an existing codebase structure without changing its external behavior—a must-have skill for dealing with complexity and code churn introduced automatically generated code.

Reference Links

  • GitClear Research & Learning – Top reference source on productivity/quality metrics for code development impact studies involving AI.
  • DORA – DevOps Research and Assessment – The largest authority on software delivery performance metrics, maintained by Google; provides insights into how efficiency and development quality are affected by new tools and practices.
  • Communications of the ACM (CACM) – The portal of the Association for Computing Machinery, a primary source of peer-reviewed studies on AI’s impact on computer science research and software engineering practice.

Leave a Reply

Your email address will not be published. Required fields are marked *