Why AI Makes Experienced Developers Slower (But Hopefully Not Forever)

The slowdown before the breakthrough.

ProductMind, Ezinne Udezue, and Ted Yang

Dec 17, 2025

We believe AI coding tools are transformative, 10xing output speed at the same quality, and this is backed up by the dozens of dev teams we’ve spoken to. But a rigorous new study reveals something different.

This study showed experienced developers worked slower, not faster, and all the while these same developers thought that the tools were helping their speed by maintaining quality.

A randomized controlled trial from METR tracked 16 experienced open-source developers working on their own repositories over 246 real tasks. These devs were far from amateurs and we would call them open-source gurus. Each coding task was randomly assigned to either allow or disallow AI tools (primarily Cursor Pro with Claude 3.5/3.7 Sonnet), measure completion time, and compare the results.

When the developers used AI, they took 19% longer to complete tasks. Yet before starting, the developers predicted AI would speed them up by 24%.

What’s more surprising is even after experiencing the actual slowdown, they still thought AI had sped them up by 20% until they saw the time log data. The perception gap is striking. Developers were systematically and dramatically wrong about their own productivity.

This Matters More Than You Think

This isn’t another “AI hype is overblown” hot take. As we said, we think the tools work, the models are impressive, and Cursor and Claude are powerful enough to recode Slack in 30 hours! But something fundamental is happening that neither developers nor their tools are accounting for.

Let’s start with the basics. These weren’t junior developers playing with new toys. They were veterans with an average 10+ years of software engineering experience and 5 years contributing to their fully mature repositories. Nerd stats: > 1,500+ commits, > 23,000 stars, > 1 million lines of code.

These are the people who should benefit most from AI augmentation as they know the codebase, they know the patterns, and they have the experience to prompt effectively and evaluate outputs critically. If AI can’t speed them up, who can it speed up?

The answer, based on prior research, has always been less experienced developers. Studies on GitHub Copilot consistently showed that junior developers got bigger productivity gains because of skill-compression. AI allows newbies to get peak performance by giving them expert-level scaffolding.

But this study flips that assumption and suggests there’s a ceiling effect, or worse, an inversion. The very expertise that should make AI tools more effective might be making them less effective.

The Five Hidden Costs

The study identifies five contributing factors after analyzing hours and hours of screen recordings that explain why developers feel faster but actually slow down.

Context switching overhead. Moving between the code editor and the AI chat interface introduces friction that compounds over time. Each switch breaks flow and the cognitive load of managing the AI assistant as a separate tool adds up in ways developers don’t consciously track.

Validation tax. Every AI suggestion requires evaluation. Is this correct? Does it fit the codebase style? Will it pass review? Does it handle edge cases? That evaluation work is invisible but expensive, transforming the developer’s role from writing code to reviewing code that looks plausible but might be subtly wrong. In fact it is this near-wrong that is probably so taxing as bugs are harder to catch.

Scope creep from capability. When AI makes it easy to generate more code, developers generate more code without the elegance that might normally go into it. The result is functionally equivalent but unnecessarily complex or brute-force solutions that feel cheap to create but aren’t.

High quality bar friction. Mature open-source projects have implicit requirements around documentation, test coverage, linting, formatting, and code style that take humans years to internalize. AI doesn’t know these requirements, so it generates code that “works” but fails review, and fixing AI output to meet unstated standards takes time.

Task redefinition. With AI, developers attempt more ambitious solutions or explore more options because the cost feels lower. This isn’t necessarily bad, but it means the task itself expands. What would have been a straightforward fix becomes a refactor, and what would have been one approach becomes three prototypes.

Together these factors create a productivity illusion where developers feel like they’re moving faster because they’re generating code faster. But as we’ve said before, a faster rocket engine pointed in the wrong direction doesn’t get you to Alpha Centauri faster.

The Benchmark Paradox

Here’s where it gets genuinely confusing. SWE-Bench Verified has frontier models solving complex programming tasks and agentic coding systems demonstrate remarkable capabilities and anecdotal reports flood socials with developers claiming they shipped features in hours that would have taken days.

So which is real? The answer is both and the gap between them reveals something important about how we measure AI capabilities.

Benchmarks optimize for completion, not production quality. SWE-Bench measures whether code passes author-written tests but doesn’t measure documentation quality, code review standards, maintainability, or adherence to implicit style guides. In mature codebases, those “soft” requirements consume substantial time, and AI generates code that works but doesn’t meet production standards.

Also benchmarks allow unlimited attempts. Frontier models in benchmark settings can sample millions of tokens, try hundreds of approaches, and iterate until something passes tests. Real developers using Cursor don’t do that. They prompt a few times, evaluate the output, and move on. This usage gap alone accounts for a lot of the difference in our opinion.

And of course anecdotes are biased. People who have great experiences with AI tools post about it while people who struggle quietly stop using them. As this study shows, people systematically overestimate their own speedup, meaning the developer who tweets “AI helped me ship a feature in 2 hours” might have actually spent 3 hours but it felt faster because they were generating code quickly. Or might be lying.

Finally quality standards vary dramatically. Prototypes and personal projects have far lower quality bars than production codebases with millions of users like these. Critical infrastructure that powers the internet has a quality bar that is extreme, where every line matters and every edge case counts.

Why Experience Makes It Worse

The study specifically notes that more experienced developers didn’t show adaptation effects. You’d expect that with practice developers would learn to use AI more effectively, figure out better prompting strategies, and develop intuition for when AI helps and when it hinders. That didn’t happen and the slowdown persisted across experience levels with AI tools.

This suggests the problem isn’t just a learning curve but something structural about how these tools integrate into expert workflows.

Expert developers have internalized patterns, shortcuts, and mental models that make them fast. Remember, Zuckerberg was talking about 10x developers decades ago. They know exactly where to look in the codebase, can predict what will break, and understand the architecture deeply.

AI disrupts that flow by offering suggestions that seem plausible but require validation, generating code in styles that almost match but not quite, and solving problems in ways that technically work but don’t fit the existing architecture. For a junior developer, that’s all upside since they don’t have expert intuition to disrupt. For a senior developer, it’s friction. They’re constantly context-switching between their mental model and the AI’s output, validating instead of creating, and managing a tool instead of solving the problem. Plus that tool isn’t deterministic like coding itself is. As we’ve said before vibe-coding feels more like managing helpful interns with all the pluses and minuses than it does programming.

Design Your AI Coding Environment Carefully

Interestingly, a separate study from Google a year ago found the opposite result where AI tools reduced time on task by approximately 21% in their randomized control trial with 96 engineers. How do we reconcile that with METR’s 19% slowdown?

The Google study used enterprise tasks with presumably lower quality bars than elite open-source projects, and the developers were working in a corporate environment with different standards and constraints. The AI features they used (code completion, smart paste, natural language to code) were integrated directly into their workflow rather than requiring separate tools like Cursor. Most tellingly, the Google study found that developers who spent more hours per day on coding benefited more from AI, which is the opposite of what you’d expect if AI helps beginners and hinders experts.

Context is everything. AI coding tools aren’t universally helpful or harmful. They’re helpful in some settings and harmful in others, and properly designing those settings is key.

The Uncomfortable Truth About AI Augmentation

This study forces us to confront something the AI hype cycle has been avoiding. Augmentation is hard and it’s not enough to build a smart tool. You have to build a tool that integrates into expert workflows without breaking what makes them expert in the first place.

Right now, AI coding assistants are like having a very smart intern who doesn’t know your codebase, doesn’t understand your quality standards, and needs constant supervision. For some tasks, that’s valuable, but for others, it’s slower than just doing it yourself.

The developers in this study kept using AI even though it slowed them down. Some reported they found it more enjoyable, others viewed it as an investment in future skills, and some thought they were faster even though the measurements showed otherwise. This is the same pattern we’ve seen with every productivity tool that feels good but doesn’t measure well. Email feels productive, Slack feels productive, and endless meetings feel productive (ok they don’t but we threw that in there to see if you are still reading), but they all create the sensation of progress without creating actual progress.

What This Means Going Forward

First, we need honest accounting. Stop measuring productivity by “lines of code generated” or “suggestions accepted” and start measuring time to mergeable releases in production-quality codebases, because the gap between those metrics is where the truth lives.

Second, we need better integration. The future of AI coding isn’t chat interfaces and separate tools but deeply integrated systems that understand codebase context, quality standards, and implicit requirements. The tools need to meet developers where they are, not force developers to adapt to the tools.

Third, we need to stop treating AI as universally applicable. Junior developers might benefit enormously while experts working on critical infrastructure might not. Prototyping and exploration might be accelerated while production refinement might be slowed, and that’s fine as long as we’re clear about when to use AI and when to step back.

Fourth, we need to fix the perception problem. Developers systematically overestimate AI’s impact on their productivity, which means organizations making decisions based on developer feedback are getting bad data. The only way to know if AI is helping is to measure objectively, not ask subjectively.

The METR study isn’t the final word on AI coding tools. It’s a snapshot of early 2025 capabilities in one important setting. The tools will get better, the integration will improve, and the slowdown will reverse.

But right now, in mature codebases with high quality standards and experienced developers, AI is making people slower while making them feel faster. That gap between perception and reality is dangerous because it leads to bad decisions, wasted resources, and frustrated teams who can’t figure out why shipping is taking longer despite all these “productivity” tools.

Maybe that’s the most important lesson. In the age of AI, trust data over feelings, even your own.

Want to dive deeper?

We invite you to bring your questions directly to us! To thank you for your support, we are hosting an end-of-year AMA. Join us, Wednesday December 17th at 11 am CST, to connect, reflect, and get practical career and product advice to help you start the new year with confidence.

Please submit your questions when you register using the button below so we can focus on what matters most to you.

We hope to see you there -bring a friend if you think they would enjoy the AMA.

ProductMind

Discussion about this post

Ready for more?