AI coding: what the research actually says

There’s a common refrain in engineering circles: AI-generated code is sloppy. It repeats itself. It doesn’t follow conventions. It creates maintenance nightmares. I’ve heard variations of this from developers who’ve never seriously used AI coding tools, and from some who have.

The trouble with this claim is that it’s been almost entirely anecdotal. Until now.

Proper science arrives

“Echoes of AI: Investigating the Downstream Effects of AI Assistants on Software Maintainability” is one of the first rigorous studies to examine this question properly. It’s peer-reviewed. It’s pre-registered, meaning the researchers committed to their methodology before seeing the results, which prevents the common scientific sin of adjusting your hypothesis after you’ve seen the data. It’s a randomised controlled trial. In short, it’s proper science.

The researchers set up two groups. One used AI coding tools. The other didn’t. Both groups implemented a non-trivial piece of software. Then, critically, the researchers measured how difficult it was for a different developer - one who wasn’t using AI - to maintain the resulting code.

Maintainability isn’t the only measure of code quality, but it’s one of the most important. The reason is economic: maintenance typically accounts for around 80% of software’s total lifecycle cost. Code that’s hard to understand is code that’s expensive to change, debug, and extend. If AI tools were producing code that was measurably harder to maintain, that would be a serious problem.

The numbers

The AI-assisted group completed their implementation 30% faster than the control group. For developers who were already habitual AI coding tool users, that figure rose to 55% faster. This aligns with what I’ve written previously about AI changing the economics of software development.

But speed means nothing if the output is rubbish. Here’s where it gets interesting.

There was no discernible difference in maintainability between the two groups. The code produced with AI assistance was just as easy to understand and modify as the code produced without it. By this measure, and it’s an important measure, the quality was equivalent.

In fact, there was a small improvement in maintainability when the original code was produced by experienced AI users. The researchers hypothesise that this is because well-directed AI tends to produce “boring”, idiomatic code. It follows conventions. It doesn’t get clever. And boring, conventional code is exactly what you want when someone else has to maintain it.

What this means

The sceptics have a position: AI coding tools are a shortcut that produces inferior results. That position now needs to contend with evidence to the contrary.

To be clear, this is one study. It examines one dimension of quality. There will be more research, and it may reveal nuances or limitations. That’s how science works. But the direction of travel is notable. This rigorous examination of this particular question found no quality penalty and a significant speed improvement.

I’ve argued that AI is changing the economics of building software. If building gets 50% cheaper, the whole calculus of software investment shifts. This research supports that argument with data rather than just logic.

The question for holdouts

If you’re still resistant to AI coding tools, ask yourself: what evidence would change your mind?

If the answer is “none”, then you’re not making a technical judgment. You’re making an emotional one. That’s your prerogative, but it’s worth being honest about it.

If the answer is “rigorous research showing it doesn’t harm quality”, then that research is starting to arrive. A pre-registered, peer-reviewed RCT found equivalent maintainability and 30-55% faster development. The burden of proof is shifting.

I’m not arguing that AI coding tools are magic, or that they work well for every task, or that they don’t require skill to use effectively. The study itself suggests that experienced users get better results. Like any tool, they reward competence.

But the claim that AI-generated code is inherently lower quality? That’s looking increasingly hard to defend.

AI coding: what the research actually says

Proper science arrives

The numbers

What this means

The question for holdouts

Further reading

Share this article

Comments

Leave a Comment

Ask about David's work