News
GitHub Research Claims Copilot Code Quality Gains in Addition to Productivity
GitHub says new research proves its Copilot AI tool can improve code quality, following earlier reports that said it boosts developer productivity.
"Our findings overall show that code authored with GitHub Copilot has increased functionality and improved readability, is of better quality, and receives higher approval rates," said Microsoft-owned GitHub in a blog post this week.
It's the latest of several research-based reports from GitHub that address the effectiveness of the original "AI pair programmer" that unleashed GenAI on software development, fundamentally changing the space and spearheading a wave of Copilots throughout Microsoft's products and services.
While GitHub's reports have been positive, a few others haven't. For example, a recent study from Uplevel Data Labs said, "Developers with Copilot access saw a significantly higher bug rate while their issue throughput remained consistent."
And earlier this year a "Coding on Copilot" whitepaper from GitClear said, "We find disconcerting trends for maintainability. Code churn -- the percentage of lines that are reverted or updated less than two weeks after being authored -- is projected to double in 2024 compared to its 2021, pre-AI baseline. We further find that the percentage of 'added code' and 'copy/pasted code' is increasing in proportion to 'updated,' 'deleted,' and 'moved 'code. In this regard, AI-generated code resembles an itinerant contributor, prone to violate the DRY-ness [don't repeat yourself] of the repos visited."
GitHub has an answer for those contrarian reports: "We hypothesize that other studies might not have found an improvement in code quality with GitHub Copilot, not because of the tool itself, but because developers may have lacked the opportunity or incentive to focus on quality." The company characterized its new research as the first controlled study to examine GitHub Copilot's impact on code quality.
Of course, the subject of GitHub Copilot's effectiveness is of intense interest, and reports vary in their conclusions, with other brand-new research published on Springer Nature Link saying, "The findings of the study suggest that GitHub Copilot can be a valuable asset to development processes, resulting in enhancements in satisfaction, performance, efficiency, and monetization dimensions. However, areas for improvement include communication features, unit testing, and addressing potential security concerns. This study demonstrates Copilot's potential as an effective tool for enhancing software development productivity and quality, providing valuable insights for future research and industry adoption."
And in August, GitHub topped research firm's Gartner's inaugural Magic Quadrant report on vendors of AI code assistants, leading in both completeness of vision and ability to execute.
"GitHub has an extensive developer community and GitHub Copilot has high user engagement, which enables it to gather feedback quickly and continuously innovate," Gartner said. "GitHub's high customer retention rates and annual contract value retention underscore its ability to maintain and grow its customer base."
As far as this week's report, GitHub listed the key findings as:
- Increased functionality: developers with GitHub Copilot access had a 56% greater likelihood of passing all 10 unit tests in the study, indicating that GitHub Copilot helps developers write more functional code by a wide margin.
- Improved readability: in blind reviews, code written with GitHub Copilot had significantly fewer code readability errors, allowing developers to write 13.6% more lines of code, on average, without encountering readability problems.
- Overall better quality code: readability improved by 3.62%, reliability by 2.94%, maintainability by 2.47%, and conciseness by 4.16%. All numbers were statistically significant. These quality improvements were consistent with those found in the 2024 DORA Report.
- Higher approval rates: developers were 5% more likely to approve code written with GitHub Copilot, meaning that such code is ready to be merged sooner, speeding up the time to fix bugs or deploy new features.
That DORA report (2024 Accelerate State of DevOps) only mentions Copilot once in passing, but does address AI-assisted coding in general, with this chart addressing developers' trust in the quality of AI-generated code:
Here's GitHub's bottom line on the new research:
So, what do these findings say about how GitHub Copilot improves code quality? While the number of commits and lines of code changed was significantly higher for the GitHub Copilot group, the average commit size was slightly smaller. This suggests that GitHub Copilot enabled developers to iterate on the code to improve its quality. Our hypothesis is that because developers spent less time making their code functional, they were able to focus more on refining its quality. This aligns with our previous findings that developers felt more confident using GitHub Copilot. It also demonstrates that with the greater confidence GitHub Copilot gave them, they were likely empowered to iterate without the fear of causing errors in the code.
That "GitHub Copilot group" refers to some of 243 developers with at least five years of Python experience who were randomly assigned to use the tool in the first phase of the study. "In the second phase, developers were randomly assigned submissions to review using a provided rubric. They were blind to whether the code was authored with GitHub Copilot," GitHub said.
About the Author
David Ramel is an editor and writer at Converge 360.