A Year In, GitHub Measures AI-Based Copilot's Productivity Boost

There's no doubt that GitHub's "AI pair programmer," Copilot, has shaken up the dev world, but by how much?

The company sought to answer that question quantitatively with a multi-pronged research effort aimed at measuring the product's impact on developer productivity and happiness.

As readers of Visual Studio Magazine probably know, Microsoft-owned GitHub launched a technical preview of Copilot in the summer of 2021, making it generally available a year later.

GitHub calls it an "AI pair programmer" for its ability to provide advanced code-completion functionality and suggestions similar to IntelliSense/IntelliCode, though it goes beyond those Microsoft offerings thanks to Codex, a cutting-edge AI system developed by Microsoft partner OpenAI. That lets it turn typed commands into actual code.

Using a combination of surveys and experiments, GitHub recently sought to measure its effect on developers who used it.

"Because AI-assisted development is a relatively new field, as researchers we have little prior research to draw upon," said GitHub's Eirini Kalliamvakou in a Sept. 7 blog post. "We wanted to measure GitHub Copilot's effects, but what are they? After early observations and interviews with users, we surveyed more than 2,000 developers to learn at scale about their experience using GitHub Copilot. We designed our research approach with three points in mind:

  • Look at productivity holistically. At GitHub we like to think broadly and sustainably about developer productivity and the many factors that influence it. We used the SPACE productivity framework to pick which aspects to investigate.
  • Include developers' first-hand perspective. We conducted multiple rounds of research including qualitative (perceptual) and quantitative (observed) data to assemble the full picture. We wanted to verify: (a) Do users' actual experiences confirm what we infer from telemetry? (b) Does our qualitative feedback generalize to our large user base?
  • Assess GitHub Copilot's effects in everyday development scenarios. When setting up our studies, we took extra care to recruit professional developers, and to design tests around typical tasks a developer might work through in a given day."

The company said it found expected and unexpected answers, which it broke down into two main findings.

Developer Productivity Goes Beyond Speed
In conducting a survey to see if developers using GitHub Copilot saw benefits in other areas beyond speeding up tasks, the company found:

  • Improving developer satisfaction. Between 60-75 percent of users reported they feel more fulfilled with their job, feel less frustrated when coding, and are able to focus on more satisfying work when using GitHub Copilot. That's a win for developers feeling good about what they do!
  • Conserving mental energy. Developers reported that GitHub Copilot helped them stay in the flow (73 percent) and preserve mental effort during repetitive tasks (87 percent). That's developer happiness right there, since we know from previous research that context switches and interruptions can ruin a developer's day, and that certain types of work are draining.
[Click on image for larger view.] Benefits (source: GitHub).

But Speed Is Important, Too
Developers said they complete tasks -- especially repetitive tasks -- faster when using GitHub Copilot, which the company said was one of those expected findings, reported by 90 percent of respondents. To observe and measure that response in practice, GitHub conducted a controlled experiment in which two groups (one using Copilot) were timed on how long it took, on average, to write an HTTP server in JavaScript.

The experiment found:

  • The group that used GitHub Copilot had a higher rate of completing the task (78 percent, compared to 70 percent in the group without Copilot).
  • The striking difference was that developers who used GitHub Copilot completed the task significantly faster -- 55 percent faster than the developers who didn't use GitHub Copilot. Specifically, the developers using GitHub Copilot took on average 1 hour and 11 minutes to complete the task, while the developers who didn't use GitHub Copilot took on average 2 hours and 41 minutes. These results are statistically significant (P=.0017) and the 95 percent confidence interval for the percentage speed gain is [21 percent, 89 percent].

The conclusion? "GitHub Copilot supports faster completion times, conserves developers' mental energy, helps them focus on more satisfying work, and ultimately find more fun in the coding they do."

Controlled Experiment
[Click on image for larger view.] Controlled Experiment (source: GitHub).

That finding was backed up by developer quotes like:

  • "(With Copilot) I have to think less, and when I have to think it's the fun stuff. It sets off a little spark that makes coding more fun and more efficient." -- Senior Software Engineer
  • "The engineers' satisfaction with doing edgy things and us giving them edgy tools is a factor for me. Copilot makes things more exciting." -- CTO, Large Engineering Org

"With the advent of GitHub Copilot, we're not alone in exploring the impact of AI-powered code completion tools!" Kalliamvakou said. "In the realm of productivity, we recently saw an evaluation with 24 students, and Google's internal assessment of ML-enhanced code completion. More broadly, the research community is trying to understand GitHub Copilot's implications in a number of contexts: education, security, labor market, as well as developer practices and behaviors. We are all currently learning by trying GitHub Copilot in a variety of settings. This is an evolving field, and we're excited for the findings that the research community -- including us -- will uncover in the months to come."

While GitHub's effort paints a pretty picture of Copilot, the product has also generated publicity on the negative side.

For example, the Free Software Foundation (FSF) last year deemed GitHub Copilot to be "unacceptable and unjust."

More recently, the Software Freedom Conservancy (SFC) piled on. The group, which like the FSF is a strong advocate of strict free and open source software (FOSS), listed many grievances about GitHub's behavior, especially pertaining to the release of a paid service based on Copilot, whose AI model is trained on top-quality GitHub source code repos.

Copilot has also revived existential angst among developers who fear AI is coming for their jobs.

Also, a 2021 security study concluded that "developers should remain awake" in view of a 40 percent bad code rate.

However, GitHub has been ploughing ahead in improving Copilot and broadening its reach, just last week making it available to teachers who are verified on GitHub Global Campus.

About the Author

David Ramel is an editor and writer for Converge360.

comments powered by Disqus


Subscribe on YouTube