GitHub Copilot, an AI Pair Programmer, Is Coming to VS Code/Visual Studio

A new "AI pair programmer" represents a breakthrough in the third revolution of software development: the use of AI in coding.

Powered by a new AI system developed by OpenAI, GitHub Copilot is hopefully coming soon to Visual Studio Code and then the full-fledged Visual Studio IDE.

For now, in the limited technical preview stage, it's rough around the edges, but supposedly getting smarter all the time. Its first implementation comes in a VS Code extension now being used by a small group of testers. If the technical preview proves successful, plans call for it to become publicly available for VS Code and Visual Studio as a for-pay product.

"We've been building GitHub Copilot together with the incredibly talented team at OpenAI for the last year, and we're so excited to be able to show it off today," said Nat Friedman, CEO of GitHub, on a June 29 Hacker News post. "Hundreds of developers are using it every day internally, and the most common reaction has been the head exploding emoji. If the technical preview goes well, we'll plan to scale this up as a paid product at some point in the future."

As an AI pair programmer, it provides code-completion functionality and suggestions similar to IntelliSense/IntelliCode, though it goes beyond those Microsoft offerings with Codex, the new AI system developed by Microsoft partner OpenAI. IntelliCode is powered by a large scale transformer model specialized for code usage (GPT-C). OpenAI Codex, on the other hand, has been described as an improved descendent of GPT-3 (Generative Pre-trained Transformer) that can translate natural language into code.

GitHub Copilot
[Click on image for larger view.] GitHub Copilot (source: GitHub).

Like IntelliCode, Codex is trained on high-quality code repos on GitHub, taking into account local project context and other factors in order to suggest code completion for individual lines or whole functions.

"OpenAI Codex has broad knowledge of how people use code and is significantly more capable than GPT-3 in code generation, in part, because it was trained on a data set that includes a much larger concentration of public source code," Friedman said in a June 29 blog post. "GitHub Copilot works with a broad set of frameworks and languages, but this technical preview works especially well for Python, JavaScript, TypeScript, Ruby and Go."

Microsoft and OpenAI entered a partnership pact back in 2019, a recent manifestation of which includes new "no code" natural language development functionality for Microsoft's Power Apps, powered by GPT-3. Now, OpenAI Codex is instrumental in the new offering from GitHub, which is owned by Microsoft.

As mentioned, some functionality is lacking in the limited technical preview (the VS Code extension shows 325 installations as of this writing). "GitHub Copilot doesn't actually test the code it suggests, so the code may not even compile or run," the project site's FAQ states. "GitHub Copilot can only hold a very limited context, so even single source files longer than a few hundred lines are clipped and only the immediately preceding context is used. And GitHub Copilot may suggest old or deprecated uses of libraries and languages. You can use the code anywhere, but you do so at your own risk."

Furthermore, the FAQ says that in trying to understand a developer's intent, GitHub Copilot suggests code that may not always work or make sense, so the suggestions should be carefully tested, reviewed and vetted, like all code.

But even with those caveats for the technical preview, GitHub is hoping for great improvements based on promising initial findings. "We recently benchmarked against a set of Python functions that have good test coverage in open source repos," the FAQ says. "We blanked out the function bodies and asked GitHub Copilot to fill them in. The model got this right 43 percent of the time on the first try, and 57 percent of the time when allowed 10 attempts. And it's getting smarter all the time."

Over on Hacker News, the inevitable questions about code ownership, licensing, conflicts of interest and other legalese arose. "In general: (1) training ML systems on public data is fair use (2) the output belongs to the operator, just like with a compiler," Friedman replied. "On the training question specifically, you can find OpenAI's position, as submitted to the USPTO here. We expect that IP and AI will be an interesting policy discussion around the world in the coming years, and we're eager to participate!"

Other questions and answers culled from Friedman's comment thread include:

  • Q: Have there yet been reports of the AI writing code that has security bugs? Is that something folks are on the lookout for?
    Friedman: I haven't seen any reports of this, but it's certainly something we want to guard against.
  • Q: Might this end up putting GPL code into projects with an incompatible license?
    Friednman: It shouldn't do that, and we are taking steps to avoid reciting training data in the output: link1 link2. In terms of the permissibility of training on public code, the jurisprudence here -- broadly relied upon by the machine learning community -- is that training ML models is fair use. We are certain this will be an area of discussion in the US and around the world and we're eager to participate."
  • Q: This is obviously controversial, since we are thinking about how this could displace a large portion of developers. How do you see Copilot being more augmentative than disruptive to the developer ecosystem? Also, how you see it different from regular code completion tools like tabnine. [Note: this theme was also featured in Twitter comments]
    Friedman: We think that software development is entering its third wave of productivity change. The first was the creation of tools like compilers, debuggers, garbage collectors, and languages that made developers more productive. The second was open source where a global community of developers came together to build on each other's work. The third revolution will be the use of AI in coding. The problems we spend our days solving may change. But there will always be problems for humans to solve.

The technical preview is restricted because the tool requires state-of-the-art AI hardware. Developers who think they have that are invited to apply here. "If the technical preview is successful, our plan is to build a commercial version of GitHub Copilot in the future. We want to use the preview to learn how people use GitHub Copilot and what it takes to operate it at scale," the FAQ states.

OpenAI says it will release Codex through its API later this summer so developers can explore many more capabilities of the new AI system, which can be infused into their own apps.

About the Author

David Ramel is an editor and writer for Converge360.

comments powered by Disqus


Subscribe on YouTube