GitHub Copilot AI Spawns Open Source Alternatives

GitHub Copilot, described as an "AI pair programmer," debuted this year with a splash, amazing developers with its ability to supply chunks of code when a user is typing in Visual Studio Code and even generate whole applications solely through typed commands.

Turning Words into Code
[Click on image for larger view.] Turning Words into Code (source: OpenAI).

That debut came in June, when Microsoft partner OpenAI announced the tool, powered by a new AI system called Codex, which has been described as an improved descendent of GPT-3 (Generative Pre-trained Transformer) that can translate natural language into code. Since then it has been steadily improved and offered as an API.

GitHub Copilot
[Click on image for larger view.] GitHub Copilot (source: GitHub).

GitHub said after trials and testing of the technical preview are complete, the company intends to offer it as a commercial product for VS Code and the full-fledged Visual Studio IDE.

A for-pay product plan apparently didn't sit well with some in the industry, as open source alternatives have sprung up.

Take, for example, GPT Code Clippy: The Open Source version of GitHub Copilot.

Demo of the VS Code Extension in Animated Action Using One of the GPT-Code Clippy Models
[Click on image for larger view.] Demo of the VS Code Extension in Animated Action Using One of the GPT-Code Clippy Models (source: Code.AI).

That wiki for the GPT-Code-Clippy (GPT-CC) project that's hosted in a GitHub repo says: "GPT-Code-Clippy (GPT-CC) is a community effort to create an open-source version of GitHub Copilot, an AI pair programmer based on GPT-3, called GPT-Codex. GPT-CC is fine-tuned on our GPT Code Clippy dataset sourced from publicly available code on GitHub. It was created to allow researchers to easily study large deep learning models that are trained on code to better understand their abilities and limitations. GPT-CC uses the GPT-Neo model as the base language model, which has been pretrained on the Pile dataset and we use the Causal Language Modelling objective to train the model."

The "we" references above refer to the repo root, CodedotAI. A YouTube channel describes Code.AI as "a community dedicated for all things related to AI for code. In this community we not only discuss deep learning or code generation, we also discuss things like evolutionary computation and code documentation! It is a great place to find fellow like-minded researchers and developers, build a team of collaborators, find a project to work on, or brainstorm project and research ideas! On this channel we post video recordings of community events such as paper reading clubs and podcasts! If you are interested in AI on code please join us using this link: !"

The project's GitHub repo explains the dataset criteria and search tool used for training, along with different GPT-CC models available, training details and more.

"Our ultimate aim is to not only develop an open-source version of Github's Code Copilot, but one which is of comparable performance and ease of use," the wiki states "To that end, we are continually expanding our dataset and developing better models." Along those lines, action items for the team include:

  • Pretrain the model from scratch with the dataset we have curated from GitHub: We believe this would be quite a straightforward process if we have the computing resources.
  • Experiment with the use of GPT-J in code generation as recommended by Evaluating Large Language Models Trained on Code
  • Expand the capabilities of GPT Code Clippy to other languages especially underrepresented ones
  • Devising a custom loss function that penalises uncompilable code
  • Devise ways to update version and updates to programming languages

While GPT Code Clippy seems to be fairly popular -- nine contributors, 207 stars, 20 forks -- it's not the only GitHub Copilot alternative that has arisen since June.

The GPT-3 DEMO site, for example, lists GitHub Copilot and GPT-Code-Clippy, along with:

  • CodeVox: a voice and natural language code creation tool from Andrew Mayne, who works for OpenAI and created the project in a hackathon.
  • Tabnine: "Tabnine‚Äôs AI code completion IDE plugin completes code based on millions of programs in all languages and on your own context, empowering developers to code better and faster. Deep Tabnine is based on GPT-2, which uses the Transformer network architecture. This architecture was first developed to solve problems in natural language processing. Although modeling code and modeling natural language might appear to be unrelated tasks, modeling code requires understanding English in some unexpected ways."

Other media outlets have published similar roundups of Copilot alternatives, which seem to be mainly existing products. Taken from several sources, these include:

  • Second Mate: "An open-source, mini imitation of GitHub Copilot using EleutherAI GPT-Neo-2.7B (via Huggingface Model Hub) for Emacs. This is a much smaller model so will likely not be as effective as Copilot, but can still be interesting to play around with!"
  • Atom: Wikipedia: "Atom is a free and open-source text and source code editor for macOS, Linux, and Microsoft Windows with support for plug-ins written in JavaScript, and embedded Git Control. Developed by GitHub, Atom is a desktop application built using web technologies."
  • Captain Stack: "This feature is somewhat similar to Github Copilot's code suggestion. But instead of using AI, it sends your search query to Google, then retrieves StackOverflow answers and autocompletes them for you."
  • YouCompleteMe:" a code-completion engine for Vim."
  • Clara: Analytics India Magazine says: "Clara is an alternative to Github Copilot for VSCode. Features wise, it supports close to 50 programming languages and gives developers the snippers at an instant. Check out the source code on Github."
  • Kite: "Kite adds AI powered code completions to your code editor, giving developers superpowers."
  • Asm-Dude: "Assembly syntax highlighting and code assistance for assembly source files and the disassembly window for Visual Studio 2015, 2017 and 2019. This extension can be found in the visual studio extensions gallery or download latest installer AsmDude.vsix (v1.9.6.14). If assembly is too much of a hassle but you still want access to specific machine instructions, consider Intrinsics-Dude." This comes as a visual studio extension.

Other products that have been identified as alternatives to GitHub Copilot include Make, Spacemacs, Rust-analyzer and more, with some pundits and sites lumping in long-existing tools with AI-driven open source knockoffs of GitHub Copilot, providing a lot of so-called options that on first glance fall quite short of GitHub Copilot's capabilities. So take them with a grain of salt.

Meanwhile, much buzz still surrounds GitHub Copilot, which was decried by the nonprofit FSF (Free Software Foundation) as "unacceptable and unjust" and which has caused existential angst among developers who fear their jobs will be replaced by advanced AI coding systems, along the lines of: "Build an ASP.NET Core MVC web site optimized for selling cars."

Security and ethical concerns have also been raised about GitHub Copilot, so it will be interesting to check out the final product when it emerges from the technical preview.

About the Author

David Ramel is an editor and writer for Converge360.

comments powered by Disqus


Subscribe on YouTube