GitHub Copilot AI Spawns Open Source Alternatives
GitHub Copilot, described as an "AI pair programmer," debuted this year with a splash, amazing developers with its ability to supply chunks of code when a user is typing in Visual Studio Code and even generate whole applications solely through typed commands.
That debut came in June, when Microsoft partner OpenAI announced the tool, powered by a new AI system called Codex, which has been described as an improved descendent of GPT-3 (Generative Pre-trained Transformer) that can translate natural language into code. Since then it has been steadily improved and offered as an API.
GitHub said after trials and testing of the technical preview are complete, the company intends to offer it as a commercial product for VS Code and the full-fledged Visual Studio IDE.
A for-pay product plan apparently didn't sit well with some in the industry, as open source alternatives have sprung up.
Take, for example, GPT Code Clippy: The Open Source version of GitHub Copilot.
That wiki for the GPT-Code-Clippy (GPT-CC) project that's hosted in a GitHub repo says: "GPT-Code-Clippy (GPT-CC) is a community effort to create an open-source version of GitHub Copilot, an AI pair programmer based on GPT-3, called GPT-Codex. GPT-CC is fine-tuned on our GPT Code Clippy dataset sourced from publicly available code on GitHub. It was created to allow researchers to easily study large deep learning models that are trained on code to better understand their abilities and limitations. GPT-CC uses the GPT-Neo model as the base language model, which has been pretrained on the Pile dataset and we use the Causal Language Modelling objective to train the model."
The "we" references above refer to the repo root, CodedotAI. A YouTube channel describes Code.AI as "a community dedicated for all things related to AI for code. In this community we not only discuss deep learning or code generation, we also discuss things like evolutionary computation and code documentation! It is a great place to find fellow like-minded researchers and developers, build a team of collaborators, find a project to work on, or brainstorm project and research ideas! On this channel we post video recordings of community events such as paper reading clubs and podcasts! If you are interested in AI on code please join us using this link: https://discord.gg/68NZFfxHxD !"
The project's GitHub repo explains the dataset criteria and search tool used for training, along with different GPT-CC models available, training details and more.
"Our ultimate aim is to not only develop an open-source version of Github's Code Copilot, but one which is of comparable performance and ease of use," the wiki states "To that end, we are continually expanding our dataset and developing better models." Along those lines, action items for the team include:
- Pretrain the model from scratch with the dataset we have curated from GitHub: We believe this would be quite a straightforward process if we have the computing resources.
- Experiment with the use of GPT-J in code generation as recommended by Evaluating Large Language Models Trained on Code
- Expand the capabilities of GPT Code Clippy to other languages especially underrepresented ones
- Devising a custom loss function that penalises uncompilable code
- Devise ways to update version and updates to programming languages
While GPT Code Clippy seems to be fairly popular -- nine contributors, 207 stars, 20 forks -- it's not the only GitHub Copilot alternative that has arisen since June.
The GPT-3 DEMO site, for example, lists GitHub Copilot and GPT-Code-Clippy, along with:
- CodeVox: a voice and natural language code creation tool from Andrew Mayne, who works for OpenAI and created the project in a hackathon.
- Tabnine: "Tabnine’s AI code completion IDE plugin completes code based on millions of programs in all languages and on your own context, empowering developers to code better and faster. Deep Tabnine is based on GPT-2, which uses the Transformer network architecture. This architecture was first developed to solve problems in natural language processing. Although modeling code and modeling natural language might appear to be unrelated tasks, modeling code requires understanding English in some unexpected ways."
Other media outlets have published similar roundups of Copilot alternatives, which seem to be mainly existing products. Taken from several sources, these include:
- Second Mate: "An open-source, mini imitation of GitHub Copilot using EleutherAI GPT-Neo-2.7B (via Huggingface Model Hub) for Emacs. This is a much smaller model so will likely not be as effective as Copilot, but can still be interesting to play around with!"
- Captain Stack: "This feature is somewhat similar to Github Copilot's code suggestion. But instead of using AI, it sends your search query to Google, then retrieves StackOverflow answers and autocompletes them for you."
- YouCompleteMe:" a code-completion engine for Vim."
- Clara: Analytics India Magazine says: "Clara is an alternative to Github Copilot for VSCode. Features wise, it supports close to 50 programming languages and gives developers the snippers at an instant. Check out the source code on Github."
- Kite: "Kite adds AI powered code completions to your code editor, giving developers superpowers."
- Asm-Dude: "Assembly syntax highlighting and code assistance for assembly source files and the disassembly window for Visual Studio 2015, 2017 and 2019. This extension can be found in the visual studio extensions gallery or download latest installer AsmDude.vsix (v184.108.40.206). If assembly is too much of a hassle but you still want access to specific machine instructions, consider Intrinsics-Dude." This comes as a visual studio extension.
Other products that have been identified as alternatives to GitHub Copilot include Make, Spacemacs, Rust-analyzer and more, with some pundits and sites lumping in long-existing tools with AI-driven open source knockoffs of GitHub Copilot, providing a lot of so-called options that on first glance fall quite short of GitHub Copilot's capabilities. So take them with a grain of salt.
Meanwhile, much buzz still surrounds GitHub Copilot, which was decried by the nonprofit FSF (Free Software Foundation) as "unacceptable and unjust" and which has caused existential angst among developers who fear their jobs will be replaced by advanced AI coding systems, along the lines of: "Build an ASP.NET Core MVC web site optimized for selling cars."
Security and ethical concerns have also been raised about GitHub Copilot, so it will be interesting to check out the final product when it emerges from the technical preview.
David Ramel is an editor and writer for Converge360.