Open Source Codeium Challenges GitHub Copilot, Strips Out Non-Permissive GPL Code -- Visual Studio Magazine

Open Source Codeium Challenges GitHub Copilot, Strips Out Non-Permissive GPL Code

By David Ramel
04/24/2023

Free and open source Codeium has launched an assault on the front-running, for-pay GitHub Copilot tool in the coding assistant space.

Along with being free of OpenAI hegemony, a key selling point in that assault is that Codeium, while providing similar code-completion capabilities, does not emit code with non-permissive licensing such as GPL (General Public License). Even though the GPL license guarantees end users the four freedoms to run, study, share and modify software, it's described as a non-permissive license.

All that is explained in last Thursday's (April 20) blog post titled "GitHub Copilot Emits GPL. Codeium Does Not."

Basically, Codeium says permissive licenses (for example MIT, BSD and Apache) let people use code for commerce or any other reason, but non-permissive licenses such as GPL prohibit such usage without consent. Codeium, developed by the deep learning specialist company Exafunction, uses the MIT license. Exafunction's GitHub repos include code for using Codeium in Vim and Neovim, the Chrome browser, Emacs and more.

Last week's post discusses the legal ramifications of violating GPL licenses, regardless of intent, which is an area of software licensing that the Codeium team said has been become muddled in the wake of startling new advancements in generative AI and large language models (LLMs). Those LLMs are the "secret sauce" powering the machine learning tech that powers generative AI constructs like ChatGPT and GPT-4 from Microsoft partner OpenAI, the clear leader in advanced AI.

The post states:

Clearly a developer copy-pasting GPL code without consent is bad and grounds for legal action, but what about a generative code model? Is it wrong for such a model to "learn" from this data? The argument to do so is clear -- GPL-licensed OSS is some of the highest quality code that is publicly available, and just like any machine learning model, better quality training data almost always means better quality LLMs. The argument to not do so is perhaps less clear -- researchers say LLMs rarely spit out training data verbatim unless interacted with adversarially, but theoretically, they could. In which case, who is responsible for this clear legal infringement? The developer of the LLM or the user who unknowingly ends up accepting the LLM's suggestions and committing the code to their team's codebase? Honestly, there is no clear answer, but that's the scary part -- no user or company should be subject to legal action, even potentially, just for using an AI code assistant tool.

While GitHub Copilot is trained on GPL-licensed code, GitHub uses nonpermissive filters to screen out potentially problematic code, but Codeium claims those filters don't work, noting that "we at Codeium have removed GPL licensed code from our training data, guaranteeing peace of mind to our users."

With the licensing angle fleshed out, a comparison of GitHub Copilot and Codeium turns to features and functionality. Here, Codeium rounded up salient points for its comparison and boiled them down into the graphic below.

**[Click on image for larger view.]** GitHub Copilot vs. Codeium *(source: Codeium).*

As can be seen, besides being free, Codeium reportedly works in more IDEs and with more programming languages, while sporting similar code-generation functionality. The relative quality of that generated code, though, is measured subjectively. A comparison conducted by Codeium awarded both a 9/10 score, saying, "it appears that Github Copilot and Codeium had roughly similar consistency in addressing the goals across the tasks, with similar rates of manual intervention necessary."

That latter observation comes in a comparison among Codeium and three similar tools: GitHub Copilot, Replit and Tabnine. Unsurprisingly, Codeium comes out on on top, with the team providing the following graphic:

**[Click on image for larger view.]** Computed Cumulative Comparison Scores *(source: Codeium).*

In addition to code completion and related capabilities to explain, refactor and translate code, Codeium comes with search and chat functionality. Chat is the newest capability and is only available on the Codeium extension for Visual Studio Code.

**[Click on image for larger view.]** VS Code Extension *(source: Codeium).*

With more than 66,000 installs, the tool promises:

Unlimited single and multi-line code completions forever

IDE-integrated chat: no need to leave VSCode to ChatGPT, and use convenient suggestions such as Refactor and Explain
Support for 70+ programming languages: Javascript, Python, Typescript, PHP, Go, Java, C, C++, Rust, Ruby, and more.
Support through our Discord Community

Codeium also comes in an enterprise offering, which is fully self-hosted and comes with additional features including local personalization on private repositories, with the team noting that enterprises often have higher requirements on data handling and security than do individual developers. However, the enterprise offering only includes code completion, not the newer search and chat functionality. The enterprise offering is priced per-seat, with exact pricing dependent on the size of an organization and any custom needs.

"We are committed to keep improving our data sanitization and filtering processes as well as maintaining a fresh training dataset (with up-to-date license metadata)," Codeium said last week. "We're also going to be taking this approach to remove potentially insecure code practices from our training data. This is possible because we are one of the very few companies that are building AI applications in a fully integrated manner independent of OpenAI -- the training, the models, the serving, the integrations, and the product."

About the Author

David Ramel is an editor and writer at Converge 360.

Printable Format

comments powered by Disqus

Featured

New 'Visual Studio Hub' 1-Stop-Shop for GitHub Copilot Resources, More

Unsurprisingly, GitHub Copilot resources are front-and-center in Microsoft's new Visual Studio Hub, a one-stop-shop for all things concerning your favorite IDE.
Mastering Blazor Authentication and Authorization

At the Visual Studio Live! @ Microsoft HQ developer conference set for August, Rockford Lhotka will explain the ins and outs of authentication across Blazor Server, WebAssembly, and .NET MAUI Hybrid apps, and show how to use identity and claims to customize application behavior through fine-grained authorization.
Linear Support Vector Regression from Scratch Using C# with Evolutionary Training

Dr. James McCaffrey from Microsoft Research presents a complete end-to-end demonstration of the linear support vector regression (linear SVR) technique, where the goal is to predict a single numeric value. A linear SVR model uses an unusual error/loss function and cannot be trained using standard simple techniques, and so evolutionary optimization training is used.
Low-Code Report Says AI Will Enhance, Not Replace DIY Dev Tools

Along with replacing software developers and possibly killing humanity, advanced AI is seen by many as a death knell for the do-it-yourself, low-code/no-code tooling industry, but a new report belies that notion.
Vibe Coding with Latest Visual Studio Preview

Microsoft's latest Visual Studio preview facilitates "vibe coding," where developers mainly use GitHub Copilot AI to do all the programming in accordance with spoken or typed instructions.

Subscribe on YouTube

.NET Insight

Email Address*Country*

Please type the letters/numbers you see above.

Upcoming Training Events

0 AM

VSLive! 4-Day Hands-On Training Seminar: Hands-on with Blazor
May 5-8, 2025

Cybersecurity & Ransomware Live! VirtCon 2025
May 13-15, 2025

VSLive! 3-Day Hands-On Training Seminar: Master Modern JavaScript: Unlock the Full Potential of Your Code
June 2-4, 2025

VSLive! 2-Day Hands-On Training Seminar: Asynchronous and Parallel Programming in C#
June 24-25, 2025

VSLive! 4-Day Hands-On Training Seminar: Immersive .NET Full Stack Training: 4-Day Hands-On Experience
July 15-18, 2025

Visual Studio Live! @ Microsoft HQ
August 4-8, 2025

Visual Studio Live! San Diego
September 8-12, 2025

Live! 360 2-Day Hands-On Seminar: Swimming in the Lakes of Microsoft Fabric and AI – A Hands-on Experience
September 18-19, 2025

Live! 360 Orlando
November 16-21, 2025

Artificial Intelligence Live! Orlando
November 16-21, 2025

Cloud & Containers Live! Orlando
November 16-21, 2025

Cybersecurity & Ransomware Live! Orlando
November 16-21, 2025

Data Platform Live! Orlando
November 16-21, 2025

Visual Studio Live! Orlando
November 16-21, 2025

VSLive! 4-Day Hands-On Training Seminar: Immersive .NET Full Stack Training: 4-Day Hands-On Experience
December 16-19, 2025

Free Webcasts

> More Webcasts