New VS Code Tool: StarCoderEx (AI Code Generator)
StarCoder, a new open-access large language model (LLM) for code generation from ServiceNow and Hugging Face, is now available for Visual Studio Code, positioned as an alternative to GitHub Copilot.
StarCoder is a transformer-based LLM capable of generating code from natural language descriptions, a perfect example of the "generative AI" craze popularized by ChatGPT, the sentient-sounding, AI-supercharged chatbot from Microsoft partner OpenAI (and creator of Copilot).
Available as a VS Code extension called StarCoderEx, it can be used to generate code from natural language descriptions in the editor or in the command palette.
It stems from an open scientific collaboration between Hugging Face (machine learning specialist) and ServiceNow (digital workflow company) called BigCode.
While not strictly open source, it's parked in a GitHub repo, which describes it thusly:
StarCoder is a language model (LM) trained on source code and natural language text. Its training data incorporates more that 80 different programming languages as well as text extracted from GitHub issues and commits and from notebooks.
"The StarCoder model is designed to level the playing field so developers from organizations of all sizes can harness the power of generative AI and maximize the business impact of automation with the proper governance, safety, and compliance protocols," said a May 4 news release from ServiceNow. "This new LLM marks the next major milestone in the BigCode Project, an ambitious initiative to develop state-of-the-art AI systems for code in an open and responsible manner with the support of the open-scientific AI research community."
On the same day, Hugging Face published a blog post about the project, which involves both StarCoder and StarCoderBase LLMs. The company trained a nearly 15 billion parameter model for 1 trillion tokens, fine-tuning the StarCoderBase model for 35 billion Python tokens, which resulted in a new model called StarCoder.
"We found that StarCoderBase outperforms existing open Code LLMs on popular programming benchmarks and matches or surpasses closed models such as code-cushman-001 from OpenAI (the original Codex model that powered early versions of GitHub Copilot). With a context length of over 8,000 tokens, the StarCoder models can process more input than any other open LLM, enabling a wide range of interesting applications. For example, by prompting the StarCoder models with a series of dialogues, we enabled them to act as a technical assistant. In addition, the models can be used to autocomplete code, make modifications to code via instructions, and explain a code snippet in natural language. We take several important steps towards a safe open model release, including an improved PII redaction pipeline, a novel attribution tracing tool, and make StarCoder publicly available under an improved version of the OpenRAIL license. The updated license simplifies the process for companies to integrate the model into their products. We believe that with its strong performance, the StarCoder models will serve as a solid foundation for the community to use and adapt it to their use-cases and products."
Hugging Face set up a StarCoder - Code Completion Playground that lets users try out the model by entering a natural language description and seeing the generated code, along with a HuggingChat site that lets users chat with a prompted version of the model, for demonstration purposes only.
When asked about StarCoder, the HuggingChat site responded with: "Starcoder is a natural language processing tool built specifically for developers. Its core capabilities include generating code snippets, providing documentation links, suggesting variable names etc., while keeping track of user interactions over time."
The Hugging Face team also conducted an experiment to see if StarCoder could act as a tech assistant in addition to generating code. They built a Tech Assistant Prompt that enabled the model to act as a tech assistant and answer programming related requests, as shown in the graphic above.
"The model was trained on GitHub code," Hugging Face said. "As such it is not an instruction model and commands like 'Write a function that computes the square root.' do not work well. However, by using the Tech Assistant prompt you can turn it into a capable technical assistant."
The model is licensed under the BigCode OpenRAIL-M v1 license agreement.
As of this writing, the VS Code extension -- with the tagline: "Extension for using alternative GitHub Copilot (StarCoder API) in VSCode" -- has been downloaded 1,890 times since its debut last Friday, May 5. It has earned an average 3.0 rating (scale 0-5) from four reviewers.
David Ramel is an editor and writer for Converge360.