News

New VS Code Tool: StarCoderEx (AI Code Generator)

StarCoder, a new open-access large language model (LLM) for code generation from ServiceNow and Hugging Face, is now available for Visual Studio Code, positioned as an alternative to GitHub Copilot.

StarCoder is a transformer-based LLM capable of generating code from natural language descriptions, a perfect example of the "generative AI" craze popularized by ChatGPT, the sentient-sounding, AI-supercharged chatbot from Microsoft partner OpenAI (and creator of Copilot).

Available as a VS Code extension called StarCoderEx, it can be used to generate code from natural language descriptions in the editor or in the command palette.

[Click on image for larger view.] StarCoderEX (source: Lisoveliy).

It stems from an open scientific collaboration between Hugging Face (machine learning specialist) and ServiceNow (digital workflow company) called BigCode.

While not strictly open source, it's parked in a GitHub repo, which describes it thusly:

StarCoder is a language model (LM) trained on source code and natural language text. Its training data incorporates more that 80 different programming languages as well as text extracted from GitHub issues and commits and from notebooks.

"The StarCoder model is designed to level the playing field so developers from organizations of all sizes can harness the power of generative AI and maximize the business impact of automation with the proper governance, safety, and compliance protocols," said a May 4 news release from ServiceNow. "This new LLM marks the next major milestone in the BigCode Project, an ambitious initiative to develop state-of-the-art AI systems for code in an open and responsible manner with the support of the open-scientific AI research community."

On the same day, Hugging Face published a blog post about the project, which involves both StarCoder and StarCoderBase LLMs. The company trained a nearly 15 billion parameter model for 1 trillion tokens, fine-tuning the StarCoderBase model for 35 billion Python tokens, which resulted in a new model called StarCoder.

"We found that StarCoderBase outperforms existing open Code LLMs on popular programming benchmarks and matches or surpasses closed models such as code-cushman-001 from OpenAI (the original Codex model that powered early versions of GitHub Copilot). With a context length of over 8,000 tokens, the StarCoder models can process more input than any other open LLM, enabling a wide range of interesting applications. For example, by prompting the StarCoder models with a series of dialogues, we enabled them to act as a technical assistant. In addition, the models can be used to autocomplete code, make modifications to code via instructions, and explain a code snippet in natural language. We take several important steps towards a safe open model release, including an improved PII redaction pipeline, a novel attribution tracing tool, and make StarCoder publicly available under an improved version of the OpenRAIL license. The updated license simplifies the process for companies to integrate the model into their products. We believe that with its strong performance, the StarCoder models will serve as a solid foundation for the community to use and adapt it to their use-cases and products."

Hugging Face set up a StarCoder - Code Completion Playground that lets users try out the model by entering a natural language description and seeing the generated code, along with a HuggingChat site that lets users chat with a prompted version of the model, for demonstration purposes only.

When asked about StarCoder, the HuggingChat site responded with: "Starcoder is a natural language processing tool built specifically for developers. Its core capabilities include generating code snippets, providing documentation links, suggesting variable names etc., while keeping track of user interactions over time."

[Click on image for larger view.] Tech Assistant Chat Examples (source: Hugging Face).

The Hugging Face team also conducted an experiment to see if StarCoder could act as a tech assistant in addition to generating code. They built a Tech Assistant Prompt that enabled the model to act as a tech assistant and answer programming related requests, as shown in the graphic above.

"The model was trained on GitHub code," Hugging Face said. "As such it is not an instruction model and commands like 'Write a function that computes the square root.' do not work well. However, by using the Tech Assistant prompt you can turn it into a capable technical assistant."

The model is licensed under the BigCode OpenRAIL-M v1 license agreement.

As of this writing, the VS Code extension -- with the tagline: "Extension for using alternative GitHub Copilot (StarCoder API) in VSCode" -- has been downloaded 1,890 times since its debut last Friday, May 5. It has earned an average 3.0 rating (scale 0-5) from four reviewers.

About the Author

David Ramel is an editor and writer at Converge 360.

comments powered by Disqus

Featured

  • IDE Irony: Coding Errors Cause 'Critical' Vulnerability in Visual Studio

    In a larger-than-normal Patch Tuesday, Microsoft warned of a "critical" vulnerability in Visual Studio that should be fixed immediately if automatic patching isn't enabled, ironically caused by coding errors.

  • Building Blazor Applications

    A trio of Blazor experts will conduct a full-day workshop for devs to learn everything about the tech a a March developer conference in Las Vegas keynoted by Microsoft execs and featuring many Microsoft devs.

  • Gradient Boosting Regression Using C#

    Dr. James McCaffrey from Microsoft Research presents a complete end-to-end demonstration of the gradient boosting regression technique, where the goal is to predict a single numeric value. Compared to existing library implementations of gradient boosting regression, a from-scratch implementation allows much easier customization and integration with other .NET systems.

  • Microsoft Execs to Tackle AI and Cloud in Dev Conference Keynotes

    AI unsurprisingly is all over keynotes that Microsoft execs will helm to kick off the Visual Studio Live! developer conference in Las Vegas, March 10-14, which the company described as "a must-attend event."

  • Copilot Agentic AI Dev Environment Opens Up to All

    Microsoft removed waitlist restrictions for some of its most advanced GenAI tech, Copilot Workspace, recently made available as a technical preview.

Subscribe on YouTube