News

Visual Studio Uses RUBICON to Improve AI Conversations

AI assistants have improved coding productivity, but how do you know if they're getting better?

Even if a user bothers to rate an AI interaction by a simple thumbs-up or thumbs-down click, that doesn't give tool creators much insight into how to improve the conversations.

So Microsoft created and last week published a paper on RUBICON, a rubric-based evaluation system that helps improve the quality of human-AI conversations in domain-specific settings, already being used in the company's flagship Visual Studio IDE.

The Microsoft Research paper, titled "RUBICON: Rubric-based Evaluation of Domain Specific Human-AI Conversations," was published last week by authors Param Biyani, Yasharth Bajpai, Arjun Radhakrishna, Gustavo Soares and Sumit Gulwani.

The researchers posit that Generative AI has transformed AI assistants in software development, making it harder to evaluate their impact on user experience as tools like GitHub Copilot become more advanced and specialized. The devs creating these AI assistants have a hard time gauging how modifications to the tools improve the user experience. Microsoft's solution leverages the concept of rubrics to evaluate conversation quality. Often used in education, rubrics are basically a set of guidelines or criteria used to evaluate or grade assignments, projects or performances -- in this case used to assess conversation quality.

"Traditional feedback mechanisms, such as simple thumbs-up or thumbs-down ratings, fall short in capturing the complexities of interactions within specialized settings, where nuanced data is often sparse," the authors said in a July 15 blog post introducing the paper. "RUBICON leverages large language models to generate rubrics for assessing conversation quality. It employs a selection process to choose the subset of rubrics based on their performance in scoring conversations. In our experiments, RUBICON effectively learns to differentiate conversation quality, achieving higher accuracy and yield rates than existing baselines."

Without knowing it, Visual Studio users are already benefiting from RUBICON.

"RUBICON-generated rubrics serve as a framework for understanding user needs, expectations, and conversational norms," the blog post said."These rubrics have been successfully implemented in Visual Studio IDE, where they have guided analysis of over 12,000 debugging conversations, offering valuable insights into the effectiveness of modifications made to the assistant and facilitating rapid fast iteration and improvement. For example, the rubrics 'The AI gave a solution too quickly, rather than asking the user for more information and trying to find the root cause of the issue,' or 'The AI gave a mostly surface-level solution to the problem,' have indicated issues where the assistant prematurely offered solutions without gathering sufficient information. These findings led to adjustments in the AI's behavior, making it more investigative and collaborative."

For visual examples, the researchers illustrated how, in Visual Studio, the AI helps the developer debug a program by providing detailed explanations and relevant code examples, shown in Figure 1. In Figure 2, its responses reflect that it's guided by context.

Figure 1: Contrasting interactions with two versions of the Visual Studio Debugging Assistant for the same task. On the left, the assistant makes assumptions without seeking clarification. On the right, the assistant proactively investigates the error, collaborates with the developer to gather essential information, and achieves a practical solution.
[Click on image for larger view.] Figure 1: Contrasting interactions with two versions of the Visual Studio Debugging Assistant for the same task. On the left, the assistant makes assumptions without seeking clarification. On the right, the assistant proactively investigates the error, collaborates with the developer to gather essential information, and achieves a practical solution. (source: Microsoft).
Figure 2: Context awareness significantly improves the AI assistant's efficacy. The response on the left is generic, superficially referring to the developer's code and restating the obvious, providing little value. The reply on the right directs the developer toward a specific solution, the toJSON method.
[Click on image for larger view.] Figure 2: Context awareness significantly improves the AI assistant's efficacy. The response on the left is generic, superficially referring to the developer's code and restating the obvious, providing little value. The reply on the right directs the developer toward a specific solution, the toJSON method. (source: Microsoft).

"Developers of AI assistance value clear insights into the performance of their interfaces," last week's post said. "RUBICON represents a valuable step toward developing a refined evaluation system that is sensitive to domain-specific tasks, adaptable to changing usage patterns, efficient, easy-to-implement, and privacy-conscious. A robust evaluation system like RUBICON can help to improve the quality of these tools without compromising user privacy or data security. As we look ahead, our goal is to broaden the applicability of RUBICON beyond just debugging in AI assistants like GitHub Copilot. We aim to support additional tasks like migration and scaffolding within IDEs, extending its utility to other chat-based Copilot experiences across various products."

About the Author

David Ramel is an editor and writer at Converge 360.

comments powered by Disqus

Featured

  • Hands On: New VS Code Insiders Build Creates Web Page from Image in Seconds

    New Vision support with GitHub Copilot in the latest Visual Studio Code Insiders build takes a user-supplied mockup image and creates a web page from it in seconds, handling all the HTML and CSS.

  • Naive Bayes Regression Using C#

    Dr. James McCaffrey from Microsoft Research presents a complete end-to-end demonstration of the naive Bayes regression technique, where the goal is to predict a single numeric value. Compared to other machine learning regression techniques, naive Bayes regression is usually less accurate, but is simple, easy to implement and customize, works on both large and small datasets, is highly interpretable, and doesn't require tuning any hyperparameters.

  • VS Code Copilot Previews New GPT-4o AI Code Completion Model

    The 4o upgrade includes additional training on more than 275,000 high-quality public repositories in over 30 popular programming languages, said Microsoft-owned GitHub, which created the original "AI pair programmer" years ago.

  • Microsoft's Rust Embrace Continues with Azure SDK Beta

    "Rust's strong type system and ownership model help prevent common programming errors such as null pointer dereferencing and buffer overflows, leading to more secure and stable code."

  • Xcode IDE from Microsoft Archrival Apple Gets Copilot AI

    Just after expanding the reach of its Copilot AI coding assistant to the open-source Eclipse IDE, Microsoft showcased how it's going even further, providing details about a preview version for the Xcode IDE from archrival Apple.

Subscribe on YouTube

Upcoming Training Events