News

New Microsoft Sandbox Uses Natural Language LLMs for SQL Queries

Microsoft has found a new use for natural language processing capabilities in machine learning large language models (LLMs): SQL queries.

The company has set up a sandbox for developers and data pros to use its Semantic Kernel SDK to experiment and test the abilities of LLMs -- GPT-4 in this case -- to generate SQL queries based on natural language expressions.

Called NL2SQL, the sandbox project is housed in the Natural Language to SQL Console GitHub repo.

The company emphasized the experimental nature of the project, noting there are alternatives for such functionality (like WikiSQL and Spider) and cautioning that it won't necessarily lead to a viable production product.

"While other approaches exist in this space, this sample serves to showcase the capability (and limitations) of LLM using Semantic Kernel for dotnet," the GitHub repo says. "Whether or not this approach provides an adequate or cost-effective solution for any particular use-case depends on its specific context and associated expectations."

An Aug. 4 announcement expounded on that notion, stating the project's focus is to zoom into the natural abilities and limitations of GPT-4 -- an advanced LLM from partner OpenAI -- to produce relevant SQL queries, promising to share the approach, learnings and some best practices.

[Click on image for larger view.] A Query (with Typo) Translated to SQL and Execution Results (source: Microsoft).

In fact, some standard best practices are already being shared:

  • Least privilege -- Restrict to read-only access on relevant tables or views and utilize column and row-level security as appropriate.
  • Credential management -- Do not expose secrets and connection strings.
  • Injection prevention -- Never directly inject user-input into SQL statements.

"Avoid inadvertent disclosure by capturing/describing database schema at design-time to allow for review/refinement," Microsoft said. "This approach aligns with least privilege as describing schema requires higher elevation than those needed to query data."

Yet more advice: "Restrict access only to the desired data. Do not rely on schema definition or query criteria to control access."

Furthermore, the company's approach follows some basic principles:

  • Synchronizing an existing database to vector-storage is a non-starter as there is no desire to introduce consistency considerations or any type of data-movement.
  • Injecting data into the prompt-frame is also a non-starter (due to the token limit).
  • Prompts cannot be hardcoded to a specific database schema or platform.
  • Must discriminate across multiple schemas (in order to support multiple data-sources or to decompose a large schema).

Microsoft said it had heard from many in the community who want to use the Semantic Kernel SDK to query their relational database using natural language expressions.

The open source SDK helps developers easily combine AI services with conventional programming languages in their applications.

The sandbox project's GitHub repo includes a ready-made Visual Studio solution that developers can use to put the tech through its paces.

The new experimental sandbox isn't Microsoft's first use of AI for working with SQL Server, as the company announced an AI-powered "Copilot" for the latest release of SQL Server Developer Tools (SSDT) in Visual Studio, as detailed in the June Visual Studio Magazine article, "Even SQL Server Developer Tools Gets an AI Copilot."

About the Author

David Ramel is an editor and writer at Converge 360.

comments powered by Disqus

Featured

  • Hands On: New VS Code Insiders Build Creates Web Page from Image in Seconds

    New Vision support with GitHub Copilot in the latest Visual Studio Code Insiders build takes a user-supplied mockup image and creates a web page from it in seconds, handling all the HTML and CSS.

  • Naive Bayes Regression Using C#

    Dr. James McCaffrey from Microsoft Research presents a complete end-to-end demonstration of the naive Bayes regression technique, where the goal is to predict a single numeric value. Compared to other machine learning regression techniques, naive Bayes regression is usually less accurate, but is simple, easy to implement and customize, works on both large and small datasets, is highly interpretable, and doesn't require tuning any hyperparameters.

  • VS Code Copilot Previews New GPT-4o AI Code Completion Model

    The 4o upgrade includes additional training on more than 275,000 high-quality public repositories in over 30 popular programming languages, said Microsoft-owned GitHub, which created the original "AI pair programmer" years ago.

  • Microsoft's Rust Embrace Continues with Azure SDK Beta

    "Rust's strong type system and ownership model help prevent common programming errors such as null pointer dereferencing and buffer overflows, leading to more secure and stable code."

  • Xcode IDE from Microsoft Archrival Apple Gets Copilot AI

    Just after expanding the reach of its Copilot AI coding assistant to the open-source Eclipse IDE, Microsoft showcased how it's going even further, providing details about a preview version for the Xcode IDE from archrival Apple.

Subscribe on YouTube

Upcoming Training Events