News

New Microsoft Sandbox Uses Natural Language LLMs for SQL Queries

Microsoft has found a new use for natural language processing capabilities in machine learning large language models (LLMs): SQL queries.

The company has set up a sandbox for developers and data pros to use its Semantic Kernel SDK to experiment and test the abilities of LLMs -- GPT-4 in this case -- to generate SQL queries based on natural language expressions.

Called NL2SQL, the sandbox project is housed in the Natural Language to SQL Console GitHub repo.

The company emphasized the experimental nature of the project, noting there are alternatives for such functionality (like WikiSQL and Spider) and cautioning that it won't necessarily lead to a viable production product.

"While other approaches exist in this space, this sample serves to showcase the capability (and limitations) of LLM using Semantic Kernel for dotnet," the GitHub repo says. "Whether or not this approach provides an adequate or cost-effective solution for any particular use-case depends on its specific context and associated expectations."

An Aug. 4 announcement expounded on that notion, stating the project's focus is to zoom into the natural abilities and limitations of GPT-4 -- an advanced LLM from partner OpenAI -- to produce relevant SQL queries, promising to share the approach, learnings and some best practices.

[Click on image for larger view.] A Query (with Typo) Translated to SQL and Execution Results (source: Microsoft).

In fact, some standard best practices are already being shared:

  • Least privilege -- Restrict to read-only access on relevant tables or views and utilize column and row-level security as appropriate.
  • Credential management -- Do not expose secrets and connection strings.
  • Injection prevention -- Never directly inject user-input into SQL statements.

"Avoid inadvertent disclosure by capturing/describing database schema at design-time to allow for review/refinement," Microsoft said. "This approach aligns with least privilege as describing schema requires higher elevation than those needed to query data."

Yet more advice: "Restrict access only to the desired data. Do not rely on schema definition or query criteria to control access."

Furthermore, the company's approach follows some basic principles:

  • Synchronizing an existing database to vector-storage is a non-starter as there is no desire to introduce consistency considerations or any type of data-movement.
  • Injecting data into the prompt-frame is also a non-starter (due to the token limit).
  • Prompts cannot be hardcoded to a specific database schema or platform.
  • Must discriminate across multiple schemas (in order to support multiple data-sources or to decompose a large schema).

Microsoft said it had heard from many in the community who want to use the Semantic Kernel SDK to query their relational database using natural language expressions.

The open source SDK helps developers easily combine AI services with conventional programming languages in their applications.

The sandbox project's GitHub repo includes a ready-made Visual Studio solution that developers can use to put the tech through its paces.

The new experimental sandbox isn't Microsoft's first use of AI for working with SQL Server, as the company announced an AI-powered "Copilot" for the latest release of SQL Server Developer Tools (SSDT) in Visual Studio, as detailed in the June Visual Studio Magazine article, "Even SQL Server Developer Tools Gets an AI Copilot."

About the Author

David Ramel is an editor and writer at Converge 360.

comments powered by Disqus

Featured

  • Mastering Blazor Authentication and Authorization

    At the Visual Studio Live! @ Microsoft HQ developer conference set for August, Rockford Lhotka will explain the ins and outs of authentication across Blazor Server, WebAssembly, and .NET MAUI Hybrid apps, and show how to use identity and claims to customize application behavior through fine-grained authorization.

  • Linear Support Vector Regression from Scratch Using C# with Evolutionary Training

    Dr. James McCaffrey from Microsoft Research presents a complete end-to-end demonstration of the linear support vector regression (linear SVR) technique, where the goal is to predict a single numeric value. A linear SVR model uses an unusual error/loss function and cannot be trained using standard simple techniques, and so evolutionary optimization training is used.

  • Low-Code Report Says AI Will Enhance, Not Replace DIY Dev Tools

    Along with replacing software developers and possibly killing humanity, advanced AI is seen by many as a death knell for the do-it-yourself, low-code/no-code tooling industry, but a new report belies that notion.

  • Vibe Coding with Latest Visual Studio Preview

    Microsoft's latest Visual Studio preview facilitates "vibe coding," where developers mainly use GitHub Copilot AI to do all the programming in accordance with spoken or typed instructions.

  • Steve Sanderson Previews AI App Dev: Small Models, Agents and a Blazor Voice Assistant

    Blazor creator Steve Sanderson presented a keynote at the recent NDC London 2025 conference where he previewed the future of .NET application development with smaller AI models and autonomous agents, along with showcasing a new Blazor voice assistant project demonstrating cutting-edge functionality.

Subscribe on YouTube