News

New Microsoft Sandbox Uses Natural Language LLMs for SQL Queries

Microsoft has found a new use for natural language processing capabilities in machine learning large language models (LLMs): SQL queries.

The company has set up a sandbox for developers and data pros to use its Semantic Kernel SDK to experiment and test the abilities of LLMs -- GPT-4 in this case -- to generate SQL queries based on natural language expressions.

Called NL2SQL, the sandbox project is housed in the Natural Language to SQL Console GitHub repo.

The company emphasized the experimental nature of the project, noting there are alternatives for such functionality (like WikiSQL and Spider) and cautioning that it won't necessarily lead to a viable production product.

"While other approaches exist in this space, this sample serves to showcase the capability (and limitations) of LLM using Semantic Kernel for dotnet," the GitHub repo says. "Whether or not this approach provides an adequate or cost-effective solution for any particular use-case depends on its specific context and associated expectations."

An Aug. 4 announcement expounded on that notion, stating the project's focus is to zoom into the natural abilities and limitations of GPT-4 -- an advanced LLM from partner OpenAI -- to produce relevant SQL queries, promising to share the approach, learnings and some best practices.

[Click on image for larger view.] A Query (with Typo) Translated to SQL and Execution Results (source: Microsoft).

In fact, some standard best practices are already being shared:

  • Least privilege -- Restrict to read-only access on relevant tables or views and utilize column and row-level security as appropriate.
  • Credential management -- Do not expose secrets and connection strings.
  • Injection prevention -- Never directly inject user-input into SQL statements.

"Avoid inadvertent disclosure by capturing/describing database schema at design-time to allow for review/refinement," Microsoft said. "This approach aligns with least privilege as describing schema requires higher elevation than those needed to query data."

Yet more advice: "Restrict access only to the desired data. Do not rely on schema definition or query criteria to control access."

Furthermore, the company's approach follows some basic principles:

  • Synchronizing an existing database to vector-storage is a non-starter as there is no desire to introduce consistency considerations or any type of data-movement.
  • Injecting data into the prompt-frame is also a non-starter (due to the token limit).
  • Prompts cannot be hardcoded to a specific database schema or platform.
  • Must discriminate across multiple schemas (in order to support multiple data-sources or to decompose a large schema).

Microsoft said it had heard from many in the community who want to use the Semantic Kernel SDK to query their relational database using natural language expressions.

The open source SDK helps developers easily combine AI services with conventional programming languages in their applications.

The sandbox project's GitHub repo includes a ready-made Visual Studio solution that developers can use to put the tech through its paces.

The new experimental sandbox isn't Microsoft's first use of AI for working with SQL Server, as the company announced an AI-powered "Copilot" for the latest release of SQL Server Developer Tools (SSDT) in Visual Studio, as detailed in the June Visual Studio Magazine article, "Even SQL Server Developer Tools Gets an AI Copilot."

About the Author

David Ramel is an editor and writer at Converge 360.

comments powered by Disqus

Featured

  • Compare New GitHub Copilot Free Plan for Visual Studio/VS Code to Paid Plans

    The free plan restricts the number of completions, chat requests and access to AI models, being suitable for occasional users and small projects.

  • Diving Deep into .NET MAUI

    Ever since someone figured out that fiddling bits results in source code, developers have sought one codebase for all types of apps on all platforms, with Microsoft's latest attempt to further that effort being .NET MAUI.

  • Copilot AI Boosts Abound in New VS Code v1.96

    Microsoft improved on its new "Copilot Edit" functionality in the latest release of Visual Studio Code, v1.96, its open-source based code editor that has become the most popular in the world according to many surveys.

  • AdaBoost Regression Using C#

    Dr. James McCaffrey from Microsoft Research presents a complete end-to-end demonstration of the AdaBoost.R2 algorithm for regression problems (where the goal is to predict a single numeric value). The implementation follows the original source research paper closely, so you can use it as a guide for customization for specific scenarios.

  • Versioning and Documenting ASP.NET Core Services

    Building an API with ASP.NET Core is only half the job. If your API is going to live more than one release cycle, you're going to need to version it. If you have other people building clients for it, you're going to need to document it.

Subscribe on YouTube