News

New Microsoft Sandbox Uses Natural Language LLMs for SQL Queries

Microsoft has found a new use for natural language processing capabilities in machine learning large language models (LLMs): SQL queries.

The company has set up a sandbox for developers and data pros to use its Semantic Kernel SDK to experiment and test the abilities of LLMs -- GPT-4 in this case -- to generate SQL queries based on natural language expressions.

Called NL2SQL, the sandbox project is housed in the Natural Language to SQL Console GitHub repo.

The company emphasized the experimental nature of the project, noting there are alternatives for such functionality (like WikiSQL and Spider) and cautioning that it won't necessarily lead to a viable production product.

"While other approaches exist in this space, this sample serves to showcase the capability (and limitations) of LLM using Semantic Kernel for dotnet," the GitHub repo says. "Whether or not this approach provides an adequate or cost-effective solution for any particular use-case depends on its specific context and associated expectations."

An Aug. 4 announcement expounded on that notion, stating the project's focus is to zoom into the natural abilities and limitations of GPT-4 -- an advanced LLM from partner OpenAI -- to produce relevant SQL queries, promising to share the approach, learnings and some best practices.

[Click on image for larger view.] A Query (with Typo) Translated to SQL and Execution Results (source: Microsoft).

In fact, some standard best practices are already being shared:

  • Least privilege -- Restrict to read-only access on relevant tables or views and utilize column and row-level security as appropriate.
  • Credential management -- Do not expose secrets and connection strings.
  • Injection prevention -- Never directly inject user-input into SQL statements.

"Avoid inadvertent disclosure by capturing/describing database schema at design-time to allow for review/refinement," Microsoft said. "This approach aligns with least privilege as describing schema requires higher elevation than those needed to query data."

Yet more advice: "Restrict access only to the desired data. Do not rely on schema definition or query criteria to control access."

Furthermore, the company's approach follows some basic principles:

  • Synchronizing an existing database to vector-storage is a non-starter as there is no desire to introduce consistency considerations or any type of data-movement.
  • Injecting data into the prompt-frame is also a non-starter (due to the token limit).
  • Prompts cannot be hardcoded to a specific database schema or platform.
  • Must discriminate across multiple schemas (in order to support multiple data-sources or to decompose a large schema).

Microsoft said it had heard from many in the community who want to use the Semantic Kernel SDK to query their relational database using natural language expressions.

The open source SDK helps developers easily combine AI services with conventional programming languages in their applications.

The sandbox project's GitHub repo includes a ready-made Visual Studio solution that developers can use to put the tech through its paces.

The new experimental sandbox isn't Microsoft's first use of AI for working with SQL Server, as the company announced an AI-powered "Copilot" for the latest release of SQL Server Developer Tools (SSDT) in Visual Studio, as detailed in the June Visual Studio Magazine article, "Even SQL Server Developer Tools Gets an AI Copilot."

About the Author

David Ramel is an editor and writer at Converge 360.

comments powered by Disqus

Featured

  • Microsoft Revamps Fledgling AutoGen Framework for Agentic AI

    Only at v0.4, Microsoft's AutoGen framework for agentic AI -- the hottest new trend in AI development -- has already undergone a complete revamp, going to an asynchronous, event-driven architecture.

  • IDE Irony: Coding Errors Cause 'Critical' Vulnerability in Visual Studio

    In a larger-than-normal Patch Tuesday, Microsoft warned of a "critical" vulnerability in Visual Studio that should be fixed immediately if automatic patching isn't enabled, ironically caused by coding errors.

  • Building Blazor Applications

    A trio of Blazor experts will conduct a full-day workshop for devs to learn everything about the tech a a March developer conference in Las Vegas keynoted by Microsoft execs and featuring many Microsoft devs.

  • Gradient Boosting Regression Using C#

    Dr. James McCaffrey from Microsoft Research presents a complete end-to-end demonstration of the gradient boosting regression technique, where the goal is to predict a single numeric value. Compared to existing library implementations of gradient boosting regression, a from-scratch implementation allows much easier customization and integration with other .NET systems.

  • Microsoft Execs to Tackle AI and Cloud in Dev Conference Keynotes

    AI unsurprisingly is all over keynotes that Microsoft execs will helm to kick off the Visual Studio Live! developer conference in Las Vegas, March 10-14, which the company described as "a must-attend event."

Subscribe on YouTube