News

ML.NET 3.0 Boosts Deep Learning, Data Processing for .NET-Based AI Apps

Microsoft shipped ML.NET 3.0, enhancing deep learning and data processing scenarios in the company's machine language framework that lets devs create AI-infused apps completely within the .NET ecosystem.

The ability for devs to use C# and F# instead of the usual data science stalwarts -- Python and R -- is a main selling point of the open source ML.NET framework, created to help developers build custom ML models and integrate them into apps. That's done with tools such as a command-line interface (CLI) and Model Builder or creating constructs like those large language models (LLMs) that power ChatGPT and Microsoft's ubiquitous "Copilot" AI assistants.

ML.NET comes in a NuGet package that has been downloaded more than 5.3 million times.

[Click on image for larger view.] ML.NET (source: Microsoft).

In announcing ML.NET 3.0 yesterday (Nov. 27), Microsoft emphasized two main points of interest, deep learning and data processing.

Deep Learning
This ML subset uses artificial neural networks loosely based on human brain behaviors in order to "learn" from inputs such as large amounts of data, even unstructured data.

Microsoft said deep learning scenarios were substantially expanded in the v3.0 release with new capabilities in three areas: object detection, named entity recognition and question answering.

Object detection in ML.NET 3.0 is an advanced form of image classification that not only categorizes entities within images but also locates them, making it ideal for scenarios with images containing multiple objects of different types. In v3.0, the object detection capabilities are boosted via integrations with TorchSharp and ONNX models, with Microsoft specifically noting TorchSharp-powered object detection APIs. The company said those represent a significant step in leveraging deep learning techniques within the ML.NET framework.

In discussing advanced neural network architecture, Microsoft explained that the underlying technology of the object detection API includes techniques developed at Microsoft Research, utilizing a transformer-based neural network architecture. This approach is indicative of modern trends in deep learning, particularly in computer vision, the company said.

TorchSharp is also instrumental in the enhancements to named entity recognition and question answering, two common ML areas that are part of the natural language processing (NLP) space. Enhancements for both of those scenarios are unlocked in ML.NET 3.0 by leveraging TorchSharp RoBERTa text classification features previously introduced.

"Both the NER and QA trainers are included in the Microsoft.ML.TorchSharp 3.0.0 package and the Microsoft.ML.TorchSharp namespace," Microsoft said.

Data Processing
Here, scenarios are improved via many enhancements and bug fixes to DataFrame -- a structure for storing and manipulating data -- along with new IDataView interoperability features.

"The important steps of loading, inspecting, transforming, and visualizing your data are much more powerful," Microsoft said.

Specific items of note include:

  • Enhanced IDataView <-> DataFrame conversions: Added support for String and VBuffer column types, with String values handled as ReadOnlyMemory<char> and VBuffer supporting all backing primitives.
  • Increased column data capacity: Columns can now store more than 2 GB of data, removing the previous limitation.
  • Apache Arrow integration: Recognizes Apache Arrow Date64 column data.
  • Expanded data loading capabilities: Includes import and export functionality for SQL databases using ADO.NET. Also, data can be loaded from any IEnumerable collection and exported to System.Data.DataTable.
  • Appending data between DataFrames: Allows appending data from one DataFrame to another when column names match, easing constraints on column ordering.
  • Handling of duplicate column names: Enhancements in DataFrame.LoadCsv to manage duplicate column names, offering options to rename them.
  • Improved arithmetic performance and null value handling: Optimizations in column cloning, binary comparison scenarios, and arithmetic operations.
  • Debugger enhancements: Better readability for columns with long names in the debugger.

Microsoft also noted new tensor primitive integrations that don't affect development tasks directly but do provide notable performance improvements. AutoML, which automates the process of applying machine learning to data, was also enhanced, providing a boost to associated experiences in Model Builder and the ML.NET CLI.

Much more about all of the above and other changes can be found in the release notes.

Going forward, the dev team is now working on plans for .NET 9 and ML.NET 4.0, though Model Builder and the ML.NET CLI are expected to be updated much sooner in order to consume the ML.NET 3.0 release.

"We know we will continue expanding deep learning scenarios and integrations, and we know we will keep making enhancements to DataFrame," Microsoft said. "We will keep expanding the APIs available in System.Numerics.Tensors and integrating them into ML.NET. Stay tuned for more detailed ML.NET 4.0 plans."

About the Author

David Ramel is an editor and writer at Converge 360.

comments powered by Disqus

Featured

  • Hands On: New VS Code Insiders Build Creates Web Page from Image in Seconds

    New Vision support with GitHub Copilot in the latest Visual Studio Code Insiders build takes a user-supplied mockup image and creates a web page from it in seconds, handling all the HTML and CSS.

  • Naive Bayes Regression Using C#

    Dr. James McCaffrey from Microsoft Research presents a complete end-to-end demonstration of the naive Bayes regression technique, where the goal is to predict a single numeric value. Compared to other machine learning regression techniques, naive Bayes regression is usually less accurate, but is simple, easy to implement and customize, works on both large and small datasets, is highly interpretable, and doesn't require tuning any hyperparameters.

  • VS Code Copilot Previews New GPT-4o AI Code Completion Model

    The 4o upgrade includes additional training on more than 275,000 high-quality public repositories in over 30 popular programming languages, said Microsoft-owned GitHub, which created the original "AI pair programmer" years ago.

  • Microsoft's Rust Embrace Continues with Azure SDK Beta

    "Rust's strong type system and ownership model help prevent common programming errors such as null pointer dereferencing and buffer overflows, leading to more secure and stable code."

  • Xcode IDE from Microsoft Archrival Apple Gets Copilot AI

    Just after expanding the reach of its Copilot AI coding assistant to the open-source Eclipse IDE, Microsoft showcased how it's going even further, providing details about a preview version for the Xcode IDE from archrival Apple.

Subscribe on YouTube

Upcoming Training Events