News

ML.NET 3.0 Boosts Deep Learning, Data Processing for .NET-Based AI Apps

Microsoft shipped ML.NET 3.0, enhancing deep learning and data processing scenarios in the company's machine language framework that lets devs create AI-infused apps completely within the .NET ecosystem.

The ability for devs to use C# and F# instead of the usual data science stalwarts -- Python and R -- is a main selling point of the open source ML.NET framework, created to help developers build custom ML models and integrate them into apps. That's done with tools such as a command-line interface (CLI) and Model Builder or creating constructs like those large language models (LLMs) that power ChatGPT and Microsoft's ubiquitous "Copilot" AI assistants.

ML.NET comes in a NuGet package that has been downloaded more than 5.3 million times.

[Click on image for larger view.] ML.NET (source: Microsoft).

In announcing ML.NET 3.0 yesterday (Nov. 27), Microsoft emphasized two main points of interest, deep learning and data processing.

Deep Learning
This ML subset uses artificial neural networks loosely based on human brain behaviors in order to "learn" from inputs such as large amounts of data, even unstructured data.

Microsoft said deep learning scenarios were substantially expanded in the v3.0 release with new capabilities in three areas: object detection, named entity recognition and question answering.

Object detection in ML.NET 3.0 is an advanced form of image classification that not only categorizes entities within images but also locates them, making it ideal for scenarios with images containing multiple objects of different types. In v3.0, the object detection capabilities are boosted via integrations with TorchSharp and ONNX models, with Microsoft specifically noting TorchSharp-powered object detection APIs. The company said those represent a significant step in leveraging deep learning techniques within the ML.NET framework.

In discussing advanced neural network architecture, Microsoft explained that the underlying technology of the object detection API includes techniques developed at Microsoft Research, utilizing a transformer-based neural network architecture. This approach is indicative of modern trends in deep learning, particularly in computer vision, the company said.

TorchSharp is also instrumental in the enhancements to named entity recognition and question answering, two common ML areas that are part of the natural language processing (NLP) space. Enhancements for both of those scenarios are unlocked in ML.NET 3.0 by leveraging TorchSharp RoBERTa text classification features previously introduced.

"Both the NER and QA trainers are included in the Microsoft.ML.TorchSharp 3.0.0 package and the Microsoft.ML.TorchSharp namespace," Microsoft said.

Data Processing
Here, scenarios are improved via many enhancements and bug fixes to DataFrame -- a structure for storing and manipulating data -- along with new IDataView interoperability features.

"The important steps of loading, inspecting, transforming, and visualizing your data are much more powerful," Microsoft said.

Specific items of note include:

  • Enhanced IDataView <-> DataFrame conversions: Added support for String and VBuffer column types, with String values handled as ReadOnlyMemory<char> and VBuffer supporting all backing primitives.
  • Increased column data capacity: Columns can now store more than 2 GB of data, removing the previous limitation.
  • Apache Arrow integration: Recognizes Apache Arrow Date64 column data.
  • Expanded data loading capabilities: Includes import and export functionality for SQL databases using ADO.NET. Also, data can be loaded from any IEnumerable collection and exported to System.Data.DataTable.
  • Appending data between DataFrames: Allows appending data from one DataFrame to another when column names match, easing constraints on column ordering.
  • Handling of duplicate column names: Enhancements in DataFrame.LoadCsv to manage duplicate column names, offering options to rename them.
  • Improved arithmetic performance and null value handling: Optimizations in column cloning, binary comparison scenarios, and arithmetic operations.
  • Debugger enhancements: Better readability for columns with long names in the debugger.

Microsoft also noted new tensor primitive integrations that don't affect development tasks directly but do provide notable performance improvements. AutoML, which automates the process of applying machine learning to data, was also enhanced, providing a boost to associated experiences in Model Builder and the ML.NET CLI.

Much more about all of the above and other changes can be found in the release notes.

Going forward, the dev team is now working on plans for .NET 9 and ML.NET 4.0, though Model Builder and the ML.NET CLI are expected to be updated much sooner in order to consume the ML.NET 3.0 release.

"We know we will continue expanding deep learning scenarios and integrations, and we know we will keep making enhancements to DataFrame," Microsoft said. "We will keep expanding the APIs available in System.Numerics.Tensors and integrating them into ML.NET. Stay tuned for more detailed ML.NET 4.0 plans."

About the Author

David Ramel is an editor and writer at Converge 360.

comments powered by Disqus

Featured

  • Compare New GitHub Copilot Free Plan for Visual Studio/VS Code to Paid Plans

    The free plan restricts the number of completions, chat requests and access to AI models, being suitable for occasional users and small projects.

  • Diving Deep into .NET MAUI

    Ever since someone figured out that fiddling bits results in source code, developers have sought one codebase for all types of apps on all platforms, with Microsoft's latest attempt to further that effort being .NET MAUI.

  • Copilot AI Boosts Abound in New VS Code v1.96

    Microsoft improved on its new "Copilot Edit" functionality in the latest release of Visual Studio Code, v1.96, its open-source based code editor that has become the most popular in the world according to many surveys.

  • AdaBoost Regression Using C#

    Dr. James McCaffrey from Microsoft Research presents a complete end-to-end demonstration of the AdaBoost.R2 algorithm for regression problems (where the goal is to predict a single numeric value). The implementation follows the original source research paper closely, so you can use it as a guide for customization for specific scenarios.

  • Versioning and Documenting ASP.NET Core Services

    Building an API with ASP.NET Core is only half the job. If your API is going to live more than one release cycle, you're going to need to version it. If you have other people building clients for it, you're going to need to document it.

Subscribe on YouTube