News
ML.NET 3.0 Boosts Deep Learning, Data Processing for .NET-Based AI Apps
Microsoft shipped ML.NET 3.0, enhancing deep learning and data processing scenarios in the company's machine language framework that lets devs create AI-infused apps completely within the .NET ecosystem.
The ability for devs to use C# and F# instead of the usual data science stalwarts -- Python and R -- is a main selling point of the open source ML.NET framework, created to help developers build custom ML models and integrate them into apps. That's done with tools such as a command-line interface (CLI) and Model Builder or creating constructs like those large language models (LLMs) that power ChatGPT and Microsoft's ubiquitous "Copilot" AI assistants.
ML.NET comes in a NuGet package that has been downloaded more than 5.3 million times.
In announcing ML.NET 3.0 yesterday (Nov. 27), Microsoft emphasized two main points of interest, deep learning and data processing.
Deep Learning
This ML subset uses artificial neural networks loosely based on human brain behaviors in order to "learn" from inputs such as large amounts of data, even unstructured data.
Microsoft said deep learning scenarios were substantially expanded in the v3.0 release with new capabilities in three areas: object detection, named entity recognition and question answering.
Object detection in ML.NET 3.0 is an advanced form of image classification that not only categorizes entities within images but also locates them, making it ideal for scenarios with images containing multiple objects of different types. In v3.0, the object detection capabilities are boosted via integrations with TorchSharp and ONNX models, with Microsoft specifically noting TorchSharp-powered object detection APIs. The company said those represent a significant step in leveraging deep learning techniques within the ML.NET framework.
In discussing advanced neural network architecture, Microsoft explained that the underlying technology of the object detection API includes techniques developed at Microsoft Research, utilizing a transformer-based neural network architecture. This approach is indicative of modern trends in deep learning, particularly in computer vision, the company said.
TorchSharp is also instrumental in the enhancements to named entity recognition and question answering, two common ML areas that are part of the natural language processing (NLP) space. Enhancements for both of those scenarios are unlocked in ML.NET 3.0 by leveraging TorchSharp RoBERTa text classification features previously introduced.
"Both the NER and QA trainers are included in the Microsoft.ML.TorchSharp 3.0.0 package and the Microsoft.ML.TorchSharp
namespace," Microsoft said.
Data Processing
Here, scenarios are improved via many enhancements and bug fixes to DataFrame -- a structure for storing and manipulating data -- along with new IDataView interoperability features.
"The important steps of loading, inspecting, transforming, and visualizing your data are much more powerful," Microsoft said.
Specific items of note include:
- Enhanced
IDataView
<-> DataFrame
conversions:-> Added support for String
and VBuffer
column types, with String
values handled as ReadOnlyMemory<char>
and VBuffer
supporting all backing primitives.
- Increased column data capacity: Columns can now store more than 2 GB of data, removing the previous limitation.
- Apache Arrow integration: Recognizes Apache Arrow
Date64
column data.
- Expanded data loading capabilities: Includes import and export functionality for SQL databases using ADO.NET. Also, data can be loaded from any
IEnumerable
collection and exported to System.Data.DataTable
.
- Appending data between DataFrames: Allows appending data from one
DataFrame
to another when column names match, easing constraints on column ordering.
- Handling of duplicate column names: Enhancements in
DataFrame.LoadCsv
to manage duplicate column names, offering options to rename them.
- Improved arithmetic performance and null value handling: Optimizations in column cloning, binary comparison scenarios, and arithmetic operations.
- Debugger enhancements: Better readability for columns with long names in the debugger.
Microsoft also noted new tensor primitive integrations that don't affect development tasks directly but do provide notable performance improvements. AutoML, which automates the process of applying machine learning to data, was also enhanced, providing a boost to associated experiences in Model Builder and the ML.NET CLI.
Much more about all of the above and other changes can be found in the release notes.
Going forward, the dev team is now working on plans for .NET 9 and ML.NET 4.0, though Model Builder and the ML.NET CLI are expected to be updated much sooner in order to consume the ML.NET 3.0 release.
"We know we will continue expanding deep learning scenarios and integrations, and we know we will keep making enhancements to DataFrame," Microsoft said. "We will keep expanding the APIs available in System.Numerics.Tensors
and integrating them into ML.NET. Stay tuned for more detailed ML.NET 4.0 plans."
About the Author
David Ramel is an editor and writer at Converge 360.