'.NET for Apache Spark' Debuts for C#/F# Big Data -- Visual Studio Magazine

'.NET for Apache Spark' Debuts for C#/F# Big Data

By David Ramel
04/25/2019

Almost four years after the debut of Apache Spark, .NET developers are on track to more easily use the popular Big Data processing framework in C# and F# projects.

The preview project, called .NET for Apache Spark, was unveiled yesterday (April 24). Its development will be conducted in the open under the direction of the .NET Foundation.

Spark is described as a unified analytics engine for large-scale data processing, compatible with Apache Hadoop data whether batched or streamed.

Currently, Spark is accessible via an interop layer with APIs for the Java, Python, Scala and R programming languages. While .NET coders have been able to use Spark with Mobius C# and F# language binding and extensions, the new project seeks to improve on that scheme while paving the way to add more language support. Microsoft promised to work closely with the open source Spark community to help the project succeed beyond similar efforts such as Mobius, which it said were hindered by a lack of communication.

".NET for Apache Spark provides high performance APIs for using Spark from C# and F#," said Microsoft in an announcement post. "With [these] .NET APIs, you can access all aspects of Apache Spark including Spark SQL, DataFrames, Streaming, MLLib etc.," it said. ".NET for Apache Spark lets you reuse all the knowledge, skills, code, and libraries you already have as a .NET developer."

The project's origin is explained in a Spark Project Improvement Proposal (SPIP) titled .NET bindings for Apache Spark created on Feb. 27. It says: "Apache Spark provides programming language support for Scala/Java (native), and extensions for Python and R. While a variety of other language extensions are possible to include in Apache Spark, .NET would bring one of the largest developer community to the table. Presently, no good Big Data solution exists for .NET developers in open source. This SPIP aims at discussing how we can bring Apache Spark goodness to the .NET development platform."

Microsoft yesterday said: "The C#/F# language binding to Spark will be written on a new Spark interop layer which offers easier extensibility. This new layer of Spark interop was written keeping in mind best practices for language extension and optimizes for interop and performance. Long term this extensibility can be used for adding support for other languages in Spark."

Project backers will work on that extensibility, which was outlined in yet another SPIP titled Interop Support for Spark Language Extensions created last December that says:

There is a desire for third party language extensions for Apache Spark. Some notable examples include:

C#/F# from project Mobius https://github.com/Microsoft/Mobius

Haskell from project sparkle https://github.com/tweag/sparkle

Julia from project Spark.jl https://github.com/dfdx/Spark.jl

Presently, Apache Spark supports Python and R via a tightly integrated interop layer. It would seem that much of that existing interop layer could be refactored into a clean surface for general (third party) language bindings...."

Microsoft addressed the aforementioned lack of communication with the open source Spark community in its SPIP, stating:

We recognize that earlier attempts at this goal (specifically Mobius https://github.com/Microsoft/Mobius) were unsuccessful primarily due to the lack of communication with the Spark community. Therefore, another goal of this proposal is to not only develop .NET bindings for Spark in open source, but also continuously seek feedback from the Spark community via posted Jira’s (like this one) and the Spark developer mailing list. Our hope is that through these engagements, we can build a community of developers that are eager to contribute to this effort or want to leverage the resulting .NET bindings for Spark in their respective Big Data applications.

Yesterday's announcement of the first preview also provided a peek into further development, which will include improving benchmarking performance, such as Arrow optimizations. Specifically, the project's roadmap calls for upcoming features such as:

Simplified getting started experience, documentation and samples
Native integration with developer tools such as Visual Studio, Visual Studio Code, Jupyter notebooks
.NET support for user-defined aggregate functions
.NET idiomatic APIs for C# and F# (e.g., using LINQ for writing queries)
Out of the box support with Azure Databricks, Kubernetes etc.
Make .NET for Apache Spark part of Spark Core

Source code for the preview project and detailed instructions on using it and interacting with it can be found on GitHub, where it has already garnered 446 stars at the time of this writing (climbing by the minute), with Microsoft's Terry Kim and Rahul Potharaju listed as primary contributors.

About the Author

David Ramel is an editor and writer at Converge 360.

Printable Format

comments powered by Disqus

Featured

VS Code v1.99 Is All About Copilot Chat AI, Including Agent Mode

Agent Mode provides an autonomous editing experience where Copilot plans and executes tasks to fulfill requests. It determines relevant files, applies code changes, suggests terminal commands, and iterates to resolve issues, all while keeping users in control to review and confirm actions.
Windows Community Toolkit v8.2 Adds Native AOT Support

Microsoft shipped Windows Community Toolkit v8.2, an incremental update to the open-source collection of helper functions and other resources designed to simplify the development of Windows applications. The main new feature is support for native ahead-of-time (AOT) compilation.
New 'Visual Studio Hub' 1-Stop-Shop for GitHub Copilot Resources, More

Unsurprisingly, GitHub Copilot resources are front-and-center in Microsoft's new Visual Studio Hub, a one-stop-shop for all things concerning your favorite IDE.
Mastering Blazor Authentication and Authorization

At the Visual Studio Live! @ Microsoft HQ developer conference set for August, Rockford Lhotka will explain the ins and outs of authentication across Blazor Server, WebAssembly, and .NET MAUI Hybrid apps, and show how to use identity and claims to customize application behavior through fine-grained authorization.
Linear Support Vector Regression from Scratch Using C# with Evolutionary Training

Dr. James McCaffrey from Microsoft Research presents a complete end-to-end demonstration of the linear support vector regression (linear SVR) technique, where the goal is to predict a single numeric value. A linear SVR model uses an unusual error/loss function and cannot be trained using standard simple techniques, and so evolutionary optimization training is used.