'.NET for Apache Spark' Debuts for C#/F# Big Data -- Visual Studio Magazine

'.NET for Apache Spark' Debuts for C#/F# Big Data

By David Ramel
04/25/2019

Almost four years after the debut of Apache Spark, .NET developers are on track to more easily use the popular Big Data processing framework in C# and F# projects.

The preview project, called .NET for Apache Spark, was unveiled yesterday (April 24). Its development will be conducted in the open under the direction of the .NET Foundation.

Spark is described as a unified analytics engine for large-scale data processing, compatible with Apache Hadoop data whether batched or streamed.

Currently, Spark is accessible via an interop layer with APIs for the Java, Python, Scala and R programming languages. While .NET coders have been able to use Spark with Mobius C# and F# language binding and extensions, the new project seeks to improve on that scheme while paving the way to add more language support. Microsoft promised to work closely with the open source Spark community to help the project succeed beyond similar efforts such as Mobius, which it said were hindered by a lack of communication.

".NET for Apache Spark provides high performance APIs for using Spark from C# and F#," said Microsoft in an announcement post. "With [these] .NET APIs, you can access all aspects of Apache Spark including Spark SQL, DataFrames, Streaming, MLLib etc.," it said. ".NET for Apache Spark lets you reuse all the knowledge, skills, code, and libraries you already have as a .NET developer."

The project's origin is explained in a Spark Project Improvement Proposal (SPIP) titled .NET bindings for Apache Spark created on Feb. 27. It says: "Apache Spark provides programming language support for Scala/Java (native), and extensions for Python and R. While a variety of other language extensions are possible to include in Apache Spark, .NET would bring one of the largest developer community to the table. Presently, no good Big Data solution exists for .NET developers in open source. This SPIP aims at discussing how we can bring Apache Spark goodness to the .NET development platform."

Microsoft yesterday said: "The C#/F# language binding to Spark will be written on a new Spark interop layer which offers easier extensibility. This new layer of Spark interop was written keeping in mind best practices for language extension and optimizes for interop and performance. Long term this extensibility can be used for adding support for other languages in Spark."

Project backers will work on that extensibility, which was outlined in yet another SPIP titled Interop Support for Spark Language Extensions created last December that says:

There is a desire for third party language extensions for Apache Spark. Some notable examples include:

C#/F# from project Mobius https://github.com/Microsoft/Mobius

Haskell from project sparkle https://github.com/tweag/sparkle

Julia from project Spark.jl https://github.com/dfdx/Spark.jl

Presently, Apache Spark supports Python and R via a tightly integrated interop layer. It would seem that much of that existing interop layer could be refactored into a clean surface for general (third party) language bindings...."

Microsoft addressed the aforementioned lack of communication with the open source Spark community in its SPIP, stating:

We recognize that earlier attempts at this goal (specifically Mobius https://github.com/Microsoft/Mobius) were unsuccessful primarily due to the lack of communication with the Spark community. Therefore, another goal of this proposal is to not only develop .NET bindings for Spark in open source, but also continuously seek feedback from the Spark community via posted Jira’s (like this one) and the Spark developer mailing list. Our hope is that through these engagements, we can build a community of developers that are eager to contribute to this effort or want to leverage the resulting .NET bindings for Spark in their respective Big Data applications.

Yesterday's announcement of the first preview also provided a peek into further development, which will include improving benchmarking performance, such as Arrow optimizations. Specifically, the project's roadmap calls for upcoming features such as:

Simplified getting started experience, documentation and samples
Native integration with developer tools such as Visual Studio, Visual Studio Code, Jupyter notebooks
.NET support for user-defined aggregate functions
.NET idiomatic APIs for C# and F# (e.g., using LINQ for writing queries)
Out of the box support with Azure Databricks, Kubernetes etc.
Make .NET for Apache Spark part of Spark Core

Source code for the preview project and detailed instructions on using it and interacting with it can be found on GitHub, where it has already garnered 446 stars at the time of this writing (climbing by the minute), with Microsoft's Terry Kim and Rahul Potharaju listed as primary contributors.

About the Author

David Ramel is an editor and writer at Converge 360.

Printable Format

comments powered by Disqus

Featured

Low-Code Report Says AI Will Enhance, Not Replace DIY Dev Tools

Along with replacing software developers and possibly killing humanity, advanced AI is seen by many as a death knell for the do-it-yourself, low-code/no-code tooling industry, but a new report belies that notion.
Vibe Coding with Latest Visual Studio Preview

Microsoft's latest Visual Studio preview facilitates "vibe coding," where developers mainly use GitHub Copilot AI to do all the programming in accordance with spoken or typed instructions.
Steve Sanderson Previews AI App Dev: Small Models, Agents and a Blazor Voice Assistant

Blazor creator Steve Sanderson presented a keynote at the recent NDC London 2025 conference where he previewed the future of .NET application development with smaller AI models and autonomous agents, along with showcasing a new Blazor voice assistant project demonstrating cutting-edge functionality.
Microsoft Closes Request for Universal UI Builder: 'It's Baffling'

Microsoft last week closed a feedback request for a universal UI builder as capable as WinForms, putting an end to a long-sought coding nirvana with a decision that angered some developers.
Azure AI Foundry Gets NVIDIA Tech

AI powerhouse NVIDIA flexed its muscle at its GTC 2025 conference this week where several partnerships with Microsoft were announced, mostly concerning Microsoft's Azure AI Foundry offering.