News

'.NET for Apache Spark' Debuts for C#/F# Big Data

Almost four years after the debut of Apache Spark, .NET developers are on track to more easily use the popular Big Data processing framework in C# and F# projects.

The preview project, called .NET for Apache Spark, was unveiled yesterday (April 24). Its development will be conducted in the open under the direction of the .NET Foundation.

Spark is described as a unified analytics engine for large-scale data processing, compatible with Apache Hadoop data whether batched or streamed.

Currently, Spark is accessible via an interop layer with APIs for the Java, Python, Scala and R programming languages. While .NET coders have been able to use Spark with Mobius C# and F# language binding and extensions, the new project seeks to improve on that scheme while paving the way to add more language support. Microsoft promised to work closely with the open source Spark community to help the project succeed beyond similar efforts such as Mobius, which it said were hindered by a lack of communication.

".NET for Apache Spark provides high performance APIs for using Spark from C# and F#," said Microsoft in an announcement post. "With [these] .NET APIs, you can access all aspects of Apache Spark including Spark SQL, DataFrames, Streaming, MLLib etc.," it said. ".NET for Apache Spark lets you reuse all the knowledge, skills, code, and libraries you already have as a .NET developer."

The project's origin is explained in a Spark Project Improvement Proposal (SPIP) titled .NET bindings for Apache Spark created on Feb. 27. It says: "Apache Spark provides programming language support for Scala/Java (native), and extensions for Python and R. While a variety of other language extensions are possible to include in Apache Spark, .NET would bring one of the largest developer community to the table. Presently, no good Big Data solution exists for .NET developers in open source. This SPIP aims at discussing how we can bring Apache Spark goodness to the .NET development platform."

Microsoft yesterday said: "The C#/F# language binding to Spark will be written on a new Spark interop layer which offers easier extensibility. This new layer of Spark interop was written keeping in mind best practices for language extension and optimizes for interop and performance. Long term this extensibility can be used for adding support for other languages in Spark."

Project backers will work on that extensibility, which was outlined in yet another SPIP titled Interop Support for Spark Language Extensions created last December that says:

There is a desire for third party language extensions for Apache Spark. Some notable examples include: Presently, Apache Spark supports Python and R via a tightly integrated interop layer. It would seem that much of that existing interop layer could be refactored into a clean surface for general (third party) language bindings...."

Microsoft addressed the aforementioned lack of communication with the open source Spark community in its SPIP, stating:

We recognize that earlier attempts at this goal (specifically Mobius https://github.com/Microsoft/Mobius) were unsuccessful primarily due to the lack of communication with the Spark community. Therefore, another goal of this proposal is to not only develop .NET bindings for Spark in open source, but also continuously seek feedback from the Spark community via posted Jira’s (like this one) and the Spark developer mailing list. Our hope is that through these engagements, we can build a community of developers that are eager to contribute to this effort or want to leverage the resulting .NET bindings for Spark in their respective Big Data applications.

Yesterday's announcement of the first preview also provided a peek into further development, which will include improving benchmarking performance, such as Arrow optimizations. Specifically, the project's roadmap calls for upcoming features such as:

  • Simplified getting started experience, documentation and samples
  • Native integration with developer tools such as Visual Studio, Visual Studio Code, Jupyter notebooks
  • .NET support for user-defined aggregate functions
  • .NET idiomatic APIs for C# and F# (e.g., using LINQ for writing queries)
  • Out of the box support with Azure Databricks, Kubernetes etc.
  • Make .NET for Apache Spark part of Spark Core

Source code for the preview project and detailed instructions on using it and interacting with it can be found on GitHub, where it has already garnered 446 stars at the time of this writing (climbing by the minute), with Microsoft's Terry Kim and Rahul Potharaju listed as primary contributors.

About the Author

David Ramel is an editor and writer at Converge 360.

comments powered by Disqus

Featured

  • Microsoft Highlights Visual Studio Live! Event Lineup and Longtime Developer Community Role

    A Microsoft MVP Blog post on Visual Studio Live!'s longevity arrives as the 2026 conference series continues with upcoming stops at Microsoft HQ, San Diego and Orlando.

  • Using Local AI to Cut Copilot Usage-Based Billing Shock

    After being gobsmacked by the new billing plan using almost all my monthly credits in one or two days, I tried pushing some Copilot-style coding work onto local models in VS Code. What I found was less "free AI" and more "pick your pain": cloud charges on one side, heavy local resource use and long waits on the other.

  • .NET 11 Preview 5 Focuses on Performance, Productivity and Safer Code

    .NET 11 Preview 5 focuses on under-the-hood runtime performance gains, streamlined APIs and language features that reduce boilerplate, plus built‑in security checks and incremental ASP.NET Core and EF Core improvements aimed at everyday developer productivity.

  • VS Code 1.124 Focuses on Agent Autonomy and Parallel Sessions

    Microsoft's June 2026 VS Code update turns on Autopilot by default and adds background sending for agent sessions.

Subscribe on YouTube