Will Big Data Be Big with Developers? -- Visual Studio Magazine

Will Big Data Be Big with Developers?

.NET developers are database developers. Whether using ADO.NET, the Entity Framework or data binding, .NET devs work with transactional data as a matter of course.

By Andrew J. Brust
12/01/2012

.NET developers are database developers. Whether using ADO.NET, the Entity Framework or data binding, .NET devs work with transactional data as a matter of course. But data analytics work is another matter. In fact, very few application and enterprise developers do analytics. Can Microsoft change that?

Microsoft opened the big data world to its ecosystem about a year ago with the announcement of its "Project Isotope" Hadoop on Windows initiative. A year later, though still in preview form, the technology has a brand (HDInsight) and significant integration with .NET and Visual Studio, and is clearly strategic to Microsoft. And developers are in the crosshairs: HDInsight was featured at BUILD, the flagship Microsoft developer conference.

Why does Microsoft think developers will take to analytics with big data now, when they didn't do so with business intelligence (BI) before? And given the overwhelming orientation of the big data world to Linux and Java, how does Microsoft expect to succeed in the space with Windows and .NET? At first glance, this looks to be a fool's errand. Is Microsoft naïve and tone deaf, or is it on to something?

Unboxing HDInsight
Before we judge whether developers will flock to HDInsight or shun it, let's get a sense of what the product is and what developer tools it features. HDInsight is based on the open source Apache Hadoop project, which provides processing and analysis of huge data sets (up to petabyte scale) by distributing the storage and compute workloads across numerous servers in a cluster. While this may sound straightforward -- and similar in principle to products such as SQL Server Parallel Data Warehouse (PDW) -- Hadoop can be pretty hard to work with.

Hadoop is natively queried through imperative Java code, using a two-pass approach called MapReduce. In this framework, a Map function first preprocesses the data, and a Reduce function then aggregates it. Multiple Mappers run in parallel across various nodes in the cluster, passing their output to multiple Reducer nodes to finish the work, also in parallel. A component included in most Hadoop distributions (including HDInsight) called "Pig" provides a data transformation language abstraction layer over Java-based MapReduce code. "Hive," another such component, provides a SQL-like abstraction over it.

What does Microsoft bring to the party? With HDInsight, developers can write MapReduce code in C# instead of Java, or use a LINQ provider to manipulate MapReduce indirectly through Hive. A NuGet package provides the C# MapReduce support, and a single-node developer version of HDInsight allows local debugging of such code in Visual Studio. A command-line utility provides deployment of the assembly to the local Hadoop instance. Deployment directly from Visual Studio to remote clusters, including the Windows Azure HDInsight implementation, seems a safe bet for future releases.

Bringing Hadoop to Windows (including to developers' own PCs) and providing integration and debugging support for C# and LINQ is a neat trick. It goes a long way toward making Hadoop an enterprise developer-friendly technology. Microsoft's alternate JavaScript-based framework for MapReduce code makes it friendly to Node.js and JavaScript developers, too. But will HDInsight appeal to the Linux- and Java-focused big data pros out there? Probably not, but therein lies the real value.

A Bigger Tent
Big data is a huge industry phenomenon right now, but the "data scientists" and MapReduce developers that enable its implementation are an exclusive bunch. These professionals are in short supply, and they don't come cheap. In other words, big data is a specialty at the height of its hype cycle, ripe for disruption.

We've seen this move before. Microsoft democratized Windows development with Visual Basic, enterprise development with .NET, relational database development with SQL Server and BI with a combination of that product plus SharePoint and Office. Every time Microsoft has disrupted an elite specialization, it's done so with devel- opers in its ecosystem. Now it's trying again with big data and HDInsight.

Hadoop is different from past disrupted areas, though, because it's already developer-focused. But the developers who typify the Hadoop faithful right now work in lab environments -- whether in academic organizations, big Internet companies or startups. Even in the enterprise, big data practitioners work in lab-like organizations; they're not, by and large, typical developers from IT and business units.

But for big data to be big, that needs to change; the skill set needs to be ubiquitous and mainstream. Business developers are database developers. Microsoft thinks they can be big data developers too. And if they're also Windows client/Phone/Server/Azure developers, that would be "big" for Redmond, indeed.

About the Author

Andrew Brust is Research Director for Big Data and Analytics at Gigaom Research. Andrew is co-author of "Programming Microsoft SQL Server 2012" (Microsoft Press); an advisor to NYTECH, the New York Technology Council; co-moderator of Big On Data - New York's Data Intelligence Meetup; serves as Microsoft Regional Director and MVP; and is conference co-chair of Visual Studio Live!

Printable Format

comments powered by Disqus

Featured

Low-Code Report Says AI Will Enhance, Not Replace DIY Dev Tools

Along with replacing software developers and possibly killing humanity, advanced AI is seen by many as a death knell for the do-it-yourself, low-code/no-code tooling industry, but a new report belies that notion.
Vibe Coding with Latest Visual Studio Preview

Microsoft's latest Visual Studio preview facilitates "vibe coding," where developers mainly use GitHub Copilot AI to do all the programming in accordance with spoken or typed instructions.
Steve Sanderson Previews AI App Dev: Small Models, Agents and a Blazor Voice Assistant

Blazor creator Steve Sanderson presented a keynote at the recent NDC London 2025 conference where he previewed the future of .NET application development with smaller AI models and autonomous agents, along with showcasing a new Blazor voice assistant project demonstrating cutting-edge functionality.
Microsoft Closes Request for Universal UI Builder: 'It's Baffling'

Microsoft last week closed a feedback request for a universal UI builder as capable as WinForms, putting an end to a long-sought coding nirvana with a decision that angered some developers.
Azure AI Foundry Gets NVIDIA Tech

AI powerhouse NVIDIA flexed its muscle at its GTC 2025 conference this week where several partnerships with Microsoft were announced, mostly concerning Microsoft's Azure AI Foundry offering.