Data Driver

Blog archive

How to Grab a Slice of the Big Data Pie

Cloudera Inc.'s recent announcement of its SQL-on-Hadoop tool is one of the latest examples of vendors trying to make Big Data analytics more accessible. But "more accessible" is a long way from "easy," and it will be a while before your average Excel jockey can take over the reins of a typical company's Big Data initiatives.

So data developers are still key, and those with Hadoop and related Big Data skills are commanding top dollars to meet an insatiable demand for their services. But the very top dollars go to the very top developers, and those folks might have to grow beyond the traditional programmer role.

While doing research for an upcoming article, I asked some experts in the field what developers can do to make themselves more marketable in this growing field.

"A general background on Hadoop is certainly a must," said Joe Nicholson, vice president of marketing at Datameer Inc., which makes prebuilt analytics applications--yet another path to that aforementioned accessibility. "But probably more important is understanding Big Data in terms of what the correlation of various data sources, new and old, can uncover to drive new business use cases.

"This is especially true of 'new' data sources like social media, machine and Web logs and text data sources like e-mail," Nicholson continued. "There is a wealth of new insights that are possible with the analysis of the new data sources combined with traditional, structured data, and these new use cases are becoming mission critical as businesses seek new competitive advantages. This is especially true when looking for insights, patterns and relationships across all types of data."

It also helps to show your work, as noted by Jon Rooney, director of developer marketing at Splunk Inc., another Big Data vendor. "There's no substitute for hands-on experience," Rooney said. "Developers who show experience by writing code and posting their work on places like GitHub are always marketable."

That sentiment is echoed by Will Cole, a product manager at Stack Overflow. Besides taking courses and attending meetups, he said, "the more concrete way to market yourself is to build side projects or contribute to open source projects where you can take what you've learned and show some working production results you've achieved."

In fact, some companies are looking for the best coding talent by using services such as that provided by Gild Inc. to measure the quality of code posted on GitHub and participation in developer forums and question-and-answer sites such as Stack Overflow, using--ironically enough--Big Data analysis, as I reported in an article on the Application Development Trends site.

Beyond showing your work, posting good code on developer-related social sites and answering questions in forums, a new way of thinking is required for developers looking to become top-notch Big Data rock stars, according to Bill Yetman. He is senior director of engineering at, where he has held various software engineering/development roles. "Developers need to approach new technologies and their careers with a 'learning mindset,' " Yetman said. "Always be willing to pick up something new, embrace it and master it. Developers who love to learn will always stay up to date and be marketable."

But it might not be that easy for some positions. "A software engineer can't simply become a data scientist in the same way a Java developer can become a Ruby developer," noted Mark A. Herschberg, CTO at Madison Logic in New York. He's in the process of starting a data science team at the B2B lead generation company, and he points out the distinction between a software engineer and a data scientist.

"A good data scientist has a combination of three different skills: data modeling, programming and business analysis," Herschberg said. "The data modeling is the hardest. Most candidates have a masters degree or PhD in math or science and have worked with various statistical models. They have programming skills--not so much the type to let you build a scalable enterprise system, but in that they can access the database and move data around. They are probably better at R and sci py (a type of Python) than at building a Web application. They also are familiar with tools like Hadoop and NoSQL databases. Finally, they have some basic business sense, so [they] will know how to ask meaningful business questions of the data.

"If a software engineer is serious about moving into data science, he or she should probably begin by taking some classes in advanced statistics and data modeling," Herschberg said.

Several sources noted that with the extreme skills shortage, companies are resorting to all kinds of ways to find talent, including contracting, outsourcing and retraining existing staff.

For those taking the latter route, some advice was offered by Yetman, who writes a Tech Roots Blog at, including a recent post with the title, "Adventures in Big Data: How do you start?".

"If you are looking for developers within your organization for a Big Data project, find the guys who are always playing with new technologies just for the fun of it," Yetman told me. "Recruit them to work on your project."

Hmm, maybe that's the best advice of all: have fun with it.

Are you having fun yet in your Big Data adventures? Share your thoughts here or drop me a line.

Posted by David Ramel on 05/03/2013

comments powered by Disqus


  • What's Next for ASP.NET Core and Blazor

    Since its inception as an intriguing experiment in leveraging WebAssembly to enable dynamic web development with C#, Blazor has evolved into a mature, fully featured framework. Integral to the ASP.NET Core ecosystem, Blazor offers developers a unique combination of server-side rendering and rich client-side interactivity.

  • Nearest Centroid Classification for Numeric Data Using C#

    Here's a complete end-to-end demo of what Dr. James McCaffrey of Microsoft Research says is arguably the simplest possible classification technique.

  • .NET MAUI in VS Code Goes GA

    Visual Studio Code's .NET MAUI workload, which evolves the former Xamarin.Forms mobile-centric framework by adding support for creating desktop applications, has reached general availability.

  • Visual Studio Devs Quick to Sound Off on Automatic Updates: 'Please No'

    A five-year-old Visual Studio feature request for automatic IDE updates is finally getting enacted by Microsoft amid a lot of initial developer pushback, seemingly misplaced.

  • First Official OpenAI Library for .NET Goes Beta

    Although it seems Microsoft and OpenAI have been deeply intertwined partners for a long time, they are only now getting around to releasing an official OpenAI library for .NET developers, joining existing community libraries.

Subscribe on YouTube