Data Driver

Blog archive

How to Grab a Slice of the Big Data Pie

Cloudera Inc.'s recent announcement of its SQL-on-Hadoop tool is one of the latest examples of vendors trying to make Big Data analytics more accessible. But "more accessible" is a long way from "easy," and it will be a while before your average Excel jockey can take over the reins of a typical company's Big Data initiatives.

So data developers are still key, and those with Hadoop and related Big Data skills are commanding top dollars to meet an insatiable demand for their services. But the very top dollars go to the very top developers, and those folks might have to grow beyond the traditional programmer role.

While doing research for an upcoming article, I asked some experts in the field what developers can do to make themselves more marketable in this growing field.

"A general background on Hadoop is certainly a must," said Joe Nicholson, vice president of marketing at Datameer Inc., which makes prebuilt analytics applications--yet another path to that aforementioned accessibility. "But probably more important is understanding Big Data in terms of what the correlation of various data sources, new and old, can uncover to drive new business use cases.

"This is especially true of 'new' data sources like social media, machine and Web logs and text data sources like e-mail," Nicholson continued. "There is a wealth of new insights that are possible with the analysis of the new data sources combined with traditional, structured data, and these new use cases are becoming mission critical as businesses seek new competitive advantages. This is especially true when looking for insights, patterns and relationships across all types of data."

It also helps to show your work, as noted by Jon Rooney, director of developer marketing at Splunk Inc., another Big Data vendor. "There's no substitute for hands-on experience," Rooney said. "Developers who show experience by writing code and posting their work on places like GitHub are always marketable."

That sentiment is echoed by Will Cole, a product manager at Stack Overflow. Besides taking courses and attending meetups, he said, "the more concrete way to market yourself is to build side projects or contribute to open source projects where you can take what you've learned and show some working production results you've achieved."

In fact, some companies are looking for the best coding talent by using services such as that provided by Gild Inc. to measure the quality of code posted on GitHub and participation in developer forums and question-and-answer sites such as Stack Overflow, using--ironically enough--Big Data analysis, as I reported in an article on the Application Development Trends site.

Beyond showing your work, posting good code on developer-related social sites and answering questions in forums, a new way of thinking is required for developers looking to become top-notch Big Data rock stars, according to Bill Yetman. He is senior director of engineering at Ancestry.com, where he has held various software engineering/development roles. "Developers need to approach new technologies and their careers with a 'learning mindset,' " Yetman said. "Always be willing to pick up something new, embrace it and master it. Developers who love to learn will always stay up to date and be marketable."

But it might not be that easy for some positions. "A software engineer can't simply become a data scientist in the same way a Java developer can become a Ruby developer," noted Mark A. Herschberg, CTO at Madison Logic in New York. He's in the process of starting a data science team at the B2B lead generation company, and he points out the distinction between a software engineer and a data scientist.

"A good data scientist has a combination of three different skills: data modeling, programming and business analysis," Herschberg said. "The data modeling is the hardest. Most candidates have a masters degree or PhD in math or science and have worked with various statistical models. They have programming skills--not so much the type to let you build a scalable enterprise system, but in that they can access the database and move data around. They are probably better at R and sci py (a type of Python) than at building a Web application. They also are familiar with tools like Hadoop and NoSQL databases. Finally, they have some basic business sense, so [they] will know how to ask meaningful business questions of the data.

"If a software engineer is serious about moving into data science, he or she should probably begin by taking some classes in advanced statistics and data modeling," Herschberg said.

Several sources noted that with the extreme skills shortage, companies are resorting to all kinds of ways to find talent, including contracting, outsourcing and retraining existing staff.

For those taking the latter route, some advice was offered by Yetman, who writes a Tech Roots Blog at Ancestry.com, including a recent post with the title, "Adventures in Big Data: How do you start?".

"If you are looking for developers within your organization for a Big Data project, find the guys who are always playing with new technologies just for the fun of it," Yetman told me. "Recruit them to work on your project."

Hmm, maybe that's the best advice of all: have fun with it.

Are you having fun yet in your Big Data adventures? Share your thoughts here or drop me a line.

Posted by David Ramel on 05/03/2013


comments powered by Disqus

Featured

  • Compare New GitHub Copilot Free Plan for Visual Studio/VS Code to Paid Plans

    The free plan restricts the number of completions, chat requests and access to AI models, being suitable for occasional users and small projects.

  • Diving Deep into .NET MAUI

    Ever since someone figured out that fiddling bits results in source code, developers have sought one codebase for all types of apps on all platforms, with Microsoft's latest attempt to further that effort being .NET MAUI.

  • Copilot AI Boosts Abound in New VS Code v1.96

    Microsoft improved on its new "Copilot Edit" functionality in the latest release of Visual Studio Code, v1.96, its open-source based code editor that has become the most popular in the world according to many surveys.

  • AdaBoost Regression Using C#

    Dr. James McCaffrey from Microsoft Research presents a complete end-to-end demonstration of the AdaBoost.R2 algorithm for regression problems (where the goal is to predict a single numeric value). The implementation follows the original source research paper closely, so you can use it as a guide for customization for specific scenarios.

  • Versioning and Documenting ASP.NET Core Services

    Building an API with ASP.NET Core is only half the job. If your API is going to live more than one release cycle, you're going to need to version it. If you have other people building clients for it, you're going to need to document it.

Subscribe on YouTube