The Evolving Definition of 'Big Data' -- Visual Studio Magazine

The Evolving Definition of 'Big Data'

By John K. Waters
04/10/2012

While there's lots of talk about big data these days (a lot of talk), there currently is no good, authoritative definition of big data, according to Microsoft Regional Director and Visual Studio Magazine columnist Andrew Brust.

"It's still working itself out," Brust says. "Like any product in a good hype cycle, the malleability of the term is being used by people to suit their agendas. And that's okay; there's a definition evolving."

Still, Brust, who will be speaking about big data and Microsoft at the upcoming Visual Studio Live! New York conference, says that a few consistent big data characteristics have emerged.

For one, it can't be big data if it isn't...well...big.

"We're talking about at least hundreds of terabytes," Brust explains. "Definitely not gigabytes. If it's not petabytes, we're getting close, and people are talking about exabytes and zettabytes. For now at least, if it's too big for a transactional system, you can legitimately call it big data. But that threshold is going to change as transactional systems evolve."

But big data also has "velocity," meaning that it's coming in an unrelenting stream. And it comes from a wide range of sources, including unstructured, non-relational sources -- click-stream data from Web sites, blogs, tweets, follows, comments and all the assets that come out of social media, for example.

Also, the big data conversation almost always includes Hadoop, Brust Says. The Hadoop Framework is an open source distributed computing platform designed to allow implementations of MapReduce to run on large clusters of commodity hardware. Google's MapReduce is a programming model for processing and generating large data sets. It supports parallel computations over large data sets on unreliable computer clusters.

"The truth is, we've always had Big Data, we just haven't kept it," says Brust, who is also the founder and CEO of Blue Badge Insights. "It hasn't been archived and used for analysis later on. But because storage has become so much cheaper, and because of Hadoop, we can now use inexpensive commodity hardware to do distributed processing on that data, and it's now financially feasible to hold the data and analyze it."

"Ultimately the value Microsoft is trying to provide is to connect the open-source Big Data world (Hadoop) with the more enterprise friendly Microsoft BI (business intelligence) world," Brust says.

For more on this topic, check out the links below or see Andrew Brust speak at an upcoming Visual Studio Live! event:

Microsoft Says It's Serious About Hadoop

Big Data and SQL Server: Disruption or Harmony?

Under the Hood With SQL Server 2012

About the Author

John K. Waters is the editor in chief of a number of Converge360.com sites, with a focus on high-end development, AI and future tech. He's been writing about cutting-edge technologies and culture of Silicon Valley for more than two decades, and he's written more than a dozen books. He also co-scripted the documentary film Silicon Valley: A 100 Year Renaissance, which aired on PBS. He can be reached at [email protected].

Printable Format

comments powered by Disqus

Featured

You Can Now Apply for Early-Stage AI Agent 'Computer Use' in Copilot Studio

On the way to autonomous AI, Microsoft announced an early access research preview of "computer use" for Copilot Studio wherein AI agents visually interact with any app or website -- clicking, typing, and navigating like a human.
KaleidoSearch Embeds dtSearch Engine

dtSearch, a specialist in enterprise and developer text retrieval software and document filters, announced it's now powering an upgrade to KaleidoSearch from Contegra Systems.
.NET 10 Preview 3 Adds Native Container Publishing

While Preview 3 comes with a wide array of incremental improvements across performance, libraries, CLI tooling, ASP.NET Core/Blazor and more, it doesn't introduce any major new features.
Trending Model Context Protocol for AI Agents Gets C# SDK

The Model Context Protocol (MCP) for agentic AI has gained much traction since being introduced by Anthropic last November, and now it has a C# SDK.
As Agentic AI Explodes, Microsoft Announces MS365 Copilot Agent Debugging

Microsoft announced agent debugging functionality for Microsoft 365 Copilot directly from the AI tool itself, no Visual Studio 2022 or Visual Studio Code needed.