News

The Evolving Definition of 'Big Data'

While there's lots of talk about big data these days (a lot of talk), there currently is no good, authoritative definition of big data, according to Microsoft Regional Director and Visual Studio Magazine columnist Andrew Brust.

"It's still working itself out," Brust says. "Like any product in a good hype cycle, the malleability of the term is being used by people to suit their agendas. And that's okay; there's a definition evolving."

Still, Brust, who will be speaking about big data and Microsoft at the upcoming Visual Studio Live! New York conference, says that a few consistent big data characteristics have emerged.

For one, it can't be big data if it isn't...well...big.

"We're talking about at least hundreds of terabytes," Brust explains. "Definitely not gigabytes. If it's not petabytes, we're getting close, and people are talking about exabytes and zettabytes. For now at least, if it's too big for a transactional system, you can legitimately call it big data. But that threshold is going to change as transactional systems evolve."

But big data also has "velocity," meaning that it's coming in an unrelenting stream. And it comes from a wide range of sources, including unstructured, non-relational sources -- click-stream data from Web sites, blogs, tweets, follows, comments and all the assets that come out of social media, for example.

Also, the big data conversation almost always includes Hadoop, Brust Says. The Hadoop Framework is an open source distributed computing platform designed to allow implementations of MapReduce to run on large clusters of commodity hardware. Google's MapReduce is a programming model for processing and generating large data sets. It supports parallel computations over large data sets on unreliable computer clusters.

"The truth is, we've always had Big Data, we just haven't kept it," says Brust, who is also the founder and CEO of Blue Badge Insights. "It hasn't been archived and used for analysis later on. But because storage has become so much cheaper, and because of Hadoop, we can now use inexpensive commodity hardware to do distributed processing on that data, and it's now financially feasible to hold the data and analyze it."

"Ultimately the value Microsoft is trying to provide is to connect the open-source Big Data world (Hadoop) with the more enterprise friendly Microsoft BI (business intelligence) world," Brust says.

For more on this topic, check out the links below or see Andrew Brust speak at an upcoming Visual Studio Live! event:

 

About the Author

John K. Waters is the editor in chief of a number of Converge360.com sites, with a focus on high-end development, AI and future tech. He's been writing about cutting-edge technologies and culture of Silicon Valley for more than two decades, and he's written more than a dozen books. He also co-scripted the documentary film Silicon Valley: A 100 Year Renaissance, which aired on PBS.  He can be reached at [email protected].

comments powered by Disqus

Featured

  • Hands On: New VS Code Insiders Build Creates Web Page from Image in Seconds

    New Vision support with GitHub Copilot in the latest Visual Studio Code Insiders build takes a user-supplied mockup image and creates a web page from it in seconds, handling all the HTML and CSS.

  • Naive Bayes Regression Using C#

    Dr. James McCaffrey from Microsoft Research presents a complete end-to-end demonstration of the naive Bayes regression technique, where the goal is to predict a single numeric value. Compared to other machine learning regression techniques, naive Bayes regression is usually less accurate, but is simple, easy to implement and customize, works on both large and small datasets, is highly interpretable, and doesn't require tuning any hyperparameters.

  • VS Code Copilot Previews New GPT-4o AI Code Completion Model

    The 4o upgrade includes additional training on more than 275,000 high-quality public repositories in over 30 popular programming languages, said Microsoft-owned GitHub, which created the original "AI pair programmer" years ago.

  • Microsoft's Rust Embrace Continues with Azure SDK Beta

    "Rust's strong type system and ownership model help prevent common programming errors such as null pointer dereferencing and buffer overflows, leading to more secure and stable code."

  • Xcode IDE from Microsoft Archrival Apple Gets Copilot AI

    Just after expanding the reach of its Copilot AI coding assistant to the open-source Eclipse IDE, Microsoft showcased how it's going even further, providing details about a preview version for the Xcode IDE from archrival Apple.

Subscribe on YouTube

Upcoming Training Events