News

The Evolving Definition of 'Big Data'

While there's lots of talk about big data these days (a lot of talk), there currently is no good, authoritative definition of big data, according to Microsoft Regional Director and Visual Studio Magazine columnist Andrew Brust.

"It's still working itself out," Brust says. "Like any product in a good hype cycle, the malleability of the term is being used by people to suit their agendas. And that's okay; there's a definition evolving."

Still, Brust, who will be speaking about big data and Microsoft at the upcoming Visual Studio Live! New York conference, says that a few consistent big data characteristics have emerged.

For one, it can't be big data if it isn't...well...big.

"We're talking about at least hundreds of terabytes," Brust explains. "Definitely not gigabytes. If it's not petabytes, we're getting close, and people are talking about exabytes and zettabytes. For now at least, if it's too big for a transactional system, you can legitimately call it big data. But that threshold is going to change as transactional systems evolve."

But big data also has "velocity," meaning that it's coming in an unrelenting stream. And it comes from a wide range of sources, including unstructured, non-relational sources -- click-stream data from Web sites, blogs, tweets, follows, comments and all the assets that come out of social media, for example.

Also, the big data conversation almost always includes Hadoop, Brust Says. The Hadoop Framework is an open source distributed computing platform designed to allow implementations of MapReduce to run on large clusters of commodity hardware. Google's MapReduce is a programming model for processing and generating large data sets. It supports parallel computations over large data sets on unreliable computer clusters.

"The truth is, we've always had Big Data, we just haven't kept it," says Brust, who is also the founder and CEO of Blue Badge Insights. "It hasn't been archived and used for analysis later on. But because storage has become so much cheaper, and because of Hadoop, we can now use inexpensive commodity hardware to do distributed processing on that data, and it's now financially feasible to hold the data and analyze it."

"Ultimately the value Microsoft is trying to provide is to connect the open-source Big Data world (Hadoop) with the more enterprise friendly Microsoft BI (business intelligence) world," Brust says.

For more on this topic, check out the links below or see Andrew Brust speak at an upcoming Visual Studio Live! event:

 

About the Author

John K. Waters is the editor in chief of a number of Converge360.com sites, with a focus on high-end development, AI and future tech. He's been writing about cutting-edge technologies and culture of Silicon Valley for more than two decades, and he's written more than a dozen books. He also co-scripted the documentary film Silicon Valley: A 100 Year Renaissance, which aired on PBS.  He can be reached at [email protected].

comments powered by Disqus

Featured

  • Cloud-Focused .NET Aspire 9.1 Released

    Along with .NET 10 Preview 1, Microsoft released.NET Aspire 9.1, the latest update to its opinionated, cloud-ready stack for building resilient, observable, and configurable cloud-native applications with .NET.

  • Microsoft Ships First .NET 10 Preview

    Microsoft shipped .NET 10 Preview 1, introducing a raft of improvements and fixes across performance, libraries, and the developer experience.

  • C# Dev Kit Previews .NET Aspire Orchestration

    Microsoft's dev team has been busy updating the C# Dev Kit, a Visual Studio Code extension that enhances the C# development experience by providing tools for managing, debugging, and editing C# projects.

  • Hands On: New VS Code Insiders Build Creates Web Page from Image in Seconds

    New Vision support with GitHub Copilot in the latest Visual Studio Code Insiders build takes a user-supplied mockup image and creates a web page from it in seconds, handling all the HTML and CSS.

  • Naive Bayes Regression Using C#

    Dr. James McCaffrey from Microsoft Research presents a complete end-to-end demonstration of the naive Bayes regression technique, where the goal is to predict a single numeric value. Compared to other machine learning regression techniques, naive Bayes regression is usually less accurate, but is simple, easy to implement and customize, works on both large and small datasets, is highly interpretable, and doesn't require tuning any hyperparameters.

Subscribe on YouTube

Upcoming Training Events