Data Driver

Blog archive

SQL Continues to Crash the Big Data Party

Regardless of the future of the Microsoft ecosystem (and those latest quarterly numbers should slow the naysayers some), data developers can rest easy knowing their SQL Server skills are transferable in the New Data Order.

The latest example is yesterday's open sourcing of a new distributed SQL query engine for Big Data developed by Facebook, called Presto.

It was designed to improve upon existing solutions for Big Data analytics such as Hadoop MapReduce and Hive, Facebook's Martin Traverso said in a post announcing the move to GitHub. "Presto is a distributed SQL query engine optimized for ad-hoc analysis at interactive speed," Traverso said. "It supports standard ANSI SQL, including complex queries, aggregations, joins and window functions."

Traverso said Presto has provided performance gains of up to 10 times more than equivalent Hive/MapReduce tools in CPU efficiency and latency for most queries. While it doesn't run on Windows, "It currently supports a large subset of ANSI SQL, including joins, left/right outer joins, subqueries, and most of the common aggregate and scalar functions, including approximate distinct counts (using HyperLogLog) and approximate percentiles (based on quantile digest)," he said.

Yes, SQL isn't going anywhere. It has withstood challenges in one form or another from other Relational Database Management Systems such as Oracle, branch movements such as MySQL, hybrids like the NoSQL movement and so on. The Big Data onslaught seemed to be stealing much of its mindshare, but the pendulum is swinging back. The problem was that the specialized Hadoop-based solutions often proved too cumbersome to quickly and efficiently glean meaningful analytics from the vast new data stores.

"This enormous knowledge gap in accessing Big Data in Hadoop has prompted an avalanche of vendors to offer SQL-on-Hadoop solutions, which increase the accessibility of Hadoop and allow organizations to reuse their investment learning in SQL," stated a Gigaom report titled "Sector RoadMap: SQL-on-Hadoop platforms in 2013."

"SQL is widely known by most business analysts," the report continued. "Many nontechnical staff without a programming background can write SQL and use traditional business intelligence (BI) tools like Tableau, MicroStrategy, and Business Objects to query data."

Further evidence of SQL's solid positioning came in a recent presentation by Roger Magoulas, research director at O'Reilly Media, at the Strata Conference + Hadoop World event. He spoke about "the state of data science as a profession." An O'Reilly salary survey conducted last year reported that the top tool in use by the responding data scientists was SQL. "I guess it's not a surprise ... we heard some of the other speakers talk about it ... that SQL is still the top thing being used," Magoulas said. His accompanying slide proclaimed "SQL Rules" and indicated 71 percent of respondents reported it as their data science tool of choice. Hadoop was a surprising No. 5 at 35 percent. SQL users also fared well in the salary level findings, but averaged below the far-fewer number of Hadoop specialists.

You can expect to soon hear about more SQL-related Big Data initiatives, joining high-profile efforts such as Teradata's Enterprise Access for Hadoop; Cloudera's Impala; IBM's Big SQL Technology Preview; Hortonworks' Stinger; and many, many more. Stay tuned.

What do you think of the SQL-on-Hadoop and other SQL-related Big Data technologies? Comment here or drop me a line.

Posted by David Ramel on 11/07/2013 at 12:50 PM


comments powered by Disqus

Featured

  • How to Do Machine Learning Evolutionary Optimization Using C#

    Resident data scientist Dr. James McCaffrey of Microsoft Research turns his attention to evolutionary optimization, using a full code download, screenshots and graphics to explain this machine learning technique used to train many types of models by modeling the biological processes of natural selection, evolution, and mutation.

  • Old Stone Wall Graphic

    Visual Studio Code Boosts Java Dependency Viewer

    Easier management of project code dependencies and improvements to extensions for popular Java frameworks and runtimes highlight the February update to Java in Visual Studio Code functionality.

  • Blule Squares

    Visual Studio 2019 for Mac 8.5 Preview Adds ASP.NET Core Authentication

    Microsoft, after shipping Visual Studio 2019 for Mac v8.4 with support for ASP.NET Core Blazor Server applications last month, is now previewing the v8.5 series, adding new authentication templates for ASP.NET Core along with other improvements.

  • Q&A with Brice Wilson: What's New in Angular 9

    We caught up with expert web developer/trainer Brice Wilson to get his take on Angular, which always appears at or near the top of periodic rankings of the most popular JavaScript-based web development frameworks.

  • Entity Framework Core Migrations

    Eric Vogel uses code samples and screenshots to demonstrate how to use Entity Framework Core migrations in a .NET Core application through the command line and in code.

.NET Insight

Sign up for our newsletter.

Terms and Privacy Policy consent

I agree to this site's Privacy Policy.

Upcoming Events