HDInsight Gets Hadoop Upgrade
Microsoft today announced its cloud-based Hadoop service, HDInsight, now supports Hadoop 2.4, the latest version of the Big Data software.
Unveiled in October 2012, HDInsight exemplifies Microsoft's embrace of the Big Data movement and -- more generally -- its increasing involvement in open source technologies of all kinds. Microsoft partners with Hadoop heavyweight Hortonworks Inc. to provide the 100 percent Hadoop-compatible service on its Microsoft Azure platform, based on the Hortonworks Hadoop distribution.
Apache Hadoop 2.4, the latest update of the open source framework that's synonymous with Big Data, was released in April with enhancements to the often-criticized Hadoop Distributed File System (HDFS). The latest release also includes improvements to YARN -- sometimes referred to as "yet another resource negotiator" -- which is also described as the successor to the even-more-criticized MapReduce technology, a key component of the original Hadoop ecosystem.
Various industry efforts aim to improve upon the constraints of the batch-oriented MapReduce with more modern analytics features such interactive queries on streaming data. YARN offers more interaction patterns with HDFS data and provides a more generalized processing platform beyond the MapReduce technology.
"This update includes interactive querying with Hive using advancements based on SQL Server technology, which we are also contributing back to the Hadoop ecosystem through project Stinger," Microsoft said in an announcement on the SQL Server Blog. "With this update to HDInsight, customers can use the speed and scale of the cloud to gain a 100x response time improvement."
Hive is a Hadoop-based data warehousing project also under the auspices of the Apache Software Foundation that allows data queries with its own SQL-like language. Stinger is a community project shepherded by Hortonworks to improve upon Hive with faster performance, increased scale and broader SQL support.
As noted by Oliver Chiu on the Microsoft Azure Blog, HDInsight is also getting an easy-to-use Web UI, letting developers graphically query Hive data.
The SQL Server team used the HDInsight announcement to highlight Microsoft's growing interaction with the open source community.
"We have fully embraced the Hadoop ecosystem and have prioritized contributing back to the community and Apache Hadoop-related projects, for example, Tez, Stinger and Hive," the post said. "All told, we've contributed 30,000 lines of code and put in 10,000-plus engineering hours to support these projects, including the porting of Hadoop to Windows. We've done this in partnership with Hortonworks, a relationship that ensures our Hadoop solutions are based on compatible implementations of Hadoop. One of the results of that partnership is the engineering work that has led to the Hortonworks Data Platform for Windows and Azure HDInsight."
The news came during the ongoing Hadoop Summit, at which T. K. Rengarajan, Microsoft corporate vice president of Data Platform, delivered the keynote address today.
David Ramel is the editor of Visual Studio Magazine.