Big Data on Windows Gets Boost with Hortonworks Update
Microsoft dropped offering its own version of Hadoop for on-premises deployments.
Hadoop on Windows got its latest update, including a new data OS, storage options and speedier queries for data warehousing.
The Hortonworks Data Platform (HDP) 2.0 for Windows supports the Apache Hadoop 2.2 platform for "big data"-type implementations. It comes with various Hadoop components (Hive, Pig and Sqoop, among others) and is certified to work with Windows Server 2008 R2 and Windows Server 2012 R2. Hadoop, as a big data implementation, is typically used to analyze terabytes of data, such as querying piles of Web clickstream data for insights.
This HDP 2.0 release is notable for supporting Apache Hadoop YARN, which Hortonworks describes as the "data operating system for Hadoop 2.0" that provides support for batch, interactive, online and streaming data processing. YARN is an Apache Software Foundation subproject that repackages the resource management capabilities of MapReduce. It's compatible with MapReduce, as well as other workloads, and is designed to support larger clusters, according to Hortonworks' description.
Another feature of HDP 2.0 for Windows is its support of the Stinger initiative to speed up queries to Apache Hive, which is a data warehouse solution for Hadoop. Hortonworks describes Stinger as speeding up and scaling up interactive Hive queries using SQL terms. Some of the speed improvements promised by Stinger will rely on introducing an Apache Software Foundation's incubator project called "Tez," but for now Stinger aims at supporting "interactive SQL queries at petabyte scale."
A third highlight in HDP 2.0 for Windows is support for Apache HBase 0.96, which is Hadoop's nonrelational distributed storage solution. The HBase 0.96 release fixes more than 2,000 issues, according to Hortonworks' description.
It's possible to perform in-place upgrades of clusters from HDP 1.3 to HDP 2.0, according to Hortonworks. Moreover, after the upgrade, existing Hive, Pig and MapReduce jobs will be capable of running on HDP 2.0 for Windows, the company claims.
Hortonworks and Microsoft have been long-time collaborators on enabling Apache Hadoop compatibility on Windows Server. Hadoop is an open source Apache Software Foundation product for processing large datasets across computer clusters. Much of the Hadoop work was initially pioneered at Yahoo, and some of those Yahoo team members later joined Hortonworks.
Microsoft touts the Hortonworks' HDP solution as "the industry's only 100% Apache Hadoop-based distribution for Windows." Microsoft dropped offering its own version for on-premises deployments, which was called "HDInsight Server for Windows," according to an article by veteran Microsoft watcher Mary Jo Foley. Instead, it currently offers Hadoop cloud computing capabilities via its "Windows Azure HDInsight" service.
Organizations considering moving their premises-built HDP solutions to an external datacenter can "easily" do that, according to Microsoft, and that includes Microsoft's Windows Azure service. In addition, Microsoft's Power BI for Office 365 business intelligence tools can be used with either HDP 2.0 for Windows or the Windows Azure HDInsight service.
Kurt Mackie is senior news producer for the 1105 Enterprise Computing Group.