News

Big Data on Windows Gets Boost with Hortonworks Update

Microsoft dropped offering its own version of Hadoop for on-premises deployments.

Hadoop on Windows got its latest update, including a new data OS, storage options and speedier queries for data warehousing.

The Hortonworks Data Platform (HDP) 2.0 for Windows supports the Apache Hadoop 2.2 platform for "big data"-type implementations. It comes with various Hadoop components (Hive, Pig and Sqoop, among others) and is certified to work with Windows Server 2008 R2 and Windows Server 2012 R2. Hadoop, as a big data implementation, is typically used to analyze terabytes of data, such as querying piles of Web clickstream data for insights.

This HDP 2.0 release is notable for supporting Apache Hadoop YARN, which Hortonworks describes as the "data operating system for Hadoop 2.0" that provides support for batch, interactive, online and streaming data processing. YARN is an Apache Software Foundation subproject that repackages the resource management capabilities of MapReduce. It's compatible with MapReduce, as well as other workloads, and is designed to support larger clusters, according to Hortonworks' description.

Another feature of HDP 2.0 for Windows is its support of the Stinger initiative to speed up queries to Apache Hive, which is a data warehouse solution for Hadoop. Hortonworks describes Stinger as speeding up and scaling up interactive Hive queries using SQL terms. Some of the speed improvements promised by Stinger will rely on introducing an Apache Software Foundation's incubator project called "Tez," but for now Stinger aims at supporting "interactive SQL queries at petabyte scale."

A third highlight in HDP 2.0 for Windows is support for Apache HBase 0.96, which is Hadoop's nonrelational distributed storage solution. The HBase 0.96 release fixes more than 2,000 issues, according to Hortonworks' description.

It's possible to perform in-place upgrades of clusters from HDP 1.3 to HDP 2.0, according to Hortonworks. Moreover, after the upgrade, existing Hive, Pig and MapReduce jobs will be capable of running on HDP 2.0 for Windows, the company claims.

Hortonworks and Microsoft have been long-time collaborators on enabling Apache Hadoop compatibility on Windows Server. Hadoop is an open source Apache Software Foundation product for processing large datasets across computer clusters. Much of the Hadoop work was initially pioneered at Yahoo, and some of those Yahoo team members later joined Hortonworks.

Microsoft touts the Hortonworks' HDP solution as "the industry's only 100% Apache Hadoop-based distribution for Windows." Microsoft dropped offering its own version for on-premises deployments, which was called "HDInsight Server for Windows," according to an article by veteran Microsoft watcher Mary Jo Foley. Instead, it currently offers Hadoop cloud computing capabilities via its "Windows Azure HDInsight" service.

Organizations considering moving their premises-built HDP solutions to an external datacenter can "easily" do that, according to Microsoft, and that includes Microsoft's Windows Azure service. In addition, Microsoft's Power BI for Office 365 business intelligence tools can be used with either HDP 2.0 for Windows or the Windows Azure HDInsight service.

About the Author

Kurt Mackie is senior news producer for 1105 Media's Converge360 group.

comments powered by Disqus

Featured

  • Microsoft Revamps Fledgling AutoGen Framework for Agentic AI

    Only at v0.4, Microsoft's AutoGen framework for agentic AI -- the hottest new trend in AI development -- has already undergone a complete revamp, going to an asynchronous, event-driven architecture.

  • IDE Irony: Coding Errors Cause 'Critical' Vulnerability in Visual Studio

    In a larger-than-normal Patch Tuesday, Microsoft warned of a "critical" vulnerability in Visual Studio that should be fixed immediately if automatic patching isn't enabled, ironically caused by coding errors.

  • Building Blazor Applications

    A trio of Blazor experts will conduct a full-day workshop for devs to learn everything about the tech a a March developer conference in Las Vegas keynoted by Microsoft execs and featuring many Microsoft devs.

  • Gradient Boosting Regression Using C#

    Dr. James McCaffrey from Microsoft Research presents a complete end-to-end demonstration of the gradient boosting regression technique, where the goal is to predict a single numeric value. Compared to existing library implementations of gradient boosting regression, a from-scratch implementation allows much easier customization and integration with other .NET systems.

  • Microsoft Execs to Tackle AI and Cloud in Dev Conference Keynotes

    AI unsurprisingly is all over keynotes that Microsoft execs will helm to kick off the Visual Studio Live! developer conference in Las Vegas, March 10-14, which the company described as "a must-attend event."

Subscribe on YouTube