News

Big Data on Windows Gets Boost with Hortonworks Update

Microsoft dropped offering its own version of Hadoop for on-premises deployments.

Hadoop on Windows got its latest update, including a new data OS, storage options and speedier queries for data warehousing.

The Hortonworks Data Platform (HDP) 2.0 for Windows supports the Apache Hadoop 2.2 platform for "big data"-type implementations. It comes with various Hadoop components (Hive, Pig and Sqoop, among others) and is certified to work with Windows Server 2008 R2 and Windows Server 2012 R2. Hadoop, as a big data implementation, is typically used to analyze terabytes of data, such as querying piles of Web clickstream data for insights.

This HDP 2.0 release is notable for supporting Apache Hadoop YARN, which Hortonworks describes as the "data operating system for Hadoop 2.0" that provides support for batch, interactive, online and streaming data processing. YARN is an Apache Software Foundation subproject that repackages the resource management capabilities of MapReduce. It's compatible with MapReduce, as well as other workloads, and is designed to support larger clusters, according to Hortonworks' description.

Another feature of HDP 2.0 for Windows is its support of the Stinger initiative to speed up queries to Apache Hive, which is a data warehouse solution for Hadoop. Hortonworks describes Stinger as speeding up and scaling up interactive Hive queries using SQL terms. Some of the speed improvements promised by Stinger will rely on introducing an Apache Software Foundation's incubator project called "Tez," but for now Stinger aims at supporting "interactive SQL queries at petabyte scale."

A third highlight in HDP 2.0 for Windows is support for Apache HBase 0.96, which is Hadoop's nonrelational distributed storage solution. The HBase 0.96 release fixes more than 2,000 issues, according to Hortonworks' description.

It's possible to perform in-place upgrades of clusters from HDP 1.3 to HDP 2.0, according to Hortonworks. Moreover, after the upgrade, existing Hive, Pig and MapReduce jobs will be capable of running on HDP 2.0 for Windows, the company claims.

Hortonworks and Microsoft have been long-time collaborators on enabling Apache Hadoop compatibility on Windows Server. Hadoop is an open source Apache Software Foundation product for processing large datasets across computer clusters. Much of the Hadoop work was initially pioneered at Yahoo, and some of those Yahoo team members later joined Hortonworks.

Microsoft touts the Hortonworks' HDP solution as "the industry's only 100% Apache Hadoop-based distribution for Windows." Microsoft dropped offering its own version for on-premises deployments, which was called "HDInsight Server for Windows," according to an article by veteran Microsoft watcher Mary Jo Foley. Instead, it currently offers Hadoop cloud computing capabilities via its "Windows Azure HDInsight" service.

Organizations considering moving their premises-built HDP solutions to an external datacenter can "easily" do that, according to Microsoft, and that includes Microsoft's Windows Azure service. In addition, Microsoft's Power BI for Office 365 business intelligence tools can be used with either HDP 2.0 for Windows or the Windows Azure HDInsight service.

About the Author

Kurt Mackie is senior news producer for 1105 Media's Converge360 group.

comments powered by Disqus

Featured

  • Hands On: New VS Code Insiders Build Creates Web Page from Image in Seconds

    New Vision support with GitHub Copilot in the latest Visual Studio Code Insiders build takes a user-supplied mockup image and creates a web page from it in seconds, handling all the HTML and CSS.

  • Naive Bayes Regression Using C#

    Dr. James McCaffrey from Microsoft Research presents a complete end-to-end demonstration of the naive Bayes regression technique, where the goal is to predict a single numeric value. Compared to other machine learning regression techniques, naive Bayes regression is usually less accurate, but is simple, easy to implement and customize, works on both large and small datasets, is highly interpretable, and doesn't require tuning any hyperparameters.

  • VS Code Copilot Previews New GPT-4o AI Code Completion Model

    The 4o upgrade includes additional training on more than 275,000 high-quality public repositories in over 30 popular programming languages, said Microsoft-owned GitHub, which created the original "AI pair programmer" years ago.

  • Microsoft's Rust Embrace Continues with Azure SDK Beta

    "Rust's strong type system and ownership model help prevent common programming errors such as null pointer dereferencing and buffer overflows, leading to more secure and stable code."

  • Xcode IDE from Microsoft Archrival Apple Gets Copilot AI

    Just after expanding the reach of its Copilot AI coding assistant to the open-source Eclipse IDE, Microsoft showcased how it's going even further, providing details about a preview version for the Xcode IDE from archrival Apple.

Subscribe on YouTube

Upcoming Training Events