News

Big Data on Windows Gets Boost with Hortonworks Update

Microsoft dropped offering its own version of Hadoop for on-premises deployments.

Hadoop on Windows got its latest update, including a new data OS, storage options and speedier queries for data warehousing.

The Hortonworks Data Platform (HDP) 2.0 for Windows supports the Apache Hadoop 2.2 platform for "big data"-type implementations. It comes with various Hadoop components (Hive, Pig and Sqoop, among others) and is certified to work with Windows Server 2008 R2 and Windows Server 2012 R2. Hadoop, as a big data implementation, is typically used to analyze terabytes of data, such as querying piles of Web clickstream data for insights.

This HDP 2.0 release is notable for supporting Apache Hadoop YARN, which Hortonworks describes as the "data operating system for Hadoop 2.0" that provides support for batch, interactive, online and streaming data processing. YARN is an Apache Software Foundation subproject that repackages the resource management capabilities of MapReduce. It's compatible with MapReduce, as well as other workloads, and is designed to support larger clusters, according to Hortonworks' description.

Another feature of HDP 2.0 for Windows is its support of the Stinger initiative to speed up queries to Apache Hive, which is a data warehouse solution for Hadoop. Hortonworks describes Stinger as speeding up and scaling up interactive Hive queries using SQL terms. Some of the speed improvements promised by Stinger will rely on introducing an Apache Software Foundation's incubator project called "Tez," but for now Stinger aims at supporting "interactive SQL queries at petabyte scale."

A third highlight in HDP 2.0 for Windows is support for Apache HBase 0.96, which is Hadoop's nonrelational distributed storage solution. The HBase 0.96 release fixes more than 2,000 issues, according to Hortonworks' description.

It's possible to perform in-place upgrades of clusters from HDP 1.3 to HDP 2.0, according to Hortonworks. Moreover, after the upgrade, existing Hive, Pig and MapReduce jobs will be capable of running on HDP 2.0 for Windows, the company claims.

Hortonworks and Microsoft have been long-time collaborators on enabling Apache Hadoop compatibility on Windows Server. Hadoop is an open source Apache Software Foundation product for processing large datasets across computer clusters. Much of the Hadoop work was initially pioneered at Yahoo, and some of those Yahoo team members later joined Hortonworks.

Microsoft touts the Hortonworks' HDP solution as "the industry's only 100% Apache Hadoop-based distribution for Windows." Microsoft dropped offering its own version for on-premises deployments, which was called "HDInsight Server for Windows," according to an article by veteran Microsoft watcher Mary Jo Foley. Instead, it currently offers Hadoop cloud computing capabilities via its "Windows Azure HDInsight" service.

Organizations considering moving their premises-built HDP solutions to an external datacenter can "easily" do that, according to Microsoft, and that includes Microsoft's Windows Azure service. In addition, Microsoft's Power BI for Office 365 business intelligence tools can be used with either HDP 2.0 for Windows or the Windows Azure HDInsight service.

About the Author

Kurt Mackie is senior news producer for 1105 Media's Converge360 group.

comments powered by Disqus

Featured

  • AI for GitHub Collaboration? Maybe Not So Much

    No doubt GitHub Copilot has been a boon for developers, but AI might not be the best tool for collaboration, according to developers weighing in on a recent social media post from the GitHub team.

  • Visual Studio 2022 Getting VS Code 'Command Palette' Equivalent

    As any Visual Studio Code user knows, the editor's command palette is a powerful tool for getting things done quickly, without having to navigate through menus and dialogs. Now, we learn how an equivalent is coming for Microsoft's flagship Visual Studio IDE, invoked by the same familiar Ctrl+Shift+P keyboard shortcut.

  • .NET 9 Preview 3: 'I've Been Waiting 9 Years for This API!'

    Microsoft's third preview of .NET 9 sees a lot of minor tweaks and fixes with no earth-shaking new functionality, but little things can be important to individual developers.

  • Data Anomaly Detection Using a Neural Autoencoder with C#

    Dr. James McCaffrey of Microsoft Research tackles the process of examining a set of source data to find data items that are different in some way from the majority of the source items.

  • What's New for Python, Java in Visual Studio Code

    Microsoft announced March 2024 updates to its Python and Java extensions for Visual Studio Code, the open source-based, cross-platform code editor that has repeatedly been named the No. 1 tool in major development surveys.

Subscribe on YouTube