News

Big Data on Windows Gets Boost with Hortonworks Update

Microsoft dropped offering its own version of Hadoop for on-premises deployments.

Hadoop on Windows got its latest update, including a new data OS, storage options and speedier queries for data warehousing.

The Hortonworks Data Platform (HDP) 2.0 for Windows supports the Apache Hadoop 2.2 platform for "big data"-type implementations. It comes with various Hadoop components (Hive, Pig and Sqoop, among others) and is certified to work with Windows Server 2008 R2 and Windows Server 2012 R2. Hadoop, as a big data implementation, is typically used to analyze terabytes of data, such as querying piles of Web clickstream data for insights.

This HDP 2.0 release is notable for supporting Apache Hadoop YARN, which Hortonworks describes as the "data operating system for Hadoop 2.0" that provides support for batch, interactive, online and streaming data processing. YARN is an Apache Software Foundation subproject that repackages the resource management capabilities of MapReduce. It's compatible with MapReduce, as well as other workloads, and is designed to support larger clusters, according to Hortonworks' description.

Another feature of HDP 2.0 for Windows is its support of the Stinger initiative to speed up queries to Apache Hive, which is a data warehouse solution for Hadoop. Hortonworks describes Stinger as speeding up and scaling up interactive Hive queries using SQL terms. Some of the speed improvements promised by Stinger will rely on introducing an Apache Software Foundation's incubator project called "Tez," but for now Stinger aims at supporting "interactive SQL queries at petabyte scale."

A third highlight in HDP 2.0 for Windows is support for Apache HBase 0.96, which is Hadoop's nonrelational distributed storage solution. The HBase 0.96 release fixes more than 2,000 issues, according to Hortonworks' description.

It's possible to perform in-place upgrades of clusters from HDP 1.3 to HDP 2.0, according to Hortonworks. Moreover, after the upgrade, existing Hive, Pig and MapReduce jobs will be capable of running on HDP 2.0 for Windows, the company claims.

Hortonworks and Microsoft have been long-time collaborators on enabling Apache Hadoop compatibility on Windows Server. Hadoop is an open source Apache Software Foundation product for processing large datasets across computer clusters. Much of the Hadoop work was initially pioneered at Yahoo, and some of those Yahoo team members later joined Hortonworks.

Microsoft touts the Hortonworks' HDP solution as "the industry's only 100% Apache Hadoop-based distribution for Windows." Microsoft dropped offering its own version for on-premises deployments, which was called "HDInsight Server for Windows," according to an article by veteran Microsoft watcher Mary Jo Foley. Instead, it currently offers Hadoop cloud computing capabilities via its "Windows Azure HDInsight" service.

Organizations considering moving their premises-built HDP solutions to an external datacenter can "easily" do that, according to Microsoft, and that includes Microsoft's Windows Azure service. In addition, Microsoft's Power BI for Office 365 business intelligence tools can be used with either HDP 2.0 for Windows or the Windows Azure HDInsight service.

About the Author

Kurt Mackie is senior news producer for 1105 Media's Converge360 group.

comments powered by Disqus

Featured

  • Compare New GitHub Copilot Free Plan for Visual Studio/VS Code to Paid Plans

    The free plan restricts the number of completions, chat requests and access to AI models, being suitable for occasional users and small projects.

  • Diving Deep into .NET MAUI

    Ever since someone figured out that fiddling bits results in source code, developers have sought one codebase for all types of apps on all platforms, with Microsoft's latest attempt to further that effort being .NET MAUI.

  • Copilot AI Boosts Abound in New VS Code v1.96

    Microsoft improved on its new "Copilot Edit" functionality in the latest release of Visual Studio Code, v1.96, its open-source based code editor that has become the most popular in the world according to many surveys.

  • AdaBoost Regression Using C#

    Dr. James McCaffrey from Microsoft Research presents a complete end-to-end demonstration of the AdaBoost.R2 algorithm for regression problems (where the goal is to predict a single numeric value). The implementation follows the original source research paper closely, so you can use it as a guide for customization for specific scenarios.

  • Versioning and Documenting ASP.NET Core Services

    Building an API with ASP.NET Core is only half the job. If your API is going to live more than one release cycle, you're going to need to version it. If you have other people building clients for it, you're going to need to document it.

Subscribe on YouTube