Cloudera Releases Desktop UI for Hadoop
Cloudera, a commercial distributor of the Hadoop data storage and processing platform, late last week released a unified graphical user interface for applications based on the open-source framework for building and running scalable Web applications.
The Cloudera Desktop UI is designed to simplify cluster management and job development for developers using Hadoop, an effort by the Apache Project that has rapidly gained in popularity among enterprise developers over the past year. With the release of the Cloudera Desktop, the company is seeking to extend the reach of Hadoop.
"We’re trying to expand the user base for Hadoop to non-developers and beginning developers," said Jeff Hammerbacher, VP of products at Cloudera. "And rather than delivering separate management and developer tools, we thought giving both a uniform look and feel would be the most useful."
Hadoop is a Java-based open-source framework for data-intensive distributed computing, offered under the Apache license. The Hadoop Framework is an open-source distributed computing platform designed to support parallel computations over large data sets on so-called unreliable computer clusters. It’s based on Google’s MapReduce, a programming model for processing and generating large data sets, which divides an application into multiple units of work, each of which can be executed on any node in a server cluster. Hadoop supports the HDFS distributed file system, which is designed to scale to petabytes of storage and to run on top of the file systems of the underlying OS.
Yahoo has been one of the biggest proponents of Hadoop. According to Shelton Shugar, Yahoo’s senior VP of engineering, cloud computing, and data infrastructure, Hadoop runs on tens of thousands of servers inside the company, which makes the Yahoo Hadoop distribution the largest in the world.
Built on Cloudera’s own commercial Hadoop distribution, this first release of Cloudera Desktop comes with four components:
- File browser, which is essentially a Finder- or Windows Explorer-like interface for copying and browsing data files stored in a Hadoop distributed file system. "It allows users to interact with a file system that’s running on clusters of tens to thousands of nodes the same way they would interact with a file system running locally on their laptop," Hammerbacher said.
- Job submission tool or job designer, a tool for creating MapReduce jobs and submitting them to the cluster from the Web interface. "You can do it without having to download any client software," Hammerbacher said. "We think this is going to help bring Hadoop to more people inside the enterprise."
- Job browser, which lets users view and drill down into jobs running on Hadoop clusters. "The job browser provides a way to visualize what jobs are running, and then drill down and get more detail at the task level," he said.
- Cluster health dashboard, for monitoring the condition of a Hadoop cluster and alerting operators in case of problems. "We expect people to have this open at all times," Hammerbacher said. "It provides a quick glance at how everything is doing with my cluster."
Cloudera Desktop runs inside the Web browser -- which isn’t surprising, given that the company’s founders come from Facebook, Yahoo, and Google. The browser-based UI eliminates the need to update client software, Hammerbacher said, and "you don’t have to punch a new hole through the firewall -- you can just use Port80 like everybody else."
The company is also working on a Desktop API, which will allow developers to create Hadoop apps that integrate the UI. Cloudera is currently working with partners to stabilize the API before publishing it for general use, Hammerbacher said.
Although this release of the Cloudera Desktop is aimed at novice and non-developers, it offers seasoned coders a way to make the case for the platform, Hammerbacher said. "If you’re a developer who really likes working with Hadoop but you’re having a hard time selling it internally, Cloudera Desktop gives you a way to turn to your compatriots and say, 'Look, this Hadoop thing isn’t rocket science. Just use this UI. I’ll build your jobs for you and you can go run them.'"
The initial release of Cloudera Desktop will work only with the Cloudera distro of Hadoop. "That’s primarily because of the changes we had to make to the open source Hadoop project to enable it to display this information to this interface were substantial," Hammerbacher said. "We’re still working with the open source community to give those changes back to the project. We have every intention of making it possible in the next 6 to 12 months for Cloudera Desktop to run against any distribution of Hadoop."
Cloudera Desktop is free; the company expects it to drive adoption of its own open source Hadoop distribution. Cloudera launched its own distribution of Hadoop last spring. Hammerbacher calls it the "open-source core of the company." The company has promised to deliver several commercial products based on that distro over the next few months.
The company launched Cloudera Desktop at Hadoop World, held last week in New York. It's the company's first release of a Hadoop-based commercial product. It comes on the heels of news that Doug Cutting, the creator of Hadoop, would be leaving Yahoo to join Cloudera. Yahoo hired Cutting in 2007 to work full-time on the Hadoop project. Cutting created the Lucene, an open-source information retrieval library with Mike Cafarella, and the Nutch, an open-source search engine based on it. Both projects are now managed through the Apache Software Foundation. Yahoo released to developers the source code for its internal distribution of Hadoop this summer, as reported (see Yahoo Releases Hadoop Source Code).
In August, Cutting announced that he would be leaving Yahoo to join Cloudera. In a blog posting, Cutting explained the reasoning for his move: "Going forward, Cloudera presents an opportunity to work with a wider range of Hadoop users. I hope to help synthesize these many voices into a project that best serves all."