Hadoop+Spark Heat Template

From CC Doc
Jump to: navigation, search
Other languages:
English • ‎français

Parent page: OpenStack VM Setups

Description
Creates a cluster with Hadoop and Spark installed and configured. It is configured to allow jobs to be submitted using the YARN job scheduler. There is also an example folder included named big-data-examples where examples of MapReduce and Spark jobs can be found. Examination of Makefiles show how to build and use the various examples.
Type
Heat Template
URL
https://raw.githubusercontent.com/cgeroux/heat-hadoop-spark/master/hadoop%2Bspark.yaml
or for a configuration which includes the ganglia cluster monitoring software:
https://raw.githubusercontent.com/cgeroux/heat-hadoop-spark/ganglia/hadoop%2Bspark.yaml
Compatible Images
ubuntu-server-14.04-amd64 (Arbutus)
Ubuntu_14.04_Trusty-amd64-20150708 (East)
Minimum Required OpenStack Version
Kilo
Notes
Creation time can vary from 5-10 minutes, to more than an hour depending on the number of nodes in your cluster.
Known not to work with the new Ubuntu Xenial image.
The logs from an application (with this specific configuration) are stored in HDFS. You can view logs from your applications using
Question.png
[name@server ~]$  yarn logs -applicationId <applicationId>
Where <applicationId> should be replaced with your application ID. Your application ID can be found from the yarn job scheduler page; the link for which is provided on the OpenStack stack overview page under the Hadoop+Spark cluster stack. The application ID is also printed out when you submit your job and has the form of application_#############_####. The spark logs can be quite verbose and it can be difficult to identify output written out by your job. The standard output from your job is prefixed with "stdout" and the number of characters to follow in that output.