Hadoop+Spark Heat Template
Parent page: OpenStack VM Setups
- Creates a cluster with Hadoop and Spark installed and configured. It is configured to allow jobs to be submitted using the YARN job scheduler. There is also an example folder included named
big-data-exampleswhere examples of MapReduce and Spark jobs can be found. Examination of
Makefiles show how to build and use the various examples.
- Heat Template
- or for a configuration which includes the ganglia cluster monitoring software:
- Compatible Images
- ubuntu-server-14.04-amd64 (Arbutus)
- Minimum Required OpenStack Version
- Creation time can vary from 5-10 minutes, to more than an hour depending on the number of nodes in your cluster.
- Known not to work with the new Ubuntu Xenial image.
- The logs from an application (with this specific configuration) are stored in HDFS. You can view logs from your applications using
[name@server ~]$ yarn logs -applicationId <applicationId>
<applicationId>should be replaced with your application ID. Your application ID can be found from the yarn job scheduler page; the link for which is provided on the OpenStack stack overview page under the Hadoop+Spark cluster stack. The application ID is also printed out when you submit your job and has the form of
application_#############_####. The spark logs can be quite verbose and it can be difficult to identify output written out by your job. The standard output from your job is prefixed with "stdout" and the number of characters to follow in that output.