Running jobs

From CC Doc
Jump to: navigation, search
This page contains changes which are not marked for translation.

Other languages:
English • ‎français


This article is a draft

This is not a complete article: This is a Draft, a work in progress that is intended to be published into an article, which may or may not be ready for inclusion in the main wiki. It should not necessarily be considered factual or authoritative.



Overview

What's a job?

On computers we are most often familiar with graphical user interfaces (GUIs). There are windows, menus, buttons; we click here and there and the system responds. On Compute Canada servers the environment is different. To begin with, you control it by typing, not clicking. This is called a command line interface. Furthermore, a program you would like to run may not begin immediately, but may instead be put in a waiting list. When the necessary CPU cores are available it will begin, otherwise jobs would interfere with each other leading to performance loss.

You prepare a small text file called a job script that basically says what program to run, where to get the input, and where to put the output. You submit this job script to a piece of software called the scheduler which decides when and where it will run. Once the job has finished you can retrieve the results of the calculation. Normally there is no interaction between you and the program while the job is running, although you can check on its progress if you wish.

Here's a very simple job script:

File : simple_job.sh

#!/bin/bash
#SBATCH --account=your-project-cpu
#SBATCH --time=00:01:00
echo 'Hello, world!'
sleep 30


It runs the programs echo and sleep, there is no input, and the output will go to a default location. The lines starting with '#' are directives to the scheduler, telling it things about what the job needs to run. You must to substitute your (or your supervisor's) project name where your-project appears above.

The job scheduler

The job scheduler is a piece of software with multiple responsibilities. It must

  • maintain a database of all jobs that were submitted until they have finished,
  • enforce policies regarding limits and priorities,
  • ensure resources are not overloaded, for example by only assigning a CPU core to one job at a time,
  • decide which jobs to run and on which compute nodes,
  • launch them on those nodes,
  • clean up after each job finishes, and
  • maintain logs for accounting and troubleshooting.

On Compute Canada systems, these responsibilities are handled by the Slurm Workload Manager.

Requesting resources

You use the job script to ask for the resources needed to run your job. Two resources associated with every job are the time needed to complete the work and the number of processors. In the example above, the time requested is one minute. The number of processors defaults to one if you specify nothing. We describe below how to request more than one processor, or other resources such as the amount of memory, or special types of processors like GPUs.

It is important to specify those parameters well. If you ask for less than the job needs, it will be killed for exceeding the requested time or memory limit. If you ask for more than the job needs, it may wait longer than necessary before it starts, and once running it will prevent others from using those resources.

A basic SLURM job

We can submit the simple_job.sh shown above with sbatch:

[someuser@host ~]$ sbatch simple_job.sh
Submitted batch job 1234
[someuser@host ~]$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
              1234 mem12_sho simple_j someuser  R       0:03      1 zeno001
[someuser@host ~]$ cat slurm-1234.out
Hello, world!

Look at the ST column in the output of squeue to determine the status of your jobs. The two most common states are "PD" for "pending" or "R" for "running". When the job has finished it no longer appears in the squeue output.

Notice that each job is assigned a "job ID", a unique identification number printed when you submit the job --- 1234 in this example. You can have more than one job in the system at a time, and the ID number can be used to distinguish them even if they have the same name. And finally, because we didn't specify anywhere else to put it the output is placed in a file named with the same job ID number, slurm-1234.out.

You can also specify options to sbatch on the command line. So for example,

[someuser@host ~]$ sbatch --account=another-project-cpu simple_job.sh 

will change the group to which the work is accounted from "your-project" to "another-project". Any option can be overridden in this way.

Choosing where the output goes

If you want the output file to have a more distinctive name than slurm-1234.out, you can use -o to change it. The following script sets a job name which will appear in squeue output, and sends the output to a file prefixed with the job name and containing the job ID number.


File : name_output.sh

#!/bin/bash
#SBATCH --time=00:01:00
#SBATCH --job-name=test
#SBATCH -o test-%J.out
echo 'Hello, world!'


Error output will normally appear in the same file, just as it would if you were typing commands interactively. If you wish you can split the standard error channel (stderr) from the standard output channel (stdout) by using the -e option much like -o, but this is only rarely necessary.

Examples of job scripts

MPI job

This example script launches four MPI processes, each with 1024 MB of memory. The run time is limited to 5 minutes.


File : mpi_job.sh

#!/bin/bash
#SBATCH --ntasks 4               # number of MPI processes
#SBATCH --mem-per-cpu=1024M      # memory; default unit is megabytes
#SBATCH --time 0-00:05           # time (D-HH:MM)
srun ./mpi_program


One can have detailed control over the location of MPI tasks by requesting specific numbers of cores per node (for example). Hybrid MPI/threaded jobs are also possible. For more on these and other options relating to distributed parallel jobs, see Running MPI jobs with SLURM.

Threaded or OpenMP job

This example script launches a single process with six CPU cores. Bear in mind that the application ompHello must be compiled with the appropriate flag, e.g. gcc -fopenmp ... or icc -openmp ...


File : openmp_job.sh

#!/bin/bash
#SBATCH --time=0-0:5
#SBATCH --cpus-per-task=6
export OMP_NUM_THREADS=$SLURM_CPUS_ON_NODE
./ompHello


For more on writing and running parallel programs with OpenMP, see OpenMP.

GPU job

This example is a serial GPU job with one GPU allocated, a memory limit of 4000 MB, and a run-time limit of 5 hours. Output filename will include name of first node used and job ID number.


File : simple_gpu_job.sh

#!/bin/bash
#SBATCH --gres=gpu:1              # request gpu "generic resource"
#SBATCH --mem=4000M               # total memory
#SBATCH --time 0-05:00            # time (D-HH:MM)
#SBATCH -o %N-%j.out              # output, %N for node name, %j for jobID
nvidia-smi


For more on running GPU jobs, see ...

Array job

Also known as a task array, an array job is a way to submit a whole set of jobs with one command. The individual jobs in the array are distinguished by an environment variable, SLURM_ARRAY_TASK_ID, which is set to a different value for each instance of the job.

sbatch --array=0-7 ...      # SLURM_ARRAY_TASK_ID will take values from 0 to 7 inclusive
sbatch --array=1,3,5,7 ...  # SLURM_ARRAY_TASK_ID will take the listed values
sbatch --array=1-7:2 ...    # Another way to do the same thing

Interactive jobs

Though batch submission is the most common and most efficient way to take advantage of our systems, interactive jobs are also supported. These can be useful for things like:

  • Data exploration at the command line
  • Interactive "console tools" like R and iPython
  • Significant software development, debugging, or compiling

You can start an interactive session on a compute node with srun. In this example we request two tasks (that is, two CPU cores) for an hour:

name@head$ srun --time=1:0:0 --ntasks=2 --pty bash 
name@node01$ ...             # do some work
name@node01$ exit            # log out of the compute node

For more details see Interactive jobs.

Monitoring jobs

By default squeue will show all jobs in the system. It may run much faster if you ask only about your own jobs with

squeue -u <username>

You can show only running jobs, or only pending jobs:

squeue -u <username> -t RUNNING
squeue -u <username> -t PENDING

You can show detailed information for a specific job with scontrol:

scontrol show jobid -dd <jobid>

Find information about a completed job with sacct, and optionally, control what it prints using --format:

sacct -j <jobid>
sacct -j <jobid> --format=JobID,JobName,MaxRSS,Elapsed

You can ask to be notified by email of certain job conditions by supplying options to sbatch:

#SBATCH --mail-user=<email_address>
#SBATCH --mail-type=BEGIN
#SBATCH --mail-type=END
#SBATCH --mail-type=FAIL
#SBATCH --mail-type=REQUEUE
#SBATCH --mail-type=ALL

Controlling jobs

Use scancel with the job ID to cancel a job:

 scancel <jobid>

You can also use it to cancel all your jobs, or all your pending jobs:

scancel -u <username>
scancel -t PENDING -u <username>

Troubleshooting

Avoid hidden characters in job scripts

Preparing a job script with a word processor instead of a text editor is a common cause of trouble. Best practice is to prepare your job script on the cluster using an editor such as nano, vim, or emacs. If you prefer to prepare or alter the script off-line, then:

  • Windows users:
    • Use a text editor such as Notepad or Notepad++.
    • After uploading the script, use dos2unix to change Windows end-of-line characters to Linux end-of-line characters.
  • Mac users:
    • Open a terminal window and use an editor such as nano, vim, or emacs.

External links

  • A "Rosetta stone" mapping commands and directives from PBS/Torque, SGE, LSF, and LoadLeveler, to SLURM.
  • Some SLURM tutorial materials: