Status

Job Submission Structure

A job file, after invoking a shell (e.g., #!/bin/bash) consists of two bodies of commands. The first is the directives to the scheduler, indicated by lines starting with #SBATCH. These are interpeted by the shell as comments, but the Slurm scheduler understands them as directives. These directives include the resource requests of the job, such as the number of nodes, the number of cores, the maximum amount of time the job will run for, email notifications and so forth. Following this, and only following this, come the shell commands. This should include any modules that the job will need to load to run their job, and the commands that the job will run. This can include calls to other scripts.

The following should be considered when writing a job submission script:

  • All scripts must invoke a shall. In most cases this will be the bash shell; #!/bin/bash
  • The submission system has default values; a default walltime of 10 minutes (this will almost certainly need to be changed), a default partition (physical), a default number of tasks (1), a default number of CPUs per task (1), and a default number of nodes (1).
  • If a shared-memory multithreaded job is being submitted, or subprocesses to make use of additional cores requested, the job should be set to have the number of tasks = 1 and set a numer of CPUs-per-task equal to the number of threads desired, with a maximum equal to the node that the job is running on. For example: #SBATCH --ntasks=1 followed by #SBATCH --cpus-per-task=8 on a new line)
  • If a distributed-member message passing job is being submitted, the job can request more than a single compute node and multiple tasks. If the job has more tasks than available cores the scheduler will make an effort to have the cores contiguous (e.g., #SBATCH --ntasks=8). To force a specific number of cores per node, use the --ncpus-per-node= option.

Certain applications have their own built-in parallelisation approaches, which nevertheless require Slurm directives. MATLAB, for example, uses a "parallel pool" (parpool) approach. See /usr/local/common/MATLAB/parpoolm on Spartan for more information.

Job Arrays

Job arrays are great for kicking off a large number of independent jobs at once with the same job script. For instance, batch processing a series of files, and the processing for each file can be performed independently of any other. Connsider an example of array of files, data_1.dat to data_50.dat to process with myProgram:

#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --time=0-00:15:00
#SBATCH --array=1-50
myProgram data_${SLURM_ARRAY_TASK_ID}.dat

There are two components in use here; first the scheduler directive (#SBATCH --array=1-50) for the array tasks, then the variable which calls the tasks (data_${SLURM_ARRAY_TASK_ID}.dat).

This will create 50 jobs, each calling myProgram with a different data file. These jobs will run in any order, as soon as resources are available (potentially, all at the same time!).

Directives may be set as a range (e.g., #SBATCH --array=0-31), as comma separated values (#SBATCH --array=1,2,5,19,27), or with a step value (e.g., #SBATCH --array=1-7:2).

See the examples on Spartan at: /usr/local/common/Array and /usr/local/common/Octave.

Job Dependencies

It is not unusual for a user to make the launch of one job dependent on the status another job. The most common example is when a user wishes to make the output of one job the input of a second job. Both jobs can be launched simultaneously, but the second job can be preventing from running before the first job has completed successfully. In other words, there is a conditional dependency on the job.

Several conditional directives to be placed on a job which are tested prior to the job being intiatied, which are summarised as after, afterany, afterok, afternotok, and singleton. These can be submitted at submission time (e.g., sbatch --dependency=afterok:$jobid1 job2.slurm). Multiple jobs can be listed as dependencies with colon separated values (e.g., sbatch --dependency=afterok:$jobid1:$jobid2 job3.slurm).

Some dependency types

Directive Value
after:jobid[:jobid...] job can begin after the specified jobs have started
afterany:jobid[:jobid...] job can begin after the specified jobs have terminated
afternotok:jobid[:jobid...] job can begin after the specified jobs have failed
afterok:jobid[:jobid...] job can begin after the specified jobs have run to completion with an exit code of zero (see the user guide for caveats).
singleton job can begin execution after all previously launched jobs with the same name and user have ended.

See examples on Spartan at: /usr/local/common/depend.

Interactive Jobs

  • An alternative to submitting a batch job is that you can perform interactive work using the sinteractive command. This is handy for testing and debugging. This will allocate and log you in to a computing node. Use our new interactive partition by running sinteractive -p interactive --time=01:00:00 to get an interactive session for 1hr.
  • See examples, including with X-windows forwarding, at /usr/local/common/Interact
  • There is also the FastX service to allow you to use a graphical session on Spartan.
  • Jupyter Notebooks or RStudio can also run on Spartan, through an OpenOnDemand service to start these applications and allow you to access them in a web browser.

Job memory allocation

By default the scheduler will set memory equal to the total amount on a compute node divided by the number of cores requested. In some cases this might not be enough (e.g., very large dataset that needs to be loaded with low level of parallelisation).

Additional memory can be allocated with the --mem=[mem][M|G|T] directive (entire job) or --mem-per-cpu=[mem][M|G|T] (per core allocated to the job).

Job Script Examples and Generator

A compilation of example job scripts for various software packages exists on Spartan at /usr/local/common. A copy of this repository is kept at https://gitlab.unimelb.edu.au/hpc-public/spartan-examples. These examples include both scheduler directives and application tests. Additional examples may be added by contacting our helpdesk.

There is also written a simple web-based job script generator to help compose jobs.

Job Output and Errors

By default Slurm combines all job output information into a single file with the job ID. This can be seen when a job starts running with a name like slurm-17439270.out, for example. The information in this file will include output from scripts run in a job and error information. Sometimes it is desireable to separate the output and error information, in which case directives like the following can be used in the jobscript:

#SBATCH -o slurm.%N.%j.out # STDOUT 
#SBATCH -e slurm.%N.%j.err # STDERR

This will create two files, one for output and one for error, which specify the jobID and the compute nodes that the job ran on.

Sometimes, the word "Killed" can be seen in the job error or output log, for example

/var/spool/slurm/job8953391/slurm_script: line 7: 12235 Killed                  python /home/scrosby/examples/mem_use/mem_use.py

This is normally due to your job exceeding the memory you requested. By default, jobs on Spartan are allocated a certain amount of RAM per CPU requested. This is equal to the memory of the node divided by the number of cores allocated. Increasing memory allocated to a job can be achieved with the #SBATCH --mem=[mem][M|G|T] directive (this is per node that your job runs on, in megabytes, gigabytes, terabytes) or #SBATCH --mem-per-cpu=[mem][M|G|T] for memory per core.

Sometimes a job output does not end up in the output file due to buffering. Buffering is where the output is put into a queue, awaiting the buffer to be flushed. This normally happens automatically, but if the output isn't large, or if the processor is busy doing other things, it can take some time for the buffer to be flushed. To remove buffers in a job and get the output immediately, run the command that makes the output in the jobscript with stdbuf. e.g. instead of python myscript.py in a jobscript, replace it with stdbuf -o0 -e0 python myscript.py

Scheduler Commands and Directives

Slurm User Commands

User Command Slurm Command
Job submission sbatch [script_file]
Job delete scancel [job_id]
Job status squeue [job_id]
Job status squeue -u [user_name]
Node list sinfo -N
Queue list squeue
Cluster status sinfo

Slurm Job Commands

Job Specification Slurm Command
Script directive #SBATCH
Partition -p [partition]
Job Name --job-name=[name]
Nodes -N [min[-max]]
Task (MPI rank) Count -n [count]
Wall Clock Limit -t [days-hh:mm:ss]
Event Address --mail-user=[address]
Event Notification --mail-type=[events]
Memory (per node) Size --mem=[mem][M|G|T]
Memory (per CPU) Size --mem-per-cpu=[mem][M|G|T]

Slurm Environment Variables

Environment Command Environment variable
Job ID $SLURM_JOBID
Submit Directory $SLURM_SUBMIT_DIR
Submit Host $SLURM_SUBMIT_HOST
Node List $SLURM_JOB_NODELIST
Job Array Index $SLURM_ARRAY_TASK_ID