Skip to content

Job Submission

Job Submission Structure

A job file, after invoking a shell (e.g., #!/bin/bash) consists of two bodies of commands. The first is the directives to the scheduler, indicated by lines starting with #SBATCH. These are interpeted by the shell as comments, but the Slurm scheduler understands them as directives. These directives include the resource requests of the job, such as the number of nodes, the number of cores, the maximum amount of time the job will run for, email notifications and so forth. Following this, and only following this, come the shell commands. This should include any modules that the job will need to load to run their job, and the commands that the job will run. This can include calls to other scripts.

The following should be considered when writing a job submission script:

  • All scripts must invoke a shell. In most cases this will be the bash shell; #!/bin/bash
  • The submission system has default values; a default walltime of 10 minutes (this will almost certainly need to be changed), a default partition (cascade), a default number of tasks (1), a default number of CPUs per task (1), and a default number of nodes (1).
  • If a shared-memory multithreaded job is being submitted, or subprocesses to make use of additional cores requested, the job should be set to have the number of tasks = 1 and set a numer of CPUs-per-task equal to the number of threads desired, with a maximum equal to the node that the job is running on. For example: #SBATCH --ntasks=1 followed by #SBATCH --cpus-per-task=8 on a new line)
  • If a distributed-member message passing job is being submitted, the job can request more than a single compute node and multiple tasks. If the job has more tasks than available cores the scheduler will make an effort to have the cores contiguous (e.g., #SBATCH --ntasks=8). To force a specific number of cores per node, use the --ncpus-per-node= option.

Certain applications have their own built-in parallelisation approaches, which nevertheless require Slurm directives. MATLAB, for example, uses a "parallel pool" (parpool) approach. See our MATLAB page for more information.

Job Script Examples and Generator

A compilation of example job scripts for various software packages exists on Spartan at /apps/examples/. A copy of this repository is kept at https://gitlab.unimelb.edu.au/hpc-public/spartan-examples. These examples include both scheduler directives and application tests. Additional examples may be added by contacting our helpdesk.

There is also written a simple web-based job script generator to help compose jobs.

Job memory allocation

By default the scheduler will set memory equal to 4000MB multiplied by the number of cores requested. In some cases this might not be enough (e.g., very large dataset that needs to be loaded with low level of parallelisation).

Additional memory can be allocated with the --mem=[mem][M|G|T] directive (memory per node allocate to job) or --mem-per-cpu=[mem][M|G|T] (per core allocated to the job).

Interactive Jobs

  • An alternative to submitting a batch job is that you can perform interactive work using the sinteractive command. This is handy for testing and debugging. This will allocate and log you in to a computing node.

Example

Spartan has an interactive partition, which provides instant access for up to 8 CPU cores and 96GB RAM, for up to 2 days. To get an interactive job for 1hr with 1 CPU, run:

sinteractive -p interactive --time=01:00:00 --cpus-per-task=1
Note that there are limits to the number of jobs you can run, and the max memory and CPU you can request on the interactive partition. The limits can be seen on CPU/Memory quotas .

The interactive partition has CPU, RAM and time limits. If you need more resources for your interactive job, you can request an interactive job on the cascade partition. To run an interactive job with 8 CPU cores, 128GB RAM for 7 days, run:

sinteractive -p cascade --time=7-0:0:0 --cpus-per-task=8 --mem=128G

  • See examples, including with X11-windows forwarding, at /apps/examples/common/Interact. An X11 client is required for local visualisation (e.g., xming or MobaXterm of MS-Windows, XQuartz for Mac OS).
  • There is also the OnDemand service to allow you to use a graphical session on Spartan.
  • Jupyter Notebooks or RStudio can also run on Spartan, through an OpenOnDemand service to start these applications and allow you to access them in a web browser.

Job Arrays

Job arrays are great for kicking off a large number of independent jobs at once with the same job script. For instance, batch processing a series of files, and the processing for each file can be performed independently of any other. Connsider an example of array of files, data_1.dat to data_50.dat to process with myProgram:

#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --time=0-00:15:00
#SBATCH --array=1-50
myProgram data_${SLURM_ARRAY_TASK_ID}.dat

There are two components in use here; first the scheduler directive (#SBATCH --array=1-50) for the array tasks, then the variable which calls the tasks (data_${SLURM_ARRAY_TASK_ID}.dat).

This will create 50 jobs, each calling myProgram with a different data file. These jobs will run in any order, as soon as resources are available (potentially, all at the same time!).

Directives may be set as a range (e.g., #SBATCH --array=0-31), as comma separated values (#SBATCH --array=1,2,5,19,27), or with a step value (e.g., #SBATCH --array=1-7:2).

See the examples on Spartan at: /apps/examples/Array and /apps/examples/Octave.

Job Dependencies

It is not unusual for a user to make the launch of one job dependent on the status another job. The most common example is when a user wishes to make the output of one job the input of a second job. Both jobs can be launched simultaneously, but the second job can be preventing from running before the first job has completed successfully. In other words, there is a conditional dependency on the job.

Several conditional directives to be placed on a job which are tested prior to the job being intiatied, which are summarised as after, afterany, afterok, afternotok, and singleton. These can be submitted at submission time (e.g., sbatch --dependency=afterok:$jobid1 job2.slurm). Multiple jobs can be listed as dependencies with colon separated values (e.g., sbatch --dependency=afterok:$jobid1:$jobid2 job3.slurm).

Some dependency types

Directive Value
after:jobid[:jobid...] job can begin after the specified jobs have started
afterany:jobid[:jobid...] job can begin after the specified jobs have terminated
afternotok:jobid[:jobid...] job can begin after the specified jobs have failed
afterok:jobid[:jobid...] job can begin after the specified jobs have run to completion with an exit code of zero (see the user guide for caveats).
singleton job can begin execution after all previously launched jobs with the same name and user have ended.

See examples on Spartan at: /apps/examples/depend.

Job Output and Errors

By default Slurm combines all job output information into a single file with the job ID. This can be seen when a job starts running with a name like slurm-17439270.out, for example. The information in this file will include output from scripts run in a job and error information. Sometimes it is desireable to separate the output and error information, in which case directives like the following can be used in the jobscript:

#SBATCH -o "slurm-%N.%j.out" # STDOUT
#SBATCH -e "slurm-%N.%j.err" # STDERR

This will create two files, one for output and one for error, which specify the jobID and the compute nodes that the job ran on.

Sometimes, the word "Killed" can be seen in the job error or output log, for example

/var/spool/slurm/job8953391/slurm_script: line 7: 12235 Killed                  python /home/scrosby/examples/mem_use/mem_use.py

This is normally due to your job exceeding the memory you requested. By default, jobs on Spartan are allocated a certain amount of RAM per CPU requested. This is equal to the memory of the node divided by the number of cores allocated. Increasing memory allocated to a job can be achieved with the #SBATCH --mem=[mem][M|G|T] directive (this is per node that your job runs on, in megabytes, gigabytes, terabytes) or #SBATCH --mem-per-cpu=[mem][M|G|T] for memory per core.

Sometimes a job output does not end up in the output file due to buffering. Buffering is where the output is put into a queue, awaiting the buffer to be flushed. This normally happens automatically, but if the output isn't large, or if the processor is busy doing other things, it can take some time for the buffer to be flushed. To remove buffers in a job and get the output immediately, run the command that makes the output in the jobscript with stdbuf. e.g. instead of python myscript.py in a jobscript, replace it with stdbuf -o0 -e0 python myscript.py

Monitoring job memory, CPU and GPU utilisation

As a result of the feedback obtained by the 2020 Spartan HPC user survey, a job monitoring system was developed.
 
This allows users to monitor the memory, CPU and GPU usage of their jobs via a simple command line script.
 
For more details, please see Job Monitoring

CPU/Memory/GPU quotas

On public partitions of Spartan (cascade, interactive, long) CPU, memory and GPU quotas have been implemented. This ensures no one user or project can use all the resources in these partitions. The limits are currently set at 17% of the resources in each partition.
 

Note

If a job is not running due to "QOSMaxCpuPerUserLimit", it means that the project's running jobs exceed the current CPU quota for that partition. If a job is not running due to "QOSMaxMemPerUserLimit", it means that the project's running jobs exceed the current memory quota for that partition.

Partition Running jobs CPU Quota (CPU cores) - per user Memory Quota (MB RAM) - per user GPUs - per user CPU Quota (CPU cores) - per project Memory Quota (MB RAM) - per project GPUs - per project
cascade,sapphire No limit 1400 14486111 1400 14486111
interactive 1 8 73728
long No limit 36 372500 36 372500
bigmem No limit 72 3010000 72 3010000
gpu-a100-short 1 8 123750 1
gpu-a100 No limit 384 5940000 48 384 5940000 48

GPU Partitions

Spartan hosts a GPGPU service, based on Nvidia A100 Ampere gpus. More information can be found on our GPU page.

Scheduler Commands and Directives

Slurm User Commands

User Command Slurm Command
Job submission sbatch [script_file]
Job delete scancel [job_id]
Job status squeue -j [job_id]
Job status squeue --me
Node list sinfo -N
Queue list squeue
Cluster status sinfo

Slurm Job Commands

Job Specification Slurm Command
Script directive #SBATCH
Partition -p [partition]
Job Name --job-name=[name]
Nodes -N [min[-max]]
Task (MPI rank) Count -n [count]
Wall Clock Limit -t [days-hh:mm:ss]
Event Address --mail-user=[address]
Event Notification --mail-type=[events]
Memory (per node) Size --mem=[mem][M|G|T]
Memory (per CPU) Size --mem-per-cpu=[mem][M|G|T]

Slurm Environment Variables

Environment Command Environment variable
Job ID $SLURM_JOBID
Submit Directory $SLURM_SUBMIT_DIR
Submit Host $SLURM_SUBMIT_HOST
Node List $SLURM_JOB_NODELIST
Job Array Index $SLURM_ARRAY_TASK_ID