Job Submission Structure
A job file, after invoking a shell (e.g.,
#!/bin/bash) consists of two bodies of commands. The first is the directives to the scheduler, indicated by lines starting with
#SBATCH. These are interpeted by the shell as comments, but the Slurm scheduler understands them as directives. These directives include the resource requests of the job, such as the number of nodes, the number of cores, the maximum amount of time the job will run for, email notifications and so forth. Following this, and only following this, come the shell commands. This should include any modules that the job will need to load to run their job, and the commands that the job will run. This can include calls to other scripts.
The following should be considered when writing a job submission script:
- All scripts must invoke a shell. In most cases this will be the bash shell;
- The submission system has default values; a default walltime of 10 minutes (this will almost certainly need to be changed), a default partition (cascade), a default number of tasks (1), a default number of CPUs per task (1), and a default number of nodes (1).
- If a shared-memory multithreaded job is being submitted, or subprocesses to make use of additional cores requested, the job should be set to have the number of tasks = 1 and set a numer of CPUs-per-task equal to the number of threads desired, with a maximum equal to the node that the job is running on. For example:
#SBATCH --ntasks=1followed by
#SBATCH --cpus-per-task=8on a new line)
- If a distributed-member message passing job is being submitted, the job can request more than a single compute node and multiple tasks. If the job has more tasks than available cores the scheduler will make an effort to have the cores contiguous (e.g.,
#SBATCH --ntasks=8). To force a specific number of cores per node, use the
Certain applications have their own built-in parallelisation approaches, which nevertheless require Slurm directives. MATLAB, for example, uses a "parallel pool" (parpool) approach. See our MATLAB page for more information.
Job Script Examples and Generator
A compilation of example job scripts for various software packages exists on Spartan at
/apps/examples/. A copy of this repository is kept at https://gitlab.unimelb.edu.au/hpc-public/spartan-examples. These examples include both scheduler directives and application tests. Additional examples may be added by contacting our helpdesk.
There is also written a simple web-based job script generator to help compose jobs.
Job memory allocation
By default the scheduler will set memory equal to 4000MB multiplied by the number of cores requested. In some cases this might not be enough (e.g., very large dataset that needs to be loaded with low level of parallelisation).
Additional memory can be allocated with the
--mem=[mem][M|G|T] directive (memory per node allocate to job) or
--mem-per-cpu=[mem][M|G|T] (per core allocated to the job).
- An alternative to submitting a batch job is that you can perform interactive work using the
sinteractivecommand. This is handy for testing and debugging. This will allocate and log you in to a computing node.
Spartan has an
interactive partition, which provides instant access for up to 8 CPU cores and 96GB RAM, for up to 2 days. To get an interactive job for 1hr with 1 CPU, run:
interactive partition has CPU, RAM and time limits. If you need more resources for your interactive job, you can request an interactive job on the cascade partition. To run an interactive job with 8 CPU cores, 128GB RAM for 7 days, run:
- See examples, including with X11-windows forwarding, at
/apps/examples/common/Interact. An X11 client is required for local visualisation (e.g., xming or MobaXterm of MS-Windows, XQuartz for Mac OS).
- There is also the OnDemand service to allow you to use a graphical session on Spartan.
- Jupyter Notebooks or RStudio can also run on Spartan, through an OpenOnDemand service to start these applications and allow you to access them in a web browser.
Job arrays are great for kicking off a large number of independent jobs at once with the same job script. For instance, batch processing a series of files, and the processing for each file can be performed independently of any other. Connsider an example of array of files,
data_50.dat to process with
There are two components in use here; first the scheduler directive (
#SBATCH --array=1-50) for the array tasks, then the variable which calls the tasks (
This will create 50 jobs, each calling
myProgram with a different data file. These jobs will run in any order, as soon as resources are available (potentially, all at the same time!).
Directives may be set as a range (e.g.,
#SBATCH --array=0-31), as comma separated values (
#SBATCH --array=1,2,5,19,27), or with a step value (e.g.,
See the examples on Spartan at:
It is not unusual for a user to make the launch of one job dependent on the status another job. The most common example is when a user wishes to make the output of one job the input of a second job. Both jobs can be launched simultaneously, but the second job can be preventing from running before the first job has completed successfully. In other words, there is a conditional dependency on the job.
Several conditional directives to be placed on a job which are tested prior to the job being intiatied, which are summarised as
singleton. These can be submitted at submission time (e.g.,
sbatch --dependency=afterok:$jobid1 job2.slurm). Multiple jobs can be listed as dependencies with colon separated values (e.g.,
sbatch --dependency=afterok:$jobid1:$jobid2 job3.slurm).
Some dependency types
|after:jobid[:jobid...]||job can begin after the specified jobs have started|
|afterany:jobid[:jobid...]||job can begin after the specified jobs have terminated|
|afternotok:jobid[:jobid...]||job can begin after the specified jobs have failed|
|afterok:jobid[:jobid...]||job can begin after the specified jobs have run to completion with an exit code of zero (see the user guide for caveats).|
|singleton||job can begin execution after all previously launched jobs with the same name and user have ended.|
See examples on Spartan at:
Job Output and Errors
By default Slurm combines all job output information into a single file with the job ID. This can be seen when a job starts running with a name like
slurm-17439270.out, for example. The information in this file will include output from scripts run in a job and error information. Sometimes it is desireable to separate the output and error information, in which case directives like the following can be used in the jobscript:
This will create two files, one for output and one for error, which specify the jobID and the compute nodes that the job ran on.
Sometimes, the word "Killed" can be seen in the job error or output log, for example
This is normally due to your job exceeding the memory you requested. By default, jobs on Spartan are allocated a certain amount of RAM per CPU requested. This is equal to the memory of the node divided by the number of cores allocated. Increasing memory allocated to a job can be achieved with the
#SBATCH --mem=[mem][M|G|T] directive (this is per node that your job runs on, in megabytes, gigabytes, terabytes) or
#SBATCH --mem-per-cpu=[mem][M|G|T] for memory per core.
Sometimes a job output does not end up in the output file due to buffering. Buffering is where the output is put into a queue, awaiting the buffer to be flushed. This normally happens automatically, but if the output isn't large, or if the processor is busy doing other things, it can take some time for the buffer to be flushed. To remove buffers in a job and get the output immediately, run the command that makes the output in the jobscript with
stdbuf. e.g. instead of
python myscript.py in a jobscript, replace it with
stdbuf -o0 -e0 python myscript.py
Monitoring job memory, CPU and GPU utilisation
As a result of the feedback obtained by the 2020 Spartan HPC user survey, a job monitoring system was developed.
This allows users to monitor the memory, CPU and GPU usage of their jobs via a simple command line script.
For more details, please see Job Monitoring
On public partitions of Spartan (cascade, interactive, long) CPU, memory and GPU quotas have been implemented. This ensures no one user or project can use all the resources in these partitions. The limits are currently set at 17% of the resources in each partition.
If a job is not running due to "QOSMaxCpuPerUserLimit", it means that the project's running jobs exceed the current CPU quota for that partition. If a job is not running due to "QOSMaxMemPerUserLimit", it means that the project's running jobs exceed the current memory quota for that partition.
|Partition||Running jobs||CPU Quota (CPU cores) - per user||Memory Quota (MB RAM) - per user||GPUs - per user||CPU Quota (CPU cores) - per project||Memory Quota (MB RAM) - per project||GPUs - per project|
Spartan hosts a GPGPU service, based on Nvidia A100 Ampere gpus. More information can be found on our GPU page.
Scheduler Commands and Directives
Slurm User Commands
|User Command||Slurm Command|
Slurm Job Commands
|Job Specification||Slurm Command|
|Task (MPI rank) Count||
|Wall Clock Limit||
|Memory (per node) Size||
|Memory (per CPU) Size||
Slurm Environment Variables
|Environment Command||Environment variable|
|Job Array Index||