Getting Started

Linux Knowledge

Prequisite

You'll need a basic understanding of the Linux operating system and command-line environment to use Spartan. This is common to all contemporary supercomputers as Linux is the most performance effective operating environment. You don't need to be an expert, and there are many resources out there to help you. This tutorial is a good place to start.

Training

We also provide a great deal of content in our training courses.

Why use Spartan and what doesn't work on Spartan

Spartan is a Linux based system, so Windows and MacOS applications will not work. If you need to use Windows applications, please consider using the Melbourne Research Cloud.
Each Spartan compute node has up to 128 CPU cores on it, and each compute node has very fast networking between compute nodes.

To make best use of Spartan's compute nodes the software you run needs to be multithreaded or multiprocess (MPI) compatible. This is not available in all software so you may see that your software does not run any faster on Spartan compared to your laptop or desktop.

Spartan is designed for jobs that run in a non-interactive (commonly called batch) mode. This allows you to run many jobs at once, increasing the throughput.

Software that requires a web interface to interact with will normally not work on Spartan. Software that runs only with a graphical user interface (GUI) may not experience a speed or throughput boost running on Spartan.

Accounts and Projects

Prerequisite

Access to Spartan requires an account. Go to Karaage to request a Spartan account using your University of Melbourne login.

Accounts are associated with one or more projects; all users must either join an existing project or create a new one.

New projects are subject to approval by the Head of Research Computing Services. Projects must demonstrate an approved research goal or goals, or demonstrate potential to support research activity, so please be explicit in your new project description. Projects and project leaders must conform to Spartan policies.

New users:

Log into Karaage
Click Apply for account
Enter your University of Melbourne email address and click Go
You will receive an email with a link to proceed with the next step
Once you click on the emailed link, click Next
Log in with your AAF Identity. For University of Melbourne staff/students, select "unimelb" and click Login
Once logged in, on the Applicant screen, provide your position at the Uni (e.g. Masters student, PhD Student, Postdoc, Lecturer), your supervisors name, and your correct department (e.g. Physics, Mechanical Engineering)

To request creation of a new project:

Select "Apply to start a new project". Please provide an appropriate title
In the Description section:
For Masters or PhD students, please provide your thesis abstract summary.
For Postdocs/Lecturers, please provide a detailed research based project description - i.e. research being done, tools being used (i.e. software), people working on the project, estimated length of project.

To join an existing project:

Select Join an existing project. Type the project ID (starts with punim), and click it. Click Submit and an email will be sent to the project leader to approve your join request

A project leader may invite people to join their project. To do so, they should login to Karaage, and go to their Karaage project list, select the appropriate project, and select the "Invite a new user" option. The user will then receive an invitation link to join the project and set up an account.

However if the user belongs to an institution that does not have a SAML login process (e.g., international researchers) it is worthwhile contacting us. Then the sysadmins will add the person manually to the project and reset their password.

Logging In

Prerequisite

To login to Spartan you will need an Secure Shell (SSH) client. This is to ensure that your connection and password to Spartan is safe.

Mac and Linux

Mac and Linux computers will already have one installed, just use the command ssh yourUsername@spartan.hpc.unimelb.edu.au at your terminal.

Note that your password for Spartan is created during sign-up, and is different to your university password.

MS-Windows

Download an SSH client such as PuTTY, set hostname as spartan.hpc.unimelb.edu.au and select Open. You'll be asked for your Spartan username and password.

Note that the old MS-Windows SSH Secure Shell Client will fail with the message "Algorithm negotiation fail". Using a newer client such as PuTTY or MobaXterm is recommended.

Create a Sample Job

Spartan has some shared example code that we can borrow. We'll use the Python example which searches a Twitter dataset. Please note that this is an example for a CPU job. If you are using GPUs you will be better off using the TensorFlow example in /apps/examples/TensorFlow/simple and follow the README.md file in that directory.

Returning to the Python example, copy the example into your home directory, and change working directory:

$ mkdir ~/examples
$ cp -r /apps/examples/Python ~/examples
$ cd ~/examples/Python/MPI

The dataset is in minitwitter.csv, and the analysis code in twitter_search_541635.py. The files ending in .slurm are those designed with instructions to the scheduler. For example, 2019twitter_one_node_eight_cores.slurm requests 8 cores on a single node, and a wall time of 12 hours, the maximum time job will run for.

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=8
#SBATCH --time=0-12:00:00

# Load required modules
module purge
module load foss/2022a
module load Python/3.10.4
module load mpi4py/3.1.4

# Launch multiple process python code
echo "Searching for mentions"
time srun -n 8 python3 twitter_search_541635.py -i minitwitter.csv -m
echo "Searching for topics"
time srun  -n 8 python3 twitter_search_541635.py -i minitwitter.csv -t
echo "Searching for the keyword 'jumping'"
time srun  -n 8 python3 twitter_search_541635.py -i minitwitter.csv -s jumping

Submit an Sample Job

First off, when you connect to Spartan, you're connecting to the login node, not an actual compute node.

Warning

Please don't run applications on the login node. The login nodes are a shared resource. All users require access to it at all times to view or move their files, create job submission scripts, view the status of the queue, or their jobs, and submit their jobs to the queue. If you run compute-intensive jobs on this node, you will reduce, or even prevent, the ability of users to do these fundamental tasks. Spartan's sysadmins will kill your job and if you continue to do so, may suspend your account.

To have Spartan run your job, submit the job script to the queue.

Go ahead and submit your job using sbatch:

$ sbatch 2019twitter_one_node_eight_cores.slurm
> Submitted batch job 18563731

Check Status and Review Output

Check how the job is progressing using squeue:

$ squeue -j 18563731
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON) 
          18563731  cascade 2019twit      lev  R       0:02      1 spartan-bm113

Depending on how busy Spartan is, and how many resources are free, your job may wait in the queue for a while. Once resources become free, Slurm will run your job.

When complete, an output file is created which logs the output from your job, for the above this has the filename slurm-18563731.out.

Interactive job using sinteractive

You can gain an interactive session on a worker node using sinteractive. This gives you a shell on the worker node that you can execute commands. For information on this, please see Interactive jobs

Other Examples

In the /apps/examples/Python/MPI directory there are other examples of the Twitter search job. Some of these are multicore, some are multicore and multinode. e.g., 2019twitter_one_node_eight_cores.slurm, 2019twitter_one_node_eight_cores.slurm.

In order for software to make use of multicore and multinode capabilities it must be written in such a manner that makes use of these technologies. This typically means using multithreading for single-node, multicore APIs such as OpenMP, or message passing for multiple node, multiple tasks such as MPI.

The performance of a job will not improve by simply adding more cores/tasks in job submission!