Prerequisite: You'll need a basic understanding of the Linux operating system and command-line environment to use Spartan. This is common to all contemporary supercomputers as the most performance effective operating environment. You don't need to be an expert, and there are many resources out there to help you. This tutorial is a good place to start. We also provide a great deal of content in our training courses.
Accounts and Projects
Prerequisite: Access to Spartan requires an an account. Go to Karaage to request a Spartan account using your University of Melbourne login.
Accounts are associated with a particular project; all users must either join an existing project or create a new one.
New projects are subject to approval by the Head of Research Compute Services. Projects must demonstrate an approved research goal or goals, or demonstrate potential to support research activity, so please be explicit in your new project description. Projects require a Principle Investigator, who must be a University of Melbourne researcher, and may have additional Research Collaborators from anywhere around the world.
A project leader you may invite people to join their project. To do so, they should login to Karaage, and go to their Karaage project list, select the appropriate project, and select the "Invite a new user" option. The user will then receive an invitation link to join the project and set up an account.
However if the user belongs to an institution that does not have a SAML login process (e.g., international researchers) it is worthwhile contacting us. Then the sysadmins will add the person manually to the project and reset their password.
Prerequisite: To login to Spartan you will need an Secure Shell (SSH) client. This is to ensure that your connection and password to Spartan is safe.
Mac and Linux
Mac and Linux computers will already have one installed, just use the command
ssh yourUsername@spartan.hpc.unimelb.edu.au at your terminal.
Note that your password for Spartan is created during sign-up, and is different to your university password.
Download an SSH client such as PuTTY, set hostname as
spartan.hpc.unimelb.edu.au and select Open. You'll be asked for your Spartan username and password.
Create a Sample Job
Spartan has some shared example code that we can borrow. We'll use the Python example which searches a Twitter dataset. Please note that this is an example for a CPU job. If you are using GPUs you will be better off using the TensorFlow example in
/usr/local/common/TensorFlow/simple and follow the
README.md file in that directory.
Returning to the Python example, copy the example into your home directory, and change working directory:
$ cp -r /usr/local/common/Python ~/
$ cd ~/Python
The dataset is in
minitwitter.csv, and the analysis code in
twitter_search_541635.py. The files ending in
.slurm are those designed with instructions to the scheduler. For example,
2019twitter_one_node_eight_cores.slurm requests 8 cores on a single node, and a wall time of 12 hours, the maximum time job will run for.
#!/bin/bash #SBATCH --nodes=1 #SBATCH --ntasks=8 #SBATCH --time=0-12:00:00 # Load required modules module purge module load spartan_2019 module load foss/2019b module load python/3.7.4 # Launch multiple process python code echo "Searching for mentions" time srun -n 8 python3 twitter_search_541635.py -i minitwitter.csv -m echo "Searching for topics" time srun -n 8 python3 twitter_search_541635.py -i minitwitter.csv -t echo "Searching for the keyword 'jumping'" time srun -n 8 python3 twitter_search_541635.py -i minitwitter.csv -s jumping
Submit an Sample Job
First off, when you connect to Spartan, you're connecting to the login node, not an actual compute node. Please don't run jobs on the login node!
The login nodes is a shared resource. All users require access to it at all times to view or move their files, create job submission scripts, view the status of the queue, or their jobs, and submit their jobs to the queue. If you run compute-intensive jobs on this node, you will reduce, or even prevent, the ability of users to do these fundamental tasks. Spartan's sysadmins will kill your job and if you continue to do so, may suspend your account.
Instead, use the scheduling tool Slurm, and scripts like the above. They tell Slurm where to run your job, how many cores you need, and how long it will take. Slurm will then allocate resources for your job, placing it in a queue if they're not yet available.
Go ahead and launch your job using
$ $ sbatch 2019twitter_one_node_eight_cores.slurm > Submitted batch job 18563731
Check Status and Review Output
Check how the job is progressing using
$ squeue -j 18563731 JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 18563731 physical 2019twit lev R 0:02 1 spartan-bm113
When complete, an output file is created which logs the output from your job, for the above this has the filename