Status: (more...)

What's special about Spartan?

Most modern HPC systems are built around a cluster of commodity computers tied together with very-fast networking. This allows computation to run across multiple cores in parallel, quickly sharing data between themselves as needed.

For certain jobs, this architecture is essential to achieving high-performance. For others, however, this is not the case, and each node can run without communicating with the others in the cluster. This class of problems often comes under the guise of embarrassingly parallel. That is, they can be run as independent parallel tasks by splitting up the data or calculation into discrete chunks. In this case, high speed networking is unnecessary, and the resources can be better spent on utilizing more cores to achieve high performance.

Spartan combines both approaches. By default, jobs run in the flexible cloud partition. For those that need it, there's a physical partition running on bare-metal hardware, interconnected with low-latency 25/50/100 Gb Mellanox ethernet.

How do I get an account?

Access to Spartan requires an an account, which you can request using Karaage.

Accounts are associated with a particular project; you can either join an existing project or create a new one.

New projects are subject to approval by the Head of Research Compute Services. Projects must demonstrate an approved research goal or goals, or demonstrate potential to support research activity. Projects require a Principle Investigator and may have additional Research Collaborators.

How do I access Spartan once I have an account?

You'll need an SSH client. Mac and Linux computers will already have one installed, just use the command ssh at your terminal.

For Windows, you'll need to download an SSH client such as PuTTY, set hostname as and select Open. You'll be asked for your Spartan username and password.

My password isn't working!

  1. Make sure you're using your Spartan password that you set in Karaage. Your Spartan password is not necessarily the same as your central university password.

  2. You can request a password reset here.

  3. If you are still having trouble, contact the University of Melbourne Service Desk on +61 3 8344 0999 or ext 40999 or email or

How do I add people to a project?

If you are a project leader you may invite people to join your project. Login to Karaage, and go to your Karaage project list, select the appropriate project, and select the "Invite a new user" option. The user will then receive an invitation link to join the project and set up an account.

However if the belong to an institution that does not have a SAML login process (e.g., international researchers) it is worthwhile contacting Spartan. Then the sysadmins will add the person manually to the project and reset their password.

What are Spartan's specifications?

Please see Hardware and Partitions for more details.

What software is installed?

Spartan uses a modules system (lmod) to load and unload different packages, including different versions of the same software. You can check what's currently installed using the module avail command, and load a module with the module load command.

Typically one doesn't load modules directly unless they're in an interactive session on a compute node (launched with sinteractive). Instead you load the modules in your Slurm script before executing your particular software.

What if the software I need is not installed?

Get in contact with us and we can install it for you. Generally speaking, you should avoid compiling software on Spartan, unless you wrote it from scratch for your own use. By letting us handle it, we can make sure that:

  • It works
  • Software licenses are managed
  • Code is compiled with the appropriate flags to maximize performance
  • Others users can also make use of the software.

Where do I go for help?

First off, we encourage researchers that are new to HPC to undertake training with us. It's free! And we can tailor a specific training program for you, for instance around a specific software package, if there is the demand. Check here for a calendar of when the next event is planned, along with the other training programs offered in coordination with ResBaz. Sign up to be notified of our next training events at:

Second, check the documentation here, as well as for the software you're running on Spartan (like Slurm).

Finally, if you ever get stuck, please feel free to contact HPC support. We're here to help make your research more productive and enjoyable, and we'll do everything we can to help.

How do I get access to GPUs?

Spartan includes a partition with GPUs (as well as a private physics-gpu partition and a private deeplearn partition).

The gpgpu partition includes four Nvidia P100 GPUs per node.

They can be specified in your job script with #SBATCH --partition gpgpu.

You'll also need to include a generic resource request in your job script, for example #SBATCH --gres=gpu:2 will request two GPUs for your job.

A range of GPU-accelerated software such as TensorFlow is available on Spartan example, as well as CUDA for developing your own GPU applications example.

N.B. The GPGPU partition is not automatically available to all Spartan users, and a dedicated project must be created to request access. See here for more details.

How do I submit a job?

You'll need your data files and scripts, the software you want to run installed on Spartan, and a job script so that Spartan knows how to put everything together. Check out Getting Started for an example.

Do I need to know how to use Linux?

Just the basics to get started. We cover this in our introductory training course, and there are many online resources available to get you started, such as this tutorial.

How do I create a multi-core job?

There are two options here. If you want to run a single instance of your program and have that program access 8 cores, you can do this:

#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8

This is the typical approach if your program makes use of multi-threading or subprocesses to make use of the additional cores.

Alternatively, if you'd like to run multiple instances (tasks) of your program, each on their own core:

#SBATCH --nodes=1
#SBATCH --ntasks=8
#SBATCH --cpus-per-task=1

This approach might be used for jobs where there are multiple instances of a program running at once, but they communicate with each other (e.g. using OpenMPI) and so should be kept within a single node so that communication between tasks is quick.

Keep in mind the number of cores that actually exist within a node, 12 for the cloud partition and up to 72 for the physical partition -- you can't request more than this.

How do I create a multi-node job?

Here's an example of a job with two nodes, each using 12 cores.

#SBATCH --nodes=2
#SBATCH --ntasks=2
#SBATCH --cpus-per-task=12

Note that you can't have a single instance of a program running across different nodes. Instead, you would usually run two instances of a program (one on each node), and have them pass messages between each other so they can work in parallel using a framework like OpenMPI.

For multi-node jobs, it is usually preferable to use the physical partition, because this partition has faster networking between nodes.

What other options are there for running my job?

Many different permutations of cores, memory, nodes, tasks and dependencies are possible to suit different use cases. Refer to the documentation for Slurm (the job manager we use on Spartan) for details.

How do I create a job array?

Job arrays are great for kicking off a large number of independent jobs at once. For instance, if you're batch processing a series of files, and the processing for each file can be performed independently of any other.

Say we have an array of files, data_1.dat to data_50.dat to process with myProgram:

#SBATCH --ntasks=1
#SBATCH --time=0-00:15:00
#SBATCH --array=1-50

myProgram data_${SLURM_ARRAY_TASK_ID}.dat

This will create 50 jobs, each calling myProgram with a different data file. These jobs will run in any order, as soon as resources are available (potentially, all at the same time!)

How do I request more memory?

By default the scheduler will set memory equal to the total amount on a compute node divided by the number of cores requested. In some cases this might not be enough (e.g., very large dataset that needs to be loaded with low level of parallelisation).

Additional memory can be allocated with the --mem=[mem][M|G|T] directive (entire job) or --mem-per-cpu=[mem][M|G|T] (per core allocated to the job).

Are there more examples I can look at?

If you go to /usr/local/common/ on Spartan there are examples for a wide range of programs. You can copy these into your home directory and run them for yourself.

How do I make my program run fast on Spartan?

Spartan, like almost all modern HPC systems, delivers high-performance by combining lots of smaller computers (nodes) together in a cluster. Each core within a node probably isn't much faster than on your own personal computer, so improved performance is dependent on using parallel processing (MPI or OpenMP) or job arrays.

How do I cite Spartan in my publications?

If you use Spartan to obtain results, we'd very much appreciate if you'd cite our service, including the DOI below. This makes it easy for us demonstrate research impact, helping to secure ongoing funding for expansion and user support.

Lev Lafayette, Greg Sauter, Linh Vu, Bernard Meade, "Spartan Performance and Flexibility: An HPC-Cloud Chimera", OpenStack Summit, Barcelona, October 27, 2016.

How do I setup passwordless SSH login?

A passwordless SSH for Spartan will make your life easier. You won't even need to remember your password!

If you have a *nix system (e.g., UNIX, Linux, MacOS X) open up a terminal on your local system and generate a keypair.

$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/user/.ssh/id_rsa): 
Created directory '/home/user/.ssh'.
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/user/.ssh/id_rsa.
Your public key has been saved in /home/user/.ssh/
The key fingerprint is:
43:51:43:a1:b5:fc:8b:b7:0a:3a:a9:b1:0f:66:73:a8 user@localhost

Now append the new public key to ~/.ssh/authorized_keys on Spartan (you'll be asked for your password one last time).

$ cat .ssh/ | ssh 'cat >> .ssh/authorized_keys'

Depending on your version of SSH you might also have to do the following changes:

  • Put the public key in .ssh/authorized_keys2
  • Change the permissions of .ssh to 700
  • Change the permissions of .ssh/authorized_keys2 to 640

You can now SSH to Spartan without having to enter your password!

How do I set up passwordless SSH login from a MS-Windows system?

Painfully. :)

1) Download additional software called PuTTYgen

2) Launch up PuTTYgen tool up. If you are on Windows 7 higher, right-click on it and select Run as Administrator.

3) Select the parameters; the default value (SSH-2 RSA) is fine.

4) Select Generate

5) Add the public key to your authorized_keys file in ~/.ssh on Spartan (create it if it doesn't exist). Make sure you don't have any weird line breaks or anything like that. Make sure the permissions on the file are 0644.

chmod 644 ~/.ssh/authorized_keys

6) Back on PuTTYgen save the Private Key and Public Key. Make sure you save Public Key as .txt while Private Key as .ppk.

7) Configure Putty to use that newly generated key. Start putty and go to Connection > SSH > Auth and add the locate of the Private Key that you saved previously.

8) Open Putty and login as usual. If all the steps above have been followed you will not need a password.

Screenshots and a Youtube video on how to do this can be found on

How can I avoid typing everytime I connect?

An SSH config file will also make your life easier. It allows you to create alises (i.e. shortcuts) for a given hostname.

Create the text file in your ~/.ssh directory with your preferred text editor, for example, nano.

nano .ssh/config

Enter the following (replacing username with your actual username of course!):

Host *
ServerAliveInterval 120
Host spartan
       User username

Now to connect to Spartan, you need only type ssh spartan.

Can I run interactive GUI applications on Spartan?

Yes. HPC systems are optimised for command line batch processing, but some workflows and software packages benefit from access to a graphical user interface (GUI). One option is to use X forwarding, which allows your application to display directly on your own machine.

Here's an example:

  1. Install X Windows on your machine. This is built-in for Linux, available via XQuartz for OS X, and MobaXterm for Windows.
  2. SSH into Spartan using the -X flag, e.g. ssh -X
  3. Open an interactive session on a compute node using the --x11 flag, i.e. sinteractive --x11. You can set the partition, wall time and core count as usual for the sinteractive command.
  4. Start your GUI application within the interactive session, which will then be forwarded to your local machine. For example, to start MATLAB:
$ module load MATLAB
$ matlab

What does "Killed" mean when seen in the output of a job on Spartan?

Sometimes, the word "Killed" can be seen in the job output log, for example

/var/spool/slurm/job8953391/slurm_script: line 7: 12235 Killed                  python /home/scrosby/examples/mem_use/

This is normally due to your job exceeding the memory you requested. By default, jobs on Spartan are allocated a certain amount of RAM per CPU requested.

Partition Default memory allocated per CPU core (GB RAM)
cloud 6
physical 6
gpgpu 4
snowy 3.5

You can increase this by following How do I request more memory.

What does "MaxCpuPerAccount" and "MaxMemoryPerAccount" mean, and why is my job not running

To ensure fair use of the public partitions on Spartan (cloud, physical, snowy), we have implemented CPU and memory quotas. This ensures no one project can use all the resources in these partitions. The limits are currently set at 15% of the resources in each partition.

If your job is not running due to "MaxCpuPerAccount", it means that your project's running jobs exceed the current CPU quota for that partition.
If your job is not running due to "MaxMemoryPerAccount", it means that your project's running jobs exceed the current memory quota for that partition.

Partition CPU Quota (CPU cores) Memory Quota (GB RAM)
cloud 300 2500
physical 200 4620
snowy 200 794

What does "QOSGrpGRES" and "QOSGrpCpuLimit" mean, and why is my GPGPU job not running

To ensure fair use of the GPGPU partition on Spartan, quotas are implemented to ensure that no one participant in the GPGPU project (LaTrobe, Deakin, St Vincents, UoM, UoM Engineering, UoM MDHS) can use all of the available GPUs.

If your GPGPU sponsor is using over their quota of GPUs, your job will be held with the message "QOSGrpGRES". Similarly if your GPGPU sponsor is using over their quota of CPUs, your job will be held with the message "QOSGrpCpuLimit". Your job will run once enough current running jobs end.

Why is my job taking a long time to run

Spartan is a very busy system, sometimes with 100% worker node allocation on most days. The batch system, Slurm, runs jobs in the order of priority, with the priority of jobs being determined by:

  • Job size: a higher priority is given to larger jobs
  • Wait time: the priority of your job increases the longer your job is in the queue
  • Fairshare: to ensure everyone has the same access to resources, a fairshare system is in place. This takes into account the resources used by a user's jobs in the last 14 days. The more resources used by your jobs in the last 14 days, the lower the priority of your new jobs
  • Backfilling: Where there is a gap in the resources and a job size, the scheduler will fill that gap to ensure maximum resource allocation.

On Spartan, the calculated priority is dominated by the fairshare component, so the most common reason for your job taking a long time to start is because of the amount of resources you consumed in the last 14 days

You can see your job priority, and what makes up the priority, by using the sprio command

# sprio -j 12409951
       12409951 physical        4240       3000       1233          6          1          0

What is my job not outputting anything to the output file in a job

Sometimes a job output does not end up in the output file due to buffering. Buffering is where the output is put into a queue, awaiting the buffer to be flushed. This normally happens automatically, but if the output isn't large, or if the processor is busy doing other things, it can take some time for the buffer to be flushed.

To remove buffers in a job and get the output immediately, run the command that makes the output in your script with stdbuf

e.g. if python is how you launch your script, replace it with stdbuf -o0 -e0 python