GPU
Spartan has 31 nodes, each with 4 80GB Nvidia A100 GPUs, 495000MB RAM and 32 CPU cores. They are available to all University of Melbourne researchers with a Spartan account.
Access
Unlike the old LIEF GPGPU platform, you do not need to specify a QoS in your Slurm submit scripts. Remove any QoS before you submit, or set it to "normal".
This will request 1 GPU on the gpu-a100 partition.
Specialist partitions, such as the feit-gpu-a100, will still require the appropriate qos.
Maximum job length
We have 2 partitions, gpu-a100-short
, which supports jobs up to 1 GPU and 4hrs of walltime, and gpu-a100
which supports jobs up to 7 days of walltime.
Comparitive speeds
We have done some benchmarking on the A100 nodes vs the old P100 nodes, and we have seen that the A100 nodes are approximately 3 to 4 times as fast as the P100 nodes. This will vary depending on how your application uses the GPUs
Known issues
This will be updated regularly as more researchers report issues
Example
The Nvidia A100 (Ampere series) has a restriction on which cuDNN versions it supports. From the cuDNN release notes