Skip to content

Status and Specs

Spartan Daily Weather Report

20241220

  • GPFS usage: 1982.60TB Free: 365.11TB (84%)
  • Spartan utilisation on the cascade partition is 100.00% node allocation, 79.35% CPU allocation.
  • Spartan utilisation on the sapphire partition is 100.00% node allocation, 91.31% CPU allocation.
  • Total queued pending/queued on the public partitions: 879
  • Spartan utilisation on the gpu-a100 partition is 100.00% node allocation. Total queued/pending: 10
  • Spartan utilisation on the gpu-h100 partition is 100.00% node allocation. Total queued/pending: 67
  • GPU card usage on the gpu-a100 partition is: 103 / 116 cards in use (88.79%)
  • GPU card usage on the gpu-h100 partition is: 37 / 40 cards in use (92.50%)
  • 1 nodes out.

Wait Time

How long will my job take to start?

Partition:
CPUs Requested:
Wall Time:


  • How to Interpret
    • This plot provides data on how long previous jobs have taken to start, which can be used as guidance on how long your job might take to start.
    • Note however that "Past performance is no guarantee of future results"; wait times can fluctuate quickly due to changes in usage or outages, and wait time could be considerably more or less than the historic average.
    • Daily averages are shown, but points may be missing for days where there were no jobs matching the selected characteristics.
    • This data is acquired using the sacct command in Slurm to get a list of all recent jobs and their start/end times.

Specifications

Spartan has a number of partitions available for general usage. A full list of partitions can be viewed with the command sinfo -s.

Partition Nodes Cores per node Memory per node (MB) Processor Peak Performance (DP TFlops) Slurm node types Extra notes
interactive 4 72 710000 Intel(R) Xeon(R) Gold 6254 CPU @ 3.10GHz 2.53 physg5,avx512 Max walltime of 2 days
long 2 72 710000 Intel(R) Xeon(R) Gold 6254 CPU @ 3.10GHz 1.26 physg5,avx512 Max walltime of 90 days
bigmem 8 72 2970000 Intel(R) Xeon(R) Gold 6254 CPU @ 3.10GHz 5.07 physg5,avx512 Max walltime of 14 days
cascade 74 72 710000 Intel(R) Xeon(R) Gold 6254 CPU @ 3.10GHz 46.89 (0.63 per node) physg5,avx512 Max walltime of 30 days
sapphire 78 128 977000 Intel(R) Xeon(R) Gold 6448H 143.75 (1.84 per node) sr,avx512 Max walltime of 30 days
gpu-a100 29 32 495000 Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz avx512 4 80GB A100 Nvidia GPUs per node. Max walltime of 7 days
gpu-a100-short 2 32 495000 Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz avx512 4 80GB A100 Nvidia GPUs per node. Max walltime of 4 hrs
gpu-h100 10 64 950000 Intel(R) Xeon(R) Platinum 8462Y+ @ 2.80GHz avx512 4 80GB H100 SXM5 Nvidia GPUs per node. Max walltime of 7 days
Total 199.5 (CPU) + 1358 (GPU)

sapphire and cascade

Each node is connected by high-speed 50Gb networking with 1.5 µsec latency, making this partition suited to multi-node jobs (e.g. those using OpenMPI).

You can constrain your jobs to use different groups of nodes (e.g. just the Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz nodes) by adding #SBATCH --constraint=physg4 to your submit script

gpu-a100

The gpu-a100 partition has modern, 80GB Nvidia A100 GPUs. See the GPU page for more details.

gpu-h100

The gpu-h100 partition has modern, 80GB Nvidia H100 SXM5 GPUs. See the GPU page for more details.

AVX-512

A number of nodes make use of the AVX-512 extended instructions. These include all of the nodes in the cascade partition, and the login nodes. To submit a job on the cascade partition that makes use of these instructions add #SBATCH --constraint=avx512 in your submission script.

Private partitions

Spartan hosts a number of private partitions, where the hardware is owned by individual research groups and faculties.

FEIT research staff and students: feit-gpu-a100

Computing and Information Systems research staff and students: deeplearn

Faculty of Science research staff and students: fos-gpu-l40s

The partition information for the private partitions can be found on Specs for private partitions

Other Partitions

There are also special partitions which are outside normal walltime constraints. In particular, interactive should be used for quick test cases; interactive has a maximum time of 2 days.

Storage system

Spartan uses IBM Spectrum Scale (previously known as GPFS). This is a highly scalable, parallel and robust filesystem.

The total Spartan storage is broken up into 2 areas:

Location Capacity Disk type
/data/gpfs 2.34PB 10K SAS
/data/scratch 575TB Flash

/home is on the University's NetApp NFS platform, backed by SSD