Skip to content

Status and Specs

Spartan Daily Weather Report

20230922

  • GPFS usage: 1820.33TB Free: 527.38TB (77%)
  • Spartan utilisation on the cascade partition is 100.00% node allocation, 85.21% CPU allocation.
  • Total queued pending/queued on the public partitions: 51
  • Spartan utilisation on the gpu-a100 partition is 100.00% node allocation. Total queued/pending: 24
  • GPU card usage on the gpu-a100 partition is: 86 / 116 cards in use (74.13%)
  • 26 nodes out.

Wait Time

How long will my job take to start?

Partition:
CPUs Requested:
Wall Time:


  • How to Interpret
    • This plot provides data on how long previous jobs have taken to start, which can be used as guidance on how long your job might take to start.
    • Note however that "Past performance is no guarantee of future results"; wait times can fluctuate quickly due to changes in usage or outages, and wait time could be considerably more or less than the historic average.
    • Daily averages are shown, but points may be missing for days where there were no jobs matching the selected characteristics.
    • This data is acquired using the sacct command in Slurm to get a list of all recent jobs and their start/end times.

Specifications

Spartan has a number of partitions available for general usage. A full list of partitions can be viewed with the command sinfo -s.

Partition Nodes Cores per node Memory per node (MB) Processor Peak Performance (DP TFlops) Slurm node types Extra notes
interactive 3 72 710000 Intel(R) Xeon(R) Gold 6254 CPU @ 3.10GHz 15.2 physg5,avx512 Max walltime of 2 days
long 2 72 710000 Intel(R) Xeon(R) Gold 6254 CPU @ 3.10GHz 10.1 physg5,avx512 Max walltime of 90 days
bigmem 4 72 2970000 Intel(R) Xeon(R) Gold 6254 CPU @ 3.10GHz 10.1 physg5,avx512 Max walltime of 14 days
cascade 14 72 1519000 Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz 58.1 physg4,avx512 Max walltime of 30 days
79 72 710000 Intel(R) Xeon(R) Gold 6254 CPU @ 3.10GHz 354.8 physg5,avx512 Max walltime of 30 days
gpu-a100 29 32 495000 Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz avx512 4 80GB A100 Nvidia GPUs per node. Max walltime of 7 days
gpu-a100-short 2 32 495000 Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz avx512 4 80GB A100 Nvidia GPUs per node. Max walltime of 4 hrs
Total 582.2 (CPU) + 1358 (GPU)

Total includes private partitions (including mig, msps2, and punim0396)

cascade

Each node is connected by high-speed 50Gb networking with 1.5 ┬Ásec latency, making this partition suited to multi-node jobs (e.g. those using OpenMPI).

You can constrain your jobs to use different groups of nodes (e.g. just the Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz nodes) by adding #SBATCH --constraint=physg4 to your submit script

gpu-a100

The gpu-a100 partition has modern, 80GB Nvidia A100 GPUs. See the GPU page for more details.

AVX-512

A number of nodes make use of the AVX-512 extended instructions. These include all of the nodes in the cascade partition, and the login nodes. To submit a job on the cascade partition that makes use of these instructions add #SBATCH --constraint=avx512 in your submission script.

Private partitions

Spartan hosts a number of private partitions, where the hardware is owned by individual research groups.

FEIT research staff and students: The deeplearn and feit-gpu-a100 partition information can be found on Specs for private partitions

Other Partitions

There are also special partitions which are outside normal walltime constraints. In particular, interactive should be used for quick test cases; interactive has a maximum time of 2 days.

Storage system

Spartan uses IBM Spectrum Scale (previously known as GPFS). This is a highly scalable, parallel and robust filesystem.

The total Spartan storage is broken up into 2 areas:

Location Capacity Disk type
/data/gpfs 2.34PB 10K SAS
/data/scratch 575TB Flash

/home is on the University's NetApp NFS platform, backed by SSD