Status

Spartan Daily Weather Report

20210422

  • GPFS usage: 1292.48TB Free: 1055.22TB (55%)
  • Spartan is very busy on the physical partition, with 100% node allocation, 99% CPU allocation.
  • Spartan is very busy on the snowy partition, with 100% node allocation, 92% CPU allocation.
  • Total queued pending/queued on the public partitions: 5243
  • Spartan is busy on the GPGPU partition, with 81% node allocation. Total queued/pending: 4
  • GPUGPU usage in the [ gpgpu ] partition: 171 / 256 cards in use (66.79%)
  • 1 nodes out, cryosparc testing.

Current Usage

How busy is Spartan today?

  • How to Interpret
    • This plot provides indicates how many CPUs (cores) are in-use on Spartan.
    • Utilization may occasionally exceed 100% as resources are added/removed from service (utilization is always relative to the most recent available CPU count).
    • This data is acquired using the sacct command in Slurm to get a list of all recent jobs and their start/end times, and counting how many cores are allocated relative to total capacity.

Wait Time

How long will my job take to start?

Partition:
CPUs Requested:
Wall Time:


  • How to Interpret
    • This plot provides data on how long previous jobs have taken to start, which can be used as guidance on how long your job might take to start.
    • Note however that "Past performance is no guarantee of future results"; wait times can fluctuate quickly due to changes in usage or outages, and wait time could be considerably more or less than the historic average.
    • Daily averages are shown, but points may be missing for days where there were no jobs matching the selected characteristics.
    • This data is acquired using the sacct command in Slurm to get a list of all recent jobs and their start/end times.

Specifications

Spartan has a number of partitions available for general usage. A full list of partitions can be viewed with the command sinfo -s.

Partition Nodes Cores per node Memory per node (MB) Processor Peak Performance (DP TFlops) Slurm node types Extra notes
interactive 2 72 745000 Intel(R) Xeon(R) Gold 6254 CPU @ 3.10GHz 10.1 physg5,avx512 Max walltime of 2 days
long 2 32 239000 Intel(R) Xeon(R) CPU E5-2698 v3 @ 2.30GHz 2.35 avx2 Max walltime of 90 days
physical 12 72 1519000 Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz 58.1 physg4,avx512
41 72 745000 Intel(R) Xeon(R) Gold 6254 CPU @ 3.10GHz 208 physg5,avx512
snowy 29 32 239000 Intel(R) Xeon(R) CPU E5-2698 v3 @ 2.30GHz 34.15 avx2
gpgpu 73 24 111000 Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz 61.5 (CPU) + 1358 (GPU) avx2 4 P100 Nvidia GPUs per node
Total 374 (CPU) + 1358 (GPU)

Total includes private partitions (including mig, msps2, and punim0396)

Physical

Each node is connected by high-speed 50Gb networking with 1.5 ┬Ásec latency, making this partition suited to multi-node jobs (e.g. those using OpenMPI).

You can constrain your jobs to use different groups of nodes (e.g. just the Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz nodes) by adding #SBATCH --constraint=physg4 to your submit script

GPGPU

The GPGPU partition is funded through a LIEF grant. See the GPU page for more details.

AVX-512

A number of nodes make use of the AVX-512 extended instructions. These include all of the nodes in the physical partition, and the login nodes. To submit a job on the physical partition that makes use of these instructions add #SBATCH --constraint=avx512 in your submission script.

Other Partitions

There are also special partitions which are outside normal walltime constraints. In particular, interactive and shortgpgpu should be used for quick test cases; interactive has a maximum time of 2 days, and shortgpgpu has a maximum time constraint of one hour.

Deeplearn

The deeplearn partition has been bought for use by Computing and Information Systems research staff and students

Node name Nodes Cores/node Memory/node (MB) Processor GPU Type GPU Memory Slurm node type
spartan-gpgpu[072-075] 4 28 234000 Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz v100 16GB GPU RAM per GPU dlg1
spartan-gpgpu[078-082] 5 28 174000 Intel(R) Xeon(R) Gold 6132 CPU @ 2.60GHz v100 16GB GPU RAM per GPU dlg2
spartan-gpgpu[086-088] 3 24 175000 Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz v100sxm2 32GB GPU RAM per GPU dlg3

To specify that your job should only run on a specific node type, add a constraint to your submission script. e.g. to specify the V100SXM2 nodes, add #SBATCH --constraint=dlg3 to your submit script.

Storage system

Spartan uses IBM Spectrum Scale (previously known as GPFS). This is a highly scalable, parallel and robust filesystem.

The total Spartan storage on CephFS is broken up into 2 areas:

Location Capacity Disk type
/data/gpfs 2.1PB 10K SAS
/data/scratch 525TB Flash

/home is on the University's NetApp NFS platform, backed by SSD