Status and Specs
Spartan Daily Weather Report
20241121
- GPFS usage: 1961.83TB Free: 385.87TB (83%)
- Spartan utilisation on the cascade partition is 98.64% node allocation, 42.11% CPU allocation.
- Spartan utilisation on the sapphire partition is 62.82% node allocation, 29.04% CPU allocation.
- Total queued pending/queued on the public partitions: 2035
- Spartan utilisation on the gpu-a100 partition is 100.00% node allocation. Total queued/pending: 171
- Spartan utilisation on the gpu-h100 partition is 100.00% node allocation. Total queued/pending: 89
- GPU card usage on the gpu-a100 partition is: 112 / 116 cards in use (96.55%)
- GPU card usage on the gpu-h100 partition is: 40 / 40 cards in use (100.00%)
- 0 nodes out.
Wait Time
How long will my job take to start?
-
How to Interpret
- This plot provides data on how long previous jobs have taken to start, which can be used as guidance on how long your job might take to start.
- Note however that "Past performance is no guarantee of future results"; wait times can fluctuate quickly due to changes in usage or outages, and wait time could be considerably more or less than the historic average.
- Daily averages are shown, but points may be missing for days where there were no jobs matching the selected characteristics.
- This data is acquired using the
sacct
command in Slurm to get a list of all recent jobs and their start/end times.
Specifications
Spartan has a number of partitions available for general usage. A full list of partitions can be viewed with the command sinfo -s
.
Partition | Nodes | Cores per node | Memory per node (MB) | Processor | Peak Performance (DP TFlops) | Slurm node types | Extra notes |
---|---|---|---|---|---|---|---|
interactive | 4 | 72 | 710000 | Intel(R) Xeon(R) Gold 6254 CPU @ 3.10GHz | 2.53 | physg5,avx512 | Max walltime of 2 days |
long | 2 | 72 | 710000 | Intel(R) Xeon(R) Gold 6254 CPU @ 3.10GHz | 1.26 | physg5,avx512 | Max walltime of 90 days |
bigmem | 8 | 72 | 2970000 | Intel(R) Xeon(R) Gold 6254 CPU @ 3.10GHz | 5.07 | physg5,avx512 | Max walltime of 14 days |
cascade | 74 | 72 | 710000 | Intel(R) Xeon(R) Gold 6254 CPU @ 3.10GHz | 46.89 (0.63 per node) | physg5,avx512 | Max walltime of 30 days |
sapphire | 78 | 128 | 977000 | Intel(R) Xeon(R) Gold 6448H | 143.75 (1.84 per node) | sr,avx512 | Max walltime of 30 days |
gpu-a100 | 29 | 32 | 495000 | Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz | avx512 | 4 80GB A100 Nvidia GPUs per node. Max walltime of 7 days | |
gpu-a100-short | 2 | 32 | 495000 | Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz | avx512 | 4 80GB A100 Nvidia GPUs per node. Max walltime of 4 hrs | |
gpu-h100 | 10 | 64 | 950000 | Intel(R) Xeon(R) Platinum 8462Y+ @ 2.80GHz | avx512 | 4 80GB H100 SXM5 Nvidia GPUs per node. Max walltime of 7 days | |
Total | 199.5 (CPU) + 1358 (GPU) |
sapphire and cascade
Each node is connected by high-speed 50Gb networking with 1.5 µsec latency, making this partition suited to multi-node jobs (e.g. those using OpenMPI).
You can constrain your jobs to use different groups of nodes (e.g. just the Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz nodes) by adding #SBATCH --constraint=physg4
to your submit script
gpu-a100
The gpu-a100 partition has modern, 80GB Nvidia A100 GPUs. See the GPU page for more details.
gpu-h100
The gpu-h100 partition has modern, 80GB Nvidia H100 SXM5 GPUs. See the GPU page for more details.
AVX-512
A number of nodes make use of the AVX-512 extended instructions. These include all of the nodes in the cascade partition, and the login nodes. To submit a job on the cascade partition that makes use of these instructions add #SBATCH --constraint=avx512
in your submission script.
Private partitions
Spartan hosts a number of private partitions, where the hardware is owned by individual research groups and faculties.
FEIT research staff and students: feit-gpu-a100
Computing and Information Systems research staff and students: deeplearn
Faculty of Science research staff and students: fos-gpu-l40s
The partition information for the private partitions can be found on Specs for private partitions
Other Partitions
There are also special partitions which are outside normal walltime constraints. In particular, interactive
should be used for quick test cases; interactive
has a maximum time of 2 days.
Storage system
Spartan uses IBM Spectrum Scale (previously known as GPFS). This is a highly scalable, parallel and robust filesystem.
The total Spartan storage is broken up into 2 areas:
Location | Capacity | Disk type |
---|---|---|
/data/gpfs | 2.34PB | 10K SAS |
/data/scratch | 575TB | Flash |
/home is on the University's NetApp NFS platform, backed by SSD