Skip to content

Realtime job monitoring

Once your job is running on the node, you can connect to the node that it is running on, via 2 methods. You can ssh to the node, or connect to the job, both from the login node.

SSH to job

Using squeue, you can find out which node your job is running on.

squeue -j 28715643 -o %i,%N
JOBID,NODELIST
28715643,spartan-bm083

The job is running on spartan-bm083, so you can ssh to it from the login node while your job is running.

[user456@spartan-login1 ~]$ ssh spartan-bm083
Last login: Thu Aug 26 17:20:22 2021
[user456@spartan-bm083 ~]$ 

When you ssh to the node, if you have multiple jobs on the same node, your SSH session is randomly put into the container of one of your jobs running on the node.

Use srun to connect to job

Using srun, you can connect directly to the session of a job running on the worker node. Using the job ID, from the login node, you can do

srun --interactive --jobid 28715643 --pty /bin/bash

which would give you a bash terminal inside of your job, or you could run a command directly e.g.

srun --interactive --jobid 28715643 --pty nvidia-smi