Slurm: Accessing an existing job¶
In some cases, you might want to access the resources of an existing allocation, e.g. when running multiple (interactive) jobs on the same allocation or when troubleshooting a non-interactive job. This can be done on the HoreKa system using Slurm's srun command. After allocating resources, e.g. with sbatch or salloc, you can access the resources of this allocation with
$ srun --jobid=<JOBID> --pty /usr/bin/bash
It creates a new job step and executes /usr/bin/bash
in pseudo terminal mode on task zero of this job step, i.e. it gives you an interactive bash on the previously allocated job.
The job id is printed immediately after a job allocation and can be retrieved later with the squeue command.
If all the allocated CPU resources are already used, srun will prohibit the new job step the access to the resources.
However, the argument --overlap
can be passed to srun to allow job steps to overlap on the resources.
In case you need to access specific nodes within the job allocation, you can do so with the argument --nodelist=<NODELIST>
.
A full example:
$ srun --jobid=1809325 --nodelist=hkn0313 --overlap --pty /usr/bin/bash
executes a bash on node hkn0313
, that is currently allocated by the job with the id 1809325
and overlaps with already running tasks.
Keep in mind, that your subsequent job will be cancelled if the original allocation is released. That can happen if a task running in the allocation is finished or cancelled.