Skip to content

Slurm: Important Commands

Start time of job or resources : squeue --start

The command can be used by any user to displays the estimated start time of a job based a number of analysis types based on historical usage, earliest available reservable resources, and priority based backlog. The command squeue is explained in detail on the squeue-manpage.

Access

By default, this command can be run by any user.


List of your submitted jobs : squeue

Displays information about active, pending and/or recently completed jobs. The command displays all own active and pending jobs. The command squeue is explained in detail on the squeue-manpage.

Access

By default, this command can be run by any user.

Flags

Flag Description
-l, --long Report more of the available information for the selected jobs or job steps, subject to any constraints specified.

Examples

squeue example on HoreKa (Only your own jobs are displayed!).

$ squeue 
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
           1760728    normal       sh   vu1498  R      33:43      1 fh2n1274
           1760673    normal       sh   vu1498  R    2:56:48      1 fh2n0501
           1760585    normal       sh   vu1498  R    5:37:21      1 fh2n1402
           1760583    normal       sh   vu1498  R    5:37:38      1 fh2n1396
           1760065    normal  fh2_400   vu1498  R   11:28:57     20 fh2n[0481-0483,1433-1448,1451]
           1756096    normal fh2_256R   vu1498  R   12:16:45     16 fh2n[0217-0232]
           1759242    normal fh2_256R   vu1498  R 1-07:37:46     16 fh2n[0145,0149-0153,0155-0156,0165,0209-0215]
           1756685    normal  fh2_64R   vu1498  R 1-16:01:52      4 fh2n[1518-1520,1526]
           1756683    normal  fh2_64R   vu1498  R 1-16:02:18      4 fh2n[1466,1469-1470,1479]

$ squeue -l
             JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON)
           1760728    normal       sh   vu1498  RUNNING      34:25  12:00:00      1 fh2n1274
           1760673    normal       sh   vu1498  RUNNING    2:57:30  12:00:00      1 fh2n0501
           1760585    normal       sh   vu1498  RUNNING    5:38:03  12:00:00      1 fh2n1402
           1760583    normal       sh   vu1498  RUNNING    5:38:20  12:00:00      1 fh2n1396
           1760065    normal  fh2_400   vu1498  RUNNING   11:29:39 2-00:00:00     20 fh2n[0481-0483,1433-1448,1451]
           1756096    normal fh2_256R   vu1498  RUNNING   12:17:27 2-00:00:00     16 fh2n[0217-0232]
           1759242    normal fh2_256R   vu1498  RUNNING 1-07:38:28 2-00:00:00     16 fh2n[0145,0149-0153,0155-0156,0165,0209-0215]
           1756685    normal  fh2_64R   vu1498  RUNNING 1-16:02:34 2-00:00:00      4 fh2n[1518-1520,1526]
           1756683    normal  fh2_64R   vu1498  RUNNING 1-16:03:00 2-00:00:00      4 fh2n[1466,1469-1470,1479]

  • The output of squeue shows how many jobs of yours are running or pending and how many nodes are in use by your jobs.


Shows free resources : sinfo_t_idle

The Slurm command sinfo is used to view partition and node information for a system running Slurm. It incorporates down time, reservations, and node state information in determining the available backfill window. The sinfo command can only be used by the administrator.

SCC has prepared a special script (sinfo_t_idle) to find out how many processors are available for immediate use on the system. It is anticipated that users will use this information to submit jobs that meet these criteria and thus obtain quick job turnaround times.

Access

By default, this command can be used by any user (sinfo can only be used by the administrator).

Example

The following command displays what resources are available for immediate use for the whole partition.

$ sinfo_t_idle
Partition develop   :      2 nodes idle
Partition normal    :      0 nodes idle
Partition long      :      0 nodes idle
Partition xnodes    :      0 nodes idle
Partition visu      :      9 nodes idle

For the above example the request for 1 node in the partition visu can be run immediately.


Detailed job information : scontrol show job

scontrol show job displays detailed job state information and diagnostic output for all or a specified job of yours. Detailed information is available for active, pending and recently completed jobs. The command scontrol is explained in detail on the the scontrol-manpage.

Display the state of all your jobs in normal mode: scontrol show job

Display the state of a job with <jobid> in normal mode: scontrol show job <jobid>

Access

End users can use scontrol show job to view the status of their own jobs only.

Arguments

Option Example
-d Display the state with jobid 8370992 in detailed mode: scontrol -d show job 8370992

Example for scontrol show job

Here is an example from HoreKa.

squeue    # show my own jobs (here the userid is replaced!)
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
           1760162    normal   15grad   ab1234  R   11:37:33     28 fh2n[0485-0487,0537-0544,1481-1488,1553-1558,1561-1562,1568]

$
$ # now, see what's up with my pending job with jobid 1760162
$ 
$ scontrol show job 1760162
JobId=1760162 JobName=15grad
   UserId=ab1234(17105) GroupId=fh2-project-cxyz(500411) MCS_label=N/A
   Priority=1 Nice=0 Account=fh2-project-hivxyz QOS=normal
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   RunTime=11:39:27 TimeLimit=3-00:00:00 TimeMin=N/A
   SubmitTime=2020-04-13T22:14:11 EligibleTime=2020-04-13T22:14:11
   AccrueTime=2020-04-13T22:14:11
   StartTime=2020-04-14T04:17:01 EndTime=2020-04-17T04:17:01 Deadline=N/A
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
   LastSchedEval=2020-04-14T04:17:01
   Partition=normal AllocNode:Sid=fh2n1992:530
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=fh2n[0485-0487,0537-0544,1481-1488,1553-1558,1561-1562,1568]
   BatchHost=fh2n0485
   NumNodes=28 NumCPUs=560 NumTasks=560 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=560,mem=875G,node=28,billing=560
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:1 CoreSpec=*
   MinCPUsNode=1 MinMemoryNode=32000M MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   OverSubscribe=NO Contiguous=0 Licenses=(null) Network=(null)
   Command=/pfs/work5/fh2-project-cxyz/ab1234/15grad/runVOF_Fh1_collated.sh
   WorkDir=/pfs/work5/fh2-project-cxyz/ab1234/15grad
   StdErr=/pfs/work5/fh2-project-cxyz/ab1234/15grad/slurm-1760162.out
   StdIn=/dev/null
   StdOut=/pfs/work5/fh2-project-cxyz/ab1234/15grad/slurm-1760162.out
   Power=

You can use standard Linux pipe commands to filter the very detailed scontrol show job output.

  • Is the job already running?
$ scontrol show job 1760162 | grep -i state
   JobState=RUNNING Reason=None Dependency=(null)


Cancel Slurm Jobs

The scancel command is used to cancel jobs. The command scancel is explained in detail on the scancel-manpage.

Canceling own jobs : scancel

scancel is used to signal or cancel jobs, job arrays or job steps. The command is:

$ scancel [-i] <job-id>
$ scancel -t <job_state_name>

Flag Description Example
-i, --interactive Interactive mode Cancel the job 987654 interactively. scancel -i 987654
-t, --state (n/a) Restrict the scancel operation to jobs in a certain state.
"job_state_name" may have a value of either "PENDING", "RUNNING" or "SUSPENDED".
Cancel all jobs in state "PENDING". scancel -t "PENDING"

Last update: February 25, 2021