Slurm: Important Commands

Start time of job or resources : squeue --start¶

The command can be used by any user to displays the estimated start time of a job based a number of analysis types based on historical usage, earliest available reservable resources, and priority based backlog. The command squeue is explained in detail on the squeue-manpage.

Access¶

By default, this command can be run by any user.

List of your submitted jobs : squeue¶

Displays information about active, pending and/or recently completed jobs. The command displays all own active and pending jobs. The command squeue is explained in detail on the squeue-manpage.

Access¶

By default, this command can be run by any user.

Flags¶

Flag	Description
-l, --long	Report more of the available information for the selected jobs or job steps, subject to any constraints specified.

Examples¶

squeue

$ squeue JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON) normal       sh   vu1498  R      33:43      1 fh2n1274 normal       sh   vu1498  R    2:56:48      1 fh2n0501 normal       sh   vu1498  R    5:37:21      1 fh2n1402 normal       sh   vu1498  R    5:37:38      1 fh2n1396 normal  fh2_400   vu1498  R   11:28:57     20 fh2n[0481-0483,1433-1448,1451] normal fh2_256R   vu1498  R   12:16:45     16 fh2n[0217-0232] normal fh2_256R   vu1498  R 1-07:37:46     16 fh2n[0145,0149-0153,0155-0156,0165,0209-0215] normal  fh2_64R   vu1498  R 1-16:01:52      4 fh2n[1518-1520,1526] normal  fh2_64R   vu1498  R 1-16:02:18      4 fh2n[1466,1469-1470,1479] JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON) normal       sh   vu1498  RUNNING      34:25  12:00:00      1 fh2n1274 normal       sh   vu1498  RUNNING    2:57:30  12:00:00      1 fh2n0501 normal       sh   vu1498  RUNNING    5:38:03  12:00:00      1 fh2n1402 normal       sh   vu1498  RUNNING    5:38:20  12:00:00      1 fh2n1396 normal  fh2_400   vu1498  RUNNING   11:29:39 2-00:00:00     20 fh2n[0481-0483,1433-1448,1451] normal fh2_256R   vu1498  RUNNING   12:17:27 2-00:00:00     16 fh2n[0217-0232] normal fh2_256R   vu1498  RUNNING 1-07:38:28 2-00:00:00     16 fh2n[0145,0149-0153,0155-0156,0165,0209-0215] normal  fh2_64R   vu1498  RUNNING 1-16:02:34 2-00:00:00      4 fh2n[1518-1520,1526] normal  fh2_64R   vu1498  RUNNING 1-16:03:00 2-00:00:00      4 fh2n[1466,1469-1470,1479]

The output of squeue shows how many jobs of yours are running or pending and how many nodes are in use by your jobs.



Shows free resources : sinfo_t_idle¶
The Slurm command sinfo is used to view partition and node information for a system running Slurm. It incorporates down time, reservations, and node state information in determining the available backfill window. The sinfo command can only be used by the administrator.
SCC has prepared a special script (sinfo_t_idle) to find out how many processors are available for immediate use on the system. It is anticipated that users will use this information to submit jobs that meet these criteria and thus obtain quick job turnaround times. 
Access¶
By default, this command can be used by any user (sinfo can only be used by the administrator). 
Example¶
The following command displays what resources are available for immediate use for the whole partition.
$ sinfo_t_idle
Partition develop   :      2 nodes idle
Partition normal    :      0 nodes idle
Partition long      :      0 nodes idle
Partition xnodes    :      0 nodes idle
Partition visu      :      9 nodes idle

For the above example the request for 1 node in the partition visu can be run immediately.


Detailed job information : scontrol show job¶
scontrol show job displays detailed job state information and diagnostic output for all or a specified job of yours. Detailed information is available for active, pending and recently completed jobs. The command scontrol is explained in detail on the the scontrol-manpage. 
Display the state of all your jobs in normal mode: scontrol show job
Display the state of a job with <jobid> in normal mode: scontrol show job <jobid>
Access¶
End users can use scontrol show job to view the status of their own jobs only. 
Arguments¶



Option
Example




-d
Display the state with jobid 8370992 in detailed mode: scontrol -d show job 8370992



Example for scontrol show job¶
Here is an example from HoreKa.
squeue    # show my own jobs (here the userid is replaced!)
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
           1760162    normal   15grad   ab1234  R   11:37:33     28 fh2n[0485-0487,0537-0544,1481-1488,1553-1558,1561-1562,1568]

$
$ # now, see what's up with my pending job with jobid 1760162
$ 
$ scontrol show job 1760162
JobId=1760162 JobName=15grad
   UserId=ab1234(17105) GroupId=fh2-project-cxyz(500411) MCS_label=N/A
   Priority=1 Nice=0 Account=fh2-project-hivxyz QOS=normal
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   RunTime=11:39:27 TimeLimit=3-00:00:00 TimeMin=N/A
   SubmitTime=2020-04-13T22:14:11 EligibleTime=2020-04-13T22:14:11
   AccrueTime=2020-04-13T22:14:11
   StartTime=2020-04-14T04:17:01 EndTime=2020-04-17T04:17:01 Deadline=N/A
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
   LastSchedEval=2020-04-14T04:17:01
   Partition=normal AllocNode:Sid=fh2n1992:530
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=fh2n[0485-0487,0537-0544,1481-1488,1553-1558,1561-1562,1568]
   BatchHost=fh2n0485
   NumNodes=28 NumCPUs=560 NumTasks=560 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=560,mem=875G,node=28,billing=560
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:1 CoreSpec=*
   MinCPUsNode=1 MinMemoryNode=32000M MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   OverSubscribe=NO Contiguous=0 Licenses=(null) Network=(null)
   Command=/pfs/work5/fh2-project-cxyz/ab1234/15grad/runVOF_Fh1_collated.sh
   WorkDir=/pfs/work5/fh2-project-cxyz/ab1234/15grad
   StdErr=/pfs/work5/fh2-project-cxyz/ab1234/15grad/slurm-1760162.out
   StdIn=/dev/null
   StdOut=/pfs/work5/fh2-project-cxyz/ab1234/15grad/slurm-1760162.out
   Power=

You can use standard Linux pipe commands to filter the very detailed scontrol show job output.

Is the job already running?

$ scontrol show job 1760162 | grep -i state
   JobState=RUNNING Reason=None Dependency=(null)



Cancel Slurm Jobs¶
The scancel command is used to cancel jobs. The command scancel is explained in detail on the scancel-manpage.   
Canceling own jobs : scancel¶
scancel is used to signal or cancel jobs, job arrays or job steps. The command is:
$ scancel [-i] <job-id>
$ scancel -t <job_state_name>




Flag
Description
Example




-i, --interactive
Interactive mode
Cancel the job 987654 interactively.  scancel -i 987654


-t, --state
(n/a)
Restrict the scancel operation to jobs in a certain state. 
  "job_state_name" may have a value of either "PENDING", "RUNNING" or "SUSPENDED". 
 Cancel all jobs in state "PENDING".  scancel -t "PENDING"



                
                  
                    



  
    
      Last update: February 25, 2021

Flag	Description	Example
-i, --interactive	Interactive mode	Cancel the job 987654 interactively. `scancel -i 987654`
-t, --state	(n/a)	Restrict the scancel operation to jobs in a certain state. "job_state_name" may have a value of either "PENDING", "RUNNING" or "SUSPENDED". Cancel all jobs in state "PENDING". `scancel -t "PENDING"`