Skip to content

Batch system

As described in the Hardware Overview chapter, users only have direct access to the two login nodes of the Future Technologies Partition. Access to the compute nodes is only possible through the so-called batch system. The batch system on FTP is Slurm.

Slurm is an open source, fault-tolerant, and highly scalable job scheduling system for large and small Linux clusters. Slurm fulfills has three key functions. First, it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work.

Any kind of calculation on the compute nodes of HoreKa requires the user to define calculations as a sequence of commands or single command together with required run time, number of CPU cores and main memory and submit all, i.e., the batch job, to a resource and workload managing software. Therefore any job submission by the user is to be executed by commands of the Slurm software. Slurm queues and runs user jobs based on fair sharing policies.

FTP-A64 ARM batch system queues

Queue Node type(s) Access policy Minimum resources Default resources Maximum resources
a64fx A64FX Exclusive nodes=1, ntasks=1 time=00:30:00, ntasks=48, mem=28000mb time=24:00:00, nodes=8, ntasks=48, mem=28000mb
nvidia100_2 ARM-A100 Exclusive nodes=1, ntasks=1 time=00:30:00, ntasks=80, mem-per-cpu=6350mb time=24:00:00, nodes=4, ntasks=80, mem=522400mb
dual_a_max Dual ARM Altra max Exclusive nodes=1, ntasks=1 time=00:30:00, ntasks=256, mem-per-cpu=2035mb time=24:00:00, nodes=6, ntasks=256, mem=520960mb
grace_grace Grace-Grace Exclusive nodes=1, ntasks=1 time=00:30:00, ntasks=144, mem-per-cpu=3402mb time=24:00:00, nodes=6, ntasks=144, mem=489960mb

FTP-X86 batch system queues

Queue Node type(s) Access policy Minimum resources Default resources Maximum resources
intel-clv100 Cascade Lake + NVIDIA V100 Exclusive nodes=1, ntasks=1 time=00:30:00, ntasks=80, mem-per-cpu=192000mb time=24:00:00, nodes=4, ntasks=80, mem=192000mb
amd-milan-mi100 AMD Milan + MI100 Shared nodes=1, ntasks=1 time=00:30:00, ntasks=2, mem=8025mb time=24:00:00, nodes=1, ntasks=64, mem=513600mb
amd-milan-mi100 AMD Milan + MI210 Shared nodes=1, ntasks=1 time=00:30:00, ntasks=2, mem=8025mb time=24:00:00, nodes=1, ntasks=64, mem=513600mb
amd-milan-mi250 AMD Milan + MI250 Shared nodes=1, ntasks=1 time=00:30:00, ntasks=1, mem=8025mb time=24:00:00, nodes=2, ntasks=256, mem=1027200mb
amd-milan-graphcore AMD Milan + Graphcore Shared nodes=1, ntasks=1 time=00:30:00, ntasks=2, mem=8025mb time=24:00:00, nodes=1, ntasks=128, mem=513600mb
intel-spr Intel Sapphire Rapids Shared nodes=1 ntasks=1 time=00:30:00, ntasks=2, mem=6420mb time=24:00:00, nodes=2, ntasks=80, mem=513600mb
intel-spr-hbm Intel Sapphire Rapids + HBM Shared nodes=1 ntasks=1 time=00:30:00, ntasks=2, mem=8075mb time=24:00:00, nodes=2, ntasks=80, mem=646000mb
intel-spr-pvc Intel Sapphire Rapids + Ponte Vecchio Shared nodes=1 ntasks=1 time=00:30:00, ntasks=1, mem=6420mb time=24:00:00, nodes=1, ntasks=80, mem=513600mb
amd-milan-mi300 AMD Instinct MI300A Accelerators Shared nodes=1 ntasks=1 time=00:30:00, ntasks=2, mem=5356mb time=24:00:00, nodes=1, ntasks=192, mem=514176mb

Last update: August 6, 2024