Hardware Overview
The Future Technologies Partition is currently split into two different cluster installations with different hardware components, different cluster management software stacks and different login nodes.
FTP is a distributed memory parallel computer consisting of multiple individual servers called "nodes". Nodes are divided into two clusters, one supports ARM instruction set and the other supports x86. Each node has either two Intel Xeon processors or up to two ARM processors, at least 32 GB of local memory, local NVMe or SATA SSD disks and two high-performance network adapters. All nodes are connected by an extremely fast, low-latency InfiniBand 4X HDR interconnect. In addition two large parallel file systems are connected to FTP.
The operating system installed on every node is Red Hat Enterprise Linux (RHEL) 8.x. On top of this operating system, a set of (open source) software components like Slurm has been installed. Some of these components are of special interest to end users and are briefly discussed here. Others are mostly just of importance to system administrators and are thus not covered by this documentation.
The different server systems in FTP have different roles and offer different services.
Login Nodes
The login nodes are the only nodes directly accessible to end users. These nodes can be used for interactive logins, file management, software development and interactive pre- and postprocessing. Two nodes are dedicated as login nodes.
Compute Nodes
The majority of the nodes are dedicated to computations. These nodes are not directly accessible to users. Instead the calculations have to be submitted to a so-called batch system. The batch system manages all compute nodes and executes the queued jobs depending on their priority and as soon as the required resources become available. A single job may use hundreds of compute nodes and many thousand CPU cores at once.
FTP-A64 ARM cluster
Login node
|
Login node |
No. Nodes |
1 |
CPU |
1x Ampere Altra Q80-33 |
Total Cores |
80 |
CPU Clock |
3.3 GHz |
Instruction Set |
ARMv8.2+ |
Memory |
512 GiB |
Accelerator Type |
- |
No. Accelerators |
- |
Disk (useable) |
890 GB |
Disk type |
SATA-SSD |
InfiniBand |
200 GBit/s BlueField 2 |
Ampere Altra CPU
|
ARM-A100 |
Dual Socket ARM Altra Max |
No. Nodes |
4 |
6 |
CPU |
1x Ampere Altra Q80-33 |
2x Ampere Altra Max M128-30 |
Total Cores |
80 |
256 |
CPU Clock |
3.3 GHz |
3 GHz |
Instruction Set |
ARMv8.2+ |
ARMv8.2+ |
Memory |
512 GiB |
512 GiB |
Accelerator Type |
NVIDIA A100 PCIe 40GB |
- |
No. Accelerators |
2 |
- |
Disk (useable) |
890 GB |
1.8 TB |
Disk type |
SATA-SSD |
SATA-SSD |
InfiniBand |
200 GBit/s BlueField 2 |
200 GBit/s ConnectX-6 |
Fujitsu CPU
|
A64FX nodes |
No. Nodes |
8 |
CPU |
1x Fujitsu A64FX |
Total Cores |
48 |
CPU Clock |
1.8 GHz |
Instruction Set |
ARMv8.2-A + SVE |
Memory |
32 GiB |
Accelerator Type |
- |
No. Accelerators |
- |
Disk (useable) |
372 GB |
Disk type |
NVMe |
InfiniBand |
100GBit/s ConnectX-6 |
NVIDIA Superchips
|
Grace-Grace |
Grace-Hopper |
No. Nodes |
6 |
2 |
CPU |
2x Grace CPU Superchip |
1x Grace CPU Superchip |
Total Cores |
144 |
72 |
CPU Clock |
3.4 GHz |
3.1 GHz |
Instruction Set |
ARMv9 + SVE2 |
ARMv9 + SVE2 |
Memory |
480GB |
480GB |
Accelerator Type |
- |
GH200 |
No. Accelerators |
- |
1 |
Disk (useable) |
3.8 TB |
1.92 TB |
Disk type |
NVMe |
NVMe |
InfiniBand |
200 GBit/s ConnectX-6 |
200 GBit/s ConnectX-6 |
FTP-X86 cluster
Login node
|
Login node |
No. Nodes |
1 |
CPU |
2x Intel Xeon Gold 6230 |
Sockets |
2 |
Total Cores |
40 |
Total Threads |
40 |
CPU Base Clock |
2.1 GHz |
Instruction Set |
x86_64 |
Memory |
192 GB |
Accelerator type |
- |
No. Accelerators |
- |
Accelerator TFLOPS (FP64) |
- |
Accelerator TFLOPS (FP32) |
- |
Accelerator TFLOPS (FP16) |
- |
Accelerator Mem b/w (GB/s) |
- |
Disk |
890GB |
Disk type |
NVMe |
Graphcore Accelerators
|
AMD EPYC Milan + Graphcore |
No. Nodes |
1 |
CPU |
2x AMD EPYC 7543 |
Sockets |
2 |
Total Cores |
64 |
Total Threads |
128 |
CPU Base Clock |
2.8 GHz |
Instruction Set |
x86_64 |
Memory |
512 GB |
Accelerator type |
Graphcore IPU-M2000 |
No. Accelerators |
4 (=16 IPUs) |
Accelerator TFLOPS (FP64) |
- |
Accelerator TFLOPS (FP32) |
- |
Accelerator TFLOPS (FP16) |
250 |
Accelerator Mem b/w (GB/s) |
|
Disk |
3 TB |
Disk type |
NVMe |
AMD Instinct Accelerators
|
AMD EPYC Milan + MI100 |
AMD EPYC Milan + MI210 |
AMD EPYC Milan + MI250 |
No. Nodes |
1 |
1 |
2 |
CPU |
2x AMD EPYC 7543 |
2x AMD EPYC 7543 |
2x AMD EPYC 7713 |
Sockets |
2 |
2 |
2 |
Total Cores |
64 |
64 |
128 |
Total Threads |
128 |
128 |
128 |
CPU Base Clock |
2.8 GHz |
2.8 GHz |
2.0 GHz |
Instruction Set |
x86_64 |
x86_64 |
x86_64 |
Memory |
512 GB |
512 GB |
1 TB |
Accelerator type |
AMD MI100 |
AMD MI210 |
AMD MI250 |
No. Accelerators |
4 |
4 |
4 |
Accelerator TFLOPS (FP64) |
11,5 |
22,6 |
45,3 |
Accelerator TFLOPS (FP32) |
23 |
22,6 |
45,3 |
Accelerator TFLOPS (FP16) |
184,6 |
181 |
362,1 |
Accelerator Mem b/w (GB/s) |
1228 |
1600 |
3276 |
Disk |
3 TB |
3 TB |
1.8 TB |
Disk type |
NVMe |
NVMe |
NVMe |
AMD Instinct MI300A APU
|
AMD Instinct MI300A Accelerators |
No. Nodes |
1 |
CPU |
4x AMD Instinct MI300A Accelerator (Zen4) |
Sockets |
4 |
Total Cores |
96 |
Total Threads |
192 |
CPU Base Clock |
|
Instruction Set |
x86_64 |
Memory |
512 GB |
Accelerator type |
Instinct MI300A |
No. Accelerators |
4 |
Accelerator TFLOPS (FP64) |
61,3 |
Accelerator TFLOPS (FP32) |
122,6 |
Accelerator TFLOPS (FP16) |
980,6 |
Accelerator Mem b/w (GB/s) |
5300 |
Disk |
29 TB |
Disk type |
NVMe |
Intel Sapphire Rapids CPU
|
Intel Sapphire Rapids |
Intel Sapphire Rapids + HBM |
No. Nodes |
2 |
2 |
CPU |
2x Intel Xeon Platinum 8480+ |
2x Intel Xeon CPU Max 9470 |
Sockets |
2 |
2 |
Total Cores |
112 |
104 |
Total Threads |
224 |
208 |
CPU Base Clock |
3.8 GHz |
3.5 GHz |
Instruction Set |
x86_64 |
x86_64 |
Memory |
512 GB |
640 GB |
Accelerator type |
- |
- |
No. Accelerators |
- |
- |
Accelerator TFLOPS (FP64) |
- |
- |
Accelerator TFLOPS (FP32) |
- |
- |
Accelerator TFLOPS (FP16) |
- |
- |
Accelerator Mem b/w (GB/s) |
- |
- |
Disk |
890GB |
890 GB |
Disk type |
NVMe |
NVMe |
Intel Ponte Vecchio Accelerator
|
Intel Sapphire Rapids + Ponte Vecchio |
No. Nodes |
1 |
CPU |
2x Intel Xeon Platinum 8460Y+ |
Sockets |
2 |
Total Cores |
80 |
Total Threads |
160 |
CPU Base Clock |
3.7 GHz |
Instruction Set |
x86_64 |
Memory |
512 GB |
Accelerator type |
Intel Ponte Vecchio XT |
No. Accelerators |
4 |
Accelerator TFLOPS (FP64) |
52 |
Accelerator TFLOPS (FP32) |
52 |
Accelerator TFLOPS (FP16) |
52 |
Accelerator Mem b/w (GB/s) |
3276 |
Disk |
890 GB |
Disk type |
NVMe |
Intel Gaudi 2 AI Accelerator
|
Intel Gaudi 2 AI Accelerator |
No. Nodes |
3 |
CPU |
2x Intel Xeon Platinum 8360Y |
Sockets |
2 |
Total Cores |
72 |
Total Threads |
144 |
CPU Base Clock |
2.4 GHz |
Instruction Set |
x86_64 |
Memory |
1024 GB |
Accelerator type |
Intel Gaudi 2 |
No. Accelerators |
8 |
Accelerator TFLOPS (FP64) |
- |
Accelerator TFLOPS (FP32) |
- |
Accelerator TFLOPS (FP16) |
- |
Accelerator Mem b/w (GB/s) |
- |
Disk |
1.8TB |
Disk type |
NVMe |
Interconnect
An important component of FTP is the InfiniBand 4x HDR 200 GBit/s interconnect. All nodes are attached to this high-throughput, very low-latency (~ 1 microsecond) network. InfiniBand is ideal for communication intensive applications and applications that e.g. perform a lot of collective MPI communications.