Future Projects
The NHR@KIT Center is currently participating in the following collaborative projects within the NHR alliance:
Identity management "as a service"
Reliable and secure access to high-performance computers and data exchange across multiple sites have become essential for many researchers. However, operating one's own identity management system with direct connection to other sites and large federations involves a great deal of effort and is not feasible for all organizations and institutions. In particular, the integration of contemporary security mechanisms such as multi-factor authentication has not yet been implemented in many identity management systems.
The software component Reg-App, which was significantly developed by KIT, already forms the backbone for secure, federated authentication of users for HoreKa, the Future Technologies Partition, and the high-performance computers operated as part of the Baden-Württemberg implementation concept for high-performance computing (bwHPC) the bwUniCluster.
As part of the NHR networking project "Reg-App - A federated identity management system as a service", NHR@KIT is coordinating the development of a service offering in which the federated identity management system is offered to other locations as an externally hosted service. In the first stage of expansion, the use is to be tested in the context of idm.nrw and HPC.NRW. In addition, the development of a project group management is planned, which will enable the persons responsible for the computing time projects set up on the NHR systems to manage the list of members independently.
Containers and Container management
Installation and configuration of application software highly depends on the specific software environment users encounter on an HPC system. This often makes it difficult to deploy specific application versions and to switch between different resources.
In recent years, container environments have emerged as a promising tool to circumvent these dependencies. They allow users to create the required software environments on their own, decouple them from the system environment to a large extent and thus make them executable on many different resources without further changes.
Within the framework of this joint project, NHR@KIT is involved in the evaluation of new container technologies, the provision of jointly maintained and standardized containers within the NHR network, the consideration of security-relevant aspects, the transfer of knowledge to users and administrators and other aspects of containers and container management on HPC systems.
Cx "as a service"
Many scientific application codes have grown over many years or even decades. Legacy issues such as outdated programming techniques, a "monolithic" design or lack of automation make further development difficult and constantly increase the effort required to validate new program versions. In addition, porting to new and innovative hardware architectures such as the ones provided with the Future Technologies Partition - which could possibly enable much higher computing power than established architectures - is only possible to a limited extent.
In order to preserve the investments already made in the existing code bases and make them ready for the future, the area of Software Sustainability is becoming increasingly important. In particular, automation in the form of Continuous Integration, Continuous Testing, Continuous Deployment and Continuous Benchmarking - CI/CT/CD/CB or Cx for short - is of great importance in this context.
NHR@KIT coordinates the area of "Sustainable Software Development - Cx as a Service" within the NHR Alliance and offers training courses and workshops as well as an infrastructure for CI/CT/CD.
Job-specific performance monitoring
Scalable, continuous performance monitoring is an important tool for sustainable and efficient use of high-performance computing resources. The operators of the resources can react to inefficient use by providing consulting and supportive services. But also on the side of the users a steadily increasing interest in corresponding data can be observed.
Due to the high complexity of HPC system environments, a large number of components are required to collect the necessary data and provide user-friendly access to it to users and administrators. Data collection must take place directly on the individual compute nodes and has to be enriched with metadata (e.g. from the job scheduler). The large volume of the collected data requires the use of powerful, scalable data storage systems. Some of the components used must be adaptable to the individual needs and circumstances of the various operators. The exact nature of the collected data also varies between centers, making the use of standard tools and providing training to users difficult.
In this collaborative project, NHR@KIT is working with other centers to ensure interoperability and common standards for system-wide and continuous job-specific performance monitoring environments.
Coordination with State-Level Networks
Even before the formation of the German Tier 2 centers in the National High Performance Computing Alliance (NHR), various Tier-3 HPC networks already existed in some German federal states, among them bwHPC (Baden-Württemberg), HKHLR (Hesse), HLRN (Berlin, Brandenburg, Bremen, Hamburg, Mecklenburg-Western Pomerania, Lower Saxony, Schleswig-Holstein), hpc.nrw (North Rhine-Westphalia), and KONWHIR (Bavaria). These networks organize the promotion of High-Performance Computing on a broad scale, but often a connection with Tier-2 or even Tier-1 sites already existed.
Users usually make their frist contact with HPC resources at the Tier-3 level, where they learn the basics of operation and the foundations are laid for a later scientific career in the Computational Sciences. However, state-level networks can hardly meet the demand for HPC experts and cannot be made responsible for the goal of "vertical permeability", the successful transition from Tier-3 to Tier-2 or Tier-1, alone.
Within the framework of this networking project, NHR@KIT coordinates the interlocking of NHR with the state-level networks, in particular with bwHPC, for example by coordinating training events and training multipliers ("Train-The-Trainer").
Highly Scalable Numerical Solvers and Numerical Libraries
Within the framework of ongoing and already completed research projects, highly scalable numerical solution algorithms and high-performance numerical software libraries have been developed at some NHR sites (e.g. HyTeg, Gingko). These lay the foundations for future Exascale Systems and promise to significantly accelerate existing applications, especially by efficiently exploiting current hardware technologies such as GPGPUs and other accelerators.
As part of this joint NHR project, these algorithms and libraries are be made available to all users. This goal includes, for example, ensuring compatibility with existing solvers and guiding researchers with the adoption into existing simulation software.