OpenMP¶

OpenMP is a directive-based parallel programming model. The specification supports the C, C++ and Fortran programming languages. OpenMP originally targeted CPU-based shared memory systems, support for accelerators such as GPUs has been added in more recent versions. For more information visit the OpenMP homepage. Recommended reading resources are:

OpenMP Specifications

There is a wide variety of training materials about using OpenMP on CPUs available, so this documentation focuses on offloading calculations to accelerators like GPUs.

OpenMP with accelerators¶

The fundamental idea of an OpenMP program is to execute all compute-intensive parts on the accelerator(s). Thereby, data transfers between host memory and device memory should be avoided as much as possible.

The host CPU can execute instructions asynchronously to the accelerator. In many cases, however, it is easier to have the computation run exclusively on the accelerator and to use the CPU only for auxiliary tasks like e.g. network communication and I/O.

The parts of the code that are supposed to be offloaded to an accelerator device are marked with sequences of the form #pragma omp target .... The compiler and runtime will automatically convert them into code for the chosen accelerator hardware, offload the generated sections to the accelerator during execution, transfer memory etc.

If written properly, OpenMP code conforms to the C/C++/FORTRAN standards and will still yield the same results when run on CPUs or when compiled with a compiler that does not support OpenMP.

Code sample¶

The following simple C program shows the typical sequence of an OpenMP program that offloads heavy calculations to the accelerator:

#ifdef _OPENMP
    #include <omp.h>
#else
    #error "OpenMP support required"
#endif
#include <stdio.h>
#include <stdlib.h>
#include <err.h>

int main(int arg, char *argv[]) {
    const unsigned int size = 1 << 16;

    // Platform information
    printf("OpenMP version (yyyymm): %i\n", _OPENMP);
    printf("Number of OpenMP devices: %i\n", omp_get_num_devices());

    printf("* Allocate memory on the host\n");
    double *a = (double *) malloc(size * sizeof(double));
    if (a == NULL) {
        errx(1, "malloc a[] failed");
    }

    printf("* Preprocess / initialize data on the host\n");
    printf("  e.g. read data from storage\n");
    for (int i = 0; i < size; i++) {
        a[i] = 0.;
    }

    printf("* Copy data from the host to the device\n");
    #pragma omp target enter data map(to: a[0:size])
    {}

    printf("* Compute on the device\n");
    #pragma omp target teams distribute parallel for simd
    for (int i = 0; i < size; i++) {
        a[i]++;
    }

    printf("* Transfer data back from the device to the host\n");
    #pragma omp target update from(a[0:size])
    {}

    printf("* Delete data on the device");
    #pragma omp target exit data map(release: a[0:size])
    {}

    printf("* Postprocess data on the host\n");
    printf("  e.g. write data to storage\n");
    for (int i = 0; i < size; i++) {
        if (a[i] != 1.) {
            errx(2, "Computation on GPU failed");
        }
    }

    printf("* Free memory on the host\n");
    free(a);

    return 0;
}

Compiling OpenMP code¶

OpenMP programs can be compiled with different compilers. The procedure for this is as follows:

GNU Compiler Collection

## Load GNU compiler environment
$ module add \
    compiler/gnu

## Compile C, C++ or FORTRAN source code with OpenMP support
$ gcc      ... -fopenmp ${C_SOURCE}   -o ${EXECUTABLE}
$ g++      ... -fopenmp ${CXX_SOURCE} -o ${EXECUTABLE}
$ gfortran ... -fopenmp ${F90_SOURCE} -o ${EXECUTABLE}

LLVM Compiler

## Load LLVM compiler and CUDA environment
$ module add \
    compiler/llvm/ \
    devel/cuda/11.4

## Compile C or C++ source code with OpenMP support
$ clang   ... -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda --cuda-path=${CUDA_PATH} -L${LLVM_LIB_DIR} ${C_SOURCE} -o ${EXECUTABLE}
$ clang++ ... -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda --cuda-path=${CUDA_PATH} -L${LLVM_LIB_DIR} ${CXX_SOURCE} -o ${EXECUTABLE}

NVIDIA High Performance Computing (HPC) SDK

## Load NVIDIA HPC SDK environment
$ module add \
    toolkit/nvidia-hpc-sdk

## Compile C or C++ source code with OpenMP support
$ nvc       ... -mp -target=gpu   ${C_SOURCE} -o ${EXECUTABLE}
$ nvc++     ... -mp -target=gpu ${CXX_SOURCE} -o ${EXECUTABLE}
$ nvfortran ... -mp -target=gpu ${F90_SOURCE} -o ${EXECUTABLE}