Skip to content

OpenACC

OpenACC is a directive-based parallel programming model. The specification supports the C, C++ and Fortran programming languages. For more information visit the OpenACC homepage to download additional resources:

OpenACC fundamentals

The fundamental idea of an OpenACC program is to execute all compute-intensive parts on the accelerator. Thereby, data transfers between host memory and device memory should be avoided as far as possible.

The host CPU can execute instructions asynchronously to the accelerator. In many cases, however, it is easier to have the computation run exclusively on the accelerator and to use the CPU only for auxiliary tasks like e.g. network communication and I/O.

The parts of the code that are supposed to be offloaded to an accelerator device are marked with sequences of the form #pragma acc .... The compiler and runtime will automatically convert them into code for the chosen accelerator hardware, offload the generated sections to the accelerator during execution, transfer memory etc.

Code sample

The following simple C program shows the typical sequence of an OpenACC program:

#ifdef _OPENACC
    #include <openacc.h>
#else
    #error "OpenACC support required"
#endif
#include <stdio.h>
#include <stdlib.h>
#include <err.h>

int main(int arg, char *argv[]) {
    const unsigned int size = 1 << 16;

    // Platform information
    acc_device_t devicetype = acc_device_default;
    int num_devices = acc_get_num_devices(devicetype);
    int device_num  = acc_get_device_num(devicetype);
    acc_set_device_num(device_num, devicetype);
    // 201306 -> 2.0
    // 201510 -> 2.5
    // 201711 -> 2.6
    printf("OpenACC version (yyyymm): %i\n", _OPENACC);
    printf("Number of OpenACC devices: %i\n", num_devices);
    printf("OpenACC Device number: %i\n", device_num);
    // acc_get_property, acc_get_property_string introduced with OpenACC 2.6
    #if _OPENACC >= 201711
    long int    mem             = acc_get_property(       device_num, acc_device_current, acc_property_memory);
    long int    free_mem        = acc_get_property(       device_num, acc_device_current, acc_property_free_memory);
    const char *property_name   = acc_get_property_string(device_num, acc_device_current, acc_property_name);
    const char *property_vendor = acc_get_property_string(device_num, acc_device_current, acc_property_vendor );
    const char *property_driver = acc_get_property_string(device_num, acc_device_current, acc_property_driver );
    printf("Memory on OpenACC device: %li\n", mem);
    printf("Free Memory on OpenACC device: %li\n", free_mem);
    if (property_name != NULL) {
        printf("OpenACC device name: %s\n", property_name);
    }
    if (property_vendor != NULL) {
        printf("OpenACC device vendor: %s\n", property_vendor);
    }
    if (property_driver != NULL) {
        printf("OpenACC device driver: %s\n", property_driver);
    }
    #endif

    printf("* Allocate memory on the host\n");
    double *a = (double *) malloc(size * sizeof(double));
    if (a == NULL) {
        errx(1, "malloc a[] failed");
    }

    printf("* Preprocess / initialize data on the host\n");
    printf("  e.g. read data from storage\n");
    for (int i = 0; i < size; i++) {
        a[i] = 0.;
    }

    printf("* Copy data from the host to the device\n");
    #pragma acc enter data copyin(a[0:size])
    {}

    printf("* Compute on the device\n");
    #pragma acc parallel loop present(a[0:size]) wait
    for (int i = 0; i < size; i++) {
        a[i]++;
    }

    printf("* Transfer data back from the device to the host\n");
    #pragma acc update host(a[0:size])
    {}

    printf("* Delete data on the device\n");
    #pragma acc exit data delete(a[0:size])
    {}

    printf("* Postprocess data on the host\n");
    printf("  e.g. write data to storage\n");
    for (int i = 0; i < size; i++) {
        if (a[i] != 1.) {
            errx(2, "Computation on GPU failed");
        }
    }

    printf("* Free memory on the host\n");
    free(a);

    return 0;
}

Compiling OpenACC code

OpenACC programs can be compiled with different compilers. The procedure for this is as follows:

  • GNU Compiler Collection

    ## Load GNU compiler environment
    $ module add \
      compiler/gnu
    
    ## Compile C, C++ or FORTRAN source code with OpenACC support
    $ gcc      ... -fopenacc   ${C_SOURCE} -o ${EXECUTABLE}
    $ g++      ... -fopenacc ${CXX_SOURCE} -o ${EXECUTABLE}
    $ gfortran ... -fopenacc ${F90_SOURCE} -o ${EXECUTABLE}
    
  • NVIDIA High Performance Computing (HPC) SDK

    ## Load NVIDIA HPC SDK environment
    $ module add toolkit/nvidia-hpc-sdk
    
    ## Compile C, C++ or FORTRAN source code with OpenACC support
    $ nvc       ... -acc -target=gpu   ${C_SOURCE} -o ${EXECUTABLE}
    $ nvc++     ... -acc -target=gpu ${CXX_SOURCE} -o ${EXECUTABLE}
    $ nvfortran ... -acc -target=gpu ${F90_SOURCE} -o ${EXECUTABLE}
    

Last update: November 30, 2022