[Glean] CUDA C++ Function Execution Space

Published: by Creative Commons Licence (Last updated: )

CUDA C++ Function Execution Space1

__global__

The __global__ execution space specifier declares a function as being a kernel. Such a function is:

  • Executed on the device,
  • Callable from the host,
  • Callable from the device for devices of compute capability 5.0 or higher.

__device__

The __device__ execution space specifier declares a function that is:

  • Executed on the device,
  • Callable from the device only.

__host__

The __host__ execution space specifier declares a function that is:

  • Executed on the host,
  • Callable from the host only.

__host__ __device__

The __host__ __device__ execution space specifier declares a function that is:

  • Executed on the host or the device,
  • Callable from the host or the device.

Example

#include <iostream>

__global__ void kernel() {
    printf("Hello from the device!\n");
}

__host__ void host_function() {
    printf("Hello from the host!\n");
}

__device__ void device_function() {
    printf("Hello from the device!\n");
}

__host__ __device__ void host_device_function() {
    printf("Hello from the host or the device!\n");
}

int main() {
    host_function(); // Call the host function
    kernel<<<1, 1>>>(); // Launch the kernel on the device
    cudaDeviceSynchronize(); // Wait for the kernel to finish
    device_function<<<1, 1>>>(); // Launch the device function
    cudaDeviceSynchronize(); // Wait for the device function to finish
    host_device_function(); // Call the host-device function
    return 0;
}

Summary

We summarize the function execution spaces in CUDA C++ in the following table:

Execution Space Description Callable from Host Callable from Device Executed on Host Executed on Device
__global__ Kernel Yes Yes No Yes
__device__ Device No Yes No Yes
__host__ Host Yes No Yes No
__host__ __device__ Host or Device Yes Yes Yes Yes

References