[Glean] CUDA C++ Function Execution Space

Published: April 02, 2024 by SingularityKChen (Last updated: June 27, 2024 )

Categories:
Glean 53

CUDA C++ Function Execution Space
Example
Summary
References

CUDA C++ Function Execution Space¹

`global`

The __global__ execution space specifier declares a function as being a kernel. Such a function is:

Executed on the device,
Callable from the host,
Callable from the device for devices of compute capability 5.0 or higher.

`device`

The __device__ execution space specifier declares a function that is:

Executed on the device,
Callable from the device only.

`host`

The __host__ execution space specifier declares a function that is:

Executed on the host,
Callable from the host only.

`host device`

The __host__ __device__ execution space specifier declares a function that is:

Executed on the host or the device,
Callable from the host or the device.

Example

#include <iostream>

__global__ void kernel() {
    printf("Hello from the device!\n");
}

__host__ void host_function() {
    printf("Hello from the host!\n");
}

__device__ void device_function() {
    printf("Hello from the device!\n");
}

__host__ __device__ void host_device_function() {
    printf("Hello from the host or the device!\n");
}

int main() {
    host_function(); // Call the host function
    kernel<<<1, 1>>>(); // Launch the kernel on the device
    cudaDeviceSynchronize(); // Wait for the kernel to finish
    device_function<<<1, 1>>>(); // Launch the device function
    cudaDeviceSynchronize(); // Wait for the device function to finish
    host_device_function(); // Call the host-device function
    return 0;
}

Summary

We summarize the function execution spaces in CUDA C++ in the following table:

Execution Space	Description	Callable from Host	Callable from Device	Executed on Host	Executed on Device
`__global__`	Kernel	Yes	Yes	No	Yes
`__device__`	Device	No	Yes	No	Yes
`__host__`	Host	Yes	No	Yes	No
`__host__ __device__`	Host or Device	Yes	Yes	Yes	Yes

References

CUDA C++ Programming Guide 12.4 ↩

CUDA C++ Function Execution Space1

__global__

__device__

__host__

__host__ __device__