Skip to main content

gpu_launch_sized_workgroup_mem

Function gpu_launch_sized_workgroup_mem 

Source
pub fn gpu_launch_sized_workgroup_mem<T>() -> *mut T
🔬This is a nightly-only experimental API. (gpu_launch_sized_workgroup_mem #135513)
Available on AMD GPU or NVidia GPU only.
Expand description

Returns the pointer to workgroup memory allocated at launch-time on GPUs.

Workgroup memory is a memory region that is shared between all threads in the same workgroup. It is faster to access than other memory but pointers do not work outside the workgroup where they were obtained. Workgroup memory can be allocated statically or after compilation, when launching a gpu-kernel. gpu_launch_sized_workgroup_mem returns the pointer to the memory that is allocated at launch-time. The size of this memory can differ between launches of a gpu-kernel, depending on what is specified at launch-time. However, the alignment is fixed by the kernel itself, at compile-time.

The returned pointer is the start of the workgroup memory region that is allocated at launch-time. All calls to gpu_launch_sized_workgroup_mem in a workgroup, independent of the generic type, return the same address, so alias the same memory. The returned pointer is aligned by at least the alignment of T.

If gpu_launch_sized_workgroup_mem is invoked multiple times with different types that have different alignment, then you may only rely on the resulting pointer having the alignment of T after a call to gpu_launch_sized_workgroup_mem::<T> has occurred in the current program execution.

§Safety

The pointer is safe to dereference from the start (the returned pointer) up to the size of workgroup memory that was specified when launching the current gpu-kernel. This allocated size is not related in any way to T.

The user must take care of synchronizing access to workgroup memory between threads in a workgroup. The usual data race requirements apply.

§Other APIs

CUDA and HIP call this dynamic shared memory, shared between threads in a block. OpenCL and SYCL call this local memory, shared between threads in a work-group. GLSL calls this shared memory, shared between invocations in a work group. DirectX calls this groupshared memory, shared between threads in a thread-group.