Package | Description |
---|---|
org.bytedeco.cuda.cudart | |
org.bytedeco.cuda.global |
Modifier and Type | Method and Description |
---|---|
cudaLaunchParams |
cudaLaunchParams.args(int i,
Pointer setter) |
cudaLaunchParams |
cudaLaunchParams.args(PointerPointer setter) |
cudaLaunchParams |
cudaLaunchParams.blockDim(dim3 setter) |
cudaLaunchParams |
cudaLaunchParams.func(Pointer setter) |
cudaLaunchParams |
cudaLaunchParams.getPointer(long i) |
cudaLaunchParams |
cudaLaunchParams.gridDim(dim3 setter) |
cudaLaunchParams |
cudaLaunchParams.position(long position) |
cudaLaunchParams |
cudaLaunchParams.sharedMem(long setter) |
cudaLaunchParams |
cudaLaunchParams.stream(CUstream_st setter) |
Modifier and Type | Method and Description |
---|---|
static int |
cudart.cudaLaunchCooperativeKernelMultiDevice(cudaLaunchParams launchParamsList,
int numDevices)
Deprecated.
|
static int |
cudart.cudaLaunchCooperativeKernelMultiDevice(cudaLaunchParams launchParamsList,
int numDevices,
int flags)
Deprecated.
This function is deprecated as of CUDA 11.3.
Invokes kernels as specified in the \p launchParamsList array where each element
of the array specifies all the parameters required to perform a single kernel launch.
These kernels can cooperate and synchronize as they execute. The size of the array is
specified by \p numDevices.
No two kernels can be launched on the same device. All the devices targeted by this
multi-device launch must be identical. All devices must have a non-zero value for the
device attribute ::cudaDevAttrCooperativeMultiDeviceLaunch.
The same kernel must be launched on all devices. Note that any __device__ or __constant__
variables are independently instantiated on every device. It is the application's
responsiblity to ensure these variables are initialized and used appropriately.
The size of the grids as specified in blocks, the size of the blocks themselves and the
amount of shared memory used by each thread block must also match across all launched kernels.
The streams used to launch these kernels must have been created via either ::cudaStreamCreate
or ::cudaStreamCreateWithPriority or ::cudaStreamCreateWithPriority. The NULL stream or
::cudaStreamLegacy or ::cudaStreamPerThread cannot be used.
The total number of blocks launched per kernel cannot exceed the maximum number of blocks
per multiprocessor as returned by ::cudaOccupancyMaxActiveBlocksPerMultiprocessor (or
::cudaOccupancyMaxActiveBlocksPerMultiprocessorWithFlags) times the number of multiprocessors
as specified by the device attribute ::cudaDevAttrMultiProcessorCount. Since the
total number of blocks launched per device has to match across all devices, the maximum
number of blocks that can be launched per device will be limited by the device with the
least number of multiprocessors.
The kernel cannot make use of CUDA dynamic parallelism.
The ::cudaLaunchParams structure is defined as:
where:
- ::cudaLaunchParams::func specifies the kernel to be launched. This same functions must
be launched on all devices. For templated functions, pass the function symbol as follows:
func_name |
Copyright © 2024. All rights reserved.