Uses of Class org.bytedeco.cuda.cudart.cudaLaunchParams (JavaCPP Presets for CUDA 12.9-9.10-1.5.12 API)

Skip navigation links

Prev
Next

All Classes

Packages that use cudaLaunchParams
Package Description

org.bytedeco.cuda.cudart

org.bytedeco.cuda.global

Uses of cudaLaunchParams in org.bytedeco.cuda.cudart

Methods in org.bytedeco.cuda.cudart that return cudaLaunchParams
Modifier and Type	Method and Description
`cudaLaunchParams`	cudaLaunchParams.`args(int i, Pointer setter)`
`cudaLaunchParams`	cudaLaunchParams.`args(PointerPointer setter)`
`cudaLaunchParams`	cudaLaunchParams.`blockDim(dim3 setter)`
`cudaLaunchParams`	cudaLaunchParams.`func(Pointer setter)`
`cudaLaunchParams`	cudaLaunchParams.`getPointer(long i)`
`cudaLaunchParams`	cudaLaunchParams.`gridDim(dim3 setter)`
`cudaLaunchParams`	cudaLaunchParams.`position(long position)`
`cudaLaunchParams`	cudaLaunchParams.`sharedMem(long setter)`
`cudaLaunchParams`	cudaLaunchParams.`stream(CUstream_st setter)`

Uses of cudaLaunchParams in org.bytedeco.cuda.global

Methods in org.bytedeco.cuda.global with parameters of type cudaLaunchParams
Modifier and Type	Method and Description
`static int`	cudart.`cudaLaunchCooperativeKernelMultiDevice(cudaLaunchParams launchParamsList, int numDevices)` Deprecated.
`static int`	cudart.`cudaLaunchCooperativeKernelMultiDevice(cudaLaunchParams launchParamsList, int numDevices, int flags)` Deprecated. This function is deprecated as of CUDA 11.3. Invokes kernels as specified in the \p launchParamsList array where each element of the array specifies all the parameters required to perform a single kernel launch. These kernels can cooperate and synchronize as they execute. The size of the array is specified by \p numDevices. No two kernels can be launched on the same device. All the devices targeted by this multi-device launch must be identical. All devices must have a non-zero value for the device attribute ::cudaDevAttrCooperativeMultiDeviceLaunch. The same kernel must be launched on all devices. Note that any __device__ or __constant__ variables are independently instantiated on every device. It is the application's responsiblity to ensure these variables are initialized and used appropriately. The size of the grids as specified in blocks, the size of the blocks themselves and the amount of shared memory used by each thread block must also match across all launched kernels. The streams used to launch these kernels must have been created via either ::cudaStreamCreate or ::cudaStreamCreateWithPriority or ::cudaStreamCreateWithPriority. The NULL stream or ::cudaStreamLegacy or ::cudaStreamPerThread cannot be used. The total number of blocks launched per kernel cannot exceed the maximum number of blocks per multiprocessor as returned by ::cudaOccupancyMaxActiveBlocksPerMultiprocessor (or ::cudaOccupancyMaxActiveBlocksPerMultiprocessorWithFlags) times the number of multiprocessors as specified by the device attribute ::cudaDevAttrMultiProcessorCount. Since the total number of blocks launched per device has to match across all devices, the maximum number of blocks that can be launched per device will be limited by the device with the least number of multiprocessors. The kernel cannot make use of CUDA dynamic parallelism. The ::cudaLaunchParams structure is defined as: `struct cudaLaunchParams { void func; dim3 gridDim; dim3 blockDim; void *args; size_t sharedMem; cudaStream_t stream; };` where: - ::cudaLaunchParams::func specifies the kernel to be launched. This same functions must be launched on all devices. For templated functions, pass the function symbol as follows: func_name - ::cudaLaunchParams::gridDim specifies the width, height and depth of the grid in blocks. This must match across all kernels launched. - ::cudaLaunchParams::blockDim is the width, height and depth of each thread block. This must match across all kernels launched. - ::cudaLaunchParams::args specifies the arguments to the kernel. If the kernel has N parameters then ::cudaLaunchParams::args should point to array of N pointers. Each pointer, from `::cudaLaunchParams::args[0]` to `::cudaLaunchParams::args[N - 1]`, point to the region of memory from which the actual parameter will be copied. - ::cudaLaunchParams::sharedMem is the dynamic shared-memory size per thread block in bytes. This must match across all kernels launched. - ::cudaLaunchParams::stream is the handle to the stream to perform the launch in. This cannot be the NULL stream or ::cudaStreamLegacy or ::cudaStreamPerThread. By default, the kernel won't begin execution on any GPU until all prior work in all the specified streams has completed. This behavior can be overridden by specifying the flag ::cudaCooperativeLaunchMultiDeviceNoPreSync. When this flag is specified, each kernel will only wait for prior work in the stream corresponding to that GPU to complete before it begins execution. Similarly, by default, any subsequent work pushed in any of the specified streams will not begin execution until the kernels on all GPUs have completed. This behavior can be overridden by specifying the flag ::cudaCooperativeLaunchMultiDeviceNoPostSync. When this flag is specified, any subsequent work pushed in any of the specified streams will only wait for the kernel launched on the GPU corresponding to that stream to complete before it begins execution.

Skip navigation links

Prev
Next

All Classes

Copyright © 2025. All rights reserved.