Optimization for GPU block size determination
Compare changes
@@ -166,20 +166,16 @@ class CudaPlatform(GenericGpu):
@@ -282,19 +278,12 @@ class CudaPlatform(GenericGpu):
@@ -324,12 +313,6 @@ class CudaPlatform(GenericGpu):
This MR optimizes GPU block sizes such that these are always multiples of the hardware's warp (CUDA) or wavefront (HIP) size.
Summarized, this MR
GpuOptions.omit_range_check
GpuOptions.block_size
GpuOptions.warp_size
and implements function for determining default valuesassume_warp_aligned_block_size
, ensuring the compiler that block sizes match with warp sizefit_block_size
and trim_block_size
member functions to DynamicBlockSizeLaunchConfiguration
for computing block sizes based on a user-defined initial block size and the iteration space