This MR optimizes GPU block sizes such that these are always multiples of the hardware's warp (CUDA) or wavefront (HIP) size.
Summarized, this MR
GpuOptions.omit_range_check
GpuOptions.block_size
GpuOptions.warp_size
and implements function for determining default valuesassume_warp_aligned_block_size
, ensuring the compiler that block sizes match with warp sizefit_block_size
and trim_block_size
member functions to DynamicBlockSizeLaunchConfiguration
for computing block sizes based on a user-defined initial block size and the iteration space