Skip to content
Snippets Groups Projects

Optimization for GPU block size determination

Compare and
10 files
+ 426
80
Compare changes
  • Side-by-side
  • Inline

Files

+ 7
0
@@ -72,6 +72,13 @@ These depend on the type of the launch configuration:
@@ -72,6 +72,13 @@ These depend on the type of the launch configuration:
while the `AutomaticLaunchConfiguration` permits no modification and computes grid and block size directly from kernel
while the `AutomaticLaunchConfiguration` permits no modification and computes grid and block size directly from kernel
parameters,
parameters,
the `ManualLaunchConfiguration` requires the user to manually specifiy both grid and block size.
the `ManualLaunchConfiguration` requires the user to manually specifiy both grid and block size.
 
The `DynamicBlockSizeLaunchConfiguration` permits the user to set a block size which is used to then dynamically compute
 
the grid size. However, the actual block size being used may differ from the user specification. This may be due to
 
trimming operations between the original block size and the iteration space to avoid spawning unnecessarily large
 
blocks. In case the `GpuOptions.use_block_size_fitting` option is set, a block fitting algorithm adapts the original
 
block size such that it aligns with the warp size given by the hardware for improved performance. The algorithm
 
`GpuLaunchConfiguration.fit_block_size` incrementally increases the trimmed block size until it is rounded to a multiple
 
of the warp size while considering the maximum amount of threads per block.
The `evaluate` method can only be used from within a Python runtime environment.
The `evaluate` method can only be used from within a Python runtime environment.
When exporting pystencils CUDA kernels for external use in C++ projects,
When exporting pystencils CUDA kernels for external use in C++ projects,
Loading