Skip to content
Snippets Groups Projects

Optimization for GPU block size determination

Compare and
10 files
+ 423
80
Compare changes
  • Side-by-side
  • Inline

Files

+ 6
0
@@ -72,6 +72,12 @@ These depend on the type of the launch configuration:
while the `AutomaticLaunchConfiguration` permits no modification and computes grid and block size directly from kernel
parameters,
the `ManualLaunchConfiguration` requires the user to manually specifiy both grid and block size.
The `DynamicBlockSizeLaunchConfiguration` permits the user to set a block size which is used to then dynamically compute
the grid size. However, the actual block size being used may differ from the user specification. This may be due to
trimming operations between the original block size and the iteration space to avoid spawning unnecessarily large blocks.
Another factor affecting the actual block size is the need to have block sizes that match with the warp size given by the hardware
for improved performance. Here, the block size is rounded to a multiple of the warp size while considering the
maximum amount of threads per block.
The `evaluate` method can only be used from within a Python runtime environment.
When exporting pystencils CUDA kernels for external use in C++ projects,
Loading