Richard Angersbach · f65cd9bf · a3fd0c24 · 3ad5b4a4 · 403cbbcf · b400aabf
--- a/docs/source/backend/gpu_codegen.md

+ 7

− 0

View file @ 16665631

Open in Web IDE
+++ b/docs/source/backend/gpu_codegen.md

+ 7

− 0

View file @ 16665631

Open in Web IDE
 @@ -72,6 +72,13 @@ These depend on the type of the launch configuration:
 @@ -72,6 +72,13 @@ These depend on the type of the launch configuration:
 while the `AutomaticLaunchConfiguration` permits no modification and computes grid and block size directly from kernel
 parameters,
 the `ManualLaunchConfiguration` requires the user to manually specifiy both grid and block size.
+The `DynamicBlockSizeLaunchConfiguration` permits the user to set a block size which is used to then dynamically compute
+the grid size. However, the actual block size being used may differ from the user specification. This may be due to
+trimming operations between the original block size and the iteration space to avoid spawning unnecessarily large 
+blocks. In case the `GpuOptions.use_block_size_fitting` option is set, a block fitting algorithm adapts the original 
+block size such that it aligns with the warp size given by the hardware for improved performance. The algorithm 
+`GpuLaunchConfiguration.fit_block_size` incrementally increases the trimmed block size until it is rounded to a multiple 
+of the warp size while considering the maximum amount of threads per block.
 The `evaluate` method can only be used from within a Python runtime environment.
 When exporting pystencils CUDA kernels for external use in C++ projects,