Richard Angersbach · Richard Angersbach · f65cd9bf · a3fd0c24 · 3ad5b4a4 · 403cbbcf
--- a/docs/source/backend/gpu_codegen.md

+ 6

− 0

View file @ c427c469

Open in Web IDE
+++ b/docs/source/backend/gpu_codegen.md

+ 6

− 0

View file @ c427c469

Open in Web IDE
 @@ -72,6 +72,12 @@ These depend on the type of the launch configuration:
 while the `AutomaticLaunchConfiguration` permits no modification and computes grid and block size directly from kernel
 parameters,
 the `ManualLaunchConfiguration` requires the user to manually specifiy both grid and block size.
+The `DynamicBlockSizeLaunchConfiguration` permits the user to set a block size which is used to then dynamically compute
+the grid size. However, the actual block size being used may differ from the user specification. This may be due to
+trimming operations between the original block size and the iteration space to avoid spawning unnecessarily large blocks.
+Another factor affecting the actual block size is the need to have block sizes that match with the warp size given by the hardware 
+for improved performance. Here, the block size is rounded to a multiple of the warp size while considering the
+maximum amount of threads per block.

 The `evaluate` method can only be used from within a Python runtime environment.
 When exporting pystencils CUDA kernels for external use in C++ projects,