Skip to content
Snippets Groups Projects
Commit 8af40ae4 authored by Frederik Hennig's avatar Frederik Hennig
Browse files

Clarify some doc comments; clarify launch grid specification

parent fa7860cd
No related branches found
No related tags found
1 merge request!430Jupyter Inspection Framework, Book Theme, and Initial Drafts for Codegen Reference Guides
Pipeline #70933 passed
......@@ -78,19 +78,26 @@ kfunc(f=f_arr, g=g_arr)
### Modifying the Launch Grid
The `kernel.compile()` invocation in the above code produces a {any}`CupyKernelWrapper` callable object.
Its interface allows us to customize the GPU launch grid.
We can manually set both the number of threads per block, and the number of blocks on the grid:
This object holds the kernel's launch grid configuration
(i.e. the number of thread blocks, and the number of threads per block.)
Pystencils specifies a default value for the block size and if possible,
the number of blocks is automatically inferred in order to cover the entire iteration space.
In addition, the wrapper's interface allows us to customize the GPU launch grid,
by manually setting both the number of threads per block, and the number of blocks on the grid:
```{code-cell} ipython3
kfunc.block_size = (16, 8, 8)
kfunc.num_blocks = (1, 2, 2)
```
In most cases, the number of blocks is automatically inferred from the block size
in order to cover the entire iteration space, so it does not need to be specified.
Setting a launch grid that is larger than the iteration space is also possible,
but will cause any threads working outside of the iteration bounds to idle.
For most kernels, setting only the `block_size` is sufficient since pystencils will
automatically compute the number of blocks;
for exceptions to this, see [](#manual_launch_grids).
If `num_blocks` is set manually and the launch grid thus specified is too small, only
a part of the iteration space will be traversed by the kernel;
similarily, if it is too large, it will cause any threads working outside of the iteration bounds to idle.
(manual_launch_grids)=
### Manual Launch Grids and Non-Cuboid Iteration Patterns
In some cases, it will be unavoidable to set the launch grid size manually;
......
......@@ -33,7 +33,12 @@ class _AUTO_TYPE:
AUTO = _AUTO_TYPE()
"""Special value that can be passed to some options for invoking automatic behaviour."""
"""Special value that can be passed to some options for invoking automatic behaviour.
Currently, these options permit `AUTO`:
- `ghost_layers <CreateKernelConfig.ghost_layers>`
"""
@dataclass
......
......@@ -87,7 +87,7 @@ class Target(Flag):
"""
GPU = CUDA
"""Alias for backward compatibility."""
"""Alias for `Target.CUDA`, for backward compatibility."""
SYCL = _GPU | _SYCL
"""SYCL kernel target.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment