Skip to content
Snippets Groups Projects
Commit 8af40ae4 authored by Frederik Hennig's avatar Frederik Hennig
Browse files

Clarify some doc comments; clarify launch grid specification

parent fa7860cd
No related branches found
No related tags found
1 merge request!430Jupyter Inspection Framework, Book Theme, and Initial Drafts for Codegen Reference Guides
Pipeline #70933 passed
...@@ -78,19 +78,26 @@ kfunc(f=f_arr, g=g_arr) ...@@ -78,19 +78,26 @@ kfunc(f=f_arr, g=g_arr)
### Modifying the Launch Grid ### Modifying the Launch Grid
The `kernel.compile()` invocation in the above code produces a {any}`CupyKernelWrapper` callable object. The `kernel.compile()` invocation in the above code produces a {any}`CupyKernelWrapper` callable object.
Its interface allows us to customize the GPU launch grid. This object holds the kernel's launch grid configuration
We can manually set both the number of threads per block, and the number of blocks on the grid: (i.e. the number of thread blocks, and the number of threads per block.)
Pystencils specifies a default value for the block size and if possible,
the number of blocks is automatically inferred in order to cover the entire iteration space.
In addition, the wrapper's interface allows us to customize the GPU launch grid,
by manually setting both the number of threads per block, and the number of blocks on the grid:
```{code-cell} ipython3 ```{code-cell} ipython3
kfunc.block_size = (16, 8, 8) kfunc.block_size = (16, 8, 8)
kfunc.num_blocks = (1, 2, 2) kfunc.num_blocks = (1, 2, 2)
``` ```
In most cases, the number of blocks is automatically inferred from the block size For most kernels, setting only the `block_size` is sufficient since pystencils will
in order to cover the entire iteration space, so it does not need to be specified. automatically compute the number of blocks;
Setting a launch grid that is larger than the iteration space is also possible, for exceptions to this, see [](#manual_launch_grids).
but will cause any threads working outside of the iteration bounds to idle. If `num_blocks` is set manually and the launch grid thus specified is too small, only
a part of the iteration space will be traversed by the kernel;
similarily, if it is too large, it will cause any threads working outside of the iteration bounds to idle.
(manual_launch_grids)=
### Manual Launch Grids and Non-Cuboid Iteration Patterns ### Manual Launch Grids and Non-Cuboid Iteration Patterns
In some cases, it will be unavoidable to set the launch grid size manually; In some cases, it will be unavoidable to set the launch grid size manually;
......
...@@ -33,7 +33,12 @@ class _AUTO_TYPE: ...@@ -33,7 +33,12 @@ class _AUTO_TYPE:
AUTO = _AUTO_TYPE() AUTO = _AUTO_TYPE()
"""Special value that can be passed to some options for invoking automatic behaviour.""" """Special value that can be passed to some options for invoking automatic behaviour.
Currently, these options permit `AUTO`:
- `ghost_layers <CreateKernelConfig.ghost_layers>`
"""
@dataclass @dataclass
......
...@@ -87,7 +87,7 @@ class Target(Flag): ...@@ -87,7 +87,7 @@ class Target(Flag):
""" """
GPU = CUDA GPU = CUDA
"""Alias for backward compatibility.""" """Alias for `Target.CUDA`, for backward compatibility."""
SYCL = _GPU | _SYCL SYCL = _GPU | _SYCL
"""SYCL kernel target. """SYCL kernel target.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment