Skip to content
Snippets Groups Projects
Commit 51c03215 authored by Frederik Hennig's avatar Frederik Hennig
Browse files

extend GPU section of composer guide. Fix and clarify some aspects in the API docs and user guide.

parent c8c1a548
Branches improved_comm
No related tags found
1 merge request!24Extend Support for CUDA and HIP kernel invocations
Pipeline #76138 failed
...@@ -40,6 +40,7 @@ Helper Methods and Builders ...@@ -40,6 +40,7 @@ Helper Methods and Builders
.. autoclass:: SfgFunctionSequencer .. autoclass:: SfgFunctionSequencer
:members: :members:
:inherited-members:
.. autoclass:: SfgNodeBuilder .. autoclass:: SfgNodeBuilder
:members: :members:
...@@ -50,6 +51,12 @@ Helper Methods and Builders ...@@ -50,6 +51,12 @@ Helper Methods and Builders
.. autoclass:: SfgSwitchBuilder .. autoclass:: SfgSwitchBuilder
:members: :members:
.. module:: pystencilssfg.composer.class_composer
.. autoclass:: SfgMethodSequencer
:members:
:inherited-members:
Context and Cursor Context and Cursor
================== ==================
......
...@@ -12,7 +12,7 @@ different configuration sources: ...@@ -12,7 +12,7 @@ different configuration sources:
the generator script to set some of its configuration options; see [Command-Line Options](#cmdline_options) the generator script to set some of its configuration options; see [Command-Line Options](#cmdline_options)
- **Project Configuration:** When embedded into a larger project, using a build system such as CMake, generator scripts - **Project Configuration:** When embedded into a larger project, using a build system such as CMake, generator scripts
may be configured globally within that project by the use of a *configuration module*. may be configured globally within that project by the use of a *configuration module*.
Settings specified inside that configuration module are always overridden by the former to configuration sources. Settings specified inside that configuration module are always overridden by the two other configuration sources listed above.
For details on configuration modules, refer to the guide on [Project and Build System Integration](#guide_project_integration). For details on configuration modules, refer to the guide on [Project and Build System Integration](#guide_project_integration).
(inline_config)= (inline_config)=
...@@ -60,14 +60,26 @@ set {any}`cfg.outer_namespace <SfgConfig.outer_namespace>`. ...@@ -60,14 +60,26 @@ set {any}`cfg.outer_namespace <SfgConfig.outer_namespace>`.
### Code Style and Formatting ### Code Style and Formatting
- Modify the values in the {any}`cfg.code_style <CodeStyle>` category to affect Pystencils-sfg gives you some options to affect its output code style.
certain formatting aspects of the generated code. These are controlled by the options in the {any}`cfg.code_style <CodeStyle>` category.
- To change, enforce, or disable auto-formatting of generated code through `clang-format`,
take a look at the {any}`cfg.clang_format <ClangFormatOptions>` category. Furthermore, pystencils-sfg uses `clang-format` to beautify generated code.
- Clang-format will, by default, sort `#include` statements alphabetically and separate The behaviour of the clang-format integration is managed by the
local and system header includes. the {any}`cfg.clang_format <ClangFormatOptions>` category,
To override this, you can set a custom sorting key for `#include` sorting via where you can set options to skip or enforce formatting,
{any}`cfg.code_style.includes_sorting_key <CodeStyle.includes_sorting_key>`. or change the formatter binary.
To set the code style used by `clang-format` either create a `.clang-format` file
in any of the parent folders of your generator script,
or modify the {any}`cfg.clang_format.code_style <ClangFormatOptions.code_style>` option.
:::{seealso}
[Clang-Format Style Options](https://clang.llvm.org/docs/ClangFormatStyleOptions.html)
:::
Clang-format will, by default, sort `#include` statements alphabetically and separate
local and system header includes.
To override this, you can set a custom sorting key for `#include` sorting via
{any}`cfg.code_style.includes_sorting_key <CodeStyle.includes_sorting_key>`.
(cmdline_options)= (cmdline_options)=
## Command-Line Options ## Command-Line Options
......
...@@ -283,7 +283,7 @@ The composer gives us access to the default kernel namespace (`<current_namespac ...@@ -283,7 +283,7 @@ The composer gives us access to the default kernel namespace (`<current_namespac
via `sfg.kernels`. via `sfg.kernels`.
To add a kernel, To add a kernel,
- either pass its assignments and the pystencils code generator configuration directly to {any}`kernels.reate() <KernelsAdder.create>`, - either pass its assignments and the pystencils code generator configuration directly to {any}`kernels.create() <KernelsAdder.create>`,
- or create the kernel separately through {any}`pystencils.create_kernel <pystencils.codegen.create_kernel>` and register it using - or create the kernel separately through {any}`pystencils.create_kernel <pystencils.codegen.create_kernel>` and register it using
{any}`kernels.add() <KernelsAdder.add>`. {any}`kernels.add() <KernelsAdder.add>`.
...@@ -344,9 +344,63 @@ cause them to be added to its signature. ...@@ -344,9 +344,63 @@ cause them to be added to its signature.
We don't want to expose this complexity, but instead hide it by using appropriate data structures. We don't want to expose this complexity, but instead hide it by using appropriate data structures.
The next section explains how that is achieved in pystencils-sfg. The next section explains how that is achieved in pystencils-sfg.
#### Invoking GPU Kernels #### Mapping Fields to Data Structures
Pystencils kernels operate on n-dimensional contiguous or strided arrays,
There exist many classes with diverse APIs modelling such arrays throughout the scientific
computing landscape, including [Kokkos Views][kokkos_view], [C++ std::mdspan][mdspan],
[SYCL buffers][sycl_buffer], and many framework-specific custom-built classes.
Using the protocols behind {any}`sfg.map_field <SfgBasicComposer.map_field>`,
it is possible to automatically emit code
that extracts the indexing information required by a kernel from any of these classes,
as long as a suitable API reflection is available.
:::{seealso}
[](#field_data_structure_reflection) for instructions on how to set up field API
reflection for a custom nd-array data structure.
:::
Pystencils-sfg natively provides field extraction for a number of C++ STL-classes,
such as `std::vector` and `std::span` (for 1D fields) and `std::mdspan`.
Import any of them from `pystencilssfg.lang.cpp.std` and create an instance for a given
field using `.from_field()`.
Then, inside the wrapper function, pass the symbolic field and its associated data structure to
{any}`sfg.map_field <SfgBasicComposer.map_field>`.
before calling the kernel:
```{code-cell} ipython3
import pystencils as ps
from pystencilssfg.lang.cpp import std
with SourceFileGenerator() as sfg:
# Create symbolic fields
f, g = ps.fields("f, g: double[1D]")
# Create data structure reflections
f_vec = std.vector.from_field(f)
g_span = std.span.from_field(g)
# Create the kernel
asm = ps.Assignment(f(0), g(0))
khandle = sfg.kernels.create(asm, "my_kernel")
# Create the wrapper function
sfg.function("call_my_kernel")(
sfg.map_field(f, f_vec),
sfg.map_field(g, g_span),
sfg.call(khandle)
)
```
## GPU Kernels
Pystencils also allows us to generate kernels for the CUDA and HIP GPU programming models. Pystencils also allows us to generate kernels for the CUDA and HIP GPU programming models.
This section describes how to generate GPU kernels through pystencils-sfg;
how to invoke them with various launch configurations,
and how GPU execution streams are reflected.
### Generate and Invoke CUDA and HIP Kernels
To generate a kernel targetting either of these, set the To generate a kernel targetting either of these, set the
{any}`target <pystencils.codegen.config.CreateKernelConfig.target>` {any}`target <pystencils.codegen.config.CreateKernelConfig.target>`
code generator option to either `Target.CUDA` or `Target.HIP`. code generator option to either `Target.CUDA` or `Target.HIP`.
...@@ -364,7 +418,7 @@ with SourceFileGenerator(sfg_config) as sfg: ...@@ -364,7 +418,7 @@ with SourceFileGenerator(sfg_config) as sfg:
cfg = ps.CreateKernelConfig(target=ps.Target.CUDA) cfg = ps.CreateKernelConfig(target=ps.Target.CUDA)
# Create fields, assemble assignments # Create fields, assemble assignments
f, g = ps.fields("f, g: double[2D]") f, g = ps.fields("f, g: double[128, 128]")
asm = ps.Assignment(f(0), g(0)) asm = ps.Assignment(f(0), g(0))
# Register kernel # Register kernel
...@@ -384,61 +438,130 @@ When investigating the generated `.cu` file, you can see that the GPU launch con ...@@ -384,61 +438,130 @@ When investigating the generated `.cu` file, you can see that the GPU launch con
This behavior can be changed by modifying options in the {any}`gpu <pystencils.codegen.config.GpuOptions>` This behavior can be changed by modifying options in the {any}`gpu <pystencils.codegen.config.GpuOptions>`
category of the `CreateKernelConfig`. category of the `CreateKernelConfig`.
#### Mapping Fields to Data Structures ### Adapting the Launch Configuration
Pystencils kernels operate on n-dimensional contiguous or strided arrays, GPU kernel invocations usually require the user to provide a launch grid, defined
There exist many classes with diverse APIs modelling such arrays throughout the scientific by the GPU thread block size and the number of blocks on the grid.
computing landscape, including [Kokkos Views][kokkos_view], [C++ std::mdspan][mdspan], In the simplest case (seen above), pystencils-sfg will emit code that automatically
[SYCL buffers][sycl_buffer], and many framework-specific custom-built classes. computes these parameters from the size of the arrays passed to the kernel,
Using the protocols behind {any}`sfg.map_field <SfgBasicComposer.map_field>`, using a default block size defined by pystencils.
it is possible to automatically emit code
that extracts the indexing information required by a kernel from any of these classes,
as long as a suitable API reflection is available.
:::{seealso} The code generator also permits customization of the launch configuration.
[](#field_data_structure_reflection) for instructions on how to set up field API You may provide a custom block size to override the default, in which case the
reflection for a custom nd-array data structure. grid size will still be computed by dividing the array sizes by your block size.
::: Otherwise, you can also fully take over control of both block and grid size.
For both cases, instructions are given in the following.
Pystencils-sfg natively provides field extraction for a number of C++ STL-classes, #### User-Defined Block Size for Auto-Computed Grid Size
such as `std::vector` and `std::span` (for 1D fields) and `std::mdspan`.
Import any of them from `pystencilssfg.lang.cpp.std` and create an instance for a given To merely modify the block size argument while still automatically inferring the grid size,
field using `.from_field()`. pass a variable or expression of type `dim3` to the `block_size` parameter of `gpu_invoke`.
Then, inside the wrapper function, pass the symbolic field and its associated data structure to Pystencils-sfg exposes two versions of `dim3`, which differ primarily in their associated
{any}`sfg.map_field <SfgBasicComposer.map_field>`. runtime headers:
before calling the kernel:
- {any}`pystencilssfg.lang.gpu.cuda.dim3 <CudaAPI.dim3>` for CUDA, and
- {any}`pystencilssfg.lang.gpu.hip.dim3 <HipAPI.dim3>` for HIP.
The following snippet selects the correct `dim3` type according to the kernel target;
it then creates a variable of that type and turns that into an argument to the kernel invocation:
```{code-cell} ipython3 ```{code-cell} ipython3
import pystencils as ps :tags: [remove-cell]
from pystencilssfg.lang.cpp import std target = ps.Target.HIP
cfg = ps.CreateKernelConfig(target=target)
f, g = ps.fields("f, g: double[128, 128]")
asm = ps.Assignment(f(0), g(0))
```
with SourceFileGenerator() as sfg: ```{code-cell} ipython3
# Create symbolic fields from pystencilssfg.lang.gpu import hip
f, g = ps.fields("f, g: double[1D]")
# Create data structure reflections with SourceFileGenerator(sfg_config) as sfg:
f_vec = std.vector.from_field(f) # ... define kernel ...
g_span = std.span.from_field(g) khandle = sfg.kernels.create(asm, "gpu_kernel", cfg)
# Create the kernel # Select dim3 reflection
asm = ps.Assignment(f(0), g(0)) match target:
khandle = sfg.kernels.create(asm, "my_kernel") case ps.Target.CUDA:
from pystencilssfg.lang.gpu import cuda as gpu_api
case ps.Target.HIP:
from pystencilssfg.lang.gpu import hip as gpu_api
# Create dim3 variable and pass it to kernel invocation
block_size = gpu_api.dim3(const=True).var("block_size")
# Create the wrapper function sfg.function("kernel_wrapper")(
sfg.function("call_my_kernel")( sfg.gpu_invoke(khandle, block_size=block_size)
sfg.map_field(f, f_vec),
sfg.map_field(g, g_span),
sfg.call(khandle)
) )
``` ```
(exposed_inline_kernels)= #### Manual Launch Configurations
### Exposed and Inline Kernels
To take full control of the launch configuration, we must disable its automatic inferrence
by setting the {any}`gpu.manual_launch_grid <pystencils.codegen.config.GpuOptions.manual_launch_grid>`
code generator option to `True`.
Then, we must pass `dim3` arguments for both `block_size` and `grid_size` to the kernel invocation:
```{code-cell} ipython3
from pystencilssfg.lang.gpu import hip
with SourceFileGenerator(sfg_config) as sfg:
# ... define kernel ...
# Configure for manual launch config
cfg = ps.CreateKernelConfig(target=ps.Target.CUDA)
cfg.gpu.manual_launch_grid = True
# Register kernel
khandle = sfg.kernels.create(asm, "gpu_kernel", cfg)
# Create dim3 variables
from pystencilssfg.lang.gpu import cuda
block_size = cuda.dim3(const=True).var("block_size")
grid_size = cuda.dim3(const=True).var("grid_size")
sfg.function("kernel_wrapper")(
sfg.gpu_invoke(khandle, block_size=block_size, grid_size=grid_size)
)
```
### Using Streams
CUDA and HIP kernels can be enqueued into streams for concurrent execution.
This is mirrored in pystencils-sfg;
all overloads of `gpu_invoke` take an optional `stream` argument.
The `stream_t` data types of both CUDA and HIP are made available
through the respective API reflections:
- {any}`lang.gpu.cuda.stream_t <CudaAPI.stream_t>` reflects `cudaStream_t`, and
- {any}`lang.gpu.hip.stream_t <HipAPI.stream_t>` reflects `hipStream_t`.
Here is an example that creates a variable of the HIP stream type
and passes it to `gpu_invoke`:
```{code-cell} ipython3
:tags: [remove-cell]
cfg = ps.CreateKernelConfig(target=ps.Target.HIP)
f, g = ps.fields("f, g: double[128, 128]")
asm = ps.Assignment(f(0), g(0))
```
```{code-cell} ipython3
from pystencilssfg.lang.gpu import hip
with SourceFileGenerator(sfg_config) as sfg:
# ... define kernel ...
khandle = sfg.kernels.create(asm, "gpu_kernel", cfg)
stream = hip.stream_t(const=True).var("stream")
sfg.function("kernel_wrapper")(
sfg.gpu_invoke(khandle, stream=stream)
)
```
:::{admonition} To Do :::{admonition} To Do
- Modifying GPU kernel launch configs
- GPU API Reflections
- Defining classes, their fields constructors, and methods - Defining classes, their fields constructors, and methods
::: :::
......
...@@ -56,7 +56,7 @@ class CudaAPI(ProvidesGpuRuntimeAPI): ...@@ -56,7 +56,7 @@ class CudaAPI(ProvidesGpuRuntimeAPI):
cuda = CudaAPI cuda = CudaAPI
"""Reflection of the CUDA runtime API""" """Alias for `CudaAPI`"""
class HipAPI(ProvidesGpuRuntimeAPI): class HipAPI(ProvidesGpuRuntimeAPI):
...@@ -72,4 +72,4 @@ class HipAPI(ProvidesGpuRuntimeAPI): ...@@ -72,4 +72,4 @@ class HipAPI(ProvidesGpuRuntimeAPI):
hip = HipAPI hip = HipAPI
"""Reflection of the HIP runtime API""" """Alias for `HipAPI`"""
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment