Skip to content
Snippets Groups Projects

Consolidate codegen and JIT modules.

Merged Frederik Hennig requested to merge fhennig/codegen-module into v2.0-dev
All threads resolved!
7 files
+ 128
5
Compare changes
  • Side-by-side
  • Inline
Files
7
+ 105
0
---
jupytext:
formats: md:myst
text_representation:
extension: .md
format_name: myst
format_version: 0.13
jupytext_version: 1.16.4
kernelspec:
display_name: Python 3 (ipykernel)
language: python
name: python3
mystnb:
execution_mode: cache
---
```{code-cell} ipython3
:tags: [remove-cell]
import sympy as sp
import pystencils as ps
import numpy as np
import matplotlib.pyplot as plt
```
(guide_gpukernels)=
# CUDA Code Generation for GPUs
Pystencils offers code generation for Nvidia GPUs using the CUDA programming model,
as well as just-in-time compilation and execution of CUDA kernels from within Python
based on the [cupy] library.w
This section's objective is to give a detailed introduction into the creation of
GPU kernels with pystencils.
## Generate, Compile and Run CUDA Kernels
In order to obtain a CUDA implementation of a symbolic kernel, naught more is required
than setting the {any}`target <CreateKernelConfig.target>` code generator option to
{any}`Target.CUDA`:
```{code-cell} ipython3
f, g = ps.fields("f, g: float64[3D]")
update = ps.Assignment(f.center(), 2 * g.center())
cfg = ps.CreateKernelConfig(target=ps.Target.CUDA)
kernel = ps.create_kernel(update, cfg)
ps.show_code(kernel)
```
The `kernel` object returned by the code generator in above snippet is an instance
of the {py:class}`GpuKernelFunction` class.
It extends {py:class}`KernelFunction` with some GPU-specific information.
In particular, it defines the {any}`threads_range <GpuKernelFunction.threads_range>`
property, which tells us how many threads the kernel is expecting to be executed with:
```{code-cell} ipython3
kernel.threads_range
```
If a GPU is available and [cupy] is installed in the current environment,
the kernel can be compiled and run immediately.
To execute the kernel, a {any}`cupy.ndarray` has to be passed for each field;
this is the GPU analogue to {any}`numpy.ndarray`:
```{code-cell} ipython3
:tags: [raises-exception]
import cupy as cp
rng = cp.random.default_rng(seed=42)
f_arr = rng.random((16, 16, 16))
g_arr = cp.zeros_like(f_arr)
kfunc = kernel.compile()
kfunc(f=f_arr, g=g_arr)
```
### Modifying the Launch Grid
The `kernel.compile()` invocation in the above code produces a {any}`CupyKernelWrapper` callable object.
Its interface allows us to customize the GPU launch grid.
## API Reference
```{eval-rst}
.. autosummary::
:toctree: autoapi
:nosignatures:
:template: autosummary/recursive_class.rst
pystencils.backend.kernelfunction.GpuKernelFunction
pystencils.backend.jit.gpu_cupy.CupyKernelWrapper
```
:::{admonition} To Do:
- GPU kernels in general: Selecting the CUDA target, compiling and running on cupy arrays
- Setting the launch grid
- Indexing options and iteration spaces
- Fast approximation functions
- Fp16 on GPU
:::
[cupy]: https://cupy.dev "CuPy Homepage"
Loading