Frederik Hennig
--- a/docs/source/reference/gpu_kernels.md 0 → 100644

+ 105

− 0
+++ b/docs/source/reference/gpu_kernels.md 0 → 100644

+ 105

− 0
+---
+jupytext:
+  formats: md:myst
+  text_representation:
+    extension: .md
+    format_name: myst
+    format_version: 0.13
+    jupytext_version: 1.16.4
+kernelspec:
+  display_name: Python 3 (ipykernel)
+  language: python
+  name: python3
+mystnb:
+  execution_mode: cache
+---
+
+```{code-cell} ipython3
+:tags: [remove-cell]
+
+import sympy as sp
+import pystencils as ps
+import numpy as np
+import matplotlib.pyplot as plt
+```
+
+(guide_gpukernels)=
+# CUDA Code Generation for GPUs
+
+Pystencils offers code generation for Nvidia GPUs using the CUDA programming model,
+as well as just-in-time compilation and execution of CUDA kernels from within Python
+based on the [cupy] library.w
+This section's objective is to give a detailed introduction into the creation of
+GPU kernels with pystencils.
+
+## Generate, Compile and Run CUDA Kernels
+
+In order to obtain a CUDA implementation of a symbolic kernel, naught more is required
+than setting the {any}`target <CreateKernelConfig.target>` code generator option to
+{any}`Target.CUDA`:
+
+```{code-cell} ipython3
+f, g = ps.fields("f, g: float64[3D]")
+update = ps.Assignment(f.center(), 2 * g.center())
+
+cfg = ps.CreateKernelConfig(target=ps.Target.CUDA)
+kernel = ps.create_kernel(update, cfg)
+
+ps.show_code(kernel)
+```
+
+The `kernel` object returned by the code generator in above snippet is an instance
+of the {py:class}`GpuKernelFunction` class.
+It extends {py:class}`KernelFunction` with some GPU-specific information.
+In particular, it defines the {any}`threads_range <GpuKernelFunction.threads_range>`
+property, which tells us how many threads the kernel is expecting to be executed with:
+
+```{code-cell} ipython3
+kernel.threads_range
+```
+
+If a GPU is available and [cupy] is installed in the current environment,
+the kernel can be compiled and run immediately.
+To execute the kernel, a {any}`cupy.ndarray` has to be passed for each field;
+this is the GPU analogue to {any}`numpy.ndarray`:
+
+```{code-cell} ipython3
+:tags: [raises-exception]
+import cupy as cp
+
+rng = cp.random.default_rng(seed=42)
+f_arr = rng.random((16, 16, 16))
+g_arr = cp.zeros_like(f_arr)
+
+kfunc = kernel.compile()
+kfunc(f=f_arr, g=g_arr)
+```
+
+### Modifying the Launch Grid
+
+The `kernel.compile()` invocation in the above code produces a {any}`CupyKernelWrapper` callable object.
+Its interface allows us to customize the GPU launch grid.
+
+## API Reference
+
+```{eval-rst}
+.. autosummary::
+  :toctree: autoapi
+  :nosignatures:
+  :template: autosummary/recursive_class.rst
+
+  pystencils.backend.kernelfunction.GpuKernelFunction
+  pystencils.backend.jit.gpu_cupy.CupyKernelWrapper
+```
+
+:::{admonition} To Do:
+
+- GPU kernels in general: Selecting the CUDA target, compiling and running on cupy arrays
+- Setting the launch grid
+- Indexing options and iteration spaces
+- Fast approximation functions
+- Fp16 on GPU
+:::
+
+
+[cupy]: https://cupy.dev "CuPy Homepage"