small extension to the user guide

6b3f5288 · Frederik Hennig · 72793960 · 6b3f5288 · 6b3f5288
Commit 6b3f5288 authored 4 months ago by Frederik Hennig
--- a/docs/source/user_manual/gpu_kernels.md
+++ b/docs/source/user_manual/gpu_kernels.md
@@ -54,7 +54,19 @@ It extends {py:class}`Kernel` with some GPU-specific information.
 If a GPU is available and [CuPy][cupy] is installed in the current environment,
 the kernel can be compiled and run immediately.
-To execute the kernel, a {any}`cupy.ndarray` has to be passed for each field.
+To execute the kernel, a {any}`cupy.ndarray` has to be passed for each field:
+```{code-cell} ipython3
+:tags: [raises-exception]
+import cupy as cp
+rng = cp.random.default_rng(seed=42)
+f_arr = rng.random((16, 16, 16))
+g_arr = cp.zeros_like(f_arr)
+kfunc = kernel.compile()
+kfunc(f=f_arr, g=g_arr)
+```
 :::{note}
 [CuPy][cupy] is a Python library for numerical computations on GPU arrays,
@@ -69,18 +81,6 @@ and to allocate and manage the data these kernels can be executed on.
 For more information on CuPy, refer to [their documentation][cupy-docs].
 :::
-```{code-cell} ipython3
-:tags: [raises-exception]
-import cupy as cp
-rng = cp.random.default_rng(seed=42)
-f_arr = rng.random((16, 16, 16))
-g_arr = cp.zeros_like(f_arr)
-kfunc = kernel.compile()
-kfunc(f=f_arr, g=g_arr)
-```
 (indexing_and_launch_config)=
 ## Modify the Indexing Scheme and Launch Configuration
@@ -130,6 +130,25 @@ kfunc(f=f_arr, g=g_arr)
 In any case. pystencils will automatically compute the grid size from the shapes of the kernel's array arguments
 and the given thread block size.
+:::{attention}
+According to the way GPU architecture splits thread blocks into warps,
+pystencils will map the kernel's *fastest* spatial coordinate onto the `x` block and thread
+indices, the second-fastest to `y`, and the slowest coordiante to `z`.
+This can mean that, when using `cupy` arrays with the default memory layout
+(corresponding to the `"numpy"` field layout specifier),
+the *thread coordinates* and the *spatial coordinates*
+map to each other in *opposite order*; e.g.
+| Spatial Coordinate | Thread Index  |
+|--------------------|---------------|
+| `x` (slowest)      | `threadIdx.z` |
+| `y`                | `threadIdx.y` |
+| `z` (fastest)      | `threadIdx.x` |
+:::
 (manual_launch_grids)=
 ### Manual Launch Grids and Non-Cuboid Iteration Patterns

--- a/tests/kernelcreation/test_gpu.py
+++ b/tests/kernelcreation/test_gpu.py
@@ -67,11 +67,7 @@ def test_indexing_options(
                kernel.launch_config.grid_size = (32, 16, 1)
    elif indexing_scheme == "linear3d":
-        kernel.launch_config.block_size = (
+        kernel.launch_config.block_size = (10, 8, 8)
-            10,
-            8,
-            8,
-        )  # must fit the src_arr shape (without ghost layers)
    kernel(src=src_arr, dst=dst_arr)