Fix output of warp-level kernel in reduction user guide
All threads resolved!
All threads resolved!
Compare changes
+ 25
− 24
@@ -88,8 +88,8 @@ create the kernel object via the {any}`create_kernel` function.
@@ -88,8 +88,8 @@ create the kernel object via the {any}`create_kernel` function.
For this example, we assume a kernel configuration for CPU platforms with no optimizations explicitly enabled.
@@ -111,14 +111,14 @@ but will be incorporated in the reduction computation.
@@ -111,14 +111,14 @@ but will be incorporated in the reduction computation.
Since our reduction result is a single scalar value, it is sufficient to set up an array comprising a singular value.
@@ -128,11 +128,11 @@ Similar to the CPU section, a base variant for NVIDIA GPUs without
@@ -128,11 +128,11 @@ Similar to the CPU section, a base variant for NVIDIA GPUs without
The steps for running the generated code on NVIDIA GPUs are identical but the fields and the write-back pointer
@@ -159,17 +159,17 @@ which are not supported yet.
@@ -159,17 +159,17 @@ which are not supported yet.
@@ -190,13 +190,14 @@ we employ a block fitting algorithm to obtain a block size that is also optimize
@@ -190,13 +190,14 @@ we employ a block fitting algorithm to obtain a block size that is also optimize