Reduction Support
-
Review changes -
-
Download -
Patches
-
Plain diff
This MR introduces reductions to pystencils for scalar data types and thus covers #55.
User interface
- Adds reduction assignment classes to
sympyextensions
module: AddReductionAssignment, SubReductionAssignment, MulReductionAssignment, MinReductionAssignment, MaxReductionAssignment
These can be used as follows:
import pystencils as ps
r = ps.TypedSymbol("r", "double")
x, y = ps.fields(f"x, y: double[3D]", layout="fzyx")
assign_dot_prod = ps.AddReductionAssignment(r, x.center() * y.center())
- Alternatívely, you can also make use of the
reduction_assignment
orreduction_assignment_from_str
functions:
from pystencils.sympyextensions import reduction_assignment, reduction_assignment_from_str
from pystencils.sympyextensions.reduction import ReductionOp
assign_dot_prod = reduction_assignment(r, ReductionOp.Add, x.center() * y.center())
assign_dot_prod = reduction_assignment_from_str(r, "+", x.center() * y.center())
Supported Backends
Generic CPUs
- Add reduction support for OpenMP
SIMD: SSE3, AVX2, AVX512
- Include a generated header file with horizontal operations performing a binary operation between a scalar variable and a SIMD vector. The SIMD vector is transformed to a scalar variable via reduction, and then the binary operation is applied to the other operand
CUDA
- Employ atomic reduction operations in all threads when the block size does not align with the warp size
- Optimization for alignment with warp size: perform warp-level reductions and only perform atomic operation on first thread of warp
- Include a header file with manual implementations for atomic operations that are not directly supported for floating point numbers: atomicMul, atomicMax, and atomicMin. These functions make use of a CAS mechanism.
Internal Changes
- Freeze handling for newly introduced
ReductionAssignment
nodes - Add
PsVecHorizontal
vectorization node for conducting a binary operation between a scalar symbol and an extraction of a vector value (obtained by performing a reduction within a vector lane) - Add dataclass
ReductionInfo
holding essential information about a reduction (i.e. reduction operation, initial value and the write-back pointer for exporting the reduction result) and create corresponding lookup table for symbols inKernelCreationContext
- Introduce
NumericLimitsFunctions
for initializing neutral elements for reductions making use of min/max operations - Adapt
Platform.select_function
such that it either returns anPsExpression
that replaces the function call or returns aPsExpression | tuple[tuple[PsStructuralNode, ...], PsAstNode]
holding aPsAstNode
that replaces the function call and tuple of structural nodes that are added before the replacement. The structural nodes allow adding preparatory code for the replacement, as needed for the warp-level reductions for GPU platforms - Add
ReductionFunctions.WriteBackToPtr
function that is replaced with platform-dependent code inPlatform.select_function
- Slightly adapt CPU/GPU Jit modules to support the handling of write-back pointers used for reductions
Edited by Richard Angersbach