Skip to content
Snippets Groups Projects

Reduction Support

Open Richard Angersbach requested to merge rangersbach/reductions into v2.0-dev

This MR introduces reductions to pystencils for scalar data types and thus covers #55.

User interface

  • Adds reduction assignment classes to sympyextensions module: AddReductionAssignment, SubReductionAssignment, MulReductionAssignment, MinReductionAssignment, MaxReductionAssignment

These can be used as follows:

import pystencils as ps

r = ps.TypedSymbol("r", "double")
x, y = ps.fields(f"x, y: double[3D]", layout="fzyx")

assign_dot_prod = ps.AddReductionAssignment(r, x.center() * y.center())
  • Alternatívely, you can also make use of the reduction_assignment or reduction_assignment_from_str functions:
from pystencils.sympyextensions import reduction_assignment, reduction_assignment_from_str
from pystencils.sympyextensions.reduction import ReductionOp

assign_dot_prod = reduction_assignment(r, ReductionOp.Add, x.center() * y.center())

assign_dot_prod = reduction_assignment_from_str(r, "+", x.center() * y.center())

Supported Backends

Generic CPUs

  • Add reduction support for OpenMP

SIMD: SSE3, AVX2, AVX512

  • Include a generated header file with horizontal operations performing a binary operation between a scalar variable and a SIMD vector. The SIMD vector is transformed to a scalar variable via reduction, and then the binary operation is applied to the other operand

CUDA

  • Employ atomic reduction operations in all threads when the block size does not align with the warp size
  • Optimization for alignment with warp size: perform warp-level reductions and only perform atomic operation on first thread of warp
  • Include a header file with manual implementations for atomic operations that are not directly supported for floating point numbers: atomicMul, atomicMax, and atomicMin. These functions make use of a CAS mechanism.

Internal Changes

  • Freeze handling for newly introduced ReductionAssignment nodes
  • Add PsVecHorizontal vectorization node for conducting a binary operation between a scalar symbol and an extraction of a vector value (obtained by performing a reduction within a vector lane)
  • Add dataclass ReductionInfo holding essential information about a reduction (i.e. reduction operation, initial value and the write-back pointer for exporting the reduction result) and create corresponding lookup table for symbols in KernelCreationContext
  • Introduce NumericLimitsFunctions for initializing neutral elements for reductions making use of min/max operations
  • Adapt Platform.select_function such that it either returns an PsExpression that replaces the function call or returns a PsExpression | tuple[tuple[PsStructuralNode, ...], PsAstNode] holding a PsAstNode that replaces the function call and tuple of structural nodes that are added before the replacement. The structural nodes allow adding preparatory code for the replacement, as needed for the warp-level reductions for GPU platforms
  • Add ReductionFunctions.WriteBackToPtr function that is replaced with platform-dependent code in Platform.select_function
  • Slightly adapt CPU/GPU Jit modules to support the handling of write-back pointers used for reductions
Edited by Richard Angersbach

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
Please register or sign in to reply