Reductions on HIP
Reductions are so far only implemented for the CUDA GPU target (!438 (merged)!). They shall also be made available for the HIP platform.
Tasks
-
Implement resolution of per-thread and warp-level reductions in HipPlatform -
Implement handling of warp sizes for AMD GPUs
Prerequesites
It would be nice to have a CI task for testing pystencils on HIP + AMD Hardware, e.g. using the testcluster, before integrating this feature.