This PR needs maybe still needs some clean-up. However, it would be good to recieve already some feed-back.
What works:
What does not work:
mirror
and wrap
(apparently they have been removed from CUDA's API but are still present in pycuda. Now there's onlycudaBoundaryModeZero = 0
Zero boundary mode
cudaBoundaryModeClamp = 1
Clamp boundary mode
cudaBoundaryModeTrap = 2
Trap boundary mode
Wtf is trap boundary mode? Nothing is documented so we can only experiment.
What kind of works: