WIP: Cuda autotune
There are no changes yet
No changes between version 2 and version 2
This PR introduces two one changes:
One drawback: the test calls are only correct if input and output fields do not overlap (so no in-place kernels).
No changes between version 2 and version 2