WIP: Cuda autotune
There are no changes yet
No changes between version 6 and version 5
This PR introduces two one changes:
One drawback: the test calls are only correct if input and output fields do not overlap (so no in-place kernels).
No changes between version 6 and version 5