WIP: Cuda autotune
Files
3+ 11
− 1
@@ -175,9 +175,15 @@ def read_config():
@@ -219,6 +225,10 @@ def get_cache_config():
This PR introduces two one changes:
One drawback: the test calls are only correct if input and output fields do not overlap (so no in-place kernels).