Fix width-one iteration slices on GPU
Files
2@@ -66,7 +66,8 @@ def create_cuda_kernel(assignments: NodeCollection, config: CreateKernelConfig):
When iteration_slice
was specified, the GPU code generator used to handle integer slice components wrongly, setting start == stop
instead of stop == start + 1
. This MR fixes that and provides tests.
Also, this reveals a more fundamental problem with iteration slices on GPU; see #103