Fix width-one iteration slices on GPU
When iteration_slice
was specified, the GPU code generator used to handle integer slice components wrongly, setting start == stop
instead of stop == start + 1
. This MR fixes that and provides tests.
Also, this reveals a more fundamental problem with iteration slices on GPU; see #103