It already worked on the CPU, just needed to remove the check. On the GPU, we use itertools.product to create the nested loop needed.
itertools.product