Disallow OpenMP + blocking + cacheline-zero
Compare changes
+ 6
− 0
@@ -107,6 +107,12 @@ def create_kernel(assignments,
The loop over the blocks is OpenMP-collapsed, so blocks might be worked on simultaneously. If the innermost block size does not align with a cache line and non-temporal stores are enabled on architectures that only do cacheline-zeroing (!230 (merged)), threads would then erase each others' data. So we disallow the problematic combination.