@holzer pointed out that SVE has nontemporal stores, which I overlooked when I implemented SVE support three years ago. So there are actually eight different kinds of stores we have to support (store, stream, mask-store, mask-stream, scatter, stream-scatter, mask-scatter, mask-stream-scatter).
(Mask-)stream-scatter requires SVE2; adding support for that introduced changes to a number of unrelated files. SVE2 is a superset of SVE, so the automatic detection won‘t return SVE if SVE2 is also available.
The added test coverage revealed a few bugs on master that I also fixed:
maskStoreS
on RISC-V-V. This meant that maskStoreS
just wouldn't compile on RISC-V-V.maskStore
(via blendv
) on ARM Neon and POWER VSX was zeroing the masked-out elements when nontemporal mode was selected (i.e. which due to the lack of real nontemporal stores maps to an emulation via cacheline zeroing). This lead to incorrect results.maskStore
on POWER VSX was ignoring the mask for the last vector of each cacheline. This lead to even more incorrect results when nontemporal mode was selected.