Neon intrinsics
This MR implements neon intrinsics to enable vectorization for the ARM architecture.
This may also become useful once ARM HPC clusters actually get deployed, though these might end up using SVE instead of NEON. For that case, additional work is needed because SVE's vector width is determined at runtime.
Edited by Michael Kuron