SVE vectorization
To continue my vectorization spree, here is the version with ARM SVE instructions. Tested in QEMU 5.2. Compiler support is still a bit wonky -- GCC 10 generates very bloated assembly and produces incorrect code for non-native vector sizes, while Clang 11 misses some obvious optimizations.
Edited by Michael Kuron