Enhance AES-NI RNG
I had most of the code for the AES RNG on ARM lying around for years and finally took the time to make it work. All the tests in test_random.py pass on my Mac, on an emulated processor with SVE in QEMU, and on an x86 machine with AVX512-compatible CPU. The CI jobs for emulated architectures are currently failing due to issues with the CI runners, but that shouldn't hold up merging this pull request.
There are some more drive-by improvements in this pull request:
- I had to make a change to the ARM SME attributes as there seems to have been a change in Apple's compiler some time after !441 (merged). The same change came to Linux in https://i10git.cs.fau.de/pycodegen/pycodegen/-/merge_requests/17.
- the AVX version of the AES-NI RNG now takes fewer cycles due to a simpler transpose implementation
- ARM64 and PowerPC native CPU detection now works reliably
- speed up ARM emulation by setting
QEMU_CPU=pauth-impdef=on(thanks to https://lists.nongnu.org/archive/html/qemu-discuss/2022-04/msg00027.html) - retry failing multiarch jobs for higher reliability
- undef macros at the end of header files (see !480 (merged))
Edited by Michael Kuron