AES-NI vectorization improvements
Compare changes
- Michael Kuron authored
+ 2
− 2
@@ -49,8 +49,8 @@ QUALIFIERS __m128 _my_cvtepu32_ps(const __m128i v)
@@ -49,8 +49,8 @@ QUALIFIERS __m128 _my_cvtepu32_ps(const __m128i v)
!30 (merged) didn't implement an SSE-vectorized _mm_cvtepu64_pd
equivalent because the stackoverflow solution didn't work. That turned out to be due to a bad optimization in GCC 5+ in fast-math mode. None of the other compilers (Clang, Intel, MSVC) have that issue, so we just disable fast-math for that function.
Also, we now use fused multiply-add if available.