Draft: Fused-multiply-add vectorization
Merge request reports
Activity
Filter activity
assigned to @kuron
added 1 commit
- 8bebd100 - Fix FMA insertion failures in remainder loops and with RNG
added 1 commit
- 6e3a147c - don't convert plain negation into fused-multiple add
The current state works, but the visitor that inserts the FMAs into the AST is severely broken. For example, it loses field alignment information (which leads to
test_alignment_and_correct_ghost_layers
failing on AVX) and OpenMP number of threads. Nevertheless I can see that it improves the compressible D3Q19 TRT from 28 MLUPS to 40 MLUPS on a single core of my Apple M1.
Please register or sign in to reply