Skip to content
Snippets Groups Projects

AVX512VL and AVX10 support

Merged Michael Kuron requested to merge avx10 into master

AVX512VL is the 256-bit version of all the AVX512F instructions. It is primarily useful on those processors that only have one AVX512 vector unit and drastically reduce their clock frequency when executing 512-bit instructions. For purposes of pystencils, this mostly means scatter/gather support (up to 30% improvements as per !241 (merged)) and no reduced clock frequencies (up to 45% improvements on Xeon Bronze 31xx/32xx, Silver 41xx/42xx, Gold 51xx/52xx). I suppose we never bothered implementing it because it offers no advantage on Xeon Gold 61xx/62xx and Platinum with their two AVX512 units, and not on newer x3xx and x4xx (or the Ice Lake/Tiger Lake/Rocket Lake desktop/laptop processors) which don't clock down anymore.

Many of Intel's future processors, however, will be using AVX10-256 instead of AVX512, which is, in a sense, half way between AVX2 and AVX512. AVX10.1-128 and AVX10.1-256 are essentially a rebranded AVX512VL that can be enabled without AVX512F. This just needs a few changed ifdefs and awareness of the CPU detection. The /proc/cpuinfo flag is just a guess, but a very likely one.

AVX10.1-512 is the same as AVX512F and enabling one always enables the other. I've adapted the ifdefs nonetheless just in case.

All information is based on https://www.phoronix.com/news/GCC-Lands-Initial-AVX10.1.

Edited by Michael Kuron

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
Please register or sign in to reply