Neon intrinsics
This MR implements neon intrinsics to enable vectorization for the ARM architecture.
This may also become useful once ARM HPC clusters actually get deployed, though these might end up using SVE instead of NEON. For that case, additional work is needed because SVE's vector width is determined at runtime.
Merge request reports
Activity
- Resolved by Michael Kuron
- Resolved by Michael Kuron
added 21 commits
-
a9c9cd00...74bb2c23 - 20 commits from branch
pycodegen:master
- 1ebc2749 - Merge branch 'master' into Neon_Intrinsics
-
a9c9cd00...74bb2c23 - 20 commits from branch
added 5 commits
-
b2f36017...facd3ab4 - 4 commits from branch
pycodegen:master
- bad62938 - Merge branch 'master' into Neon_Intrinsics
-
b2f36017...facd3ab4 - 4 commits from branch
Seems like that builtin only exists on clang. According to the ARM ISA manual, the instruction is
STNP
, and grepping through the gcc source code suggests that gcc does not know it. So you will need a wrapper function that resorts to a regular store, something like#if defined(__has_builtin) && __has_builtin(__builtin_nontemporal_store) __builtin_nontemporal_store(..., ...); #else ... #endif
I wouldn't want to make use of that. There is exactly the instruction we are searching for
However, as far as is see this is not available for SIMD types yet. Thus there is no intrinsic available. I think we should drop NT Stores for now until this is more accessible on the arm side.
I also saw that every store intrinsic is also available as a lane version. For example
vst1q_f64
andvst1q_lane_f64
. I didn`t see any difference in the implementation of those instruction so they are not interesting for us, right?enabled an automatic merge when the pipeline for 30514b76 succeeds
added 27 commits
-
30514b76...b2b2e912 - 26 commits from branch
pycodegen:master
- 28db0932 - Merge i10git.cs.fau.de:pycodegen/pystencils into Neon_Intrinsics
-
30514b76...b2b2e912 - 26 commits from branch
added 6 commits
-
28db0932...6effd8d3 - 2 commits from branch
pycodegen:master
- 069b11e7 - Merge https://i10git.cs.fau.de/pycodegen/pystencils into Neon_Intrinsics
- 1c7399a0 - Merge branch 'Neon_Intrinsics' of https://i9git.cs.fau.de/holzer/pystencils into Neon_Intrinsics
- 457ea9fe - Merge branch 'Neon_Intrinsics' of https://i10git.cs.fau.de/holzer/pystencils into Neon_Intrinsics
- 54fb9d89 - Merge https://i10git.cs.fau.de/pycodegen/pystencils into Neon_Intrinsics
Toggle commit list-
28db0932...6effd8d3 - 2 commits from branch
mentioned in commit f19bd423
mentioned in merge request !228 (merged)