Skip to content
Snippets Groups Projects

Neon intrinsics

Merged Markus Holzer requested to merge holzer/pystencils:Neon_Intrinsics into master
All threads resolved!

This MR implements neon intrinsics to enable vectorization for the ARM architecture.

This may also become useful once ARM HPC clusters actually get deployed, though these might end up using SVE instead of NEON. For that case, additional work is needed because SVE's vector width is determined at runtime.

Edited by Michael Kuron

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Michael Kuron
  • Markus Holzer added 21 commits

    added 21 commits

    Compare with previous version

  • Markus Holzer added 1 commit

    added 1 commit

    Compare with previous version

  • Michael Kuron resolved all threads

    resolved all threads

  • Looks good. Now we only need tests and ideally a CI runner (such as a cheap Rapberry Pi 4 8 GB).

  • Markus Holzer added 1 commit

    added 1 commit

    • b2f36017 - Removed unused and broken _mm_setzero

    Compare with previous version

  • Markus Holzer added 5 commits

    added 5 commits

    Compare with previous version

  • Markus Holzer added 1 commit

    added 1 commit

    • a57ddc7b - Set default alignment to 64 byte

    Compare with previous version

  • Michael Kuron approved this merge request

    approved this merge request

  • In the meantime, I have tested the Neon Intrinsics. Everything works just fine except for the non-temporal stores. Further, I still need to test sve.

  • SVE probably needs a bit more work because of its flexible vector width. Since none of us have hardware access, we could also remove it for now.

    What's the problem with the non-temporal stores?

  • I am not sure if I need to set additional flags but __builtin_nontemporal_store is just not known by the compiler.

    With SVE I would agree.

  • Seems like that builtin only exists on clang. According to the ARM ISA manual, the instruction is STNP, and grepping through the gcc source code suggests that gcc does not know it. So you will need a wrapper function that resorts to a regular store, something like

    #if defined(__has_builtin) && __has_builtin(__builtin_nontemporal_store)
        __builtin_nontemporal_store(..., ...);
    #else
        ...
    #endif
  • I wouldn't want to make use of that. There is exactly the instruction we are searching for

    https://developer.arm.com/documentation/100076/0100/a64-instruction-set-reference/a64-data-transfer-instructions/stnp

    However, as far as is see this is not available for SIMD types yet. Thus there is no intrinsic available. I think we should drop NT Stores for now until this is more accessible on the arm side.

    I also saw that every store intrinsic is also available as a lane version. For example vst1q_f64 and vst1q_lane_f64. I didn`t see any difference in the implementation of those instruction so they are not interesting for us, right?

  • The lane versions are for extracting individual elements from a vector. So NEON's vst1q_f32 is like SSE's _mm_store_ps, while vst1q_lane_f32 corresponds to _mm_extract_ps.

  • Markus Holzer added 1 commit

    added 1 commit

    • 30514b76 - Removed SVE and NT Stores for arm

    Compare with previous version

  • Markus Holzer marked this merge request as ready

    marked this merge request as ready

  • Ok, this makes sense. I have now removed SVE and NT Stores.

  • Markus Holzer enabled an automatic merge when the pipeline for 30514b76 succeeds

    enabled an automatic merge when the pipeline for 30514b76 succeeds

  • Markus Holzer canceled the automatic merge

    canceled the automatic merge

  • Markus Holzer added 27 commits

    added 27 commits

    • 30514b76...b2b2e912 - 26 commits from branch pycodegen:master
    • 28db0932 - Merge i10git.cs.fau.de:pycodegen/pystencils into Neon_Intrinsics

    Compare with previous version

  • Markus Holzer added 6 commits

    added 6 commits

    Compare with previous version

  • Markus Holzer added 1 commit

    added 1 commit

    Compare with previous version

  • Michael Kuron changed the description

    changed the description

  • merged

  • Michael Kuron mentioned in commit f19bd423

    mentioned in commit f19bd423

  • Michael Kuron mentioned in merge request !228 (merged)

    mentioned in merge request !228 (merged)

  • Please register or sign in to reply
    Loading