This MR introduces the first wave of vectorization infrastructure into the new backend.
Alongside this, several changes and additions are made to the AST, the symbol table, typification, constant folding, code printing, as well as the Target
API.
ast.vector
module for SIMD-related AST NodesPsVectorMemAcc
to ast.vector
, rename it to PsVecMemAcc
, and allow its stride to be an expressionPsVecBroadcast
for scalar-to-vector broadcastsCAstPrinter
into generic BasePrinter
and C-specific subclass CAstPrinter
IRAstPrinter
subclass of BasePrinter
to print the entire IR to pseudocode (including untyped stuff, the vector IR, and all non-C-constructs)PsAstNode.__str__
to call the IRAstPrinter
duplicate_symbol
to allow changing the duplicate's data typeget_new_symbol
to always receive a new symbol, even if the given name is already occupiedPsVecBroadcast
and vector memory accessesEliminateConstants
to correctly process vector constants and vector typesEliminateConstants
to fold PsCast
s and PsVecBroadcast
s of constantsIntroduce the AstVectorizer
transformer, which takes a scalar IR subtree and transforms it into a SIMD version of itself,
along a given iteration axis.
At this point, the AstVectorizer
is capable of translating constants, symbols, arithmetic and math functions, type casts, and memory accesses with either lane-invariant or affine indices.
Vectorization and masking of conditionals and loops is future work.
Introduce the LoopVectorizer
, which internally uses the AstVectorizer
to transform single scalar loops into SIMD versions of themselves, with optional handling of trailing iterations.
MaterializeVectorIntrinsics
to SelectIntrinsics
GenericVectorCPU
to directly receive AST nodesTarget
from .enums
to .target
; deprecate .enums
AVX512_FP16
targetTarget
This MR provides basic infrastructure for vectorized code generation and a test suite for the basic functionality.
At this point, kernel vectorization is not yet part of the create_kernel
pipeline.
Only a limited set of intrinsics is so far implemented for x86 (e.g. gather/scatter, type casts, etc. are still missing);
and platforms for other hardware (ARM, RISC-V, PPC, ...) are missing alltogether.
Masked vectorization will also follow in the future.