Set up a MWE:
git clone https://i10git.cs.fau.de/walberla/example_app.git
cd example_app/apps/example_app_codegen/
# copy .cpp, .py, .prm files, update list of generated files in CMakeLists.txt, chmod +x the python file and add shebang '#!/usr/bin/python3'
cd $(git rev-parse --show-toplevel)
mkdir build
cd build
Compile with Clang in debug mode with lbmpy 0.4.4:
VERSION=0.4.4 DEPS="/work/jgrad/walberla_deps" PYTHONPATH="${DEPS}/${VERSION}/lbmpy:${DEPS}/${VERSION}/pystencils:${DEPS}/devel/walberla/python/" CC=clang CXX=clang++ cmake .. -DWALBERLA_DIR=/work/jgrad/walberla_deps/devel/walberla -DWALBERLA_BUILD_WITH_CODEGEN=ON -DCMAKE_BUILD_TYPE=Debug
VERSION=0.4.4 DEPS="/work/jgrad/walberla_deps" PYTHONPATH="${DEPS}/${VERSION}/lbmpy:${DEPS}/${VERSION}/pystencils:${DEPS}/devel/walberla/python/" make -j$(nproc)
Then compile the AVX binary separately with:
(cd /work/jgrad/walberla_deps/devel/example_app/build/apps/example_app_codegen && /usr/bin/ccache /usr/bin/clang++ -DBOOST_ALL_NO_LIB -I/work/jgrad/walberla_deps/devel/example_app/build/walberla/src -I/work/jgrad/walberla_deps/devel/walberla/src -I/work/jgrad/walberla_deps/devel/example_app/build/apps/example_app_codegen/default_codegen -isystem /work/jgrad/walberla_deps/devel/example_app/src -isystem /work/jgrad/walberla_deps/devel/example_app/build/src -isystem /work/jgrad/walberla_deps/0.4.4/pystencils/pystencils/include -isystem /usr/lib/x86_64-linux-gnu/openmpi/include/openmpi -isystem /usr/lib/x86_64-linux-gnu/openmpi/include -Wall -Wconversion -Wshadow -Wno-c++11-extensions -Qunused-arguments -pthread -pthread -g -std=gnu++17 -DWALBERLA_BUILD_WITH_AVX -mavx2 -o CMakeFiles/ExampleAppCodegen.dir/ExampleAppAVX.cpp.o -c /work/jgrad/walberla_deps/devel/example_app/apps/example_app_codegen/ExampleApp.cpp)
(cd /work/jgrad/walberla_deps/devel/example_app/build/apps/example_app_codegen && /tikhome/jgrad/.local/lib/python3.8/site-packages/cmake/data/bin/cmake -E cmake_link_script CMakeFiles/ExampleAppCodegen.dir/link.txt --verbose=1
/usr/bin/clang++ -Wall -Wconversion -Wshadow -Wno-c++11-extensions -Qunused-arguments -pthread -pthread -g CMakeFiles/ExampleAppCodegen.dir/ExampleAppAVX.cpp.o -o ExampleAppCodegenAVX -Wl,-rpath,/usr/lib/x86_64-linux-gnu/openmpi/lib ../../walberla/src/blockforest/libblockforest.a ../../walberla/src/core/libcore.a ../../walberla/src/field/libfield.a ../../walberla/src/lbm/liblbm.a ../../walberla/src/geometry/libgeometry.a ../../walberla/src/timeloop/libtimeloop.a ../../walberla/src/gui/libgui.a libLatticeModelGenerated.a ../../walberla/src/domain_decomposition/libdomain_decomposition.a ../../walberla/src/vtk/libvtk.a ../../walberla/src/boundary/libboundary.a ../../walberla/src/blockforest/libblockforest.a ../../walberla/src/core/libcore.a ../../walberla/src/field/libfield.a ../../walberla/src/lbm/liblbm.a ../../walberla/src/geometry/libgeometry.a ../../walberla/src/timeloop/libtimeloop.a ../../walberla/src/gui/libgui.a libLatticeModelGenerated.a ../../walberla/src/domain_decomposition/libdomain_decomposition.a ../../walberla/src/vtk/libvtk.a ../../walberla/src/boundary/libboundary.a ../../walberla/src/blockforest/libblockforest.a ../../walberla/src/core/libcore.a ../../walberla/src/field/libfield.a ../../walberla/src/lbm/liblbm.a ../../walberla/src/geometry/libgeometry.a ../../walberla/src/timeloop/libtimeloop.a ../../walberla/src/gui/libgui.a libLatticeModelGenerated.a ../../walberla/src/domain_decomposition/libdomain_decomposition.a ../../walberla/src/vtk/libvtk.a ../../walberla/src/boundary/libboundary.a /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_cxx.so /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so /usr/lib/libpfft.so /usr/lib/x86_64-linux-gnu/libfftw3.so /usr/lib/x86_64-linux-gnu/libfftw3_mpi.so ../../walberla/extern/lodepng/liblodepng.a /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_cxx.so /usr/lib/libpfft.so /usr/lib/x86_64-linux-gnu/libfftw3.so /usr/lib/x86_64-linux-gnu/libfftw3_mpi.so)
Run the binaries with the parameter files:
apps/example_app_codegen/ExampleAppCodegen ../apps/example_app_codegen/ExampleApp.prm
apps/example_app_codegen/ExampleAppCodegenAVX ../apps/example_app_codegen/ExampleApp.prm
The AVX binary will fail at random with a SIGSEV, because the fields
are allocated with 8-byte alignment, although 32-byte alignment is
required to safely load doubles in memory. The src/field/Field.impl.h
file has ifdefs to select the correct alignment if AVX2 is defined,
however:
- the
alignment
value is 16 instead of 32 - the
sizeof(T) < alignment
usesT=const float [13]
, but the conditional was probably meant to test a hypothetical typeT_underlying=const float
- the conditional evaluates to
false
but takes thetrue
branch in GDB (in the ESPResSo bridge, thefalse
branch is taken) - the
allocator_
shared pointer should dereference to awalberla::field::AllocateAligned<unsigned char, 16>
object, but instead it dereferences to a generic allocator with 8-byte alignment
GDB setup:
gdb --args apps/example_app_codegen/ExampleAppCodegenAVX ../apps/example_app_codegen/ExampleApp.prm
(gdb) b /work/jgrad/walberla_deps/devel/walberla/src/field/Field.impl.h:341
(gdb) run
(gdb) tui e
Then in GDB, the execution was stepped through to check the values in the conditional as well as the allocated pointer, with is often 8-byte aligned instead of 16-byte or 32-byte aligned:
(gdb) print mem
$1 = (double *) 0x15554d528028
(gdb) python print(0x15554d528028 / 32)
733003551745.25
Then run continue
until the SIGSEV is hit.