Deterministic code generation
When generating kernels with Pystencils 1.x and 2.0-dev, assignment operations that do not depend on each other appear in random order, which also changes the name of all variables that follow the pattern xi_<number>
. Thus running the same code generation script twice leads to two different source codes in C++ and CUDA. This behavior negates the benefit of version control, since code diffs become difficult to read:
diff example (click to expand)
--- a/src/walberla_bridge/generated_kernels/CollideSweepSinglePrecisionThermalized.cpp
+++ b/src/walberla_bridge/generated_kernels/CollideSweepSinglePrecisionThermalized.cpp
@@ -49,47 +49,43 @@ namespace pystencils {
const float xi_0 = ((1.0f) / (rho));
- const float xi_10 = xi_0 * 0.5f;
- const float u_0 = xi_0 * (vel0Term - xi_11 - xi_8) + xi_10 * xi_271;
- const float xi_17 = u_0 * xi_271;
- const float xi_28 = xi_17 * 0.16666666666666666f;
- const float xi_29 = -xi_28;
- const float xi_30 = xi_17 * 0.083333333333333329f
+ const float xi_7 = xi_0 * 0.5f;
+ const float u_0 = xi_0 * (vel0Term + xi_13 + xi_8 + xi_9) + xi_246 * xi_7;
+ const float xi_25 = u_0 * xi_246;
+ const float xi_37 = xi_25 * 0.16666666666666666f;
+ const float xi_38 = xi_25 * 0.083333333333333329f;
+ const float xi_39 = omega_shear * xi_38;
...
This behavior also makes regression and bug hunting more tedious. For example, when migrating from Pystencils 1.3.3 to 1.3.7 in ESPResSo, we noticed numerical instabilities in single-precision collision kernels in both C++ and CUDA. Same issue with 2.0-dev. It's hard to tell from the code diff whether this is due to a regression in the mathematical expressions in the collision kernel (that would be an issue from pystencils), or due to the compiler re-ordering assignments in a way that introduces excessive precision loss (that would be an issue from ESPResSo's compiler options or from our codegen script). While one could manually edit the generated code to promote float
to double
in some of the assignments to reduce precision loss, this is not ideal in GPU kernels where double-precision is much slower, nor in vectorized CPU kernels where promotion to double-precision requires non-trivial changes to the code.
My question is: is there a way to improve determinism in the code generation workflow? Could this be achieved from the user's side by adapting the codegen script, or would one have to alter pystencils' design? The AssignmentCollection.topological_sort()
method seems to be a good candidate, although my understanding of this specific aspect of the code generation process is quite shallow, and I'm unsure if replacing a Python list
of sp.Assignment
by AssignmentCollection
is always safe.
The lack of consistency in variables naming and ordering forces us to "earmark" generated kernels. We will also have to release the waLBerla bridge with the C++/CUDA output of the codegen script for selected CPU and GPU architectures, since we cannot rely on the C++ and CUDA code to be exactly the same for all users when running the code script, which would be an obstacle to e.g. Fedora's Reproducible Package Builds initiative.
Thank you for your time!