How To Use the Composer#

The composer API is the interface by which C++ code is constructed in pystencils-sfg. It is exposed through the ubiquitous composer object returned by the SourceFileGenerator upon entry into its managed region. This guide is meant to illustrate the various constructions possible through the composer, starting from things as simple as #include directives and plain code strings, up to entire classes and their members.

Basic Functionality#

Prelude Comment#

You can equip your generated files with a prelude comment that will be printed at their very top:

import datetime

now = datetime.datetime.now()

with SourceFileGenerator() as sfg:
    sfg.prelude(f"This file was generated using pystencils-sfg at {now}.")
/**
 * This file was generated using pystencils-sfg at 2025-04-04 07:15:43.092762.
 */

#pragma once

#define RESTRICT __restrict__
/**
 * This file was generated using pystencils-sfg at 2025-04-04 07:15:43.092762.
 */

#include "demo.hpp"

#include Directives#

Use sfg.include to add #include directives to your generated files. For a system-header include, delimit the header name with <>. If the directive should be printed not into the header, but the implementation file, set private = True:

with SourceFileGenerator() as sfg:
    sfg.include("my_header.hpp")
    sfg.include("<memory>")
    sfg.include("detail_header.hpp", private=True)
#pragma once

#include "my_header.hpp"
#include <memory>

#define RESTRICT __restrict__
#include "demo.hpp"
#include "detail_header.hpp"

Plain Code Strings#

It is always possible to print out plain code strings verbatim. Use sfg.code() to write code directly to the generated header file. To emit the code to the implementation file instead, use sfg.code(..., impl=True).

with SourceFileGenerator() as sfg:
    sfg.code("int THE_ANSWER;")
    sfg.code("int THE_ANSWER = 42;", impl=True)
#pragma once

#define RESTRICT __restrict__

int THE_ANSWER;
#include "demo.hpp"

int THE_ANSWER = 42;

Defining Functions#

Free functions can be declared and defined using the sfg.function sequencer. It uses builder syntax to declare the various properties of the function in arbitrary order via a sequence of calls. This sequence must end with a plain pair of parentheses ( ... ) within which the function body will be defined. For example, the following will create a function getValue with return type int32 which is marked with the nodiscard attribute:

with SourceFileGenerator() as sfg:
    sfg.function("getValue").returns("int32").attr("nodiscard")(
        "return 42;"
    )
#pragma once

#include <cstdint>

#define RESTRICT __restrict__

[[nodiscard]] int32_t getValue();
#include "demo.hpp"

[[nodiscard]] int32_t getValue() { return 42; }

For a list of all available function qualifiers, see the reference of SfgFunctionSequencer.

Populate the Function Body#

The function body sequencer takes an arbitrary list of arguments of different types which are then interpreted as C++ code. The simplest case are plain strings, which will be printed out verbatim, in order, each string argument on its own line:

with SourceFileGenerator() as sfg:
    sfg.function("factorial").params(
        sfg.var("n", "uint64")
    ).returns("uint64")(
        "if(n == 0) return 1;",
        "else return n * factorial(n - 1);"
    )
#pragma once

#include <cstdint>

#define RESTRICT __restrict__

uint64_t factorial(uint64_t n);
#include "demo.hpp"

uint64_t factorial(uint64_t n) {
  if (n == 0)
    return 1;
  else
    return n * factorial(n - 1);
}

However, to make life easier, the composer API offers various APIs to model C++ code programmatically.

Note

Observe that the code generated from the above snippet contains line breaks after the if() and else keywords that where not part of the input. This happens because pystencils-sfg passes its generated code through clang-format for beautification.

Conditionals#

To emit an if-else conditional statement, use sfg.branch. The syntax of sfg.branch mimics the C++ if () {} else {} construct by a sequence of two (or three, with an else-branch) pairs of parentheses:

with SourceFileGenerator() as sfg:
    sfg.function("factorial").params(
        sfg.var("n", "uint64")
    ).returns("uint64")(
        sfg.branch("n == 0")(  # Condition
            #   then-block
            "return 1;"
        )(
            #   else-block
            "return n * factorial(n - 1);"
        )
    )
#pragma once

#include <cstdint>

#define RESTRICT __restrict__

uint64_t factorial(uint64_t n);
#include "demo.hpp"

uint64_t factorial(uint64_t n) {
  if (n == 0) {
    return 1;
  } else {
    return n * factorial(n - 1);
  }
}

Variables and Automatic Collection of Function Parameters#

Pystencils-sfg’s versatile expression system can keep track of free variables in a function body, and then automatically exposes these variables as function parameters. To cast a code string as an expression depending on variables, we need to do two things:

  • Create an object for each variable using sfg.var. This method takes the name and data type of the variable.

  • Create the expression through sfg.expr by interpolating a Python format string (see str.format) with variables or other expressions.

For example, here’s the expression in the else-block of the factorial function modelled this way:

n = sfg.var("n", "uint64")
...
sfg.expr("return {0} * factorial({0} - 1);", n)

Using this, we can omit the manually specified parameter list for factorial:

with SourceFileGenerator() as sfg:
    n = sfg.var("n", "uint64")

    sfg.function("factorial").returns("uint64")(
        sfg.branch(sfg.expr("{} == 0", n))(  # Condition
            #   then-block
            "return 1;"
        )(
            #   else-block, with interpolated expression
            sfg.expr("return {0} * factorial({0} - 1);", n)
        )
    )
#pragma once

#include <cstdint>

#define RESTRICT __restrict__

uint64_t factorial(uint64_t n);
#include "demo.hpp"

uint64_t factorial(uint64_t n) {
  if (n == 0) {
    return 1;
  } else {
    return n * factorial(n - 1);
  }
}

Manual Parameter Lists#

When function parameters are collected from the function body, the composer will always order them alphabetically. If this is not desired, e.g. if a generated function is expected to have a specific interface with a fixed parameter order, you will need to specify the parameter list manually using .params(...).

Variables of C++ Class Type#

sfg.var should only be used for the most basic data types: it parses its second argument as a data type using create_type, which is restricted to primitive and fixed-width C types. For more complex C++ classes, class templates, and their APIs, pystencils-sfg provides its own modelling system, implemented in pystencilssfg.lang. This system is used, for instance, by pystencilssfg.lang.cpp.std, which mirrors (a small part of) the C++ standard library.

To create a variable of a class template represented using the pystencilssfg.lang modelling system, first instantiate the class (with any template arguments, as well as optional const and ref qualifiers) and then call var on it:

from pystencilssfg.lang.cpp import std

data = std.vector("float64", const=True, ref=True).var("data")
str(data), str(data.dtype)
('data', 'const std::vector< double >&')

Initializing Variables#

To emit an initializer statement for a variable, use sfg.init:

from pystencilssfg.lang.cpp import std

result = std.tuple("int32", "int32").var("result")
n, m = sfg.vars("n, m", "int32")

sfg.init(result)(
    sfg.expr("{} / {}", n, m),
    sfg.expr("{} % {}", n, m)
)

This will be recognized by the parameter collector: variables that are defined using init before they are used will be considered bound and will not end up in the function signature. Also, any variables passed to the braced initializer-expression (by themselves or inside sfg.expr) will be found and tracked by the parameter collector:

from pystencilssfg.lang.cpp import std

with SourceFileGenerator() as sfg:
    result = std.tuple("int32", "int32").var("result")
    n, m = sfg.vars("n, m", "int32")

    sfg.function("div_rem").params(n, m).returns(result.dtype)(
        sfg.init(result)(
            sfg.expr("{} / {}", n, m),
            sfg.expr("{} % {}", n, m)
        ),
        sfg.expr("return {}", result)
    )
#pragma once

#include <cstdint>
#include <tuple>

#define RESTRICT __restrict__

std::tuple<int32_t, int32_t> div_rem(int32_t n, int32_t m);
#include "demo.hpp"

std::tuple<int32_t, int32_t> div_rem(int32_t n, int32_t m) {
  std::tuple<int32_t, int32_t> result{n / m, n % m};
  return result
}

Namespaces#

C++ uses namespaces to structure code and group entities. By default, pystencils-sfg emits all code into the global namespace. For instructions on how to change the outermost namespace used by the SourceFileGenerator, see Generator Script Configuration and Command-Line Interface.

Starting from the outermost namespace, nested namespaces can be entered and exited during code generation. To enter a new namespace, use sfg.namespace in one of two ways:

  • Simply calling sfg.namespace("my_namespace") and ignoring its return value will cause the generator script to use the given namespace for the rest of its execution;

  • Calling sfg.namespace("my_namespace") in a with statement will activate the given namespace only for the duration of the managed block.

To illustrate, the following snippet activates the namespace mylib::generated for the entire length of the generator script, and then enters and exits the nested namespace mylib::generated::detail:

with SourceFileGenerator() as sfg:
    sfg.namespace("mylib::generated")

    sfg.code("/* Generated code in outer namespace */")

    with sfg.namespace("detail"):
        sfg.code("/* Implementation details in the inner namespace */")

    sfg.code("/* More code in the outer namespace */")
#pragma once

#define RESTRICT __restrict__

namespace mylib::generated {
/* Generated code in outer namespace */

namespace detail {
/* Implementation details in the inner namespace */
} // namespace detail

/* More code in the outer namespace */
} // namespace mylib::generated
#include "demo.hpp"

namespace mylib::generated {
namespace detail {} // namespace detail
} // namespace mylib::generated

Kernels and Parameter Mappings#

The original purpose of pystencils-sfg is to simplify the embedding of pystencils-generated numerical kernels into C++ applications. This section discusses how to register kernels with the source file generator, how to call them in wrapper code, and how to automatically map symbolic pystencils fields onto nd-array data structures.

Registering Kernels#

In the generated files, kernels are organized in kernel namespaces. The composer gives us access to the default kernel namespace (<current_namespace>::kernels) via sfg.kernels.

To add a kernel,

Both functions return a kernel handle, through which the kernel may later be invoked.

You may create and access custom-named kernel namespaces using sfg.kernel_namespace(). This gives you a KernelsAdder object with the same interface as sfg.kernels.

Note

A kernel namespace is not a regular namespace; if you attempt to create both a regular and a kernel namespace with the same name, the composer will raise an error.

Here’s an example with two kernels being registered in different kernel namespace, once using add, and once using create.

import pystencils as ps

with SourceFileGenerator() as sfg:
    #   Create symbolic fields
    f, g = ps.fields("f, g: double[2D]")

    #   Define and create the first kernel
    asm1 = ps.Assignment(f(0), g(0))
    cfg1 = ps.CreateKernelConfig()
    cfg1.cpu.openmp.enable = True
    khandle_1 = sfg.kernels.create(asm1, "first_kernel", cfg1)

    #   Define the second kernel and its codegen configuration
    asm2 = ps.Assignment(f(0), 3.0 * g(0))
    cfg2 = ps.CreateKernelConfig(target=ps.Target.CUDA)

    #   Create and register the second kernel at a custom namespace
    kernel2 = ps.create_kernel(asm2, cfg2)
    khandle_2 = sfg.kernel_namespace("gpu_kernels").add(kernel2, "second_kernel")
#pragma once

#define RESTRICT __restrict__
#include "demo.hpp"
#include <cmath>
#include <cstdint>

namespace kernels {
void first_kernel(double *RESTRICT const _data_f,
                  double *RESTRICT const _data_g, const int64_t _size_f_0,
                  const int64_t _size_f_1, const int64_t _stride_f_0,
                  const int64_t _stride_f_1, const int64_t _stride_g_0,
                  const int64_t _stride_g_1) {
#pragma omp parallel for schedule(static)
  for (int64_t ctr_0 = 0LL; ctr_0 < _size_f_0; ctr_0 += 1LL) {
    for (int64_t ctr_1 = 0LL; ctr_1 < _size_f_1; ctr_1 += 1LL) {
      _data_f[ctr_0 * _stride_f_0 + ctr_1 * _stride_f_1] =
          _data_g[ctr_0 * _stride_g_0 + ctr_1 * _stride_g_1];
    }
  }
}
} // namespace kernels

namespace gpu_kernels {
__global__ void
second_kernel(double *RESTRICT const _data_f, double *RESTRICT const _data_g,
              const int64_t _size_f_0, const int64_t _size_f_1,
              const int64_t _stride_f_0, const int64_t _stride_f_1,
              const int64_t _stride_g_0, const int64_t _stride_g_1) {
  const int32_t __c_blockidx_ymblockdim_ypthreadidx_y =
      blockIdx.y * blockDim.y + threadIdx.y;
  const int32_t __c_blockidx_xmblockdim_xpthreadidx_x =
      blockIdx.x * blockDim.x + threadIdx.x;
  const int64_t ctr_0 = (int64_t)__c_blockidx_ymblockdim_ypthreadidx_y;
  const int64_t ctr_1 = (int64_t)__c_blockidx_xmblockdim_xpthreadidx_x;
  if (ctr_0 < _size_f_0 && ctr_1 < _size_f_1) {
    _data_f[ctr_0 * _stride_f_0 + ctr_1 * _stride_f_1] =
        3.0 * _data_g[ctr_0 * _stride_g_0 + ctr_1 * _stride_g_1];
  }
}
} // namespace gpu_kernels

Writing Kernel Wrapper Functions#

By default, kernel definitions are only visible in the generated implementation file; kernels are supposed to not be called directly, but through wrapper functions. This serves to hide their fairly lenghty and complicated low-level function interface.

Invoking CPU Kernels#

To call a CPU kernel from a function, use sfg.call on a kernel handle:

sfg.function("kernel_wrapper")(
    sfg.call(khandle)
)

This will expose all parameters of the kernel into the wrapper function and, in turn, cause them to be added to its signature. We don’t want to expose this complexity, but instead hide it by using appropriate data structures. The next section explains how that is achieved in pystencils-sfg.

Mapping Fields to Data Structures#

Pystencils kernels operate on n-dimensional contiguous or strided arrays, There exist many classes with diverse APIs modelling such arrays throughout the scientific computing landscape, including Kokkos Views, C++ std::mdspan, SYCL buffers, and many framework-specific custom-built classes. Using the protocols behind sfg.map_field, it is possible to automatically emit code that extracts the indexing information required by a kernel from any of these classes, as long as a suitable API reflection is available.

See also

Reflecting Field Data Structures for instructions on how to set up field API reflection for a custom nd-array data structure.

Pystencils-sfg natively provides field extraction for a number of C++ STL-classes, such as std::vector and std::span (for 1D fields) and std::mdspan. Import any of them from pystencilssfg.lang.cpp.std and create an instance for a given field using .from_field(). Then, inside the wrapper function, pass the symbolic field and its associated data structure to sfg.map_field. before calling the kernel:

import pystencils as ps
from pystencilssfg.lang.cpp import std

with SourceFileGenerator() as sfg:
    #   Create symbolic fields
    f, g = ps.fields("f, g: double[1D]")

    #   Create data structure reflections
    f_vec = std.vector.from_field(f)
    g_span = std.span.from_field(g)

    #   Create the kernel
    asm = ps.Assignment(f(0), g(0))
    khandle = sfg.kernels.create(asm, "my_kernel")

    #   Create the wrapper function
    sfg.function("call_my_kernel")(
        sfg.map_field(f, f_vec),
        sfg.map_field(g, g_span),
        sfg.call(khandle)
    )
#pragma once

#include <span>
#include <vector>

#define RESTRICT __restrict__

void call_my_kernel(std::vector<double> &f, std::span<double> g);
#include "demo.hpp"
#include <cmath>
#include <cstdint>

namespace kernels {
void my_kernel(double *RESTRICT const _data_f, double *RESTRICT const _data_g,
               const int64_t _size_f_0, const int64_t _stride_f_0,
               const int64_t _stride_g_0) {
  for (int64_t ctr_0 = 0LL; ctr_0 < _size_f_0; ctr_0 += 1LL) {
    _data_f[ctr_0 * _stride_f_0] = _data_g[ctr_0 * _stride_g_0];
  }
}
} // namespace kernels

void call_my_kernel(std::vector<double> &f, std::span<double> g) {
  double *RESTRICT const _data_f{f.data()};
  const int64_t _size_f_0{int64_t(f.size())};
  const int64_t _stride_f_0{int64_t(1)};
  double *RESTRICT const _data_g{g.data()};
  /* g.size() == _size_g_0 */
  const int64_t _stride_g_0{int64_t(1)};
  kernels::my_kernel(_data_f, _data_g, _size_f_0, _stride_f_0, _stride_g_0);
}

GPU Kernels#

Pystencils also allows us to generate kernels for the CUDA and HIP GPU programming models. This section describes how to generate GPU kernels through pystencils-sfg; how to invoke them with various launch configurations, and how GPU execution streams are reflected.

Generate and Invoke CUDA and HIP Kernels#

To generate a kernel targetting either of these, set the target code generator option to either Target.CUDA or Target.HIP. After registering a GPU kernel, its invocation can be rendered using sfg.gpu_invoke. Here is an example using CUDA:

from pystencilssfg import SfgConfig
sfg_config = SfgConfig()
sfg_config.extensions.impl = "cu"

with SourceFileGenerator(sfg_config) as sfg:
    #   Configure the code generator to use CUDA
    cfg = ps.CreateKernelConfig(target=ps.Target.CUDA)

    #   Create fields, assemble assignments
    f, g = ps.fields("f, g: double[128, 128]")
    asm = ps.Assignment(f(0), g(0))

    #   Register kernel
    khandle = sfg.kernels.create(asm, "gpu_kernel", cfg)

    #   Invoke it
    sfg.function("kernel_wrapper")(
        sfg.gpu_invoke(khandle)
    )
#pragma once

#define RESTRICT __restrict__

void kernel_wrapper(double *RESTRICT const _data_f,
                    double *RESTRICT const _data_g);
#include "demo.hpp"
#include <cstdint>
#include <cuda_runtime.h>
#include <tuple>

namespace kernels {
__global__ void gpu_kernel(double *RESTRICT const _data_f,
                           double *RESTRICT const _data_g) {
  const int32_t __c_blockidx_ymblockdim_ypthreadidx_y =
      blockIdx.y * blockDim.y + threadIdx.y;
  const int32_t __c_blockidx_xmblockdim_xpthreadidx_x =
      blockIdx.x * blockDim.x + threadIdx.x;
  const int64_t ctr_0 = (int64_t)__c_blockidx_ymblockdim_ypthreadidx_y;
  const int64_t ctr_1 = (int64_t)__c_blockidx_xmblockdim_xpthreadidx_x;
  if (ctr_0 < 128LL && ctr_1 < 128LL) {
    _data_f[ctr_0 * 128LL + ctr_1] = _data_g[ctr_0 * 128LL + ctr_1];
  }
}
} // namespace kernels

void kernel_wrapper(double *RESTRICT const _data_f,
                    double *RESTRICT const _data_g) {
  {
    const dim3 __block_size{8, 8, 4};
    const std::tuple<uint32_t, uint32_t, uint32_t> __work_items{128LL, 128LL,
                                                                1LL};
    const dim3 __grid_size{
        (std::get<0>(__work_items) + __block_size.x - 1) / __block_size.x,
        (std::get<1>(__work_items) + __block_size.y - 1) / __block_size.y,
        (std::get<2>(__work_items) + __block_size.z - 1) / __block_size.z};
    /* clang-format off */
    /* [pystencils-sfg] Formatting may add illegal spaces between angular brackets in `<<< >>>` */
    kernels::gpu_kernel<<< __grid_size, __block_size, 0 >>>(_data_f, _data_g);
    /* clang-format on */
  }
}

In this snippet, we used the generator configuration to change the suffix of the generated implementation file to .cu.

When investigating the generated .cu file, you can see that the GPU launch configuration parameters grid size and block size are being computed automatically from the array sizes. This behavior can be changed by modifying options in the gpu category of the CreateKernelConfig.

Adapting the Launch Configuration#

GPU kernel invocations usually require the user to provide a launch grid, defined by the GPU thread block size and the number of blocks on the grid. In the simplest case (seen above), pystencils-sfg will emit code that automatically computes these parameters from the size of the arrays passed to the kernel, using a default block size defined by pystencils.

The code generator also permits customization of the launch configuration. You may provide a custom block size to override the default, in which case the grid size will still be computed by dividing the array sizes by your block size. Otherwise, you can also fully take over control of both block and grid size. For both cases, instructions are given in the following.

User-Defined Block Size for Auto-Computed Grid Size#

To merely modify the block size argument while still automatically inferring the grid size, pass a variable or expression of type dim3 to the block_size parameter of gpu_invoke. Pystencils-sfg exposes two versions of dim3, which differ primarily in their associated runtime headers:

The following snippet selects the correct dim3 type according to the kernel target; it then creates a variable of that type and turns that into an argument to the kernel invocation:

from pystencilssfg.lang.gpu import hip

with SourceFileGenerator(sfg_config) as sfg:
    # ... define kernel ...
    khandle = sfg.kernels.create(asm, "gpu_kernel", cfg)

    #   Select dim3 reflection
    match target:
        case ps.Target.CUDA:
            from pystencilssfg.lang.gpu import cuda as gpu_api
        case ps.Target.HIP:
            from pystencilssfg.lang.gpu import hip as gpu_api
    
    #   Create dim3 variable and pass it to kernel invocation
    block_size = gpu_api.dim3(const=True).var("block_size")

    sfg.function("kernel_wrapper")(
        sfg.gpu_invoke(khandle, block_size=block_size)
    )
#pragma once

#include <hip/hip_runtime.h>

#define RESTRICT __restrict__

void kernel_wrapper(double *RESTRICT const _data_f,
                    double *RESTRICT const _data_g, const dim3 block_size);
#include "demo.hpp"
#include "pystencils_runtime/hip.h"
#include <cstdint>
#include <tuple>

namespace kernels {
__global__ void gpu_kernel(double *RESTRICT const _data_f,
                           double *RESTRICT const _data_g) {
  const int32_t __c_blockidx_ymblockdim_ypthreadidx_y =
      blockIdx.y * blockDim.y + threadIdx.y;
  const int32_t __c_blockidx_xmblockdim_xpthreadidx_x =
      blockIdx.x * blockDim.x + threadIdx.x;
  const int64_t ctr_0 = (int64_t)__c_blockidx_ymblockdim_ypthreadidx_y;
  const int64_t ctr_1 = (int64_t)__c_blockidx_xmblockdim_xpthreadidx_x;
  if (ctr_0 < 128LL && ctr_1 < 128LL) {
    _data_f[ctr_0 * 128LL + ctr_1] = _data_g[ctr_0 * 128LL + ctr_1];
  }
}
} // namespace kernels

void kernel_wrapper(double *RESTRICT const _data_f,
                    double *RESTRICT const _data_g, const dim3 block_size) {
  {
    const dim3 __block_size{block_size};
    const std::tuple<uint32_t, uint32_t, uint32_t> __work_items{128LL, 128LL,
                                                                1LL};
    const dim3 __grid_size{
        (std::get<0>(__work_items) + __block_size.x - 1) / __block_size.x,
        (std::get<1>(__work_items) + __block_size.y - 1) / __block_size.y,
        (std::get<2>(__work_items) + __block_size.z - 1) / __block_size.z};
    /* clang-format off */
    /* [pystencils-sfg] Formatting may add illegal spaces between angular brackets in `<<< >>>` */
    kernels::gpu_kernel<<< __grid_size, __block_size, 0 >>>(_data_f, _data_g);
    /* clang-format on */
  }
}

Manual Launch Configurations#

To take full control of the launch configuration, we must disable its automatic inferrence by setting the gpu.manual_launch_grid code generator option to True. Then, we must pass dim3 arguments for both block_size and grid_size to the kernel invocation:

from pystencilssfg.lang.gpu import hip

with SourceFileGenerator(sfg_config) as sfg:
    # ... define kernel ...

    #   Configure for manual launch config
    cfg = ps.CreateKernelConfig(target=ps.Target.CUDA)
    cfg.gpu.manual_launch_grid = True

    #   Register kernel
    khandle = sfg.kernels.create(asm, "gpu_kernel", cfg)
    
    #   Create dim3 variables
    from pystencilssfg.lang.gpu import cuda
    block_size = cuda.dim3(const=True).var("block_size")
    grid_size = cuda.dim3(const=True).var("grid_size")

    sfg.function("kernel_wrapper")(
        sfg.gpu_invoke(khandle, block_size=block_size, grid_size=grid_size)
    )
#pragma once

#include <cuda_runtime.h>

#define RESTRICT __restrict__

void kernel_wrapper(double *RESTRICT const _data_f,
                    double *RESTRICT const _data_g, const dim3 block_size,
                    const dim3 grid_size);
#include "demo.hpp"
#include <cstdint>

namespace kernels {
__global__ void gpu_kernel(double *RESTRICT const _data_f,
                           double *RESTRICT const _data_g) {
  const int32_t __c_blockidx_ymblockdim_ypthreadidx_y =
      blockIdx.y * blockDim.y + threadIdx.y;
  const int32_t __c_blockidx_xmblockdim_xpthreadidx_x =
      blockIdx.x * blockDim.x + threadIdx.x;
  const int64_t ctr_0 = (int64_t)__c_blockidx_ymblockdim_ypthreadidx_y;
  const int64_t ctr_1 = (int64_t)__c_blockidx_xmblockdim_xpthreadidx_x;
  if (ctr_0 < 128LL && ctr_1 < 128LL) {
    _data_f[ctr_0 * 128LL + ctr_1] = _data_g[ctr_0 * 128LL + ctr_1];
  }
}
} // namespace kernels

void kernel_wrapper(double *RESTRICT const _data_f,
                    double *RESTRICT const _data_g, const dim3 block_size,
                    const dim3 grid_size) {
  /* clang-format off */
  /* [pystencils-sfg] Formatting may add illegal spaces between angular brackets in `<<< >>>` */
  kernels::gpu_kernel<<< grid_size, block_size, 0 >>>(_data_f, _data_g);
  /* clang-format on */
}

Using Streams#

CUDA and HIP kernels can be enqueued into streams for concurrent execution. This is mirrored in pystencils-sfg; all overloads of gpu_invoke take an optional stream argument. The stream_t data types of both CUDA and HIP are made available through the respective API reflections:

Here is an example that creates a variable of the HIP stream type and passes it to gpu_invoke:

from pystencilssfg.lang.gpu import hip

with SourceFileGenerator(sfg_config) as sfg:
    # ... define kernel ...
    khandle = sfg.kernels.create(asm, "gpu_kernel", cfg)

    stream = hip.stream_t(const=True).var("stream")

    sfg.function("kernel_wrapper")(
        sfg.gpu_invoke(khandle, stream=stream)
    )
#pragma once

#include <hip/hip_runtime.h>

#define RESTRICT __restrict__

void kernel_wrapper(double *RESTRICT const _data_f,
                    double *RESTRICT const _data_g, const hipStream_t stream);
#include "demo.hpp"
#include "pystencils_runtime/hip.h"
#include <cstdint>
#include <tuple>

namespace kernels {
__global__ void gpu_kernel(double *RESTRICT const _data_f,
                           double *RESTRICT const _data_g) {
  const int32_t __c_blockidx_ymblockdim_ypthreadidx_y =
      blockIdx.y * blockDim.y + threadIdx.y;
  const int32_t __c_blockidx_xmblockdim_xpthreadidx_x =
      blockIdx.x * blockDim.x + threadIdx.x;
  const int64_t ctr_0 = (int64_t)__c_blockidx_ymblockdim_ypthreadidx_y;
  const int64_t ctr_1 = (int64_t)__c_blockidx_xmblockdim_xpthreadidx_x;
  if (ctr_0 < 128LL && ctr_1 < 128LL) {
    _data_f[ctr_0 * 128LL + ctr_1] = _data_g[ctr_0 * 128LL + ctr_1];
  }
}
} // namespace kernels

void kernel_wrapper(double *RESTRICT const _data_f,
                    double *RESTRICT const _data_g, const hipStream_t stream) {
  {
    const dim3 __block_size{8, 8, 4};
    const std::tuple<uint32_t, uint32_t, uint32_t> __work_items{128LL, 128LL,
                                                                1LL};
    const dim3 __grid_size{
        (std::get<0>(__work_items) + __block_size.x - 1) / __block_size.x,
        (std::get<1>(__work_items) + __block_size.y - 1) / __block_size.y,
        (std::get<2>(__work_items) + __block_size.z - 1) / __block_size.z};
    /* clang-format off */
    /* [pystencils-sfg] Formatting may add illegal spaces between angular brackets in `<<< >>>` */
    kernels::gpu_kernel<<< __grid_size, __block_size, 0, stream >>>(_data_f, _data_g);
    /* clang-format on */
  }
}

To Do

  • Defining classes, their fields constructors, and methods