How To Use the Composer#
The composer API is the interface by which C++ code is constructed in pystencils-sfg.
It is exposed through the ubiquitous composer object returned by the SourceFileGenerator
upon entry into its managed region.
This guide is meant to illustrate the various constructions possible through the composer,
starting from things as simple as #include
directives and plain code strings,
up to entire classes and their members.
Basic Functionality#
Prelude Comment#
You can equip your generated files with a prelude comment that will be printed at their very top:
import datetime
now = datetime.datetime.now()
with SourceFileGenerator() as sfg:
sfg.prelude(f"This file was generated using pystencils-sfg at {now}.")
/**
* This file was generated using pystencils-sfg at 2025-04-04 07:15:43.092762.
*/
#pragma once
#define RESTRICT __restrict__
/**
* This file was generated using pystencils-sfg at 2025-04-04 07:15:43.092762.
*/
#include "demo.hpp"
#include
Directives#
Use sfg.include
to add #include
directives to your generated files.
For a system-header include, delimit the header name with <>
.
If the directive should be printed not into the header, but the implementation file,
set private = True
:
with SourceFileGenerator() as sfg:
sfg.include("my_header.hpp")
sfg.include("<memory>")
sfg.include("detail_header.hpp", private=True)
#pragma once
#include "my_header.hpp"
#include <memory>
#define RESTRICT __restrict__
#include "demo.hpp"
#include "detail_header.hpp"
Plain Code Strings#
It is always possible to print out plain code strings verbatim.
Use sfg.code()
to write code directly to the generated header file.
To emit the code to the implementation file instead, use sfg.code(..., impl=True)
.
with SourceFileGenerator() as sfg:
sfg.code("int THE_ANSWER;")
sfg.code("int THE_ANSWER = 42;", impl=True)
#pragma once
#define RESTRICT __restrict__
int THE_ANSWER;
#include "demo.hpp"
int THE_ANSWER = 42;
Defining Functions#
Free functions can be declared and defined using the sfg.function
sequencer.
It uses builder syntax to declare the various properties of the function in arbitrary
order via a sequence of calls. This sequence must end with a plain pair of parentheses ( ... )
within which the function body will be defined.
For example, the following will create a function getValue
with return type int32
which is marked with the nodiscard
attribute:
with SourceFileGenerator() as sfg:
sfg.function("getValue").returns("int32").attr("nodiscard")(
"return 42;"
)
#pragma once
#include <cstdint>
#define RESTRICT __restrict__
[[nodiscard]] int32_t getValue();
#include "demo.hpp"
[[nodiscard]] int32_t getValue() { return 42; }
For a list of all available function qualifiers, see the reference of SfgFunctionSequencer
.
Populate the Function Body#
The function body sequencer takes an arbitrary list of arguments of different types which are then interpreted as C++ code. The simplest case are plain strings, which will be printed out verbatim, in order, each string argument on its own line:
with SourceFileGenerator() as sfg:
sfg.function("factorial").params(
sfg.var("n", "uint64")
).returns("uint64")(
"if(n == 0) return 1;",
"else return n * factorial(n - 1);"
)
#pragma once
#include <cstdint>
#define RESTRICT __restrict__
uint64_t factorial(uint64_t n);
#include "demo.hpp"
uint64_t factorial(uint64_t n) {
if (n == 0)
return 1;
else
return n * factorial(n - 1);
}
However, to make life easier, the composer API offers various APIs to model C++ code programmatically.
Note
Observe that the code generated from the above snippet contains line breaks after the if()
and else
keywords
that where not part of the input.
This happens because pystencils-sfg
passes its generated code through clang-format
for beautification.
Conditionals#
To emit an if-else conditional statement, use sfg.branch
.
The syntax of sfg.branch
mimics the C++ if () {} else {}
construct by a sequence of
two (or three, with an else
-branch) pairs of parentheses:
with SourceFileGenerator() as sfg:
sfg.function("factorial").params(
sfg.var("n", "uint64")
).returns("uint64")(
sfg.branch("n == 0")( # Condition
# then-block
"return 1;"
)(
# else-block
"return n * factorial(n - 1);"
)
)
#pragma once
#include <cstdint>
#define RESTRICT __restrict__
uint64_t factorial(uint64_t n);
#include "demo.hpp"
uint64_t factorial(uint64_t n) {
if (n == 0) {
return 1;
} else {
return n * factorial(n - 1);
}
}
Variables and Automatic Collection of Function Parameters#
Pystencils-sfg’s versatile expression system can keep track of free variables in a function body, and then automatically exposes these variables as function parameters. To cast a code string as an expression depending on variables, we need to do two things:
Create an object for each variable using
sfg.var
. This method takes the name and data type of the variable.Create the expression through
sfg.expr
by interpolating a Python format string (seestr.format
) with variables or other expressions.
For example, here’s the expression in the else
-block of the factorial
function modelled this way:
n = sfg.var("n", "uint64")
...
sfg.expr("return {0} * factorial({0} - 1);", n)
Using this, we can omit the manually specified parameter list for factorial
:
with SourceFileGenerator() as sfg:
n = sfg.var("n", "uint64")
sfg.function("factorial").returns("uint64")(
sfg.branch(sfg.expr("{} == 0", n))( # Condition
# then-block
"return 1;"
)(
# else-block, with interpolated expression
sfg.expr("return {0} * factorial({0} - 1);", n)
)
)
#pragma once
#include <cstdint>
#define RESTRICT __restrict__
uint64_t factorial(uint64_t n);
#include "demo.hpp"
uint64_t factorial(uint64_t n) {
if (n == 0) {
return 1;
} else {
return n * factorial(n - 1);
}
}
Manual Parameter Lists#
When function parameters are collected from the function body, the composer will always order them
alphabetically. If this is not desired, e.g. if a generated function is expected to have a specific interface
with a fixed parameter order, you will need to specify the parameter list manually using .params(...)
.
Variables of C++ Class Type#
sfg.var
should only be used for the most basic data types: it parses its second argument as a data type using
create_type
, which is restricted to primitive and fixed-width C types.
For more complex C++ classes, class templates, and their APIs, pystencils-sfg provides its own modelling system,
implemented in pystencilssfg.lang
.
This system is used, for instance, by pystencilssfg.lang.cpp.std
, which mirrors (a small part of) the C++ standard library.
See also
To create a variable of a class template represented using the pystencilssfg.lang
modelling system,
first instantiate the class (with any template arguments, as well as optional const
and ref
qualifiers)
and then call var
on it:
from pystencilssfg.lang.cpp import std
data = std.vector("float64", const=True, ref=True).var("data")
str(data), str(data.dtype)
('data', 'const std::vector< double >&')
Initializing Variables#
To emit an initializer statement for a variable, use sfg.init
:
from pystencilssfg.lang.cpp import std
result = std.tuple("int32", "int32").var("result")
n, m = sfg.vars("n, m", "int32")
sfg.init(result)(
sfg.expr("{} / {}", n, m),
sfg.expr("{} % {}", n, m)
)
This will be recognized by the parameter collector:
variables that are defined using init
before they are used will be considered bound
and will not end up in the function signature.
Also, any variables passed to the braced initializer-expression (by themselves or inside sfg.expr
)
will be found and tracked by the parameter collector:
from pystencilssfg.lang.cpp import std
with SourceFileGenerator() as sfg:
result = std.tuple("int32", "int32").var("result")
n, m = sfg.vars("n, m", "int32")
sfg.function("div_rem").params(n, m).returns(result.dtype)(
sfg.init(result)(
sfg.expr("{} / {}", n, m),
sfg.expr("{} % {}", n, m)
),
sfg.expr("return {}", result)
)
#pragma once
#include <cstdint>
#include <tuple>
#define RESTRICT __restrict__
std::tuple<int32_t, int32_t> div_rem(int32_t n, int32_t m);
#include "demo.hpp"
std::tuple<int32_t, int32_t> div_rem(int32_t n, int32_t m) {
std::tuple<int32_t, int32_t> result{n / m, n % m};
return result
}
Namespaces#
C++ uses namespaces to structure code and group entities.
By default, pystencils-sfg emits all code into the global namespace.
For instructions on how to change the outermost namespace used by the SourceFileGenerator
,
see Generator Script Configuration and Command-Line Interface.
Starting from the outermost namespace, nested namespaces can be entered and exited during
code generation.
To enter a new namespace, use sfg.namespace
in one of two ways:
Simply calling
sfg.namespace("my_namespace")
and ignoring its return value will cause the generator script to use the given namespace for the rest of its execution;Calling
sfg.namespace("my_namespace")
in awith
statement will activate the given namespace only for the duration of the managed block.
To illustrate, the following snippet activates the namespace mylib::generated
for the entire
length of the generator script, and then enters and exits the nested namespace mylib::generated::detail
:
with SourceFileGenerator() as sfg:
sfg.namespace("mylib::generated")
sfg.code("/* Generated code in outer namespace */")
with sfg.namespace("detail"):
sfg.code("/* Implementation details in the inner namespace */")
sfg.code("/* More code in the outer namespace */")
#pragma once
#define RESTRICT __restrict__
namespace mylib::generated {
/* Generated code in outer namespace */
namespace detail {
/* Implementation details in the inner namespace */
} // namespace detail
/* More code in the outer namespace */
} // namespace mylib::generated
#include "demo.hpp"
namespace mylib::generated {
namespace detail {} // namespace detail
} // namespace mylib::generated
Kernels and Parameter Mappings#
The original purpose of pystencils-sfg is to simplify the embedding of pystencils-generated numerical kernels into C++ applications. This section discusses how to register kernels with the source file generator, how to call them in wrapper code, and how to automatically map symbolic pystencils fields onto nd-array data structures.
Registering Kernels#
In the generated files, kernels are organized in kernel namespaces.
The composer gives us access to the default kernel namespace (<current_namespace>::kernels
)
via sfg.kernels
.
To add a kernel,
either pass its assignments and the pystencils code generator configuration directly to
kernels.create()
,or create the kernel separately through
pystencils.create_kernel
and register it usingkernels.add()
.
Both functions return a kernel handle, through which the kernel may later be invoked.
You may create and access custom-named kernel namespaces using sfg.kernel_namespace()
.
This gives you a KernelsAdder
object with the same interface as sfg.kernels
.
Note
A kernel namespace is not a regular namespace; if you attempt to create both a regular and a kernel namespace with the same name, the composer will raise an error.
Here’s an example with two kernels being registered in different kernel namespace,
once using add
, and once using create
.
import pystencils as ps
with SourceFileGenerator() as sfg:
# Create symbolic fields
f, g = ps.fields("f, g: double[2D]")
# Define and create the first kernel
asm1 = ps.Assignment(f(0), g(0))
cfg1 = ps.CreateKernelConfig()
cfg1.cpu.openmp.enable = True
khandle_1 = sfg.kernels.create(asm1, "first_kernel", cfg1)
# Define the second kernel and its codegen configuration
asm2 = ps.Assignment(f(0), 3.0 * g(0))
cfg2 = ps.CreateKernelConfig(target=ps.Target.CUDA)
# Create and register the second kernel at a custom namespace
kernel2 = ps.create_kernel(asm2, cfg2)
khandle_2 = sfg.kernel_namespace("gpu_kernels").add(kernel2, "second_kernel")
#pragma once
#define RESTRICT __restrict__
#include "demo.hpp"
#include <cmath>
#include <cstdint>
namespace kernels {
void first_kernel(double *RESTRICT const _data_f,
double *RESTRICT const _data_g, const int64_t _size_f_0,
const int64_t _size_f_1, const int64_t _stride_f_0,
const int64_t _stride_f_1, const int64_t _stride_g_0,
const int64_t _stride_g_1) {
#pragma omp parallel for schedule(static)
for (int64_t ctr_0 = 0LL; ctr_0 < _size_f_0; ctr_0 += 1LL) {
for (int64_t ctr_1 = 0LL; ctr_1 < _size_f_1; ctr_1 += 1LL) {
_data_f[ctr_0 * _stride_f_0 + ctr_1 * _stride_f_1] =
_data_g[ctr_0 * _stride_g_0 + ctr_1 * _stride_g_1];
}
}
}
} // namespace kernels
namespace gpu_kernels {
__global__ void
second_kernel(double *RESTRICT const _data_f, double *RESTRICT const _data_g,
const int64_t _size_f_0, const int64_t _size_f_1,
const int64_t _stride_f_0, const int64_t _stride_f_1,
const int64_t _stride_g_0, const int64_t _stride_g_1) {
const int32_t __c_blockidx_ymblockdim_ypthreadidx_y =
blockIdx.y * blockDim.y + threadIdx.y;
const int32_t __c_blockidx_xmblockdim_xpthreadidx_x =
blockIdx.x * blockDim.x + threadIdx.x;
const int64_t ctr_0 = (int64_t)__c_blockidx_ymblockdim_ypthreadidx_y;
const int64_t ctr_1 = (int64_t)__c_blockidx_xmblockdim_xpthreadidx_x;
if (ctr_0 < _size_f_0 && ctr_1 < _size_f_1) {
_data_f[ctr_0 * _stride_f_0 + ctr_1 * _stride_f_1] =
3.0 * _data_g[ctr_0 * _stride_g_0 + ctr_1 * _stride_g_1];
}
}
} // namespace gpu_kernels
Writing Kernel Wrapper Functions#
By default, kernel definitions are only visible in the generated implementation file; kernels are supposed to not be called directly, but through wrapper functions. This serves to hide their fairly lenghty and complicated low-level function interface.
Invoking CPU Kernels#
To call a CPU kernel from a function, use sfg.call
on a kernel handle:
sfg.function("kernel_wrapper")(
sfg.call(khandle)
)
This will expose all parameters of the kernel into the wrapper function and, in turn, cause them to be added to its signature. We don’t want to expose this complexity, but instead hide it by using appropriate data structures. The next section explains how that is achieved in pystencils-sfg.
Mapping Fields to Data Structures#
Pystencils kernels operate on n-dimensional contiguous or strided arrays,
There exist many classes with diverse APIs modelling such arrays throughout the scientific
computing landscape, including Kokkos Views, C++ std::mdspan,
SYCL buffers, and many framework-specific custom-built classes.
Using the protocols behind sfg.map_field
,
it is possible to automatically emit code
that extracts the indexing information required by a kernel from any of these classes,
as long as a suitable API reflection is available.
See also
Reflecting Field Data Structures for instructions on how to set up field API reflection for a custom nd-array data structure.
Pystencils-sfg natively provides field extraction for a number of C++ STL-classes,
such as std::vector
and std::span
(for 1D fields) and std::mdspan
.
Import any of them from pystencilssfg.lang.cpp.std
and create an instance for a given
field using .from_field()
.
Then, inside the wrapper function, pass the symbolic field and its associated data structure to
sfg.map_field
.
before calling the kernel:
import pystencils as ps
from pystencilssfg.lang.cpp import std
with SourceFileGenerator() as sfg:
# Create symbolic fields
f, g = ps.fields("f, g: double[1D]")
# Create data structure reflections
f_vec = std.vector.from_field(f)
g_span = std.span.from_field(g)
# Create the kernel
asm = ps.Assignment(f(0), g(0))
khandle = sfg.kernels.create(asm, "my_kernel")
# Create the wrapper function
sfg.function("call_my_kernel")(
sfg.map_field(f, f_vec),
sfg.map_field(g, g_span),
sfg.call(khandle)
)
#pragma once
#include <span>
#include <vector>
#define RESTRICT __restrict__
void call_my_kernel(std::vector<double> &f, std::span<double> g);
#include "demo.hpp"
#include <cmath>
#include <cstdint>
namespace kernels {
void my_kernel(double *RESTRICT const _data_f, double *RESTRICT const _data_g,
const int64_t _size_f_0, const int64_t _stride_f_0,
const int64_t _stride_g_0) {
for (int64_t ctr_0 = 0LL; ctr_0 < _size_f_0; ctr_0 += 1LL) {
_data_f[ctr_0 * _stride_f_0] = _data_g[ctr_0 * _stride_g_0];
}
}
} // namespace kernels
void call_my_kernel(std::vector<double> &f, std::span<double> g) {
double *RESTRICT const _data_f{f.data()};
const int64_t _size_f_0{int64_t(f.size())};
const int64_t _stride_f_0{int64_t(1)};
double *RESTRICT const _data_g{g.data()};
/* g.size() == _size_g_0 */
const int64_t _stride_g_0{int64_t(1)};
kernels::my_kernel(_data_f, _data_g, _size_f_0, _stride_f_0, _stride_g_0);
}
GPU Kernels#
Pystencils also allows us to generate kernels for the CUDA and HIP GPU programming models. This section describes how to generate GPU kernels through pystencils-sfg; how to invoke them with various launch configurations, and how GPU execution streams are reflected.
Generate and Invoke CUDA and HIP Kernels#
To generate a kernel targetting either of these, set the
target
code generator option to either Target.CUDA
or Target.HIP
.
After registering a GPU kernel,
its invocation can be rendered using sfg.gpu_invoke
.
Here is an example using CUDA:
from pystencilssfg import SfgConfig
sfg_config = SfgConfig()
sfg_config.extensions.impl = "cu"
with SourceFileGenerator(sfg_config) as sfg:
# Configure the code generator to use CUDA
cfg = ps.CreateKernelConfig(target=ps.Target.CUDA)
# Create fields, assemble assignments
f, g = ps.fields("f, g: double[128, 128]")
asm = ps.Assignment(f(0), g(0))
# Register kernel
khandle = sfg.kernels.create(asm, "gpu_kernel", cfg)
# Invoke it
sfg.function("kernel_wrapper")(
sfg.gpu_invoke(khandle)
)
#pragma once
#define RESTRICT __restrict__
void kernel_wrapper(double *RESTRICT const _data_f,
double *RESTRICT const _data_g);
#include "demo.hpp"
#include <cstdint>
#include <cuda_runtime.h>
#include <tuple>
namespace kernels {
__global__ void gpu_kernel(double *RESTRICT const _data_f,
double *RESTRICT const _data_g) {
const int32_t __c_blockidx_ymblockdim_ypthreadidx_y =
blockIdx.y * blockDim.y + threadIdx.y;
const int32_t __c_blockidx_xmblockdim_xpthreadidx_x =
blockIdx.x * blockDim.x + threadIdx.x;
const int64_t ctr_0 = (int64_t)__c_blockidx_ymblockdim_ypthreadidx_y;
const int64_t ctr_1 = (int64_t)__c_blockidx_xmblockdim_xpthreadidx_x;
if (ctr_0 < 128LL && ctr_1 < 128LL) {
_data_f[ctr_0 * 128LL + ctr_1] = _data_g[ctr_0 * 128LL + ctr_1];
}
}
} // namespace kernels
void kernel_wrapper(double *RESTRICT const _data_f,
double *RESTRICT const _data_g) {
{
const dim3 __block_size{8, 8, 4};
const std::tuple<uint32_t, uint32_t, uint32_t> __work_items{128LL, 128LL,
1LL};
const dim3 __grid_size{
(std::get<0>(__work_items) + __block_size.x - 1) / __block_size.x,
(std::get<1>(__work_items) + __block_size.y - 1) / __block_size.y,
(std::get<2>(__work_items) + __block_size.z - 1) / __block_size.z};
/* clang-format off */
/* [pystencils-sfg] Formatting may add illegal spaces between angular brackets in `<<< >>>` */
kernels::gpu_kernel<<< __grid_size, __block_size, 0 >>>(_data_f, _data_g);
/* clang-format on */
}
}
In this snippet, we used the generator configuration
to change the suffix of the generated implementation file to .cu
.
When investigating the generated .cu
file, you can see that the GPU launch configuration parameters
grid size and block size are being computed automatically from the array sizes.
This behavior can be changed by modifying options in the gpu
category of the CreateKernelConfig
.
Adapting the Launch Configuration#
GPU kernel invocations usually require the user to provide a launch grid, defined by the GPU thread block size and the number of blocks on the grid. In the simplest case (seen above), pystencils-sfg will emit code that automatically computes these parameters from the size of the arrays passed to the kernel, using a default block size defined by pystencils.
The code generator also permits customization of the launch configuration. You may provide a custom block size to override the default, in which case the grid size will still be computed by dividing the array sizes by your block size. Otherwise, you can also fully take over control of both block and grid size. For both cases, instructions are given in the following.
User-Defined Block Size for Auto-Computed Grid Size#
To merely modify the block size argument while still automatically inferring the grid size,
pass a variable or expression of type dim3
to the block_size
parameter of gpu_invoke
.
Pystencils-sfg exposes two versions of dim3
, which differ primarily in their associated
runtime headers:
pystencilssfg.lang.gpu.cuda.dim3
for CUDA, andpystencilssfg.lang.gpu.hip.dim3
for HIP.
The following snippet selects the correct dim3
type according to the kernel target;
it then creates a variable of that type and turns that into an argument to the kernel invocation:
from pystencilssfg.lang.gpu import hip
with SourceFileGenerator(sfg_config) as sfg:
# ... define kernel ...
khandle = sfg.kernels.create(asm, "gpu_kernel", cfg)
# Select dim3 reflection
match target:
case ps.Target.CUDA:
from pystencilssfg.lang.gpu import cuda as gpu_api
case ps.Target.HIP:
from pystencilssfg.lang.gpu import hip as gpu_api
# Create dim3 variable and pass it to kernel invocation
block_size = gpu_api.dim3(const=True).var("block_size")
sfg.function("kernel_wrapper")(
sfg.gpu_invoke(khandle, block_size=block_size)
)
#pragma once
#include <hip/hip_runtime.h>
#define RESTRICT __restrict__
void kernel_wrapper(double *RESTRICT const _data_f,
double *RESTRICT const _data_g, const dim3 block_size);
#include "demo.hpp"
#include "pystencils_runtime/hip.h"
#include <cstdint>
#include <tuple>
namespace kernels {
__global__ void gpu_kernel(double *RESTRICT const _data_f,
double *RESTRICT const _data_g) {
const int32_t __c_blockidx_ymblockdim_ypthreadidx_y =
blockIdx.y * blockDim.y + threadIdx.y;
const int32_t __c_blockidx_xmblockdim_xpthreadidx_x =
blockIdx.x * blockDim.x + threadIdx.x;
const int64_t ctr_0 = (int64_t)__c_blockidx_ymblockdim_ypthreadidx_y;
const int64_t ctr_1 = (int64_t)__c_blockidx_xmblockdim_xpthreadidx_x;
if (ctr_0 < 128LL && ctr_1 < 128LL) {
_data_f[ctr_0 * 128LL + ctr_1] = _data_g[ctr_0 * 128LL + ctr_1];
}
}
} // namespace kernels
void kernel_wrapper(double *RESTRICT const _data_f,
double *RESTRICT const _data_g, const dim3 block_size) {
{
const dim3 __block_size{block_size};
const std::tuple<uint32_t, uint32_t, uint32_t> __work_items{128LL, 128LL,
1LL};
const dim3 __grid_size{
(std::get<0>(__work_items) + __block_size.x - 1) / __block_size.x,
(std::get<1>(__work_items) + __block_size.y - 1) / __block_size.y,
(std::get<2>(__work_items) + __block_size.z - 1) / __block_size.z};
/* clang-format off */
/* [pystencils-sfg] Formatting may add illegal spaces between angular brackets in `<<< >>>` */
kernels::gpu_kernel<<< __grid_size, __block_size, 0 >>>(_data_f, _data_g);
/* clang-format on */
}
}
Manual Launch Configurations#
To take full control of the launch configuration, we must disable its automatic inferrence
by setting the gpu.manual_launch_grid
code generator option to True
.
Then, we must pass dim3
arguments for both block_size
and grid_size
to the kernel invocation:
from pystencilssfg.lang.gpu import hip
with SourceFileGenerator(sfg_config) as sfg:
# ... define kernel ...
# Configure for manual launch config
cfg = ps.CreateKernelConfig(target=ps.Target.CUDA)
cfg.gpu.manual_launch_grid = True
# Register kernel
khandle = sfg.kernels.create(asm, "gpu_kernel", cfg)
# Create dim3 variables
from pystencilssfg.lang.gpu import cuda
block_size = cuda.dim3(const=True).var("block_size")
grid_size = cuda.dim3(const=True).var("grid_size")
sfg.function("kernel_wrapper")(
sfg.gpu_invoke(khandle, block_size=block_size, grid_size=grid_size)
)
#pragma once
#include <cuda_runtime.h>
#define RESTRICT __restrict__
void kernel_wrapper(double *RESTRICT const _data_f,
double *RESTRICT const _data_g, const dim3 block_size,
const dim3 grid_size);
#include "demo.hpp"
#include <cstdint>
namespace kernels {
__global__ void gpu_kernel(double *RESTRICT const _data_f,
double *RESTRICT const _data_g) {
const int32_t __c_blockidx_ymblockdim_ypthreadidx_y =
blockIdx.y * blockDim.y + threadIdx.y;
const int32_t __c_blockidx_xmblockdim_xpthreadidx_x =
blockIdx.x * blockDim.x + threadIdx.x;
const int64_t ctr_0 = (int64_t)__c_blockidx_ymblockdim_ypthreadidx_y;
const int64_t ctr_1 = (int64_t)__c_blockidx_xmblockdim_xpthreadidx_x;
if (ctr_0 < 128LL && ctr_1 < 128LL) {
_data_f[ctr_0 * 128LL + ctr_1] = _data_g[ctr_0 * 128LL + ctr_1];
}
}
} // namespace kernels
void kernel_wrapper(double *RESTRICT const _data_f,
double *RESTRICT const _data_g, const dim3 block_size,
const dim3 grid_size) {
/* clang-format off */
/* [pystencils-sfg] Formatting may add illegal spaces between angular brackets in `<<< >>>` */
kernels::gpu_kernel<<< grid_size, block_size, 0 >>>(_data_f, _data_g);
/* clang-format on */
}
Using Streams#
CUDA and HIP kernels can be enqueued into streams for concurrent execution.
This is mirrored in pystencils-sfg;
all overloads of gpu_invoke
take an optional stream
argument.
The stream_t
data types of both CUDA and HIP are made available
through the respective API reflections:
lang.gpu.cuda.stream_t
reflectscudaStream_t
, andlang.gpu.hip.stream_t
reflectshipStream_t
.
Here is an example that creates a variable of the HIP stream type
and passes it to gpu_invoke
:
from pystencilssfg.lang.gpu import hip
with SourceFileGenerator(sfg_config) as sfg:
# ... define kernel ...
khandle = sfg.kernels.create(asm, "gpu_kernel", cfg)
stream = hip.stream_t(const=True).var("stream")
sfg.function("kernel_wrapper")(
sfg.gpu_invoke(khandle, stream=stream)
)
#pragma once
#include <hip/hip_runtime.h>
#define RESTRICT __restrict__
void kernel_wrapper(double *RESTRICT const _data_f,
double *RESTRICT const _data_g, const hipStream_t stream);
#include "demo.hpp"
#include "pystencils_runtime/hip.h"
#include <cstdint>
#include <tuple>
namespace kernels {
__global__ void gpu_kernel(double *RESTRICT const _data_f,
double *RESTRICT const _data_g) {
const int32_t __c_blockidx_ymblockdim_ypthreadidx_y =
blockIdx.y * blockDim.y + threadIdx.y;
const int32_t __c_blockidx_xmblockdim_xpthreadidx_x =
blockIdx.x * blockDim.x + threadIdx.x;
const int64_t ctr_0 = (int64_t)__c_blockidx_ymblockdim_ypthreadidx_y;
const int64_t ctr_1 = (int64_t)__c_blockidx_xmblockdim_xpthreadidx_x;
if (ctr_0 < 128LL && ctr_1 < 128LL) {
_data_f[ctr_0 * 128LL + ctr_1] = _data_g[ctr_0 * 128LL + ctr_1];
}
}
} // namespace kernels
void kernel_wrapper(double *RESTRICT const _data_f,
double *RESTRICT const _data_g, const hipStream_t stream) {
{
const dim3 __block_size{8, 8, 4};
const std::tuple<uint32_t, uint32_t, uint32_t> __work_items{128LL, 128LL,
1LL};
const dim3 __grid_size{
(std::get<0>(__work_items) + __block_size.x - 1) / __block_size.x,
(std::get<1>(__work_items) + __block_size.y - 1) / __block_size.y,
(std::get<2>(__work_items) + __block_size.z - 1) / __block_size.z};
/* clang-format off */
/* [pystencils-sfg] Formatting may add illegal spaces between angular brackets in `<<< >>>` */
kernels::gpu_kernel<<< __grid_size, __block_size, 0, stream >>>(_data_f, _data_g);
/* clang-format on */
}
}
To Do
Defining classes, their fields constructors, and methods