Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found
Select Git revision
  • 66-absolute-access-is-probably-not-copied-correctly-after-_eval_subs
  • const_fix
  • fhennig/v2.0-deprecations
  • fma
  • gpu_bufferfield_fix
  • gpu_liveness_opts
  • holzer-master-patch-46757
  • hyteg
  • improved_comm
  • master
  • nhr-course-week
  • target_dh_refactoring
  • v2.0-dev
  • vectorization_sqrt_fix
  • zikeliml/124-rework-tutorials
  • zikeliml/Task-96-dotExporterForAST
  • last/Kerncraft
  • last/LLVM
  • last/OpenCL
  • release/0.2.1
  • release/0.2.10
  • release/0.2.11
  • release/0.2.12
  • release/0.2.13
  • release/0.2.14
  • release/0.2.15
  • release/0.2.2
  • release/0.2.3
  • release/0.2.4
  • release/0.2.6
  • release/0.2.7
  • release/0.2.8
  • release/0.2.9
  • release/0.3.0
  • release/0.3.1
  • release/0.3.2
  • release/0.3.3
  • release/0.3.4
  • release/0.4.0
  • release/0.4.1
  • release/0.4.2
  • release/0.4.3
  • release/0.4.4
  • release/1.0
  • release/1.0.1
  • release/1.1
  • release/1.1.1
  • release/1.2
  • release/1.3
  • release/1.3.1
  • release/1.3.2
  • release/1.3.3
  • release/1.3.4
  • release/1.3.5
  • release/1.3.6
  • release/1.3.7
  • release/2.0.dev0
57 results

Target

Select target project
  • anirudh.jonnalagadda/pystencils
  • hyteg/pystencils
  • jbadwaik/pystencils
  • jngrad/pystencils
  • itischler/pystencils
  • ob28imeq/pystencils
  • hoenig/pystencils
  • Bindgen/pystencils
  • hammer/pystencils
  • da15siwa/pystencils
  • holzer/pystencils
  • alexander.reinauer/pystencils
  • ec93ujoh/pystencils
  • Harke/pystencils
  • seitz/pystencils
  • pycodegen/pystencils
16 results
Select Git revision
  • 66-absolute-access-is-probably-not-copied-correctly-after-_eval_subs
  • const_fix
  • fhennig/v2.0-deprecations
  • fma
  • gpu_bufferfield_fix
  • gpu_liveness_opts
  • holzer-master-patch-46757
  • hyteg
  • improved_comm
  • master
  • nhr-course-week
  • target_dh_refactoring
  • v2.0-dev
  • vectorization_sqrt_fix
  • zikeliml/124-rework-tutorials
  • zikeliml/Task-96-dotExporterForAST
  • last/Kerncraft
  • last/LLVM
  • last/OpenCL
  • release/0.2.1
  • release/0.2.10
  • release/0.2.11
  • release/0.2.12
  • release/0.2.13
  • release/0.2.14
  • release/0.2.15
  • release/0.2.2
  • release/0.2.3
  • release/0.2.4
  • release/0.2.6
  • release/0.2.7
  • release/0.2.8
  • release/0.2.9
  • release/0.3.0
  • release/0.3.1
  • release/0.3.2
  • release/0.3.3
  • release/0.3.4
  • release/0.4.0
  • release/0.4.1
  • release/0.4.2
  • release/0.4.3
  • release/0.4.4
  • release/1.0
  • release/1.0.1
  • release/1.1
  • release/1.1.1
  • release/1.2
  • release/1.3
  • release/1.3.1
  • release/1.3.2
  • release/1.3.3
  • release/1.3.4
  • release/1.3.5
  • release/1.3.6
  • release/1.3.7
  • release/2.0.dev0
57 results
Show changes
Showing
with 261 additions and 123 deletions
...@@ -4,14 +4,6 @@ ...@@ -4,14 +4,6 @@
Symbolic Language Symbolic Language
***************** *****************
.. toctree::
:maxdepth: 2
:hidden:
field
sympyextensions
Pystencils allows you to define near-arbitrarily complex numerical kernels in its symbolic Pystencils allows you to define near-arbitrarily complex numerical kernels in its symbolic
language, which is based on the computer algebra system `SymPy <https://www.sympy.org>`_. language, which is based on the computer algebra system `SymPy <https://www.sympy.org>`_.
The pystencils code generator is able to parse and translate a large portion of SymPy's The pystencils code generator is able to parse and translate a large portion of SymPy's
...@@ -64,7 +56,7 @@ An assignment collection contains two separate lists of assignments: ...@@ -64,7 +56,7 @@ An assignment collection contains two separate lists of assignments:
into fields. into fields.
.. autosummary:: .. autosummary::
:toctree: autoapi :toctree: generated
:nosignatures: :nosignatures:
:template: autosummary/recursive_class.rst :template: autosummary/recursive_class.rst
......
...@@ -11,7 +11,7 @@ Type Creation and Conversion ...@@ -11,7 +11,7 @@ Type Creation and Conversion
---------------------------- ----------------------------
.. autosummary:: .. autosummary::
:toctree: autoapi :toctree: generated
:nosignatures: :nosignatures:
create_type create_type
...@@ -34,7 +34,7 @@ unless you have very particular needs. ...@@ -34,7 +34,7 @@ unless you have very particular needs.
:parts: 1 :parts: 1
.. autosummary:: .. autosummary::
:toctree: autoapi :toctree: generated
:nosignatures: :nosignatures:
:template: autosummary/entire_class.rst :template: autosummary/entire_class.rst
...@@ -82,10 +82,10 @@ Exceptions ...@@ -82,10 +82,10 @@ Exceptions
.. currentmodule:: pystencils.types .. currentmodule:: pystencils.types
.. autosummary:: .. autosummary::
:toctree: autoapi :toctree: generated
:nosignatures: :nosignatures:
pystencils.types.PsTypeError PsTypeError
Implementation Details Implementation Details
......
...@@ -46,12 +46,13 @@ use_cython = [ ...@@ -46,12 +46,13 @@ use_cython = [
] ]
doc = [ doc = [
'sphinx', 'sphinx',
'furo', 'pydata-sphinx-theme==0.15.4',
'nbsphinx', 'sphinx-book-theme==1.1.3', # workaround for https://github.com/executablebooks/sphinx-book-theme/issues/865
'sphinxcontrib-bibtex', 'sphinxcontrib-bibtex',
'sphinx_autodoc_typehints', 'sphinx_autodoc_typehints',
'pandoc', 'pandoc',
'sphinx_design', 'sphinx_design',
'myst-nb'
] ]
tests = [ tests = [
'pytest', 'pytest',
......
...@@ -5,6 +5,7 @@ from .defaults import DEFAULTS ...@@ -5,6 +5,7 @@ from .defaults import DEFAULTS
from . import fd from . import fd
from . import stencil as stencil from . import stencil as stencil
from .display_utils import get_code_obj, get_code_str, show_code, to_dot from .display_utils import get_code_obj, get_code_str, show_code, to_dot
from .inspection import inspect
from .field import Field, FieldType, fields from .field import Field, FieldType, fields
from .types import create_type, create_numeric_type from .types import create_type, create_numeric_type
from .cache import clear_cache from .cache import clear_cache
...@@ -37,7 +38,6 @@ from .sympyextensions.typed_sympy import TypedSymbol, DynamicType ...@@ -37,7 +38,6 @@ from .sympyextensions.typed_sympy import TypedSymbol, DynamicType
from .sympyextensions import SymbolCreator from .sympyextensions import SymbolCreator
from .datahandling import create_data_handling from .datahandling import create_data_handling
__all__ = [ __all__ = [
"Field", "Field",
"FieldType", "FieldType",
...@@ -63,6 +63,7 @@ __all__ = [ ...@@ -63,6 +63,7 @@ __all__ = [
"to_dot", "to_dot",
"get_code_obj", "get_code_obj",
"get_code_str", "get_code_str",
"inspect",
"AssignmentCollection", "AssignmentCollection",
"Assignment", "Assignment",
"AddAugmentedAssignment", "AddAugmentedAssignment",
......
from .base_printer import EmissionError
from .c_printer import emit_code, CAstPrinter from .c_printer import emit_code, CAstPrinter
from .ir_printer import emit_ir, IRAstPrinter from .ir_printer import emit_ir, IRAstPrinter
__all__ = ["emit_code", "CAstPrinter", "emit_ir", "IRAstPrinter"] __all__ = ["emit_code", "CAstPrinter", "emit_ir", "IRAstPrinter", "EmissionError"]
...@@ -189,7 +189,7 @@ class BasePrinter(ABC): ...@@ -189,7 +189,7 @@ class BasePrinter(ABC):
pc.indent_level += self._indent_width pc.indent_level += self._indent_width
interior = "\n".join(self.visit(stmt, pc) for stmt in statements) + "\n" interior = "\n".join(self.visit(stmt, pc) for stmt in statements) + "\n"
pc.indent_level -= self._indent_width pc.indent_level -= self._indent_width
return pc.indent("{\n") + interior + pc.indent("}\n") return pc.indent("{\n") + interior + pc.indent("}")
case PsStatement(expr): case PsStatement(expr):
return pc.indent(f"{self.visit(expr, pc)};") return pc.indent(f"{self.visit(expr, pc)};")
......
...@@ -5,7 +5,7 @@ from pystencils.backend.memory import PsSymbol ...@@ -5,7 +5,7 @@ from pystencils.backend.memory import PsSymbol
from .base_printer import BasePrinter from .base_printer import BasePrinter
from ..kernelfunction import KernelFunction from ..kernelfunction import KernelFunction
from ...types import PsType, PsArrayType, PsScalarType from ...types import PsType, PsArrayType, PsScalarType, PsTypeError
from ..ast.expressions import PsBufferAcc from ..ast.expressions import PsBufferAcc
from ..ast.vector import PsVecMemAcc from ..ast.vector import PsVecMemAcc
...@@ -23,7 +23,10 @@ class CAstPrinter(BasePrinter): ...@@ -23,7 +23,10 @@ class CAstPrinter(BasePrinter):
def visit(self, node: PsAstNode, pc: PrinterCtx) -> str: def visit(self, node: PsAstNode, pc: PrinterCtx) -> str:
match node: match node:
case PsVecMemAcc(): case PsVecMemAcc():
raise EmissionError("Cannot print vectorized array accesses to C code.") raise EmissionError(
f"Unable to print C code for vector memory access {node}.\n"
f"Vectorized memory accesses must be mapped to intrinsics before emission."
)
case PsBufferAcc(): case PsBufferAcc():
raise EmissionError( raise EmissionError(
...@@ -33,7 +36,7 @@ class CAstPrinter(BasePrinter): ...@@ -33,7 +36,7 @@ class CAstPrinter(BasePrinter):
case _: case _:
return super().visit(node, pc) return super().visit(node, pc)
def _symbol_decl(self, symb: PsSymbol): def _symbol_decl(self, symb: PsSymbol):
dtype = symb.get_dtype() dtype = symb.get_dtype()
...@@ -52,11 +55,12 @@ class CAstPrinter(BasePrinter): ...@@ -52,11 +55,12 @@ class CAstPrinter(BasePrinter):
def _constant_literal(self, constant: PsConstant): def _constant_literal(self, constant: PsConstant):
dtype = constant.get_dtype() dtype = constant.get_dtype()
if not isinstance(dtype, PsScalarType): if not isinstance(dtype, PsScalarType):
raise EmissionError( raise EmissionError("Cannot print literals for non-scalar constants.")
"Cannot print literals for non-scalar constants."
)
return dtype.create_literal(constant.value) return dtype.create_literal(constant.value)
def _type_str(self, dtype: PsType): def _type_str(self, dtype: PsType):
return dtype.c_string() try:
return dtype.c_string()
except PsTypeError:
raise EmissionError(f"Unable to print type {dtype} as a C data type.")
...@@ -59,7 +59,7 @@ class IRAstPrinter(BasePrinter): ...@@ -59,7 +59,7 @@ class IRAstPrinter(BasePrinter):
stride_code = "" if stride is None else f", stride={stride}" stride_code = "" if stride is None else f", stride={stride}"
code = f"vec_load< {lanes}{stride_code} >({ptr_code}, {offset_code})" code = f"vec_memacc< {lanes}{stride_code} >({ptr_code}, {offset_code})"
return pc.parenthesize(code, Ops.Subscript) return pc.parenthesize(code, Ops.Subscript)
case PsVecBroadcast(lanes, operand): case PsVecBroadcast(lanes, operand):
......
...@@ -41,6 +41,7 @@ class CupyKernelWrapper(KernelWrapper): ...@@ -41,6 +41,7 @@ class CupyKernelWrapper(KernelWrapper):
self._kfunc: GpuKernelFunction = kfunc self._kfunc: GpuKernelFunction = kfunc
self._raw_kernel = raw_kernel self._raw_kernel = raw_kernel
self._block_size = block_size self._block_size = block_size
self._num_blocks: tuple[int, int, int] | None = None
self._args_cache: dict[Any, tuple] = dict() self._args_cache: dict[Any, tuple] = dict()
@property @property
...@@ -59,6 +60,14 @@ class CupyKernelWrapper(KernelWrapper): ...@@ -59,6 +60,14 @@ class CupyKernelWrapper(KernelWrapper):
def block_size(self, bs: tuple[int, int, int]): def block_size(self, bs: tuple[int, int, int]):
self._block_size = bs self._block_size = bs
@property
def num_blocks(self) -> tuple[int, int, int] | None:
return self._num_blocks
@num_blocks.setter
def num_blocks(self, nb: tuple[int, int, int] | None):
self._num_blocks = nb
def __call__(self, **kwargs: Any): def __call__(self, **kwargs: Any):
kernel_args, launch_grid = self._get_cached_args(**kwargs) kernel_args, launch_grid = self._get_cached_args(**kwargs)
device = self._get_device(kernel_args) device = self._get_device(kernel_args)
...@@ -72,7 +81,7 @@ class CupyKernelWrapper(KernelWrapper): ...@@ -72,7 +81,7 @@ class CupyKernelWrapper(KernelWrapper):
return devices.pop() return devices.pop()
def _get_cached_args(self, **kwargs): def _get_cached_args(self, **kwargs):
key = (self._block_size,) + tuple((k, id(v)) for k, v in kwargs.items()) key = (self._block_size, self._num_blocks) + tuple((k, id(v)) for k, v in kwargs.items())
if key not in self._args_cache: if key not in self._args_cache:
args = self._get_args(**kwargs) args = self._get_args(**kwargs)
...@@ -185,25 +194,36 @@ class CupyKernelWrapper(KernelWrapper): ...@@ -185,25 +194,36 @@ class CupyKernelWrapper(KernelWrapper):
symbolic_threads_range = self._kfunc.threads_range symbolic_threads_range = self._kfunc.threads_range
threads_range: list[int] = [ if self._num_blocks is not None:
evaluate_expression(expr, valuation) launch_grid = LaunchGrid(self._num_blocks, self._block_size)
for expr in symbolic_threads_range.num_work_items
]
if symbolic_threads_range.dim < 3: elif symbolic_threads_range is not None:
threads_range += [1] * (3 - symbolic_threads_range.dim) threads_range: list[int] = [
evaluate_expression(expr, valuation)
for expr in symbolic_threads_range.num_work_items
]
def div_ceil(a, b): if symbolic_threads_range.dim < 3:
return a // b if a % b == 0 else a // b + 1 threads_range += [1] * (3 - symbolic_threads_range.dim)
# TODO: Refine this? def div_ceil(a, b):
grid_size = tuple( return a // b if a % b == 0 else a // b + 1
div_ceil(threads, tpb)
for threads, tpb in zip(threads_range, self._block_size) # TODO: Refine this?
) num_blocks = tuple(
assert len(grid_size) == 3 div_ceil(threads, tpb)
for threads, tpb in zip(threads_range, self._block_size)
)
assert len(num_blocks) == 3
launch_grid = LaunchGrid(num_blocks, self._block_size)
launch_grid = LaunchGrid(grid_size, self._block_size) else:
raise JitError(
"Unable to determine launch grid for GPU kernel invocation: "
"No manual grid size was specified, and the number of threads could not "
"be determined automatically."
)
return tuple(args), launch_grid return tuple(args), launch_grid
......
...@@ -139,6 +139,13 @@ class AstFactory: ...@@ -139,6 +139,13 @@ class AstFactory:
self._typify(self.parse_index(iter_slice) + self.parse_index(1)) self._typify(self.parse_index(iter_slice) + self.parse_index(1))
) )
step = self.parse_index(1) step = self.parse_index(1)
if normalize_to is not None:
upper_limit = self.parse_index(normalize_to)
if isinstance(start, PsConstantExpr) and start.constant.value < 0:
start = fold(self._typify(upper_limit.clone() + start))
stop = fold(self._typify(upper_limit.clone() + stop))
else: else:
start = self._parse_any_index( start = self._parse_any_index(
iter_slice.start if iter_slice.start is not None else 0 iter_slice.start if iter_slice.start is not None else 0
...@@ -157,21 +164,21 @@ class AstFactory: ...@@ -157,21 +164,21 @@ class AstFactory:
f"Invalid value for `slice.step`: {step.constant.value}" f"Invalid value for `slice.step`: {step.constant.value}"
) )
if normalize_to is not None: if normalize_to is not None:
upper_limit = self.parse_index(normalize_to) upper_limit = self.parse_index(normalize_to)
if isinstance(start, PsConstantExpr) and start.constant.value < 0: if isinstance(start, PsConstantExpr) and start.constant.value < 0:
start = fold(self._typify(upper_limit.clone() + start)) start = fold(self._typify(upper_limit.clone() + start))
if stop is None: if stop is None:
stop = upper_limit stop = upper_limit
elif isinstance(stop, PsConstantExpr) and stop.constant.value < 0: elif isinstance(stop, PsConstantExpr) and stop.constant.value < 0:
stop = fold(self._typify(upper_limit.clone() + stop)) stop = fold(self._typify(upper_limit.clone() + stop))
elif stop is None:
raise ValueError(
"Cannot parse a slice with `stop == None` if no normalization limit is given"
)
elif stop is None:
raise ValueError(
"Cannot parse a slice with `stop == None` if no normalization limit is given"
)
assert stop is not None # for mypy assert stop is not None # for mypy
return start, stop, step return start, stop, step
......
...@@ -6,6 +6,7 @@ from functools import reduce ...@@ -6,6 +6,7 @@ from functools import reduce
from operator import mul from operator import mul
from ...defaults import DEFAULTS from ...defaults import DEFAULTS
from ...config import _AUTO_TYPE, AUTO
from ...simp import AssignmentCollection from ...simp import AssignmentCollection
from ...field import Field, FieldType from ...field import Field, FieldType
...@@ -195,21 +196,25 @@ class FullIterationSpace(IterationSpace): ...@@ -195,21 +196,25 @@ class FullIterationSpace(IterationSpace):
def dimensions(self): def dimensions(self):
"""The dimensions of this iteration space""" """The dimensions of this iteration space"""
return self._dimensions return self._dimensions
@property
def counters(self) -> tuple[PsSymbol, ...]:
return tuple(dim.counter for dim in self._dimensions)
@property @property
def lower(self): def lower(self) -> tuple[PsExpression, ...]:
"""Lower limits of each dimension""" """Lower limits of each dimension"""
return (dim.start for dim in self._dimensions) return tuple(dim.start for dim in self._dimensions)
@property @property
def upper(self): def upper(self) -> tuple[PsExpression, ...]:
"""Upper limits of each dimension""" """Upper limits of each dimension"""
return (dim.stop for dim in self._dimensions) return tuple(dim.stop for dim in self._dimensions)
@property @property
def steps(self): def steps(self) -> tuple[PsExpression, ...]:
"""Iteration steps of each dimension""" """Iteration steps of each dimension"""
return (dim.step for dim in self._dimensions) return tuple(dim.step for dim in self._dimensions)
@property @property
def archetype_field(self) -> Field | None: def archetype_field(self) -> Field | None:
...@@ -412,7 +417,7 @@ def create_sparse_iteration_space( ...@@ -412,7 +417,7 @@ def create_sparse_iteration_space(
def create_full_iteration_space( def create_full_iteration_space(
ctx: KernelCreationContext, ctx: KernelCreationContext,
assignments: AssignmentCollection, assignments: AssignmentCollection,
ghost_layers: None | int | Sequence[int | tuple[int, int]] = None, ghost_layers: None | _AUTO_TYPE | int | Sequence[int | tuple[int, int]] = None,
iteration_slice: None | int | slice | tuple[int | slice, ...] = None, iteration_slice: None | int | slice | tuple[int | slice, ...] = None,
) -> IterationSpace: ) -> IterationSpace:
assert not ctx.fields.index_fields assert not ctx.fields.index_fields
...@@ -452,16 +457,7 @@ def create_full_iteration_space( ...@@ -452,16 +457,7 @@ def create_full_iteration_space(
# Otherwise, if an iteration slice was specified, use that # Otherwise, if an iteration slice was specified, use that
# Otherwise, use the inferred ghost layers # Otherwise, use the inferred ghost layers
if ghost_layers is not None: if ghost_layers is AUTO:
ctx.metadata["ghost_layers"] = ghost_layers
return FullIterationSpace.create_with_ghost_layers(
ctx, ghost_layers, archetype_field
)
elif iteration_slice is not None:
return FullIterationSpace.create_from_slice(
ctx, iteration_slice, archetype_field
)
else:
if len(domain_field_accesses) > 0: if len(domain_field_accesses) > 0:
inferred_gls = max( inferred_gls = max(
[fa.required_ghost_layers for fa in domain_field_accesses] [fa.required_ghost_layers for fa in domain_field_accesses]
...@@ -473,3 +469,15 @@ def create_full_iteration_space( ...@@ -473,3 +469,15 @@ def create_full_iteration_space(
return FullIterationSpace.create_with_ghost_layers( return FullIterationSpace.create_with_ghost_layers(
ctx, inferred_gls, archetype_field ctx, inferred_gls, archetype_field
) )
elif ghost_layers is not None:
assert not isinstance(ghost_layers, _AUTO_TYPE)
ctx.metadata["ghost_layers"] = ghost_layers
return FullIterationSpace.create_with_ghost_layers(
ctx, ghost_layers, archetype_field
)
elif iteration_slice is not None:
return FullIterationSpace.create_from_slice(
ctx, iteration_slice, archetype_field
)
else:
assert False, "unreachable code"
...@@ -259,10 +259,12 @@ def create_cpu_kernel_function( ...@@ -259,10 +259,12 @@ def create_cpu_kernel_function(
class GpuKernelFunction(KernelFunction): class GpuKernelFunction(KernelFunction):
"""Internal representation of a kernel function targeted at CUDA GPUs."""
def __init__( def __init__(
self, self,
body: PsBlock, body: PsBlock,
threads_range: GpuThreadsRange, threads_range: GpuThreadsRange | None,
target: Target, target: Target,
name: str, name: str,
parameters: Sequence[KernelParameter], parameters: Sequence[KernelParameter],
...@@ -276,7 +278,8 @@ class GpuKernelFunction(KernelFunction): ...@@ -276,7 +278,8 @@ class GpuKernelFunction(KernelFunction):
self._threads_range = threads_range self._threads_range = threads_range
@property @property
def threads_range(self) -> GpuThreadsRange: def threads_range(self) -> GpuThreadsRange | None:
"""Object exposing the total size of the launch grid this kernel expects to be executed with."""
return self._threads_range return self._threads_range
...@@ -284,14 +287,16 @@ def create_gpu_kernel_function( ...@@ -284,14 +287,16 @@ def create_gpu_kernel_function(
ctx: KernelCreationContext, ctx: KernelCreationContext,
platform: Platform, platform: Platform,
body: PsBlock, body: PsBlock,
threads_range: GpuThreadsRange, threads_range: GpuThreadsRange | None,
function_name: str, function_name: str,
target_spec: Target, target_spec: Target,
jit: JitBase, jit: JitBase,
): ):
undef_symbols = collect_undefined_symbols(body) undef_symbols = collect_undefined_symbols(body)
for threads in threads_range.num_work_items:
undef_symbols |= collect_undefined_symbols(threads) if threads_range is not None:
for threads in threads_range.num_work_items:
undef_symbols |= collect_undefined_symbols(threads)
params = _get_function_params(ctx, undef_symbols) params = _get_function_params(ctx, undef_symbols)
req_headers = _get_headers(ctx, platform, body) req_headers = _get_headers(ctx, platform, body)
......
from warnings import warn
from ...types import constify from ...types import constify
from ..exceptions import MaterializationError from ..exceptions import MaterializationError
from .generic_gpu import GenericGpu, GpuThreadsRange from .generic_gpu import GenericGpu, GpuThreadsRange
...@@ -7,7 +9,7 @@ from ..kernelcreation import ( ...@@ -7,7 +9,7 @@ from ..kernelcreation import (
IterationSpace, IterationSpace,
FullIterationSpace, FullIterationSpace,
SparseIterationSpace, SparseIterationSpace,
AstFactory AstFactory,
) )
from ..kernelcreation.context import KernelCreationContext from ..kernelcreation.context import KernelCreationContext
...@@ -43,6 +45,7 @@ GRID_DIM = [ ...@@ -43,6 +45,7 @@ GRID_DIM = [
class CudaPlatform(GenericGpu): class CudaPlatform(GenericGpu):
"""Platform for CUDA-based GPUs."""
def __init__( def __init__(
self, ctx: KernelCreationContext, indexing_cfg: GpuIndexingConfig | None = None self, ctx: KernelCreationContext, indexing_cfg: GpuIndexingConfig | None = None
...@@ -57,7 +60,7 @@ class CudaPlatform(GenericGpu): ...@@ -57,7 +60,7 @@ class CudaPlatform(GenericGpu):
def materialize_iteration_space( def materialize_iteration_space(
self, body: PsBlock, ispace: IterationSpace self, body: PsBlock, ispace: IterationSpace
) -> tuple[PsBlock, GpuThreadsRange]: ) -> tuple[PsBlock, GpuThreadsRange | None]:
if isinstance(ispace, FullIterationSpace): if isinstance(ispace, FullIterationSpace):
return self._prepend_dense_translation(body, ispace) return self._prepend_dense_translation(body, ispace)
elif isinstance(ispace, SparseIterationSpace): elif isinstance(ispace, SparseIterationSpace):
...@@ -112,6 +115,11 @@ class CudaPlatform(GenericGpu): ...@@ -112,6 +115,11 @@ class CudaPlatform(GenericGpu):
case MathFunctions.Abs if dtype.width == 16: case MathFunctions.Abs if dtype.width == 16:
cfunc = CFunction(" __habs", arg_types, dtype) cfunc = CFunction(" __habs", arg_types, dtype)
case _:
raise MaterializationError(
f"Cannot materialize call to function {func}"
)
call.function = cfunc call.function = cfunc
return call return call
...@@ -123,9 +131,21 @@ class CudaPlatform(GenericGpu): ...@@ -123,9 +131,21 @@ class CudaPlatform(GenericGpu):
def _prepend_dense_translation( def _prepend_dense_translation(
self, body: PsBlock, ispace: FullIterationSpace self, body: PsBlock, ispace: FullIterationSpace
) -> tuple[PsBlock, GpuThreadsRange]: ) -> tuple[PsBlock, GpuThreadsRange | None]:
dimensions = ispace.dimensions_in_loop_order() dimensions = ispace.dimensions_in_loop_order()
launch_config = GpuThreadsRange.from_ispace(ispace)
if not self._cfg.manual_launch_grid:
try:
threads_range = GpuThreadsRange.from_ispace(ispace)
except MaterializationError as e:
warn(
str(e.args[0])
+ "\nIf this is intended, set `manual_launch_grid=True` in the code generator configuration.",
UserWarning,
)
threads_range = None
else:
threads_range = None
indexing_decls = [] indexing_decls = []
conds = [] conds = []
...@@ -146,6 +166,8 @@ class CudaPlatform(GenericGpu): ...@@ -146,6 +166,8 @@ class CudaPlatform(GenericGpu):
if not self._cfg.omit_range_check: if not self._cfg.omit_range_check:
conds.append(PsLt(ctr, dim.stop)) conds.append(PsLt(ctr, dim.stop))
indexing_decls = indexing_decls[::-1]
if conds: if conds:
condition: PsExpression = conds[0] condition: PsExpression = conds[0]
for cond in conds[1:]: for cond in conds[1:]:
...@@ -155,7 +177,7 @@ class CudaPlatform(GenericGpu): ...@@ -155,7 +177,7 @@ class CudaPlatform(GenericGpu):
body.statements = indexing_decls + body.statements body.statements = indexing_decls + body.statements
ast = body ast = body
return ast, launch_config return ast, threads_range
def _prepend_sparse_translation( def _prepend_sparse_translation(
self, body: PsBlock, ispace: SparseIterationSpace self, body: PsBlock, ispace: SparseIterationSpace
......
...@@ -10,6 +10,7 @@ from ..kernelcreation.iteration_space import ( ...@@ -10,6 +10,7 @@ from ..kernelcreation.iteration_space import (
SparseIterationSpace, SparseIterationSpace,
) )
from .platform import Platform from .platform import Platform
from ..exceptions import MaterializationError
class GpuThreadsRange: class GpuThreadsRange:
...@@ -48,6 +49,15 @@ class GpuThreadsRange: ...@@ -48,6 +49,15 @@ class GpuThreadsRange:
@property @property
def dim(self) -> int: def dim(self) -> int:
return self._dim return self._dim
def __str__(self) -> str:
rep = "GpuThreadsRange { "
rep += "; ".join(f"{x}: {w}" for x, w in zip("xyz", self._num_work_items))
rep += " }"
return rep
def _repr_html_(self) -> str:
return str(self)
@staticmethod @staticmethod
def _from_full_ispace(ispace: FullIterationSpace) -> GpuThreadsRange: def _from_full_ispace(ispace: FullIterationSpace) -> GpuThreadsRange:
...@@ -56,6 +66,19 @@ class GpuThreadsRange: ...@@ -56,6 +66,19 @@ class GpuThreadsRange:
raise NotImplementedError( raise NotImplementedError(
f"Cannot create a GPU threads range for an {len(dimensions)}-dimensional iteration space" f"Cannot create a GPU threads range for an {len(dimensions)}-dimensional iteration space"
) )
from ..ast.analysis import collect_undefined_symbols as collect
for dim in dimensions:
symbs = collect(dim.start) | collect(dim.stop) | collect(dim.step)
for ctr in ispace.counters:
if ctr in symbs:
raise MaterializationError(
"Unable to construct GPU threads range for iteration space: "
f"Limits of dimension counter {dim.counter.name} "
f"depend on another dimension's counter {ctr.name}"
)
work_items = [ispace.actual_iterations(dim) for dim in dimensions] work_items = [ispace.actual_iterations(dim) for dim in dimensions]
return GpuThreadsRange(work_items) return GpuThreadsRange(work_items)
...@@ -63,6 +86,6 @@ class GpuThreadsRange: ...@@ -63,6 +86,6 @@ class GpuThreadsRange:
class GenericGpu(Platform): class GenericGpu(Platform):
@abstractmethod @abstractmethod
def materialize_iteration_space( def materialize_iteration_space(
self, block: PsBlock, ispace: IterationSpace self, body: PsBlock, ispace: IterationSpace
) -> tuple[PsBlock, GpuThreadsRange]: ) -> tuple[PsBlock, GpuThreadsRange | None]:
pass pass
...@@ -27,7 +27,7 @@ class Platform(ABC): ...@@ -27,7 +27,7 @@ class Platform(ABC):
@abstractmethod @abstractmethod
def materialize_iteration_space( def materialize_iteration_space(
self, block: PsBlock, ispace: IterationSpace self, body: PsBlock, ispace: IterationSpace
) -> PsBlock | tuple[PsBlock, Any]: ) -> PsBlock | tuple[PsBlock, Any]:
pass pass
......
...@@ -8,7 +8,7 @@ from ..kernelcreation import KernelCreationContext ...@@ -8,7 +8,7 @@ from ..kernelcreation import KernelCreationContext
from ..constants import PsConstant from ..constants import PsConstant
from ..ast import PsAstNode from ..ast import PsAstNode
from ..ast.structural import PsLoop, PsBlock, PsDeclaration from ..ast.structural import PsLoop, PsBlock, PsDeclaration
from ..ast.expressions import PsExpression from ..ast.expressions import PsExpression, PsTernary, PsGt
from ..ast.vector import PsVecBroadcast from ..ast.vector import PsVecBroadcast
from ..ast.analysis import collect_undefined_symbols from ..ast.analysis import collect_undefined_symbols
...@@ -18,7 +18,7 @@ from .rewrite import substitute_symbols ...@@ -18,7 +18,7 @@ from .rewrite import substitute_symbols
class LoopVectorizer: class LoopVectorizer:
"""Vectorize loops. """Vectorize loops.
The loop vectorizer provides methods to vectorize single loops inside an AST The loop vectorizer provides methods to vectorize single loops inside an AST
using a given number of vector lanes. using a given number of vector lanes.
During vectorization, the loop body is transformed using the `AstVectorizer`, During vectorization, the loop body is transformed using the `AstVectorizer`,
...@@ -64,29 +64,26 @@ class LoopVectorizer: ...@@ -64,29 +64,26 @@ class LoopVectorizer:
@overload @overload
def vectorize_select_loops( def vectorize_select_loops(
self, node: PsBlock, predicate: Callable[[PsLoop], bool] self, node: PsBlock, predicate: Callable[[PsLoop], bool]
) -> PsBlock: ) -> PsBlock: ...
...
@overload @overload
def vectorize_select_loops( def vectorize_select_loops(
self, node: PsLoop, predicate: Callable[[PsLoop], bool] self, node: PsLoop, predicate: Callable[[PsLoop], bool]
) -> PsLoop | PsBlock: ) -> PsLoop | PsBlock: ...
...
@overload @overload
def vectorize_select_loops( def vectorize_select_loops(
self, node: PsAstNode, predicate: Callable[[PsLoop], bool] self, node: PsAstNode, predicate: Callable[[PsLoop], bool]
) -> PsAstNode: ) -> PsAstNode: ...
...
def vectorize_select_loops( def vectorize_select_loops(
self, node: PsAstNode, predicate: Callable[[PsLoop], bool] self, node: PsAstNode, predicate: Callable[[PsLoop], bool]
) -> PsAstNode: ) -> PsAstNode:
"""Select and vectorize loops from a syntax tree according to a predicate. """Select and vectorize loops from a syntax tree according to a predicate.
Finds each loop inside a subtree and evaluates ``predicate`` on them. Finds each loop inside a subtree and evaluates ``predicate`` on them.
If ``predicate(loop)`` evaluates to `True`, the loop is vectorized. If ``predicate(loop)`` evaluates to `True`, the loop is vectorized.
Loops nested inside a vectorized loop will not be processed. Loops nested inside a vectorized loop will not be processed.
Args: Args:
...@@ -139,7 +136,7 @@ class LoopVectorizer: ...@@ -139,7 +136,7 @@ class LoopVectorizer:
# Generate vectorized loop body # Generate vectorized loop body
simd_body = self._vectorize_ast(loop.body, vc) simd_body = self._vectorize_ast(loop.body, vc)
if vector_ctr in collect_undefined_symbols(simd_body): if vector_ctr in collect_undefined_symbols(simd_body):
simd_body.statements.insert(0, vector_counter_decl) simd_body.statements.insert(0, vector_counter_decl)
...@@ -186,20 +183,31 @@ class LoopVectorizer: ...@@ -186,20 +183,31 @@ class LoopVectorizer:
trailing_start = self._ctx.get_new_symbol( trailing_start = self._ctx.get_new_symbol(
f"__{scalar_ctr.name}_trailing_start", scalar_ctr.get_dtype() f"__{scalar_ctr.name}_trailing_start", scalar_ctr.get_dtype()
) )
trailing_start_decl = self._type_fold( trailing_start_decl = self._type_fold(
PsDeclaration( PsDeclaration(
PsExpression.make(trailing_start), PsExpression.make(trailing_start),
( PsTernary(
# If at least one vectorized iteration took place...
PsGt(
PsExpression.make(simd_stop),
simd_start.clone(),
),
# start from the smallest non-valid multiple of simd_step, offset from simd_start
( (
PsExpression.make(simd_stop) (
- simd_start.clone() PsExpression.make(simd_stop)
- PsExpression.make(PsConstant(1)) - simd_start.clone()
- PsExpression.make(PsConstant(1))
)
/ PsExpression.make(simd_step)
+ PsExpression.make(PsConstant(1))
) )
/ PsExpression.make(simd_step) * PsExpression.make(simd_step)
+ PsExpression.make(PsConstant(1)) + simd_start.clone(),
) # otherwise start at zero
* PsExpression.make(simd_step) simd_start.clone(),
+ simd_start.clone(), ),
) )
) )
......
...@@ -314,7 +314,7 @@ class BoundaryHandling: ...@@ -314,7 +314,7 @@ class BoundaryHandling:
def _create_boundary_kernel(self, symbolic_field, symbolic_index_field, boundary_obj): def _create_boundary_kernel(self, symbolic_field, symbolic_index_field, boundary_obj):
return create_boundary_kernel(symbolic_field, symbolic_index_field, self.stencil, boundary_obj, return create_boundary_kernel(symbolic_field, symbolic_index_field, self.stencil, boundary_obj,
target=self._target,) # cpu_openmp=self._openmp) TODO: replace target=self._target, cpu_openmp=self._openmp)
def _create_index_fields(self): def _create_index_fields(self):
dh = self._data_handling dh = self._data_handling
......
...@@ -28,6 +28,19 @@ class PsOptionsError(Exception): ...@@ -28,6 +28,19 @@ class PsOptionsError(Exception):
"""Indicates an option clash in the `CreateKernelConfig`.""" """Indicates an option clash in the `CreateKernelConfig`."""
class _AUTO_TYPE:
...
AUTO = _AUTO_TYPE()
"""Special value that can be passed to some options for invoking automatic behaviour.
Currently, these options permit `AUTO`:
- `ghost_layers <CreateKernelConfig.ghost_layers>`
"""
@dataclass @dataclass
class OpenMpConfig: class OpenMpConfig:
"""Parameters controlling kernel parallelization using OpenMP.""" """Parameters controlling kernel parallelization using OpenMP."""
...@@ -182,6 +195,14 @@ class GpuIndexingConfig: ...@@ -182,6 +195,14 @@ class GpuIndexingConfig:
block_size: tuple[int, int, int] | None = None block_size: tuple[int, int, int] | None = None
"""Desired block size for the execution of GPU kernels. May be overridden later by the runtime system.""" """Desired block size for the execution of GPU kernels. May be overridden later by the runtime system."""
manual_launch_grid: bool = False
"""Always require a manually specified launch grid when running this kernel.
If set to `True`, the code generator will not attempt to infer the size of
the launch grid from the kernel.
The launch grid will then have to be specified manually at runtime.
"""
sycl_automatic_block_size: bool = True sycl_automatic_block_size: bool = True
"""If set to `True` while generating for `Target.SYCL`, let the SYCL runtime decide on the block size. """If set to `True` while generating for `Target.SYCL`, let the SYCL runtime decide on the block size.
...@@ -213,32 +234,43 @@ class CreateKernelConfig: ...@@ -213,32 +234,43 @@ class CreateKernelConfig:
function_name: str = "kernel" function_name: str = "kernel"
"""Name of the generated function""" """Name of the generated function"""
ghost_layers: None | int | Sequence[int | tuple[int, int]] = None ghost_layers: None | _AUTO_TYPE | int | Sequence[int | tuple[int, int]] = None
"""Specifies the number of ghost layers of the iteration region. """Specifies the number of ghost layers of the iteration region.
Options: Options:
- `None`: Required ghost layers are inferred from field accesses - :py:data:`AUTO <pystencils.config.AUTO>`: Required ghost layers are inferred from field accesses
- `int`: A uniform number of ghost layers in each spatial coordinate is applied - `int`: A uniform number of ghost layers in each spatial coordinate is applied
- ``Sequence[int, tuple[int, int]]``: Ghost layers are specified for each spatial coordinate. - ``Sequence[int, tuple[int, int]]``: Ghost layers are specified for each spatial coordinate.
In each coordinate, a single integer specifies the ghost layers at both the lower and upper iteration limit, In each coordinate, a single integer specifies the ghost layers at both the lower and upper iteration limit,
while a pair of integers specifies the lower and upper ghost layers separately. while a pair of integers specifies the lower and upper ghost layers separately.
When manually specifying ghost layers, it is the user's responsibility to avoid out-of-bounds memory accesses. When manually specifying ghost layers, it is the user's responsibility to avoid out-of-bounds memory accesses.
If ``ghost_layers=None`` is specified, the iteration region may otherwise be set using the `iteration_slice` option.
.. note::
At most one of `ghost_layers`, `iteration_slice`, and `index_field` may be set.
""" """
iteration_slice: None | Sequence[slice] = None iteration_slice: None | int | slice | tuple[int | slice] = None
"""Specifies the kernel's iteration slice. """Specifies the kernel's iteration slice.
`iteration_slice` may only be set if ``ghost_layers=None``. Example:
If it is set, a slice must be specified for each spatial coordinate. >>> cfg = CreateKernelConfig(
TODO: Specification of valid slices and their behaviour ... iteration_slice=ps.make_slice[3:14, 2:-2]
... )
>>> cfg.iteration_slice
(slice(3, 14, None), slice(2, -2, None))
.. note::
At most one of `ghost_layers`, `iteration_slice`, and `index_field` may be set.
""" """
index_field: Field | None = None index_field: Field | None = None
"""Index field for a sparse kernel. """Index field for a sparse kernel.
If this option is set, a sparse kernel with the given field as index field will be generated. If this option is set, a sparse kernel with the given field as index field will be generated.
.. note::
At most one of `ghost_layers`, `iteration_slice`, and `index_field` may be set.
""" """
"""Data Types""" """Data Types"""
......
...@@ -291,7 +291,10 @@ class SerialDataHandling(DataHandling): ...@@ -291,7 +291,10 @@ class SerialDataHandling(DataHandling):
def synchronization_function(self, names, stencil=None, target=None, functor=None, **_): def synchronization_function(self, names, stencil=None, target=None, functor=None, **_):
if target is None: if target is None:
target = self.default_target target = self.default_target
assert target in (Target.CPU, Target.GPU)
if not (target.is_cpu() or target == Target.CUDA):
raise ValueError(f"Unsupported target: {target}")
if not hasattr(names, '__len__') or type(names) is str: if not hasattr(names, '__len__') or type(names) is str:
names = [names] names = [names]
...@@ -325,7 +328,7 @@ class SerialDataHandling(DataHandling): ...@@ -325,7 +328,7 @@ class SerialDataHandling(DataHandling):
values_per_cell = values_per_cell[0] values_per_cell = values_per_cell[0]
if len(filtered_stencil) > 0: if len(filtered_stencil) > 0:
if target == Target.CPU: if target.is_cpu():
if functor is None: if functor is None:
from pystencils.slicing import get_periodic_boundary_functor from pystencils.slicing import get_periodic_boundary_functor
functor = get_periodic_boundary_functor functor = get_periodic_boundary_functor
......
...@@ -988,24 +988,35 @@ def create_numpy_array_with_layout(shape, layout, alignment=False, byte_offset=0 ...@@ -988,24 +988,35 @@ def create_numpy_array_with_layout(shape, layout, alignment=False, byte_offset=0
def spatial_layout_string_to_tuple(layout_str: str, dim: int) -> Tuple[int, ...]: def spatial_layout_string_to_tuple(layout_str: str, dim: int) -> Tuple[int, ...]:
if layout_str in ('fzyx', 'zyxf'): if dim <= 0:
assert dim <= 3 raise ValueError("Dimensionality must be positive")
return tuple(reversed(range(dim)))
layout_str = layout_str.lower()
if layout_str in ('fzyx', 'f', 'reverse_numpy', 'SoA'): if layout_str in ('fzyx', 'zyxf', 'soa', 'aos'):
if dim > 3:
raise ValueError(f"Invalid spatial dimensionality for layout descriptor {layout_str}: May be at most 3.")
return tuple(reversed(range(dim)))
if layout_str in ('f', 'reverse_numpy'):
return tuple(reversed(range(dim))) return tuple(reversed(range(dim)))
elif layout_str in ('c', 'numpy', 'AoS'): elif layout_str in ('c', 'numpy'):
return tuple(range(dim)) return tuple(range(dim))
raise ValueError("Unknown layout descriptor " + layout_str) raise ValueError("Unknown layout descriptor " + layout_str)
def layout_string_to_tuple(layout_str, dim): def layout_string_to_tuple(layout_str, dim):
if dim <= 0:
raise ValueError("Dimensionality must be positive")
layout_str = layout_str.lower() layout_str = layout_str.lower()
if layout_str == 'fzyx' or layout_str == 'soa': if layout_str == 'fzyx' or layout_str == 'soa':
assert dim <= 4 if dim > 4:
raise ValueError(f"Invalid total dimensionality for layout descriptor {layout_str}: May be at most 4.")
return tuple(reversed(range(dim))) return tuple(reversed(range(dim)))
elif layout_str == 'zyxf' or layout_str == 'aos': elif layout_str == 'zyxf' or layout_str == 'aos':
assert dim <= 4 if dim > 4:
raise ValueError(f"Invalid total dimensionality for layout descriptor {layout_str}: May be at most 4.")
return tuple(reversed(range(dim - 1))) + (dim - 1,) return tuple(reversed(range(dim - 1))) + (dim - 1,)
elif layout_str == 'f' or layout_str == 'reverse_numpy': elif layout_str == 'f' or layout_str == 'reverse_numpy':
return tuple(reversed(range(dim))) return tuple(reversed(range(dim)))
......