Skip to content
Snippets Groups Projects
Commit 0b7de74d authored by Markus Holzer's avatar Markus Holzer
Browse files

Merged master

parents 826f21ab 37518a47
No related branches found
No related tags found
1 merge request!292Rebase of pystencils Type System
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from pystencils.session import * from pystencils.session import *
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Tutorial 02: Basic Kernel generation with *pystencils* # Tutorial 02: Basic Kernel generation with *pystencils*
Now that you have an [overview of pystencils](01_tutorial_getting_started.ipynb), Now that you have an [overview of pystencils](01_tutorial_getting_started.ipynb),
this tutorial shows in more detail how to formulate, optimize and run stencil kernels. this tutorial shows in more detail how to formulate, optimize and run stencil kernels.
## 1) Kernel Definition ## 1) Kernel Definition
### a) Defining kernels with assignment lists and the `kernel` decorator ### a) Defining kernels with assignment lists and the `kernel` decorator
*pystencils* gets a symbolic formulation of the kernel. This can be either an `Assignment` or a sequence of `Assignment`s that follow a set of restrictions. *pystencils* gets a symbolic formulation of the kernel. This can be either an `Assignment` or a sequence of `Assignment`s that follow a set of restrictions.
Lets first create a kernel that consists of multiple assignments: Lets first create a kernel that consists of multiple assignments:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
src_arr = np.zeros([20, 30]) src_arr = np.zeros([20, 30])
dst_arr = np.zeros_like(src_arr) dst_arr = np.zeros_like(src_arr)
dst, src = ps.fields(dst=dst_arr, src=src_arr) dst, src = ps.fields(dst=dst_arr, src=src_arr)
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
grad_x, grad_y = sp.symbols("grad_x, grad_y") grad_x, grad_y = sp.symbols("grad_x, grad_y")
symbolic_description = [ symbolic_description = [
ps.Assignment(grad_x, (src[1, 0] - src[-1, 0]) / 2), ps.Assignment(grad_x, (src[1, 0] - src[-1, 0]) / 2),
ps.Assignment(grad_y, (src[0, 1] - src[0, -1]) / 2), ps.Assignment(grad_y, (src[0, 1] - src[0, -1]) / 2),
ps.Assignment(dst[0, 0], grad_x + grad_y), ps.Assignment(dst[0, 0], grad_x + grad_y),
] ]
kernel = ps.create_kernel(symbolic_description) kernel = ps.create_kernel(symbolic_description)
symbolic_description symbolic_description
``` ```
%% Output %% Output
$\displaystyle \left[ grad_{x} \leftarrow \frac{{src}_{(1,0)}}{2} - \frac{{src}_{(-1,0)}}{2}, \ grad_{y} \leftarrow \frac{{src}_{(0,1)}}{2} - \frac{{src}_{(0,-1)}}{2}, \ {dst}_{(0,0)} \leftarrow grad_{x} + grad_{y}\right]$ $\displaystyle \left[ grad_{x} \leftarrow \frac{{src}_{(1,0)}}{2} - \frac{{src}_{(-1,0)}}{2}, \ grad_{y} \leftarrow \frac{{src}_{(0,1)}}{2} - \frac{{src}_{(0,-1)}}{2}, \ {dst}_{(0,0)} \leftarrow grad_{x} + grad_{y}\right]$
⎡ src_E src_W src_N src_S ⎤ ⎡ src_E src_W src_N src_S ⎤
⎢gradₓ := ───── - ─────, grad_y := ───── - ─────, dst_C := gradₓ + grad_y⎥ ⎢gradₓ := ───── - ─────, grad_y := ───── - ─────, dst_C := gradₓ + grad_y⎥
⎣ 2 2 2 2 ⎦ ⎣ 2 2 2 2 ⎦
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
We created subexpressions, using standard sympy symbols on the left hand side, to split the kernel into multiple assignments. Defining a kernel using a list of `Assignment`s is quite tedious and hard to read. We created subexpressions, using standard sympy symbols on the left hand side, to split the kernel into multiple assignments. Defining a kernel using a list of `Assignment`s is quite tedious and hard to read.
To simplify the formulation of a kernel, *pystencils* offers the `kernel` decorator, that transforms a normal Python function with `@=` assignments into an assignment list that can be passed to `create_kernel`. To simplify the formulation of a kernel, *pystencils* offers the `kernel` decorator, that transforms a normal Python function with `@=` assignments into an assignment list that can be passed to `create_kernel`.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
@ps.kernel @ps.kernel
def symbolic_description_using_function(): def symbolic_description_using_function():
grad_x @= (src[1, 0] - src[-1, 0]) / 2 grad_x @= (src[1, 0] - src[-1, 0]) / 2
grad_y @= (src[0, 1] - src[0, -1]) / 2 grad_y @= (src[0, 1] - src[0, -1]) / 2
dst[0, 0] @= grad_x + grad_y dst[0, 0] @= grad_x + grad_y
symbolic_description_using_function symbolic_description_using_function
``` ```
%% Output %% Output
$\displaystyle \left[ grad_{x} \leftarrow \frac{{src}_{(1,0)}}{2} - \frac{{src}_{(-1,0)}}{2}, \ grad_{y} \leftarrow \frac{{src}_{(0,1)}}{2} - \frac{{src}_{(0,-1)}}{2}, \ {dst}_{(0,0)} \leftarrow grad_{x} + grad_{y}\right]$ $\displaystyle \left[ grad_{x} \leftarrow \frac{{src}_{(1,0)}}{2} - \frac{{src}_{(-1,0)}}{2}, \ grad_{y} \leftarrow \frac{{src}_{(0,1)}}{2} - \frac{{src}_{(0,-1)}}{2}, \ {dst}_{(0,0)} \leftarrow grad_{x} + grad_{y}\right]$
⎡ src_E src_W src_N src_S ⎤ ⎡ src_E src_W src_N src_S ⎤
⎢gradₓ := ───── - ─────, grad_y := ───── - ─────, dst_C := gradₓ + grad_y⎥ ⎢gradₓ := ───── - ─────, grad_y := ───── - ─────, dst_C := gradₓ + grad_y⎥
⎣ 2 2 2 2 ⎦ ⎣ 2 2 2 2 ⎦
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
The decorated function can contain any Python code, only the `@=` operator, and the ternary inline `if-else` operator have different meaning. The decorated function can contain any Python code, only the `@=` operator, and the ternary inline `if-else` operator have different meaning.
### b) Ternary 'if' with `Piecewise` ### b) Ternary 'if' with `Piecewise`
The ternary operator maps to `sympy.Piecewise` functions, that can be used to introduce branching into the kernel. Piecewise defined functions must give a value for every input, i.e. there must be a 'otherwise' clause in the end that is indicated by the condition `True`. Piecewise objects are standard sympy terms that can be integrated into bigger expressions: The ternary operator maps to `sympy.Piecewise` functions, that can be used to introduce branching into the kernel. Piecewise defined functions must give a value for every input, i.e. there must be a 'otherwise' clause in the end that is indicated by the condition `True`. Piecewise objects are standard sympy terms that can be integrated into bigger expressions:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
sp.Piecewise((1.0, src[0,1] > 0), (0.0, True)) + src[1, 0] sp.Piecewise((1.0, src[0,1] > 0), (0.0, True)) + src[1, 0]
``` ```
%% Output %% Output
$\displaystyle {src}_{(1,0)} + \begin{cases} 1.0 & \text{for}\: {src}_{(0,1)} > 0 \\0.0 & \text{otherwise} \end{cases}$ $\displaystyle {src}_{(1,0)} + \begin{cases} 1.0 & \text{for}\: {src}_{(0,1)} > 0 \\0.0 & \text{otherwise} \end{cases}$
⎛⎧1.0 for src_N > 0⎞ ⎛⎧1.0 for src_N > 0⎞
src_E + ⎜⎨ ⎟ src_E + ⎜⎨ ⎟
⎝⎩0.0 otherwise ⎠ ⎝⎩0.0 otherwise ⎠
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Piecewise objects are created by the `kernel` decorator for ternary if-else statements. Piecewise objects are created by the `kernel` decorator for ternary if-else statements.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
@ps.kernel @ps.kernel
def kernel_with_piecewise(): def kernel_with_piecewise():
grad_x @= (src[1, 0] - src[-1, 0]) / 2 if src[-1, 0] > 0 else 0.0 grad_x @= (src[1, 0] - src[-1, 0]) / 2 if src[-1, 0] > 0 else 0.0
kernel_with_piecewise kernel_with_piecewise
``` ```
%% Output %% Output
$\displaystyle \left[ grad_{x} \leftarrow \begin{cases} \frac{{src}_{(1,0)}}{2} - \frac{{src}_{(-1,0)}}{2} & \text{for}\: {src}_{(-1,0)} > 0 \\0.0 & \text{otherwise} \end{cases}\right]$ $\displaystyle \left[ grad_{x} \leftarrow \begin{cases} \frac{{src}_{(1,0)}}{2} - \frac{{src}_{(-1,0)}}{2} & \text{for}\: {src}_{(-1,0)} > 0 \\0.0 & \text{otherwise} \end{cases}\right]$
⎡ ⎧src_E src_W ⎤ ⎡ ⎧src_E src_W ⎤
⎢ ⎪───── - ───── for src_W > 0⎥ ⎢ ⎪───── - ───── for src_W > 0⎥
⎢gradₓ := ⎨ 2 2 ⎥ ⎢gradₓ := ⎨ 2 2 ⎥
⎢ ⎪ ⎥ ⎢ ⎪ ⎥
⎣ ⎩ 0.0 otherwise ⎦ ⎣ ⎩ 0.0 otherwise ⎦
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### c) Assignment level optimizations using `AssignmentCollection` ### c) Assignment level optimizations using `AssignmentCollection`
When the kernels get larger and more complex, it is helpful to organize the list of assignment into a more structured way. The `AssignmentCollection` offers optimizating transformation on a list of assignments. It holds two assignment lists, one for subexpressions and one for the main assignments. Main assignments are typically those that write to an array. When the kernels get larger and more complex, it is helpful to organize the list of assignment into a more structured way. The `AssignmentCollection` offers optimizating transformation on a list of assignments. It holds two assignment lists, one for subexpressions and one for the main assignments. Main assignments are typically those that write to an array.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
@ps.kernel @ps.kernel
def somewhat_longer_dummy_kernel(s): def somewhat_longer_dummy_kernel(s):
s.a @= src[0, 1] + src[-1, 0] s.a @= src[0, 1] + src[-1, 0]
s.b @= 2 * src[1, 0] + src[0, -1] s.b @= 2 * src[1, 0] + src[0, -1]
s.c @= src[0, 1] + 2 * src[1, 0] + src[-1, 0] + src[0, -1] - src[0,0] s.c @= src[0, 1] + 2 * src[1, 0] + src[-1, 0] + src[0, -1] - src[0,0]
dst[0, 0] @= s.a + s.b + s.c dst[0, 0] @= s.a + s.b + s.c
ac = ps.AssignmentCollection(main_assignments=somewhat_longer_dummy_kernel[-1:], ac = ps.AssignmentCollection(main_assignments=somewhat_longer_dummy_kernel[-1:],
subexpressions=somewhat_longer_dummy_kernel[:-1]) subexpressions=somewhat_longer_dummy_kernel[:-1])
ac ac
``` ```
%% Output %% Output
AssignmentCollection: dst_C, <- f(src_E, src_C, src_N, src_W, src_S) AssignmentCollection: dst_C, <- f(src_N, src_E, src_W, src_C, src_S)
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
ac.operation_count ac.operation_count
``` ```
%% Output %% Output
{'adds': 8, {'adds': 8,
'muls': 2, 'muls': 2,
'divs': 0, 'divs': 0,
'sqrts': 0, 'sqrts': 0,
'fast_sqrts': 0, 'fast_sqrts': 0,
'fast_inv_sqrts': 0, 'fast_inv_sqrts': 0,
'fast_div': 0} 'fast_div': 0}
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
The `pystencils.simp` submodule offers several functions to optimize a collection of assignments. The `pystencils.simp` submodule offers several functions to optimize a collection of assignments.
It also offers functionality to group optimization into strategies and evaluate them. It also offers functionality to group optimization into strategies and evaluate them.
In this example we reduce the number of operations by reusing existing subexpressions to get rid of two unnecessary floating point additions. For more information about assignment collections and simplifications see the [demo notebook](demo_assignment_collection.ipynb). In this example we reduce the number of operations by reusing existing subexpressions to get rid of two unnecessary floating point additions. For more information about assignment collections and simplifications see the [demo notebook](demo_assignment_collection.ipynb).
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
opt_ac = ps.simp.subexpression_substitution_in_existing_subexpressions(ac) opt_ac = ps.simp.subexpression_substitution_in_existing_subexpressions(ac)
opt_ac opt_ac
``` ```
%% Output %% Output
AssignmentCollection: dst_C, <- f(src_E, src_C, src_N, src_W, src_S) AssignmentCollection: dst_C, <- f(src_N, src_E, src_W, src_C, src_S)
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
opt_ac.operation_count opt_ac.operation_count
``` ```
%% Output %% Output
{'adds': 6, {'adds': 6,
'muls': 1, 'muls': 1,
'divs': 0, 'divs': 0,
'sqrts': 0, 'sqrts': 0,
'fast_sqrts': 0, 'fast_sqrts': 0,
'fast_inv_sqrts': 0, 'fast_inv_sqrts': 0,
'fast_div': 0} 'fast_div': 0}
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### d) Ghost layers and iteration region ### d) Ghost layers and iteration region
When creating a kernel with neighbor accesses, *pystencils* automatically restricts the iteration region, such that all accesses are safe. When creating a kernel with neighbor accesses, *pystencils* automatically restricts the iteration region, such that all accesses are safe.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
kernel = ps.create_kernel(ps.Assignment(dst[0,0], src[2, 0] + src[-1, 0])) kernel = ps.create_kernel(ps.Assignment(dst[0,0], src[2, 0] + src[-1, 0]))
ps.show_code(kernel) ps.show_code(kernel)
``` ```
%% Output %% Output
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
When no additional ghost layer information is given, *pystencils* looks at all neighboring field accesses and introduces the required number of ghost layers **for all directions**. In the example above the largest neighbor accesses was ``src[2, 0]``, so theoretically we would need 2 ghost layers only the the end of the x coordinate. When no additional ghost layer information is given, *pystencils* looks at all neighboring field accesses and introduces the required number of ghost layers **for all directions**. In the example above the largest neighbor accesses was ``src[2, 0]``, so theoretically we would need 2 ghost layers only the the end of the x coordinate.
By default *pystencils* introduces 2 ghost layers at all borders of the domain. The next cell shows how to change this behavior. Be careful with manual ghost layer specification, wrong values may lead to SEGFAULTs. By default *pystencils* introduces 2 ghost layers at all borders of the domain. The next cell shows how to change this behavior. Be careful with manual ghost layer specification, wrong values may lead to SEGFAULTs.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
gl_spec = [(0, 2), # 0 ghost layers at the left, 2 at the right border gl_spec = [(0, 2), # 0 ghost layers at the left, 2 at the right border
(1, 0)] # 1 ghost layer at the lower y, one at the upper y coordinate (1, 0)] # 1 ghost layer at the lower y, one at the upper y coordinate
kernel = ps.create_kernel(ps.Assignment(dst[0,0], src[2, 0] + src[-1, 0]), ghost_layers=gl_spec) kernel = ps.create_kernel(ps.Assignment(dst[0,0], src[2, 0] + src[-1, 0]), ghost_layers=gl_spec)
ps.show_code(kernel) ps.show_code(kernel)
``` ```
%% Output %% Output
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## 2 ) Restrictions ## 2 ) Restrictions
### a) Independence Restriction ### a) Independence Restriction
*pystencils* only works for kernels where each array element can be updated independently from all other elements. This restriction ensures that the kernels can be easily parallelized and also be run on the GPU. Trying to define kernels where the results depends on the iteration order, leads to a ValueError. *pystencils* only works for kernels where each array element can be updated independently from all other elements. This restriction ensures that the kernels can be easily parallelized and also be run on the GPU. Trying to define kernels where the results depends on the iteration order, leads to a ValueError.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
invalid_description = [ invalid_description = [
ps.Assignment(dst[1, 0], src[1, 0] + src[-1, 0]), ps.Assignment(dst[1, 0], src[1, 0] + src[-1, 0]),
ps.Assignment(dst[0, 0], src[1, 0] - src[-1, 0]), ps.Assignment(dst[0, 0], src[1, 0] - src[-1, 0]),
] ]
try: try:
invalid_kernel = ps.create_kernel(invalid_description) invalid_kernel = ps.create_kernel(invalid_description)
assert False, "Should never be executed" assert False, "Should never be executed"
except ValueError as e: except ValueError as e:
print(e) print(e)
``` ```
%% Output %% Output
Field dst is written at two different locations Field dst is written at two different locations
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
The independence restriction makes sure that the kernel can be safely parallelized by checking the following conditions: If a field is modified inside the kernel, it may only be modified at a single spatial position. In that case the field may also only be read at this position. Fields that are not modified may be read at multiple neighboring positions. The independence restriction makes sure that the kernel can be safely parallelized by checking the following conditions: If a field is modified inside the kernel, it may only be modified at a single spatial position. In that case the field may also only be read at this position. Fields that are not modified may be read at multiple neighboring positions.
Specifically, this rule allows for in-place updates that don't access neighbors. Specifically, this rule allows for in-place updates that don't access neighbors.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
valid_kernel = ps.create_kernel(ps.Assignment(src[0,0], 2*src[0,0] + 42)) valid_kernel = ps.create_kernel(ps.Assignment(src[0,0], 2*src[0,0] + 42))
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
If a field stores multiple values per cell, as in the next example, this restriction only applies for accesses with the same index. If a field stores multiple values per cell, as in the next example, this restriction only applies for accesses with the same index.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
v = ps.fields("v(2): double[2D]") v = ps.fields("v(2): double[2D]")
valid_kernel = ps.create_kernel([ps.Assignment(v[0,0](1), 2*v[0,0](1) + 42), valid_kernel = ps.create_kernel([ps.Assignment(v[0,0](1), 2*v[0,0](1) + 42),
ps.Assignment(v[0,1](0), 2*v[1,0](0) + 42)]) ps.Assignment(v[0,1](0), 2*v[1,0](0) + 42)])
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### b) Static Single Assignment Form ### b) Static Single Assignment Form
All assignments that don't write to a field must be in SSA form All assignments that don't write to a field must be in SSA form
1. Each sympy symbol may only occur once as a left-hand-side (fields can be written multiple times) 1. Each sympy symbol may only occur once as a left-hand-side (fields can be written multiple times)
2. A symbol has to be defined before it is used. If it is never defined it is introduced as function parameter 2. A symbol has to be defined before it is used. If it is never defined it is introduced as function parameter
The next cell demonstrates the first SSA restriction: The next cell demonstrates the first SSA restriction:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
@ps.kernel @ps.kernel
def not_allowed(): def not_allowed():
a, b = sp.symbols("a b") a, b = sp.symbols("a b")
a @= src[0, 0] a @= src[0, 0]
b @= a + 3 b @= a + 3
a @= src[-1, 0] a @= src[-1, 0]
dst[0, 0] @= a + b dst[0, 0] @= a + b
try: try:
ps.create_kernel(not_allowed) ps.create_kernel(not_allowed)
assert False assert False
except ValueError as e: except ValueError as e:
print(e) print(e)
``` ```
%% Output %% Output
Assignments not in SSA form, multiple assignments to a Assignments not in SSA form, multiple assignments to a
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Also it is not allowed to write a field at the same location Also it is not allowed to write a field at the same location
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
@ps.kernel @ps.kernel
def not_allowed(): def not_allowed():
dst[0, 0] @= src[0, 1] + src[1, 0] dst[0, 0] @= src[0, 1] + src[1, 0]
dst[0, 0] @= 2 * dst[0, 0] dst[0, 0] @= 2 * dst[0, 0]
try: try:
ps.create_kernel(not_allowed) ps.create_kernel(not_allowed)
assert False assert False
except ValueError as e: except ValueError as e:
print(e) print(e)
``` ```
%% Output %% Output
Field dst is written twice at the same location Field dst is written twice at the same location
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
This situation should be resolved by introducing temporary variables This situation should be resolved by introducing temporary variables
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
tmp_var = sp.Symbol("a") tmp_var = sp.Symbol("a")
@ps.kernel @ps.kernel
def allowed(): def allowed():
tmp_var @= src[0, 1] + src[1, 0] tmp_var @= src[0, 1] + src[1, 0]
dst[0, 0] @= 2 * tmp_var dst[0, 0] @= 2 * tmp_var
ast = ps.create_kernel(allowed) ast = ps.create_kernel(allowed)
ps.show_code(ast) ps.show_code(ast)
``` ```
%% Output %% Output
......
import os import os
from collections.abc import Hashable from collections.abc import Hashable
from functools import partial, lru_cache from functools import partial, wraps, lru_cache
from itertools import chain from itertools import chain
from joblib import Memory from joblib import Memory
...@@ -27,6 +27,36 @@ def memorycache_if_hashable(maxsize=128, typed=False): ...@@ -27,6 +27,36 @@ def memorycache_if_hashable(maxsize=128, typed=False):
return wrapper return wrapper
def sharedmethodcache(cache_id: str):
"""Decorator for memoization of instance methods, allowing multiple methods to use the same cache.
This decorator caches results of instance methods per instantiated object of the surrounding class.
It allows multiple methods to use the same cache, by passing them the same `cache_id` string.
Cached values are stored in a dictionary, which is added as a member `self.<cache_id>` to the
`self` object instance. Make sure that this doesn't cause any naming conflicts with other members!
Of course, for this to be useful, said methods must have the same signature (up to additional kwargs)
and must return the same result when called with the same arguments."""
def _decorator(user_method):
@wraps(user_method)
def _decorated_func(self, *args, **kwargs):
objdict = self.__dict__
cache = objdict.setdefault(cache_id, dict())
key = args
for item in kwargs.items():
key += item
if key not in cache:
result = user_method(self, *args, **kwargs)
cache[key] = result
return result
else:
return cache[key]
return _decorated_func
return _decorator
# Disable memory cache: # Disable memory cache:
# disk_cache = lambda o: o # disk_cache = lambda o: o
# disk_cache_no_fallback = lambda o: o # disk_cache_no_fallback = lambda o: o
...@@ -270,7 +270,7 @@ def insert_vector_casts(ast_node, instruction_set, default_float_type='double'): ...@@ -270,7 +270,7 @@ def insert_vector_casts(ast_node, instruction_set, default_float_type='double'):
new_arg = visit_expr(expr.args[0], default_type) new_arg = visit_expr(expr.args[0], default_type)
base_type = get_type_of_expression(expr.args[0]).base_type if type(expr.args[0]) is VectorMemoryAccess \ base_type = get_type_of_expression(expr.args[0]).base_type if type(expr.args[0]) is VectorMemoryAccess \
else get_type_of_expression(expr.args[0]) else get_type_of_expression(expr.args[0])
pw = sp.Piecewise((-new_arg, new_arg < base_type.numpy_dtype.type(0)), pw = sp.Piecewise((-new_arg, new_arg < cast_func(0, base_type.numpy_dtype)),
(new_arg, True)) (new_arg, True))
return visit_expr(pw, default_type) return visit_expr(pw, default_type)
elif expr.func in handled_functions or isinstance(expr, sp.Rel) or isinstance(expr, BooleanFunction): elif expr.func in handled_functions or isinstance(expr, sp.Rel) or isinstance(expr, BooleanFunction):
......
...@@ -311,7 +311,7 @@ class AssignmentCollection: ...@@ -311,7 +311,7 @@ class AssignmentCollection:
if eq.lhs in symbols_to_extract: if eq.lhs in symbols_to_extract:
new_assignments.append(eq) new_assignments.append(eq)
new_sub_expr = [eq for eq in self.subexpressions new_sub_expr = [eq for eq in self.all_assignments
if eq.lhs in dependent_symbols and eq.lhs not in symbols_to_extract] if eq.lhs in dependent_symbols and eq.lhs not in symbols_to_extract]
return self.copy(new_assignments, new_sub_expr) return self.copy(new_assignments, new_sub_expr)
......
...@@ -235,6 +235,9 @@ def subs_additive(expr: sp.Expr, replacement: sp.Expr, subexpression: sp.Expr, ...@@ -235,6 +235,9 @@ def subs_additive(expr: sp.Expr, replacement: sp.Expr, subexpression: sp.Expr,
normalized_replacement_match = normalize_match_parameter(required_match_replacement, len(subexpression.args)) normalized_replacement_match = normalize_match_parameter(required_match_replacement, len(subexpression.args))
if isinstance(subexpression, sp.Number):
return expr.subs({replacement: subexpression})
def visit(current_expr): def visit(current_expr):
if current_expr.is_Add: if current_expr.is_Add:
expr_max_length = max(len(current_expr.args), len(subexpression.args)) expr_max_length = max(len(current_expr.args), len(subexpression.args))
...@@ -263,7 +266,7 @@ def subs_additive(expr: sp.Expr, replacement: sp.Expr, subexpression: sp.Expr, ...@@ -263,7 +266,7 @@ def subs_additive(expr: sp.Expr, replacement: sp.Expr, subexpression: sp.Expr,
return current_expr return current_expr
else: else:
if current_expr.func == sp.Mul and Zero() in param_list: if current_expr.func == sp.Mul and Zero() in param_list:
return Zero() return sp.simplify(current_expr)
else: else:
return current_expr.func(*param_list, evaluate=False) return current_expr.func(*param_list, evaluate=False)
...@@ -359,7 +362,7 @@ def remove_higher_order_terms(expr: sp.Expr, symbols: Sequence[sp.Symbol], order ...@@ -359,7 +362,7 @@ def remove_higher_order_terms(expr: sp.Expr, symbols: Sequence[sp.Symbol], order
if velocity_factors_in_product(expr) <= order: if velocity_factors_in_product(expr) <= order:
return expr return expr
else: else:
return sp.Rational(0, 1) return Zero()
if type(expr) != Add: if type(expr) != Add:
return expr return expr
...@@ -453,6 +456,72 @@ def recursive_collect(expr, symbols, order_by_occurences=False): ...@@ -453,6 +456,72 @@ def recursive_collect(expr, symbols, order_by_occurences=False):
return rec_sum return rec_sum
def summands(expr):
return set(expr.args) if isinstance(expr, sp.Add) else {expr}
def simplify_by_equality(expr, a, b, c):
"""
Uses the equality a = b + c, where a and b must be symbols, to simplify expr
by attempting to express additive combinations of two quantities by the third.
This works on expressions that are reducible to the form
:math:`a * (...) + b * (...) + c * (...)`,
without any mixed terms of a, b and c.
"""
if not isinstance(a, sp.Symbol) or not isinstance(b, sp.Symbol):
raise ValueError("a and b must be symbols.")
c = sp.sympify(c)
if not (isinstance(c, sp.Symbol) or is_constant(c)):
raise ValueError("c must be either a symbol or a constant!")
expr = sp.sympify(expr)
expr_expanded = sp.expand(expr)
a_coeff = expr_expanded.coeff(a, 1)
expr_expanded -= (a * a_coeff).expand()
b_coeff = expr_expanded.coeff(b, 1)
expr_expanded -= (b * b_coeff).expand()
if isinstance(c, sp.Symbol):
c_coeff = expr_expanded.coeff(c, 1)
rest = expr_expanded - (c * c_coeff).expand()
else:
c_coeff = expr_expanded / c
rest = 0
a_summands = summands(a_coeff)
b_summands = summands(b_coeff)
c_summands = summands(c_coeff)
# replace b + c by a
b_plus_c_coeffs = b_summands & c_summands
for coeff in b_plus_c_coeffs:
rest += a * coeff
b_summands -= b_plus_c_coeffs
c_summands -= b_plus_c_coeffs
# replace a - b by c
neg_b_summands = {-x for x in b_summands}
a_minus_b_coeffs = a_summands & neg_b_summands
for coeff in a_minus_b_coeffs:
rest += c * coeff
a_summands -= a_minus_b_coeffs
b_summands -= {-x for x in a_minus_b_coeffs}
# replace a - c by b
neg_c_summands = {-x for x in c_summands}
a_minus_c_coeffs = a_summands & neg_c_summands
for coeff in a_minus_c_coeffs:
rest += b * coeff
a_summands -= a_minus_c_coeffs
c_summands -= {-x for x in a_minus_c_coeffs}
# put it back together
return (rest + a * sum(a_summands) + b * sum(b_summands) + c * sum(c_summands)).expand()
def count_operations(term: Union[sp.Expr, List[sp.Expr], List[Assignment]], def count_operations(term: Union[sp.Expr, List[sp.Expr], List[Assignment]],
only_type: Optional[str] = 'real') -> Dict[str, int]: only_type: Optional[str] = 'real') -> Dict[str, int]:
"""Counts the number of additions, multiplications and division. """Counts the number of additions, multiplications and division.
......
...@@ -59,13 +59,13 @@ def test_alignment_of_different_layouts(): ...@@ -59,13 +59,13 @@ def test_alignment_of_different_layouts():
byte_offset = 8 byte_offset = 8
for tries in range(16): # try a few times, since we might get lucky and get randomly a correct alignment for tries in range(16): # try a few times, since we might get lucky and get randomly a correct alignment
arr = create_numpy_array_with_layout((3, 4, 5), layout=(0, 1, 2), arr = create_numpy_array_with_layout((3, 4, 5), layout=(0, 1, 2),
alignment=True, byte_offset=byte_offset) alignment=8*4, byte_offset=byte_offset)
assert is_aligned(arr[offset, ...], 8*4, byte_offset) assert is_aligned(arr[offset, ...], 8*4, byte_offset)
arr = create_numpy_array_with_layout((3, 4, 5), layout=(2, 1, 0), arr = create_numpy_array_with_layout((3, 4, 5), layout=(2, 1, 0),
alignment=True, byte_offset=byte_offset) alignment=8*4, byte_offset=byte_offset)
assert is_aligned(arr[..., offset], 8*4, byte_offset) assert is_aligned(arr[..., offset], 8*4, byte_offset)
arr = create_numpy_array_with_layout((3, 4, 5), layout=(2, 0, 1), arr = create_numpy_array_with_layout((3, 4, 5), layout=(2, 0, 1),
alignment=True, byte_offset=byte_offset) alignment=8*4, byte_offset=byte_offset)
assert is_aligned(arr[:, 0, :], 8*4, byte_offset) assert is_aligned(arr[:, 0, :], 8*4, byte_offset)
...@@ -7,7 +7,7 @@ ...@@ -7,7 +7,7 @@
""" """
import pytest
from pystencils.session import * from pystencils.session import *
from sympy import poly from sympy import poly
...@@ -22,6 +22,12 @@ def test_field_access_poly(): ...@@ -22,6 +22,12 @@ def test_field_access_poly():
def test_field_access_piecewise(): def test_field_access_piecewise():
try:
a = sp.Piecewise((0, 1 < sp.Max(-0.5, sp.Symbol("test") + 0.5)), (1, True))
a.simplify()
except Exception as e:
pytest.skip(f"Bug in SymPy 1.10: {e}")
else:
dh = ps.create_data_handling((20, 20)) dh = ps.create_data_handling((20, 20))
ρ = dh.add_array('rho') ρ = dh.add_array('rho')
pw = sp.Piecewise((0, 1 < sp.Max(-0.5, ρ.center+0.5)), (1, True)) pw = sp.Piecewise((0, 1 < sp.Max(-0.5, ρ.center+0.5)), (1, True))
......
...@@ -317,9 +317,7 @@ def diffusion_reaction(fluctuations: bool): ...@@ -317,9 +317,7 @@ def diffusion_reaction(fluctuations: bool):
fluct = sp.sqrt(2 * dens * D) * sp.sqrt(1 / length) * stencil_factor fluct = sp.sqrt(2 * dens * D) * sp.sqrt(1 / length) * stencil_factor
# add fluctuations # add fluctuations
fluct *= 2 * (next(rng_symbol_gen) - 0.5) * sp.sqrt(3) fluct *= 2 * (next(rng_symbol_gen) - 0.5) * sp.sqrt(3)
flux.main_assignments[i] = ps.Assignment(flux.main_assignments[i].lhs, flux.main_assignments[i].rhs + fluct)
flux.main_assignments[i] = ps.Assignment(flux.main_assignments[i].lhs,
flux.main_assignments[i].rhs + fluct)
# Add the folding to the flux, so that the random numbers persist through the ghostlayers. # Add the folding to the flux, so that the random numbers persist through the ghostlayers.
fold = {ps.astnodes.LoopOverCoordinate.get_loop_counter_symbol(i): fold = {ps.astnodes.LoopOverCoordinate.get_loop_counter_symbol(i):
...@@ -419,7 +417,7 @@ advection_diffusion_fluctuations.runners = {} ...@@ -419,7 +417,7 @@ advection_diffusion_fluctuations.runners = {}
@pytest.mark.parametrize("density", [27.0, 56.5]) @pytest.mark.parametrize("density", [27.0, 56.5])
@pytest.mark.parametrize("fluctuations", [False, True]) @pytest.mark.parametrize("fluctuations", [False, True])
@pytest.mark.longrun @pytest.mark.longrun
def test_diffusion_reaction(velocity, density, fluctuations): def test_diffusion_reaction(fluctuations, density, velocity):
diffusion_reaction.runner = diffusion_reaction(fluctuations) diffusion_reaction.runner = diffusion_reaction(fluctuations)
diffusion_reaction.runner(density, velocity) diffusion_reaction.runner(density, velocity)
......
from pystencils.cache import sharedmethodcache
class Fib:
def __init__(self):
self.fib_rec_called = 0
self.fib_iter_called = 0
@sharedmethodcache("fib_cache")
def fib_rec(self, n):
self.fib_rec_called += 1
return 1 if n <= 1 else self.fib_rec(n-1) + self.fib_rec(n-2)
@sharedmethodcache("fib_cache")
def fib_iter(self, n):
self.fib_iter_called += 1
f1, f2 = 0, 1
for i in range(n):
f2 = f1 + f2
f1 = f2 - f1
return f2
def test_fib_memoization_1():
fib = Fib()
assert "fib_cache" not in fib.__dict__
f13 = fib.fib_rec(13)
assert fib.fib_rec_called == 14
assert "fib_cache" in fib.__dict__
assert fib.fib_cache[(13,)] == f13
for k in range(14):
# fib_iter should use cached results from fib_rec
fib.fib_iter(k)
assert fib.fib_iter_called == 0
def test_fib_memoization_2():
fib = Fib()
f11 = fib.fib_iter(11)
f12 = fib.fib_iter(12)
assert fib.fib_iter_called == 2
f13 = fib.fib_rec(13)
# recursive calls should be cached
assert fib.fib_rec_called == 1
class Triad:
def __init__(self):
self.triad_called = 0
@sharedmethodcache("triad_cache")
def triad(self, a, b, c=0):
"""Computes the triad a*b+c."""
self.triad_called += 1
return a * b + c
def test_triad_memoization():
triad = Triad()
assert triad.triad.__doc__ == "Computes the triad a*b+c."
t = triad.triad(12, 4, 15)
assert triad.triad_called == 1
assert triad.triad_cache[(12, 4, 15)] == t
t = triad.triad(12, 4, c=15)
assert triad.triad_called == 2
assert triad.triad_cache[(12, 4, 'c', 15)] == t
t = triad.triad(12, 4, 15)
assert triad.triad_called == 2
t = triad.triad(12, 4, c=15)
assert triad.triad_called == 2
import sympy import sympy
import numpy as np import numpy as np
import sympy as sp
import pystencils import pystencils
from pystencils.sympyextensions import replace_second_order_products from pystencils.sympyextensions import replace_second_order_products
from pystencils.sympyextensions import remove_higher_order_terms from pystencils.sympyextensions import remove_higher_order_terms
from pystencils.sympyextensions import complete_the_squares_in_exp from pystencils.sympyextensions import complete_the_squares_in_exp
from pystencils.sympyextensions import extract_most_common_factor from pystencils.sympyextensions import extract_most_common_factor
from pystencils.sympyextensions import simplify_by_equality
from pystencils.sympyextensions import count_operations from pystencils.sympyextensions import count_operations
from pystencils.sympyextensions import common_denominator from pystencils.sympyextensions import common_denominator
from pystencils.sympyextensions import get_symmetric_part from pystencils.sympyextensions import get_symmetric_part
...@@ -176,3 +178,26 @@ def test_get_symmetric_part(): ...@@ -176,3 +178,26 @@ def test_get_symmetric_part():
sym_part = get_symmetric_part(expr, sympy.symbols(f'y z')) sym_part = get_symmetric_part(expr, sympy.symbols(f'y z'))
assert sym_part == expected_result assert sym_part == expected_result
def test_simplify_by_equality():
x, y, z = sp.symbols('x, y, z')
p, q = sp.symbols('p, q')
# Let x = y + z
expr = x * p - y * p + z * q
expr = simplify_by_equality(expr, x, y, z)
assert expr == z * p + z * q
expr = x * (p - 2 * q) + 2 * q * z
expr = simplify_by_equality(expr, x, y, z)
assert expr == x * p - 2 * q * y
expr = x * (y + z) - y * z
expr = simplify_by_equality(expr, x, y, z)
assert expr == x*y + z**2
# Let x = y + 2
expr = x * p - 2 * p
expr = simplify_by_equality(expr, x, y, 2)
assert expr == y * p
...@@ -59,4 +59,6 @@ def test_timeloop(): ...@@ -59,4 +59,6 @@ def test_timeloop():
timeloop.run_time_span(seconds=seconds) timeloop.run_time_span(seconds=seconds)
end = time.perf_counter() end = time.perf_counter()
np.testing.assert_almost_equal(seconds, end - start, decimal=2) # This test case fails often due to time measurements. It is not a good idea to assert here
# np.testing.assert_almost_equal(seconds, end - start, decimal=2)
print("timeloop: ", seconds, " own meassurement: ", end - start)
...@@ -53,7 +53,7 @@ exclude_lines = ...@@ -53,7 +53,7 @@ exclude_lines =
if __name__ == .__main__.: if __name__ == .__main__.:
skip_covered = True skip_covered = True
fail_under = 86 fail_under = 85
[html] [html]
directory = coverage_report directory = coverage_report
...@@ -90,7 +90,7 @@ setuptools.setup(name='pystencils', ...@@ -90,7 +90,7 @@ setuptools.setup(name='pystencils',
author_email='cs10-codegen@fau.de', author_email='cs10-codegen@fau.de',
url='https://i10git.cs.fau.de/pycodegen/pystencils/', url='https://i10git.cs.fau.de/pycodegen/pystencils/',
packages=['pystencils'] + ['pystencils.' + s for s in setuptools.find_packages('pystencils')], packages=['pystencils'] + ['pystencils.' + s for s in setuptools.find_packages('pystencils')],
install_requires=['sympy>=1.6,<=1.9', 'numpy>=1.8.0', 'appdirs', 'joblib'], install_requires=['sympy>=1.6,<=1.10', 'numpy>=1.8.0', 'appdirs', 'joblib'],
package_data={'pystencils': ['include/*.h', package_data={'pystencils': ['include/*.h',
'backends/cuda_known_functions.txt', 'backends/cuda_known_functions.txt',
'backends/opencl1.1_known_functions.txt', 'backends/opencl1.1_known_functions.txt',
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please to comment