Compare revisions

e0d50917 · e0d50917 · e0d50917 · e0d50917 · e0d50917 · e0d50917
--- a/docs/source/backend/objects.rst
+++ b/docs/source/backend/objects.rst
+****************************************
+Constants, Memory Objects, and Functions
+****************************************
+
+Memory Objects: Symbols and Buffers
+===================================
+
+The Memory Model
+----------------
+
+In order to reason about memory accesses, mutability, invariance, and aliasing, the *pystencils* backend uses
+a very simple memory model. There are three types of memory objects:
+
+- Symbols (`PsSymbol`), which act as registers for data storage within the scope of a kernel
+- Field buffers (`PsBuffer`), which represent a contiguous block of memory the kernel has access to, and
+- the *unmanaged heap*, which is a global catch-all memory object which all pointers not belonging to a field
+  array point into.
+
+All of these objects are disjoint, and cannot alias each other.
+Each symbol exists in isolation,
+field buffers do not overlap,
+and raw pointers are assumed not to point into memory owned by a symbol or field array.
+Instead, all raw pointers point into unmanaged heap memory, and are assumed to *always* alias one another:
+Each change brought to unmanaged memory by one raw pointer is assumed to affect the memory pointed to by
+another raw pointer.
+
+Symbols
+-------
+
+In the pystencils IR, instances of `PsSymbol` represent what is generally known as "virtual registers".
+These are memory locations that are private to a function, cannot be aliased or pointed to, and will finally reside
+either in physical registers or on the stack.
+Each symbol has a name and a data type. The data type may initially be `None`, in which case it should soon after be
+determined by the `Typifier`.
+
+Other than their front-end counterpart `sympy.Symbol <sympy.core.symbol.Symbol>`, `PsSymbol` instances are mutable;
+their properties can and often will change over time.
+As a consequence, they are not comparable by value:
+two `PsSymbol` instances with the same name and data type will in general *not* be equal.
+In fact, most of the time, it is an error to have two identical symbol instances active.
+
+Creating Symbols
+^^^^^^^^^^^^^^^^
+
+During kernel translation, symbols never exist in isolation, but should always be managed by a `KernelCreationContext`.
+Symbols can be created and retrieved using `add_symbol <KernelCreationContext.add_symbol>` and `find_symbol <KernelCreationContext.find_symbol>`.
+A symbol can also be duplicated using `duplicate_symbol <KernelCreationContext.duplicate_symbol>`, which assigns a new name to the symbol's copy.
+The `KernelCreationContext` keeps track of all existing symbols during a kernel translation run
+and makes sure that no name and data type conflicts may arise.
+
+Never call the constructor of `PsSymbol` directly unless you really know what you are doing.
+
+Symbol Properties
+^^^^^^^^^^^^^^^^^
+
+Symbols can be annotated with arbitrary information using *symbol properties*.
+Each symbol property type must be a subclass of `PsSymbolProperty`.
+It is strongly recommended to implement property types using frozen
+`dataclasses <https://docs.python.org/3/library/dataclasses.html>`_.
+For example, this snippet defines a property type that models pointer alignment requirements:
+
+.. code-block:: python
+
+    @dataclass(frozen=True)
+    class AlignmentProperty(UniqueSymbolProperty)
+        """Require this pointer symbol to be aligned at a particular byte boundary."""
+        
+        byte_boundary: int
+
+Inheriting from `UniqueSymbolProperty` ensures that at most one property of this type can be attached to
+a symbol at any time.
+Properties can be added, queried, and removed using the `PsSymbol` properties API listed below.
+
+Many symbol properties are more relevant to consumers of generated kernels than to the code generator itself.
+The above alignment property, for instance, may be added to a pointer symbol by a vectorization pass
+to document its assumption that the pointer be properly aligned, in order to emit aligned load and store instructions.
+It then becomes the responsibility of the runtime system embedding the kernel to check this prequesite before calling the kernel.
+To make sure this information becomes visible, any properties attached to symbols exposed as kernel parameters will also
+be added to their respective `Parameter` instance.
+
+Buffers
+-------
+
+Buffers, as represented by the `PsBuffer` class, represent contiguous, n-dimensional, linearized cuboid blocks of memory.
+Each buffer has a fixed name and element data type,
+and will be represented in the IR via three sets of symbols:
+
+- The *base pointer* is a symbol of pointer type which points into the buffer's underlying memory area.
+  Each buffer has at least one, its primary base pointer, whose pointed-to type must be the same as the
+  buffer's element type. There may be additional base pointers pointing into subsections of that memory.
+  These additional base pointers may also have deviating data types, as is for instance required for
+  type erasure in certain cases.
+  To communicate its role to the code generation system,
+  each base pointer needs to be marked as such using the `BufferBasePtr` property,
+  .
+- The buffer *shape* defines the size of the buffer in each dimension. Each shape entry is either a `symbol <PsSymbol>`
+  or a `constant <PsConstant>`.
+- The buffer *strides* define the step size to go from one entry to the next in each dimension.
+  Like the shape, each stride entry is also either a symbol or a constant.
+
+The shape and stride symbols must all have the same data type, which will be stored as the buffer's index data type.
+
+Creating and Managing Buffers
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Similarily to symbols, buffers are typically managed by the `KernelCreationContext`, which associates each buffer
+to a front-end `Field`. Buffers for fields can be obtained using `get_buffer <KernelCreationContext.get_buffer>`.
+The context makes sure to avoid name conflicts between buffers.
+
+API Documentation
+=================
+
+.. automodule:: pystencils.codegen.properties
+    :members:
+
+.. automodule:: pystencils.backend.memory
+    :members:
+
+.. automodule:: pystencils.backend.constants
+    :members:
+
+.. autoclass:: pystencils.backend.literals.PsLiteral
+    :members:
+
+.. automodule:: pystencils.backend.functions
--- a/docs/source/backend/platforms.rst
+++ b/docs/source/backend/platforms.rst
+*********
+Platforms
+*********
+
+.. automodule:: pystencils.backend.platforms
+    :members:
\ No newline at end of file
--- a/docs/source/backend/transformations.rst
+++ b/docs/source/backend/transformations.rst
+*******************
+AST Transformations
+*******************
+
+`pystencils.backend.transformations`
+
+.. automodule:: pystencils.backend.transformations
--- a/docs/source/backend/translation.rst
+++ b/docs/source/backend/translation.rst
+******************
+Kernel Translation
+******************
+
+The Kernel Creation Context
+---------------------------
+
+.. autoclass:: pystencils.backend.kernelcreation.KernelCreationContext
+    :members:
+
+Analysis and Constraints Checks
+-------------------------------
+
+.. autoclass:: pystencils.backend.kernelcreation.KernelAnalysis
+    :members:
+
+SymPy Parsing and IR Construction
+---------------------------------
+
+.. autoclass:: pystencils.backend.kernelcreation.AstFactory
+    :members:
+
+.. autoclass:: pystencils.backend.kernelcreation.FreezeExpressions
+    :members:
+
+Type Checking and Inference
+---------------------------
+
+.. autoclass:: pystencils.backend.kernelcreation.typification.TypeContext
+    :members:
+
+.. autoclass:: pystencils.backend.kernelcreation.Typifier
+    :members:
--- a/docs/source/conf.py
+++ b/docs/source/conf.py
+import datetime
+import re
+
+from pystencils import __version__ as pystencils_version
+
+project = "pystencils"
+html_title = "pystencils Documentation"
+
+copyright = (
+    f"{datetime.datetime.now().year}, Martin Bauer, Markus Holzer, Frederik Hennig"
+)
+author = "Martin Bauer, Markus Holzer, Frederik Hennig"
+
+version = re.sub(r"(\d+\.\d+)\.\d+(.*)", r"\1\2", pystencils_version)
+version = re.sub(r"(\.dev\d+).*?$", r"\1", version)
+# The full version, including alpha/beta/rc tags.
+release = pystencils_version
+
+language = "en"
+default_role = "any"
+pygments_style = "sphinx"
+
+# -- General configuration ---------------------------------------------------
+# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
+
+extensions = [
+    "sphinx_design",
+    "sphinx.ext.autodoc",
+    "sphinx.ext.autosummary",
+    "sphinx.ext.doctest",
+    "sphinx.ext.intersphinx",
+    "sphinx.ext.mathjax",
+    "sphinx.ext.napoleon",
+    "sphinx.ext.inheritance_diagram",
+    "sphinxcontrib.bibtex",
+    "sphinx_autodoc_typehints",
+    "myst_nb",
+]
+
+templates_path = ["_templates"]
+exclude_patterns = []
+
+autodoc_member_order = "bysource"
+autodoc_typehints = "description"
+
+intersphinx_mapping = {
+    "python": ("https://docs.python.org/3.8", None),
+    "numpy": ("https://docs.scipy.org/doc/numpy/", None),
+    "matplotlib": ("https://matplotlib.org/", None),
+    "sympy": ("https://docs.sympy.org/latest/", None),
+    "cupy": ("https://docs.cupy.dev/en/stable/", None),
+}
+
+# -- Options for inheritance diagrams-----------------------------------------
+
+inheritance_graph_attrs = {
+    "bgcolor": "white",
+}
+
+# -- Options for MyST / MyST-NB ----------------------------------------------
+
+nb_execution_mode = "off"  # do not execute notebooks by default
+
+myst_enable_extensions = [
+    "dollarmath",
+    "colon_fence",
+]
+
+# -- Options for HTML output -------------------------------------------------
+
+html_theme = "sphinx_book_theme"
+html_static_path = ["_static"]
+html_css_files = [
+    "css/fixtables.css",
+]
+html_theme_options = {
+    "logo": {
+        "image_light": "_static/img/pystencils-logo-light.svg",
+        "image_dark": "_static/img/pystencils-logo-dark.svg",
+    }
+}
+
+# NbSphinx configuration
+
+nbsphinx_execute = "never"
+nbsphinx_codecell_lexer = "python3"
+
+#   BibTex
+bibtex_bibfiles = ["pystencils.bib"]
--- a/docs/source/contributing/dev-workflow.md
+++ b/docs/source/contributing/dev-workflow.md
+# Development Workflow
+
+This page contains instructions on how to get started with developing pystencils.
+
+## Prepare the Git Repository
+
+The pystencils Git repository is hosted at [i10git.cs.fau.de](https://i10git.cs.fau.de), the GitLab instance of the
+[Chair for Systems Simulation](https://www.cs10.tf.fau.de/) at [FAU Erlangen-Nürnberg](https://fau.de).
+In order to contribute code to pystencils, you will need to acquire an account there; to do so,
+please follow the instructions on the GitLab landing page.
+
+### Create a Fork
+
+Only the core developers of pystencils have write-access to the primary repository.
+To contribute, you will therefore have to create a fork of that repository
+by navigating to the [repository page](https://i10git.cs.fau.de/pycodegen/pystencils)
+and selecting *Fork* there.
+In this fork, you may freely create branches and develop code, which may later be merged to a primary branch
+via merge requests.
+
+### Create a Local Clone
+
+Once you have a fork of the repository, you can clone it to your local machine using the git command-line.
+
+:::{note}
+To clone via SSH, which is recommended, you will first have to [register an SSH key](https://docs.gitlab.com/ee/user/ssh.html).
+:::
+
+Open up a shell and navigate to a directory you want to work in.
+Then, enter
+
+```bash
+git clone git@i10git.cs.fau.de:<your-username>/pystencils.git
+```
+
+to clone your fork of pystencils.
+
+:::{note}
+To keep up to date with the upstream repository, you can add it as a secondary remote to your clone:
+```bash
+git remote add upstream git@i10git.cs.fau.de:pycodegen/pystencils.git
+```
+You can point your clone's `master` branch at the upstream master like this:
+```bash
+git pull --set-upstream upstream master
+```
+:::
+
+## Set Up the Python Environment
+
+To develop pystencils, you will need at least the following software installed on your machine:
+
+- Python 3.10 or later: Since pystencils minimal supported version is Python 3.10, we recommend that you work with Python 3.10 directly.
+- An up-to-date C++ compiler, used by pystencils to JIT-compile generated code
+- [Nox](https://nox.thea.codes/en/stable/), which we use for test automation.
+  Nox will be used extensively in the instructions on testing below.
+- Optionally [CUDA](https://developer.nvidia.com/cuda-toolkit),
+  if you have an Nvidia or AMD GPU and plan to develop on pystencils' GPU capabilities
+
+Once you have these, set up a [virtual environment](https://docs.python.org/3/library/venv.html) for development.
+This ensures that your system's installation of Python is kept clean, and isolates your development environment
+from outside influence.
+Use the following commands to create a virtual environment at `.venv` and perform an editable install of pystencils into it:
+
+```bash
+python -m venv .venv
+source .venv/bin/activate
+export PIP_REQUIRE_VIRTUALENV=true
+pip install -e .[dev]
+```
+
+:::{note}
+Setting `PIP_REQUIRE_VIRTUALENV` ensures that pip refuses to install packages globally --
+Consider setting this variable globally in your shell's configuration file.
+:::
+
+You are now ready to go! Create a new git branch to work on, open up an IDE, and start coding.
+Make sure your IDE recognizes the virtual environment you created, though.
+
+## Static Code Analysis
+
+### PEP8 Code Style
+
+We use [flake8](https://github.com/PyCQA/flake8/tree/main) to check our code for compliance with the
+[PEP8](https://peps.python.org/pep-0008/) code style.
+You can either run `flake8` directly, or through Nox, to analyze your code with respect to style errors:
+
+::::{grid}
+:::{grid-item}
+```bash
+nox -s lint
+```
+:::
+:::{grid-item}
+```bash
+flake8 src/pystencils
+```
+:::
+::::
+
+### Static Type Checking
+
+New code added to pystencils is required to carry type annotations,
+and its types are checked using [mypy](https://mypy.readthedocs.io/en/stable/index.html#).
+To discover type errors, run *mypy* either directly or via Nox:
+
+::::{grid}
+:::{grid-item}
+```bash
+nox -s typecheck
+```
+:::
+:::{grid-item}
+```bash
+mypy src/pystencils
+```
+:::
+::::
+
+:::{note}
+Type checking is currently restricted only to a few modules, which are listed in the `mypy.ini` file.
+Most code in the remaining modules is significantly older and is not comprehensively type-annotated.
+As more modules are updated with type annotations, this list will expand in the future.
+If you think a new module is ready to be type-checked, add an exception clause to `mypy.ini`.
+:::
+
+## Running the Test Suite
+
+Pystencils comes with an extensive and steadily growing suite of unit tests.
+To run the full testsuite, invoke the Nox `testsuite` session:
+
+```bash
+nox -s testsuite
+```
+
+:::{seealso}
+[](#testing_pystencils)
+:::
+
+
+## Building the Documentation
+
+The pystencils documentation pages are written in MyST Markdown and ReStructuredText,
+located at the `docs` folder, and built using Sphinx.
+To build the documentation pages of pystencils, simply run the `docs` Nox session:
+```bash
+nox -s docs
+```
+
+This will emit the generated HTML pages to `docs/build/html`.
+The `docs` session permits two parameters to customize its execution:
+ - `--clean`: Clean the page generator's output before building
+ - `--fail-on-warnings`: Have the build fail (finish with a nonzero exit code) if Sphinx emits any warnings.
+
+You must pass any of these to the session command *after a pair of dashes* (`--`); e.g.:
+```bash
+nox -s docs -- --clean
+```
--- a/docs/source/contributing/index.md
+++ b/docs/source/contributing/index.md
+# Contribution Guide
+
+Welcome to the Contributor's Guide to pystencils!
+If you are interested in contributing to the development of pystencils, this is the place to start.
+
+Pystencils is an open-source package licensed under the [AGPL v3](https://www.gnu.org/licenses/agpl-3.0.en.html).
+As such, the act of contributing to pystencils by submitting a merge request is taken as agreement to the terms of the licence.
+
+:::{toctree}
+:maxdepth: 2
+dev-workflow
+testing
+:::
--- a/docs/source/contributing/testing.md
+++ b/docs/source/contributing/testing.md
+(testing_pystencils)=
+# Testing pystencils
+
+The pystencils testsuite is located at the `tests` directory,
+constructed using [pytest](https://pytest.org),
+and automated through [Nox](https://nox.thea.codes).
+On this page, you will find instructions on how to execute and extend it.
+
+## Running the Testsuite
+
+The fastest way to execute the pystencils test suite is through the `testsuite` Nox session:
+
+```bash
+nox -s testsuite
+```
+
+There exist several configurations of the testsuite session, from which the above command will
+select and execute only those that are available on your machine.
+ - *Python Versions:* The testsuite session can be run against all major Python versions between 3.10 and 3.13 (inclusive).
+   To only use a specific Python version, add the `-p 3.XX` argument to your Nox invocation; e.g. `nox -s testsuite -p 3.11`.
+ - *CuPy:* There exist three variants of `testsuite`, including or excluding tests for the CUDA GPU target: `cpu`, `cupy12` and `cupy13`.
+   To select one, append `(<variant>)` to the session name; e.g. `nox -s "testsuite(cupy12)"`.
+
+You may also pass options through to pytest via positional arguments after a pair of dashes, e.g.:
+
+```bash
+nox -s testsuite -- -k "kernelcreation"
+```
+
+During the testsuite run, coverage information is collected using [coverage.py](https://coverage.readthedocs.io/en/7.6.10/),
+and the results are exported to HTML.
+You can display a detailed overview of code coverage by opening the generated `htmlcov/index.html` page.
+
+## Extending the Test Suite
+
+### Codegen Configurations via Fixtures
+
+In the pystencils test suite, it is often necessary to test code generation features against multiple targets.
+To simplify this process, we provide a number of [pytest fixtures](https://docs.pytest.org/en/stable/how-to/fixtures.html)
+you can and should use in your tests:
+
+ - `target`: Provides code generation targets for your test.
+   Using this fixture will make pytest create a copy of your test for each target
+   available on the current machine (see {any}`Target.available_targets`).
+ - `gen_config`: Provides default code generation configurations for your test.
+   This fixture depends on `target` and provides a {any}`CreateKernelConfig` instance
+   with target-specific optimization options (in particular vectorization) enabled.
+ - `xp`: The `xp` fixture gives you either the *NumPy* (`np`) or the *CuPy* (`cp`) module,
+   depending on whether `target` is a CPU or GPU target.
+
+These fixtures are defined in `tests/fixtures.py`.
+
+### Overriding Fixtures
+
+Pytest allows you to locally override fixtures, which can be especially handy when you wish
+to restrict the target selection of a test.
+For example, the following test overrides `target` using a parametrization mark,
+and uses this in combination with the `gen_config` fixture, which now
+receives the overridden `target` parameter as input:
+
+```Python
+@pytest.mark.parametrize("target", [Target.X86_SSE, Target.X86_AVX])
+def test_bogus(gen_config):
+    assert gen_config.target.is_vector_cpu()
+```
+
+## Testing with the Experimental CPU JIT
+
+Currently, the testsuite by default still uses the {any}`legacy CPU JIT compiler <LegacyCpuJit>`,
+since the new CPU JIT compiler is still in an experimental stage.
+To test your code against the new JIT compiler, pass the `--experimental-cpu-jit` option to pytest:
+
+```bash
+nox -s testsuite -- --experimental-cpu-jit
+```
+
+This will alter the `gen_config` fixture, activating the experimental CPU JIT for CPU targets.
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
+#################################
+pystencils v2.0-dev Documentation
+#################################
+
+.. note::
+   You are currently viewing the documentation pages for the development revision |release|
+   of pystencils 2.0.
+   These pages have been generated from the branch 
+   `v2.0-dev <https://i10git.cs.fau.de/pycodegen/pystencils/-/tree/v2.0-dev?ref_type=heads>`_.
+
+   Pystencils 2.0 is currently under development. 
+   It marks a complete re-design of the package's internal structure;
+   furthermore, it will bring a set of new features and API changes.
+   Be aware that many features are still missing or might have brand-new bugs in this version.
+   If you wish to work with and contribute to this development, please refer to
+   `the Git repository <https://i10git.cs.fau.de/pycodegen/pystencils/-/tree/v2.0-dev?ref_type=heads>`_.
+
+.. note::
+   These pages are still under construction; many aspects are still missing.
+   Do not hesitate to contribute!
+
+Welcome to the documentation and reference guide of *pystencils*!
+*Pystencils* offers a symbolic language and code generator for the development of high-performing
+numerical kernels for both CPU and GPU targets. 
+Its features include:
+
+- **Symbolic Algebra:** Design numerical methods on a mathematical level using the full power
+  of the `SymPy <https://sympy.org>`_ computer algebra system.
+  Make use of pystencils' discretization engines to automatically derive finite difference- and finite volume-methods,
+  and take control of numerical precision using the `versatile type system <page_type_system>`.
+- **Kernel Description:** Derive and optimize stencil-based update rules using a symbolic abstraction
+  of numerical `fields <page_symbolic_language>`.
+- **Code Generation:** `Generate and compile <guide_kernelcreation>` high-performance parallel kernels for CPUs and GPUs.
+  Accelerate your kernels on multicore CPUs using the automatic OpenMP parallelization
+  and make full use of your cores' SIMD units through the highly configurable vectorizer.
+- **Rapid Prototyping:** Run your numerical solvers on `NumPy <https://numpy.org>`_ and `CuPy <https://cupy.dev>`_ arrays
+  and test them interactively inside `Jupyter <https://jupyter.org/>`_ notebooks.
+  Quickly set up numerical schemes, apply initial and boundary conditions, evaluate them on model problems
+  and rapidly visualize the results using matplotlib or VTK.
+- **Framework Integration:** Export your kernels and use them inside HPC frameworks
+  such as `waLBerla`_ to build massively parallel simulations.
+
+
+.. .. card:: Getting Started: Our Tutorials
+..    :link: page_tutorials
+..    :link-type: ref
+
+..    New to *pystencils*? Check out our set of tutorials to quickly and interactively learn the basics.
+
+.. .. card:: Reference Guide and APIs
+..    :link: page_api
+..    :link-type: ref
+
+..    Get an overview of *pystencils*' APIs for mathematical modelling and code generation.
+
+.. .. card:: Migration Guide: 1.3.x to 2.0
+..    :link: page_v2_migration
+..    :link-type: ref
+
+..    Find advice on migrating your code from *pystencils 1.3.x* to *pystencils 2.0*
+
+.. .. card:: Developers's Reference: Code Generation Backend
+..    :link: page_codegen_backend
+..    :link-type: ref
+
+..    Dive deep into the core of pystencils' code generation engine.
+
+Topics
+------
+
+.. toctree::
+  :maxdepth: 1
+  :caption: Getting Started
+
+  installation
+  tutorials/index
+
+.. toctree::
+  :maxdepth: 1
+  :caption: User Manual
+
+  user_manual/symbolic_language
+  user_manual/kernelcreation
+  user_manual/gpu_kernels
+  user_manual/WorkingWithTypes
+
+.. toctree::
+  :maxdepth: 1
+  :caption: API Reference
+
+  api/symbolic/index
+  api/types
+  api/codegen
+  api/jit
+
+.. toctree::
+  :maxdepth: 1
+  :caption: Topics
+
+  contributing/index
+  migration
+  backend/index
+
+Projects using pystencils
+-------------------------
+
+- `lbmpy <https://pycodegen.pages.i10git.cs.fau.de/lbmpy/>`_
+- `waLBerla`_
+- `HyTeG Operator Generator (HOG) <https://hyteg.pages.i10git.cs.fau.de/hog/>`_
+
+
+.. _walberla: https://walberla.net
--- a/docs/source/installation.md
+++ b/docs/source/installation.md
+(_installation)=
+# Setup and Installation
+
+## Install pystencils
+
+There are two ways to install the latest development version of pystencils 2.0.
+You can either install it directly from our git repository:
+
+```bash
+pip install "git+https://i10git.cs.fau.de/pycodegen/pystencils.git@v2.0-dev"
+```
+
+Or clone the repository locally and perform an editable install:
+
+```bash
+git clone -b v2.0-dev https://i10git.cs.fau.de/pycodegen/pystencils.git
+pip install -e pystencils
+```
+
+### Feature Groups
+
+In both cases, you can add a set of optional features to your installation by listing them
+in square brackets (e.g. `pip install -e pystencils[feature1, feature2]`).
+The following feature sets are available:
+
+- `interactive` (**recommended**): Install dependencies for using pystencils interactively from
+  within Jupyter notebooks.
+  Setting this flag will cause pip to install `jupyter`, `matplotlib`, and `graphviz`, among others, alongside pystencils.
+- `alltrafos` (**recommended**): Install dependencies to enable a wider set of code transformation.
+  These include [islpy](https://pypi.org/project/islpy/) for polyhedral loop transformations,
+  and [py-cpuinfo](https://pypi.org/project/py-cpuinfo/) for detecting the current hardware in order
+  to select optimal vector instructions.
+- `use_cython`: Install [Cython](https://cython.org/), which is used internally by pystencils
+  to accelerate the setup of boundary conditions.
+
+:::{dropdown} For Developers
+
+If you are developing pystencils, we recommend you perform an editable install of your
+local clone of the repository, with all optional features:
+```bash
+pip install -e pystencils[alltrafos,interactive,use_cython,doc,tests]
+```
+
+This includes the additional feature groups `doc`, which contains all dependencies required
+to build this documentation, and `tests`, which adds `flake8` for code style checking,
+`mypy` for static type checking, and `pytest` plus plugins for running the test suite.
+:::
+
+### For Nvidia GPUs
+
+If you have an Nvidia graphics processor and CUDA installed, you can use pystencils to directly compile
+and execute kernels running on your GPU.
+This requires a working installation of [cupy](https://cupy.dev).
+Please refer to the cupy's [installation manual](https://docs.cupy.dev/en/stable/install.html)
+for details about installing cupy.
--- a/docs/source/migration.md
+++ b/docs/source/migration.md
+---
+jupytext:
+  formats: md:myst
+  text_representation:
+    extension: .md
+    format_name: myst
+kernelspec:
+  display_name: Python 3 (ipykernel)
+  language: python
+  name: python3
+mystnb:
+  execution_mode: cache
+---
+
+(_page_v2_migration)=
+# Version 2.0 Migration Guide
+
+With version 2.0, many APIs of *pystencils* will be changed; old interfaces are being deprecated
+and new systems are put in place.
+This page is a still-incomplete list of these changes, with advice on how to migrate your code
+from pystencils 1.x to pystencils 2.0.
+
+```{code-cell} ipython3
+:tags: [remove-cell]
+
+import pystencils as ps
+```
+
+
+## Kernel Creation
+
+### Configuration
+
+The API of {any}`create_kernel`, and the configuration options of the {any}`CreateKernelConfig`, have changed significantly.
+The `CreateKernelConfig` class has been refined to be safe to copy and edit incrementally.
+The recommended way of setting up the code generator is now *incremental configuration*:
+
+```{code-cell} ipython3
+cfg = ps.CreateKernelConfig()
+cfg.default_dtype = "float32"
+cfg.cpu.openmp.enable = True
+cfg.cpu.openmp.num_threads = 8
+cfg.ghost_layers = 2
+```
+
+- *Data Types:* `CreateKernelConfig` now takes to parameters to control data types in your kernels:
+  the ``default_dtype`` is applied to all numerical computations, while the ``index_dtype`` is used
+  for all index calculations and loop counters.
+- *CPU Optimization Options:* Should now be set via the {any}`cpu <CpuOptions>` option category and its subcategories.
+   
+:::{dropdown} Deprecated options of `CreateKernelConfig`
+
+- ``data_type``: Use ``default_dtype`` instead
+- ``cpu_openmp``: Set OpenMP-Options in the `cpu.openmp <OpenMpOptions>` category instead.
+- ``cpu_vectorize_info``: Set vectorization options in the `cpu.vectorize <VectorizationOptions>` category instead
+- ``gpu_indexing_params``: Set GPU indexing options in the `gpu <GpuOptions>` category instead
+
+:::
+
+
+### Type Checking
+
+The old type checking system of pystencils' code generator has been replaced by a new type inference and validation
+mechanism whose rules are much stricter than before.
+While running `create_kernel`, you may now encounter a `TypificationError` where previously your kernels compiled fine.
+If this happens, it is probable that you have been doing some illegal, maybe dangerous, or at least unsafe things with data types
+(like inserting integers into a floating-point context without casting them, or mixing types of different precisions or signedness).
+If you are sure the error is not your fault, please file an issue at our
+[bug tracker](https://i10git.cs.fau.de/pycodegen/pystencils/-/issues).
+
+### Type System
+
+The ``pystencils.typing`` module has been entirely replaced by the new {any}`pystencils.types` module,
+which is home to a completely new type system.
+The primary interaction points with this system are still the {any}`TypedSymbol` class and the {any}`create_type` routine.
+Code using any of these two should not require any changes, except:
+
+- *Importing `TypedSymbol` and `create_type`:* Both `TypedSymbol` and `create_type` should now be imported directly
+  from the ``pystencils`` namespace.
+- *Custom data types:* `TypedSymbol` used to accept arbitrary strings as data types.
+  This is no longer possible; instead, import {any}`pystencils.types.PsCustomType` and use it to describe
+  custom data types unknown to pystencils, as in ``TypedSymbol("xs", PsCustomType("std::vector< int >"))``
+
+All old data type classes (such as ``BasicType``, ``PointerType``, ``StructType``, etc.) have been removed
+and replaced by the class hierarchy below {any}`PsType`.
+Directly using any of these type classes in the frontend is discouraged unless absolutely necessary;
+in most cases, `create_type` suffices.
+
--- a/doc/sphinx/pystencils.bib
+++ b/doc/sphinx/pystencils.bib
--- a/docs/source/tutorials/01_tutorial_getting_started.py
+++ b/docs/source/tutorials/01_tutorial_getting_started.py
+# ---
+# jupyter:
+#   jupytext:
+#     text_representation:
+#       extension: .py
+#       format_name: percent
+#       format_version: '1.3'
+#       jupytext_version: 1.16.7
+#   kernelspec:
+#     display_name: .venv
+#     language: python
+#     name: python3
+# ---
+
+# %%
+import pystencils as ps
+from pystencils import plot as plt
+
+import numpy as np
+import sympy as sp
+
+# %% [markdown]
+# # Tutorial 01: Getting Started
+#
+#
+# ## Overview
+#
+# *pystencils* is a package that can speed up computations on *numpy* arrays. All computations are carried out fully parallel on CPUs (single node with OpenMP, multiple nodes with MPI) or on GPUs.
+# It is suited for applications that run the same operation on *numpy* arrays multiple times. It can be used to accelerate computations on images or voxel fields. Its main application, however, are numerical simulations using finite differences, finite volumes, or lattice Boltzmann methods. 
+# There already exist a variety of packages to speed up numeric Python code. One could use pure numpy or solutions that compile your code, like *Cython* and *numba*. See [this page](demo_benchmark.ipynb) for a comparison of these tools.
+#
+# ![Stencil](../_static/img/pystencils_stencil_four_points_with_arrows.svg)
+#
+# As the name suggests, *pystencils* was developed for **stencil codes**, i.e. operations that update array elements using only a local neighborhood. 
+# It generates C code, compiles it behind the scenes, and lets you call the compiled C function as if it was a native Python function. 
+# But lets not dive too deep into the concepts of *pystencils* here, they are covered in detail in the following tutorials. Let's instead look at a simple example, that computes the average neighbor values of a *numpy* array. Therefor we first create two rather large arrays for input and output:
+
+# %%
+input_arr = np.random.rand(1024, 1024)
+output_arr = np.zeros_like(input_arr)
+
+
+# %% [markdown]
+# We first implement a version of this algorithm using pure numpy and benchmark it.
+
+# %%
+def numpy_kernel():
+    output_arr[1:-1, 1:-1] = input_arr[2:, 1:-1] + input_arr[:-2, 1:-1] + \
+                             input_arr[1:-1, 2:] + input_arr[1:-1, :-2]
+
+
+# %%
+# %%timeit 
+numpy_kernel()
+
+# %% [markdown]
+# Now lets see how to run the same algorithm with *pystencils*.
+
+# %%
+src, dst = ps.fields(src=input_arr, dst=output_arr)     
+# Note: A field declaration only stores it's arguments properties!
+#       On the symbolic side, no actual numeric values are stored.
+#       The arrays are only used to infer the shape and dtype of the fields.
+#       The actual assignment of numeric values is done when executing the kernel.
+
+symbolic_description = ps.Assignment(dst[0,0], 
+                                     (src[1, 0] + src[-1, 0] + src[0, 1] + src[0, -1]) / 4)
+symbolic_description
+
+# %%
+plt.figure(figsize=(3,3))
+ps.stencil.plot_expression(symbolic_description.rhs)
+
+# %% [markdown]
+# Here we first have created a symbolic notation of the stencil itself. This representation is built on top of *sympy* and is explained in detail in the next section. 
+# This description is then compiled and loaded as a Python function.
+
+# %%
+kernel = ps.create_kernel(symbolic_description).compile()
+
+
+# %% [markdown]
+# This whole process might seem overly complicated. We have already spent more lines of code than we needed for the *numpy* implementation and don't have anything running yet! However, this multi-stage process of formulating the algorithm symbolically, and just in the end actually running it, is what makes *pystencils* faster and more flexible than other approaches.
+#
+# Now finally lets benchmark the *pystencils* kernel.
+
+# %%
+def pystencils_kernel():
+    kernel(src=input_arr, dst=output_arr)   # Numeric values of `input_arr` and `output_arr` are assigned here.
+
+
+# %%
+# %%timeit
+pystencils_kernel()
+
+# %% [markdown]
+# This benchmark shows that *pystencils* is a lot faster than pure *numpy*, especially for large arrays. 
+# If you are interested in performance details and comparison to other packages like Cython, have a look at [this page](demo_benchmark.ipynb).
+#
+#
+
+# %% [markdown]
+# ## Short *sympy* introduction
+#
+# In this tutorial we continue with a short *sympy* introduction, since the symbolic kernel definition is built on top of this package. If you already know *sympy* you can skip this section. 
+# You can also read the full [sympy documentation here](http://docs.sympy.org/latest/index.html).
+
+# %%
+import sympy as sp
+sp.init_printing()  # enable nice LaTeX output
+
+# %% [markdown]
+# *sympy* is a package for symbolic calculation. So first we need some symbols:
+
+# %%
+x = sp.Symbol("x")
+y = sp.Symbol("y")
+type(x)
+
+# %% [markdown]
+# The usual mathematical operations are defined for symbols:
+
+# %%
+expr = x**2 * ( y + x + 5) + x**2
+expr
+
+# %% [markdown]
+# Now we can do all sorts of operations on these expressions: expand them, factor them, substitute variables:
+
+# %%
+expr.expand()
+
+# %%
+expr.factor()
+
+# %%
+expr.subs(y, sp.cos(x))
+
+# %% [markdown]
+# We can also built equations and solve them
+
+# %%
+eq = sp.Eq(expr, 1)
+eq
+
+# %%
+sp.solve(sp.Eq(expr, 1), y)
+
+# %% [markdown]
+# A *sympy* expression is represented by an abstract syntax tree (AST), which can be inspected and modified.
+
+# %%
+expr
+
+# %%
+ps.to_dot(expr, graph_style={'size': "9.5,12.5"} )
+
+# %% [markdown]
+# Programatically the children node type is acessible as ``expr.func`` and its children as ``expr.args``.
+# With these members a tree can be traversed and modified.
+
+# %%
+expr.func
+
+# %%
+expr.args
+
+# %% [markdown]
+# ## Using *pystencils* 
+#
+#
+# ### Fields
+#
+# *pystencils* is a module to generate code for stencil operations. 
+# One has to specify an update rule for each element of an array, with optional dependencies to neighbors.
+# This is done use pure *sympy* with one addition: **Fields**.
+#
+# Fields represent a multidimensional array, where some dimensions are considered *spatial*, and some as *index* dimensions. Spatial coordinates are given relative (i.e. one can specify "the current cell" and "the left neighbor") whereas index dimensions are used to index multiple values per cell.
+
+# %%
+my_field = ps.fields("f(3) : double[2D]")
+
+# %% [markdown]
+# Neighbors are labeled according to points on a compass where the first coordinate is west/east, second coordinate north/south and third coordinate top/bottom. 
+
+# %%
+field_access = my_field[1, 0](1)
+field_access
+
+# %% [markdown]
+# The result of indexing a field is an instance of ``Field.Access``. This class is a subclass of a *sympy* Symbol and thus can be used whereever normal symbols can be used. It is just like a normal symbol with some additional information attached to it.
+
+# %%
+isinstance(field_access, sp.Symbol)
+
+# %% [markdown]
+# ### Building our first stencil kernel
+#
+# Lets start by building a simple filter kernel. We create a field representing an image, then define a edge detection filter on the third pixel component which is blue for an RGB image.
+
+# %%
+img_field = ps.fields("img(4): [2D]")
+
+# %%
+w1, w2 = sp.symbols("w_1 w_2")
+color = 2
+sobel_x = (-w2 * img_field[-1,0](color) - w1 * img_field[-1,-1](color) - w1 * img_field[-1, +1](color) \
+           +w2 * img_field[+1,0](color) + w1 * img_field[+1,-1](color) - w1 * img_field[+1, +1](color))**2
+sobel_x
+
+# %% [markdown]
+# We have mixed some standard *sympy* symbols into this expression to possibly give the different directions different weights. The complete expression is still a valid *sympy* expression, so all features of *sympy* work on it. Lets for example now fix one weight by substituting it with a constant.
+
+# %%
+sobel_x = sobel_x.subs(w1, 0.5)
+sobel_x
+
+# %% [markdown]
+# Now lets built an executable kernel out of it, which writes the result to a second field. Assignments are created using *pystencils* `Assignment` class, that gets the left- and right hand side of the assignment.
+
+# %%
+dst_field = ps.fields('dst: [2D]' )
+update_rule = ps.Assignment(dst_field[0,0], sobel_x)
+update_rule
+
+# %% [markdown]
+# Next we can see *pystencils* in action which creates a kernel for us.
+
+# %%
+from pystencils import create_kernel
+ast = create_kernel(update_rule, cpu_openmp=False)
+compiled_kernel = ast.compile()
+
+# %% [markdown]
+# This compiled kernel is now just an ordinary Python function. 
+# Now lets grab an image to apply this filter to:
+
+# %%
+try:
+    import requests
+    import imageio.v2 as imageio
+    from io import BytesIO
+
+    response = requests.get("https://www.python.org/static/community_logos/python-logo-master-v3-TM.png")
+    img = imageio.imread(BytesIO(response.content)).astype(np.double)
+    img /= img.max()
+    plt.imshow(img);
+except ImportError:
+    print("No requests or imageio installed")
+    img = np.random.random((82, 290, 4))
+
+# %%
+filtered_image = np.zeros_like(img[..., 0])
+# here we call the compiled stencil function
+compiled_kernel(img=img, dst=filtered_image, w_2=0.5)
+plt.imshow(filtered_image, cmap='gray');
+
+# %% [markdown]
+# ### Digging into *pystencils*
+#
+# On our way we have created an ``ast``-object. We can inspect this, to see what *pystencils* actually does.
+
+# %%
+#   TODO nbackend
+# ps.to_dot(ast, graph_style={'size': "9.5,12.5"})
+
+# %% [markdown]
+# *pystencils* also builds a tree structure of the program, where each `Assignment` node internally again has a *sympy* AST which is not printed here. Out of this representation *C* code can be generated:
+
+# %%
+ps.show_code(ast)
+
+# %% [markdown]
+# Behind the scenes this code is compiled into a shared library and made available as a Python function. Before compiling this function we can modify the AST object, for example to parallelize it with OpenMP.
+
+# %%
+cfg = ps.CreateKernelConfig()
+cfg.cpu.openmp.enable = True
+cfg.cpu.openmp.num_threads = 2
+ast = ps.create_kernel(
+    update_rule,
+    cfg
+)
+
+ps.show_code(ast)
+
+# %% [markdown]
+# ### Fixed array sizes
+#
+# Since we already know the arrays to which the kernel should be applied, we can 
+# create *Field* objects with fixed size, based on a numpy array:
+
+# %%
+img_field, dst_field = ps.fields("I(4), dst : [2D]", I=img.astype(np.double), dst=filtered_image)
+
+sobel_x = -2 * img_field[-1,0](1) - img_field[-1,-1](1) - img_field[-1, +1](1) \
+         +2 * img_field[+1,0](1) + img_field[+1,-1](1) - img_field[+1, +1](1)
+update_rule = ps.Assignment(dst_field[0,0], sobel_x)
+
+ast = create_kernel(update_rule)
+ps.show_code(ast)
+
+# %% [markdown]
+# Compare this code to the version above. In this code the loop bounds and array offsets are constants, which usually leads to faster kernels.
+
+# %% [markdown]
+# ### Running on GPU
+#
+# If you have a GPU and [cupy](https://cupy.dev/) installed, *pystencils* can run your kernel on the GPU as well. You can find more details about this in the GPU tutorial.
+
+# %%
+try:
+    import cupy
+    from pystencils.gpu import BlockIndexing
+
+    gpu_ast = create_kernel(update_rule, target=ps.Target.GPU,
+                            gpu_indexing=BlockIndexing,
+                            gpu_indexing_params={'blockSize': (64, 1, 1)})
+
+    ps.show_code(gpu_ast)
+except ImportError:
+    print("Please install cupy for GPU support")
--- a/docs/source/tutorials/02_tutorial_basic_kernels.py
+++ b/docs/source/tutorials/02_tutorial_basic_kernels.py
+# ---
+# jupyter:
+#   jupytext:
+#     custom_cell_magics: kql
+#     text_representation:
+#       extension: .py
+#       format_name: percent
+#       format_version: '1.3'
+#       jupytext_version: 1.11.2
+#   kernelspec:
+#     display_name: .venv
+#     language: python
+#     name: python3
+# ---
+
+# %%
+from pystencils.session import *
+
+# %% [markdown]
+# # Tutorial 02: Basic Kernel generation with *pystencils*
+#
+# Now that you have an [overview of pystencils](01_tutorial_getting_started.ipynb), 
+# this tutorial shows in more detail how to formulate, optimize and run stencil kernels.
+#
+# ## 1) Kernel Definition
+#
+# ### a) Defining kernels with assignment lists and the  `kernel` decorator 
+#
+# *pystencils* gets a symbolic formulation of the kernel. This can be either an `Assignment` or a sequence of `Assignment`s that follow a set of restrictions. 
+#
+# Lets first create a kernel that consists of multiple assignments:
+
+# %%
+src_arr = np.zeros([20, 30])
+dst_arr = np.zeros_like(src_arr)
+
+dst, src = ps.fields(dst=dst_arr, src=src_arr)
+
+# %%
+grad_x, grad_y = sp.symbols("grad_x, grad_y")
+
+symbolic_description = [
+    ps.Assignment(grad_x, (src[1, 0] - src[-1, 0]) / 2),
+    ps.Assignment(grad_y, (src[0, 1] - src[0, -1]) / 2),
+    ps.Assignment(dst[0, 0], grad_x + grad_y),
+]
+kernel = ps.create_kernel(symbolic_description)
+symbolic_description
+
+
+# %% [markdown]
+# We created subexpressions, using standard sympy symbols on the left hand side, to split the kernel into multiple assignments. Defining a kernel using a list of `Assignment`s is quite tedious and hard to read. 
+# To simplify the formulation of a kernel, *pystencils* offers the `kernel` decorator, that transforms a normal Python function with `@=` assignments into an assignment list that can be passed to `create_kernel`. 
+
+# %%
+@ps.kernel
+def symbolic_description_using_function():
+    grad_x @= (src[1, 0] - src[-1, 0]) / 2
+    grad_y @= (src[0, 1] - src[0, -1]) / 2
+    dst[0, 0] @= grad_x + grad_y
+symbolic_description_using_function
+
+# %% [markdown]
+# The decorated function can contain any Python code, only the `@=` operator, and the ternary inline `if-else` operator have different meaning. 
+#
+# ### b) Ternary 'if' with `Piecewise`
+#
+# The ternary operator maps to `sympy.Piecewise` functions, that can be used to introduce branching into the kernel. Piecewise defined functions must give a value for every input, i.e. there must be a 'otherwise' clause in the end that is indicated by the condition `True`. Piecewise objects are standard sympy terms that can be integrated into bigger expressions:
+
+# %%
+sp.Piecewise((1.0, src[0,1] > 0), (0.0, True)) + src[1, 0]
+
+
+# %% [markdown]
+# Piecewise objects are created by the `kernel` decorator for ternary if-else statements.
+
+# %%
+@ps.kernel
+def kernel_with_piecewise():
+    grad_x @= (src[1, 0] - src[-1, 0]) / 2 if src[-1, 0] > 0 else 0.0
+kernel_with_piecewise
+
+
+# %% [markdown]
+# ### c) Assignment level optimizations using `AssignmentCollection`
+#
+# When the kernels get larger and more complex, it is helpful to organize the list of assignment into a more structured way. The `AssignmentCollection` offers optimizating transformation on a list of assignments. It holds two assignment lists, one for subexpressions and one for the main assignments. Main assignments are typically those that write to an array.
+
+# %%
+@ps.kernel
+def somewhat_longer_dummy_kernel(s):
+    s.a @= src[0, 1] + src[-1, 0]
+    s.b @= 2 * src[1, 0] + src[0, -1]
+    s.c @= src[0, 1] + 2 * src[1, 0] + src[-1, 0] + src[0, -1] - src[0,0]
+    dst[0, 0] @= s.a + s.b + s.c
+    
+ac = ps.AssignmentCollection(main_assignments=somewhat_longer_dummy_kernel[-1:], 
+                             subexpressions=somewhat_longer_dummy_kernel[:-1])
+ac
+
+# %%
+ac.operation_count
+
+# %% [markdown]
+# The `pystencils.simp` submodule offers several functions to optimize a collection of assignments.
+# It also offers functionality to group optimization into strategies and evaluate them. 
+# In this example we reduce the number of operations by reusing existing subexpressions to get rid of two unnecessary floating point additions. For more information about assignment collections and simplifications see the [demo notebook](demo_assignment_collection.ipynb).
+
+# %%
+opt_ac = ps.simp.subexpression_substitution_in_existing_subexpressions(ac)
+opt_ac
+
+# %%
+# FIXME currently unavailable
+# opt_ac.operation_count
+
+# %% [markdown]
+# ### d) Ghost layers and iteration region
+#
+# When creating a kernel with neighbor accesses, *pystencils* automatically restricts the iteration region, such that all accesses are safe. 
+
+# %%
+kernel = ps.create_kernel(ps.Assignment(dst[0,0], src[2, 0] + src[-1, 0]))
+ps.show_code(kernel)
+
+# %% [markdown]
+# When no additional ghost layer information is given, *pystencils* looks at all neighboring field accesses and introduces the required number of ghost layers **for all directions**. In the example above the largest neighbor accesses was ``src[2, 0]``, so theoretically we would need 2 ghost layers only the the end of the x coordinate. 
+# By default *pystencils* introduces 2 ghost layers at all borders of the domain. The next cell shows how to change this behavior. Be careful with manual ghost layer specification, wrong values may lead to SEGFAULTs.
+
+# %%
+gl_spec = [(0, 2),   # 0 ghost layers at the left, 2 at the right border
+           (1, 0)]   # 1 ghost layer at the lower y, one at the upper y coordinate
+kernel = ps.create_kernel(ps.Assignment(dst[0,0], src[2, 0] + src[-1, 0]), ghost_layers=gl_spec)
+ps.show_code(kernel)
+
+# %% [markdown]
+# ## 2 ) Restrictions
+#
+#
+# ### a) Independence Restriction
+#
+# *pystencils* only works for kernels where each array element can be updated independently from all other elements. This restriction ensures that the kernels can be easily parallelized and also be run on the GPU. Trying to define kernels where the results depends on the iteration order, leads to a ValueError.
+
+# %%
+invalid_description = [
+    ps.Assignment(dst[1, 0], src[1, 0] + src[-1, 0]),
+    ps.Assignment(dst[0, 0], src[1, 0] - src[-1, 0]),
+]
+try:
+    invalid_kernel = ps.create_kernel(invalid_description)
+    assert False, "Should never be executed"
+except ps.KernelConstraintsError as e:
+    print(e)
+
+# %% [markdown]
+# The independence restriction makes sure that the kernel can be safely parallelized by checking the following conditions: If a field is modified inside the kernel, it may only be modified at a single spatial position. In that case the field may also only be read at this position. Fields that are not modified may be read at multiple neighboring positions.
+#
+# Specifically, this rule allows for in-place updates that don't access neighbors.
+
+# %%
+valid_kernel = ps.create_kernel(ps.Assignment(src[0,0], 2*src[0,0] + 42))
+
+# %% [markdown]
+# If a field stores multiple values per cell, as in the next example, this restriction only applies for accesses with the same index.
+
+# %%
+v = ps.fields("v(2): double[2D]")
+valid_kernel = ps.create_kernel([ps.Assignment(v[0,0](1), 2*v[0,0](1) + 42),
+                                 ps.Assignment(v[0,1](0), 2*v[1,0](0) + 42)])
+
+
+# %% [markdown]
+# ### b) Static Single Assignment Form
+#
+# All assignments that don't write to a field must be in SSA form
+# 1. Each sympy symbol may only occur once as a left-hand-side (fields can be written multiple times)
+# 2. A symbol has to be defined before it is used. If it is never defined it is introduced as function parameter
+#
+# The next cell demonstrates the first SSA restriction:
+
+# %%
+@ps.kernel
+def not_allowed():
+    a, b = sp.symbols("a b")
+    a @= src[0, 0]
+    b @= a + 3
+    a @= src[-1, 0]
+    dst[0, 0] @= a + b
+try:
+    ps.create_kernel(not_allowed)
+    assert False
+except ps.KernelConstraintsError as e:
+    print(e)
+
+
+# %% [markdown]
+# Also it is not allowed to write a field at the same location
+
+# %%
+@ps.kernel
+def not_allowed():
+    dst[0, 0] @= src[0, 1] + src[1, 0]
+    dst[0, 0] @= 2 * dst[0, 0]
+
+try:
+    ps.create_kernel(not_allowed)
+    assert False
+except ps.KernelConstraintsError as e:
+    print(e)
+
+# %% [markdown]
+# This situation should be resolved by introducing temporary variables
+
+# %%
+tmp_var = sp.Symbol("a")
+
+@ps.kernel
+def allowed():
+    tmp_var @= src[0, 1] + src[1, 0]
+    dst[0, 0] @= 2 * tmp_var
+
+
+ast = ps.create_kernel(allowed)
+ps.show_code(ast)
--- a/docs/source/tutorials/03_tutorial_datahandling.py
+++ b/docs/source/tutorials/03_tutorial_datahandling.py
+# ---
+# jupyter:
+#   jupytext:
+#     text_representation:
+#       extension: .py
+#       format_name: percent
+#       format_version: '1.3'
+#       jupytext_version: 1.16.7
+#   kernelspec:
+#     display_name: Python 3 (ipykernel)
+#     language: python
+#     name: python3
+# ---
+
+# %%
+import psutil
+from mpl_toolkits.mplot3d import Axes3D
+from matplotlib import pyplot, cm
+from pystencils.session import *
+from pystencils.boundaries import add_neumann_boundary, Neumann, Dirichlet, BoundaryHandling
+from pystencils.slicing import slice_from_direction
+import math
+import time
+# %matplotlib inline
+
+# %% [markdown]
+# Test to see if cupy is installed which is needed to run calculations on the GPU
+
+# %%
+try:
+    import cupy
+    gpu = True
+except ImportError:
+    gpu = False
+    cupy = None
+    print('No cupy installed')
+
+# %% [markdown]
+# # Tutorial 03: Datahandling
+
+# %% [markdown]
+# This is a tutorial about the `DataHandling` class of pystencils. This class is an abstraction layer to
+# - link numpy arrays to pystencils fields
+# - handle CPU-GPU array transfer, such that one can write code that works on CPU and GPU
+# - makes it possible to write MPI parallel simulations to run on distributed-memory clusters using the waLBerla library
+#
+# We will look at a small and easy example to demonstrate the usage of `DataHandling` objects. We will define an averaging kernel to every cell of an array, that writes the average of the neighbor cell values to the center.
+
+# %% [markdown]
+# ## 1. Manual 
+#
+# ### 1.1. CPU kernels
+#
+# In this first part, we set up a scenario manually without a `DataHandling`. In the next sections we then repeat the same setup with the help of the data handling. 
+#
+# One concept of *pystencils* that may be confusing at first, is the differences between pystencils fields and numpy arrays. Fields are used to describe the computation *symbolically* with sympy, while numpy arrays hold the actual values where the computation is executed on. 
+#
+# One option to create and execute a *pystencils* kernel is listed below. For reasons that become clear later we call this the **variable-field-size workflow**:
+#
+# 1. define pystencils fields
+# 2. use sympy and the pystencils fields to define an update rule, that describes what should be done on *every cell* 
+# 3. compile the update rule to a real function, that can be called from Python. For each field that was referenced in the symbolic description the function expects a numpy array, passed as named parameter
+# 4. create some numpy arrays with actual data 
+# 5. call the kernel - usually many times
+#
+# Now, lets see how this actually looks in Python code:
+
+# %%
+# 1. field definitions
+src_field, dst_field = ps.fields("src, dst:[2D]")
+
+# 2. define update rule
+update_rule = [ps.Assignment(lhs=dst_field[0, 0],
+                             rhs=(src_field[1, 0] + src_field[-1, 0] + 
+                                  src_field[0, 1] + src_field[0, -1]) / 4)]
+
+# 3. compile update rule to function
+kernel_function = ps.create_kernel(update_rule).compile()
+
+# 4. create numpy arrays and call kernel
+src_arr, dst_arr = np.random.rand(30, 30), np.zeros([30, 30])
+
+# 5. call kernel
+kernel_function(src=src_arr, dst=dst_arr)  # names of arguments have to match names passed to ps.fields()
+
+# %% [markdown]
+# This workflow separates the symbolic and the numeric stages very cleanly. The separation also makes it possible to stop after step 3, write the C-code to a file and call the kernel from a C program. Speaking of the C-Code - lets have a look at the generated sources:
+
+# %%
+ps.show_code(kernel_function.ast)
+
+# %% [markdown]
+# Even if it looks very ugly and low-level :) lets look at this code in a bit more detail. The code is generated in a way that it works for different array sizes. The size of the array is passed in the `_size_dst_` variables that specifiy the shape of the array for each dimension. Also, the memory layout (linearization) of the array can be different. That means the array could be stored in row-major or column-major order - if we pass in the array strides correctly the kernel does the right thing. If you're not familiar with the concept of strides check out [this stackoverflow post](https://stackoverflow.com/questions/53097952/how-to-understand-numpy-strides-for-layman) or search in the numpy documentation for strides - C vs Fortran order.
+#
+# The goal of *pystencils* is to produce the fastest possible code. One technique to do this is to use all available information already on compile time and generate code that is highly adapted to the specific problem. In our case we already know the shape and strides of the arrays we want to apply the kernel to, so we can make use of this information. This idea leads to the **fixed-field-size workflow**. The main difference there is that we define the arrays first and therefore let *pystencils* know about the array shapes and strides, so that it can generate more specific code:
+#
+# 1. create numpy arrays that hold your data
+# 2. define pystencils fields, this time telling pystencils already which arrays they correspond to, so that it knows about the size and strides
+#
+# in the other steps nothing has changed
+#
+# 3. define the update rule 
+# 4. compile update rule to kernel
+# 5. run the kernel
+
+# %%
+# 1. create arrays first
+src_arr, dst_arr = np.random.rand(30, 30), np.zeros([30, 30])
+
+# 2. define symbolic fields - note the additional parameter that link an array to each field
+src_field, dst_field = ps.fields("src, dst:[2D]", src=src_arr, dst=dst_arr)
+
+# 3. define update rule
+update_rule = [ps.Assignment(lhs=dst_field[0, 0],
+                             rhs=(src_field[1, 0] + src_field[-1, 0] + 
+                                  src_field[0, 1] + src_field[0, -1]) / 4)]
+
+# 4. compile it
+kernel_function = ps.create_kernel(update_rule).compile()
+
+# 5. call kernel
+kernel_function(src=src_arr, dst=dst_arr)  # names of arguments have to match names passed to ps.fields()
+
+# %% [markdown]
+# Functionally, both variants are equivalent. We see the difference only when we look at the generated code
+
+# %%
+ps.show_code(kernel_function.ast)
+
+# %% [markdown]
+# Compare this to the code above! It looks much simpler. The reason is that all index computations are already simplified since the exact field sizes and strides are known. This kernel now only works on arrays of the previously specified size. 
+#
+# Lets try what happens if we use a different array:
+
+# %%
+src_arr2, dst_arr2 = np.random.rand(40, 40), np.zeros([40, 40])
+try:
+    kernel_function(src=src_arr2, dst=dst_arr2)
+except ValueError as e:
+    print(e)
+
+# %% [markdown]
+# ### 1.2. GPU simulations
+#
+# Let's now jump to a seemingly unrelated topic: running kernels on the GPU. 
+# When creating the kernel, an additional parameter `target=ps.Target.GPU` has to be passed. Also, the compiled kernel cannot be called with numpy arrays directly, but has to be called with `cupy.array`s instead. That means, we have to transfer our numpy array to GPU first. From this step we obtain a gpuarray, then we can run the kernel, hopefully multiple times so that the data transfer was worth the time. Finally we transfer the finished result back to CPU:
+
+# %%
+if cupy:
+    config = ps.CreateKernelConfig(target=ps.Target.GPU)
+    kernel_function_gpu = ps.create_kernel(update_rule, config=config).compile()
+    # transfer to GPU
+    src_arr_gpu = cupy.asarray(src_arr)
+    dst_arr_gpu = cupy.asarray(dst_arr)
+    
+    # run kernel on GPU, this is done many times in real setups
+    kernel_function_gpu(src=src_arr_gpu, dst=dst_arr_gpu)
+    
+    # transfer result back to CPU
+    dst_arr = dst_arr_gpu.get()
+
+# %% [markdown]
+# ###  1.3. Summary: manual way
+#
+# - Don't confuse *pystencils* fields and *numpy* arrays
+#     - fields are symbolic
+#     - arrays are numeric
+# - Use the fixed-field-size workflow whenever possible, since code might be faster. Create arrays first, then create fields from arrays
+# - if we run GPU kernels, arrays have to transferred to the GPU first
+#
+# As demonstrated in the examples above we have to define 2 or 3 corresponding objects for each grid:
+#
+# - symbolic pystencils field
+# - numpy array on CPU 
+# - for GPU use also cupy to mirror the data on the GPU
+#
+# Managing these three objects manually is tedious and error-prone. We'll see in the next section how the data handling object takes care of this problem.
+
+# %% [markdown]
+# ## 2. Introducing the data handling - serial version
+#
+# ### 2.1. Example for CPU simulations
+#
+# The data handling internally keeps a mapping between symbolic fields and numpy arrays. When we create a field, automatically a corresponding array is allocated as well. Optionally we can also allocate memory on the GPU for the array as well. Lets dive right in and see how our example looks like, when implemented with a data handling.
+
+# %%
+dh = ps.create_data_handling(domain_size=(30, 30))
+
+# fields are now created using the data handling
+src_field = dh.add_array('src', values_per_cell=1)
+dst_field = dh.add_array('dst', values_per_cell=1)
+
+# kernel is created just like before
+update_rule = [ps.Assignment(lhs=dst_field[0, 0],
+                             rhs=(src_field[1, 0] + src_field[-1, 0] + src_field[0, 1] + src_field[0, -1]) / 4)]
+kernel_function = ps.create_kernel(update_rule).compile()
+
+# have a look at the generated code - it uses 
+# the fast version where array sizes are compiled-in
+#  ps.show_code(kernel_function.ast)
+
+# %% [markdown]
+# The data handling has methods to create fields - but where are the corresponding arrays? 
+# In the serial case you can access them as a member of the data handling, for example to initialize our 'src' array we can write
+
+# %%
+src_arr = dh.cpu_arrays['src']
+src_arr.fill(0.0)
+
+# %% [markdown]
+# This method is nice and easy, but you should not use it if you want your simulation to run on distributed-memory clusters. We'll see why in the last section. So it is good habit to not access the arrays directly but use the data handling to do so. We can, for example, initialize the array also with the following code:
+
+# %%
+dh.fill('src', 0.0)
+
+# %% [markdown]
+# To run the kernels with the same code as before, we would also need the arrays. We could do that accessing the `cpu_arrays`:
+
+# %%
+kernel_function(src=dh.cpu_arrays['src'],
+                dst=dh.cpu_arrays['dst'])
+
+# %% [markdown]
+# but to be prepared for MPI parallel simulations, again a method of the data handling should be used for this. 
+# Besides, this method is also simpler to use - since it automatically detects which arrays a kernel uses and passes them in.
+
+# %%
+dh.run_kernel(kernel_function)
+
+# %% [markdown]
+# To access the data read-only instead of using `cpu_arrays` the gather function should be used.
+# This function gives you a read-only copy of the domain or part of the domain. 
+# We will discuss this function later in more detail when we look at MPI parallel simulations.
+# For serial simulations keep in mind that modifying the resulting array does not change your original data!
+
+# %%
+read_only_copy = dh.gather_array('src', ps.make_slice[:, :], ghost_layers=False)
+
+# %% [markdown]
+# ### 2.2. Example for GPU simulations
+#
+# In this section we have a look at GPU simulations using the data handling. Only minimal changes are required.
+# When creating the data handling we can pass a 'default_target'. This means for every added field an array on the CPU and the GPU is allocated. This is a useful default, for more fine-grained control the `add_array` method also takes additional parameters controlling where the array should be allocated.
+#
+# Additionally we also need to compile a GPU version of the kernel.
+
+# %%
+if gpu is False:
+    dh = ps.create_data_handling(domain_size=(30, 30), default_target=ps.Target.CPU)
+else:
+    dh = ps.create_data_handling(domain_size=(30, 30), default_target=ps.Target.GPU)
+
+# fields are now created using the data handling
+src_field = dh.add_array('src', values_per_cell=1)
+dst_field = dh.add_array('dst', values_per_cell=1)
+
+# kernel is created just like before
+update_rule = [ps.Assignment(lhs=dst_field[0, 0],
+                             rhs=(src_field[1, 0] + src_field[-1, 0] + src_field[0, 1] + src_field[0, -1]) / 4)]
+config = ps.CreateKernelConfig(target=dh.default_target)
+kernel_function = ps.create_kernel(update_rule, config=config).compile()
+
+dh.fill('src', 0.0)
+
+# %% [markdown]
+# The data handling provides function to transfer data between CPU and GPU
+
+# %%
+if gpu:
+    dh.to_gpu('src')
+    
+dh.run_kernel(kernel_function)
+
+if gpu:
+    dh.to_cpu('dst')
+
+# %% [markdown]
+# usually one wants to transfer all fields that have been allocated on CPU and GPU at the same time:
+
+# %%
+dh.all_to_gpu()
+dh.run_kernel(kernel_function)
+dh.all_to_cpu()
+
+# %% [markdown]
+# We can always include the `all_to_*` functions in our code, since they do nothing if there are no arrays allocated on the GPU. Thus there is only a single point in the code where we can switch between CPU and GPU version: the `default_target` of the data handling. 
+
+# %% [markdown]
+# ### 2.2 Ghost Layers and periodicity
+#
+# The data handling can also provide periodic boundary conditions. Therefor the domain is extended by one layer of cells, the so-called ghost layer or halo layer.
+
+# %%
+print("Shape of domain         ", dh.shape)
+print("Direct access to arrays ", dh.cpu_arrays['src'].shape)
+print("Gather                  ", dh.gather_array('src', ghost_layers=True).shape)
+
+# %% [markdown]
+#  So the actual arrays are 2 cells larger than what you asked for. This additional layer is used to copy over the data from the other end of the array, such that for the stencil algorithm effectively the domain is periodic. This copying operation has to be started manually though:
+
+# %%
+dh = ps.create_data_handling((4, 4), periodicity=(True, False))
+dh.add_array("src")
+
+# get copy function
+copy_fct = dh.synchronization_function(['src'])
+
+dh.fill('src', 0.0, ghost_layers=True)
+dh.fill('src', 3.0, ghost_layers=False)
+before_sync = dh.gather_array('src', ghost_layers=True).copy()
+
+copy_fct() # copy over to get periodicity in x direction
+
+after_sync = dh.gather_array('src', ghost_layers=True).copy()
+plt.subplot(1,2,1)
+plt.scalar_field(before_sync);
+plt.title("Before")
+plt.subplot(1,2,2)
+plt.scalar_field(after_sync);
+plt.title("After");
+
+# %% [markdown]
+# ## 3. Going (MPI) parallel - the parallel data handling
+#
+# ### 3.1. Conceptual overview
+# To run MPI parallel simulations the waLBerla framework is used. waLBerla has to be compiled against your local MPI library and thus is a bit hard to install. We suggest to use the version shipped with conda for testing. For production, when you want to run on a cluster the best option is to build waLBerla yourself against the MPI library that is installed on your cluster.
+#
+# Now lets have a look on how to write code that runs MPI parallel.
+# Since the data is distributed, we don't have access to the full array any more but only to the part that is stored locally. The domain is split into so called blocks, where one process might get one (or sometimes multiple) blocks. To do anything with the local part of the data we iterate over the **local** blocks to get the contents as numpy arrays. The blocks returned in the loop differ from process to process.
+#
+# Copy the following snippet to a python file and run with multiple processes e.g.:
+# ``mpirun -np 4 myscript.py`` you will see that there are as many blocks as processes.
+
+# %%
+dh = ps.create_data_handling(domain_size=(30, 30), parallel=True)
+field = dh.add_array('field')
+for block in dh.iterate():
+    # offset is in global coordinates, where first inner cell has coordiante (0,0) 
+    # and ghost layers have negative coordinates
+    print(block.offset, block['field'].shape) 
+    
+    # use offset to create a local array 'my_data' for the part of the domain
+    #np.copyto(block[field.name], my_data)
+
+# %% [markdown]
+# To get some more interesting results here in the notebook we put multiple blocks onto our single notebook process. This makes not much sense for real simulations, but for testing and demonstration purposes this is useful.
+
+# %%
+from waLBerla import createUniformBlockGrid
+from pystencils.datahandling import ParallelDataHandling
+
+blocks = createUniformBlockGrid(blocks=(2,1,2), cellsPerBlock=(20, 10, 20), 
+                                oneBlockPerProcess=False, periodic=(1, 0, 0))
+dh = ParallelDataHandling(blocks)
+field = dh.add_array('field')
+for block in dh.iterate():
+    print(block.offset, block['field'].shape)
+
+# %% [markdown]
+# Now we see that we have four blocks with (20, 10, 20) block each, and the global domain is (40, 10, 40) big.
+# All subblock also have a ghost layer around them, which is used to synchronize with their neighboring blocks (over the network). For ghost layer synchronization the same `synchronization_function` is used that we used above for periodic boundaries, because copying between blocks and copying the ghost layer for periodicity uses the same mechanism.
+
+# %%
+dh.gather_array('field').shape
+
+# %% [markdown]
+# ### 3.2. Parallel example 
+#
+# To illustrate this in more detail, lets run a simple kernel on a parallel domain. waLBerla can handle 3D domains only, so we choose a z-size of 1.
+
+# %%
+blocks = createUniformBlockGrid(blocks=(2,4,1), cellsPerBlock=(20, 10, 1), 
+                                oneBlockPerProcess=False, periodic=(1, 1, 0))
+dh = ParallelDataHandling(blocks, dim=2)
+src_field = dh.add_array('src')
+dst_field = dh.add_array('dst')
+
+update_rule = [ps.Assignment(lhs=dst_field[0, 0 ],
+                             rhs=(src_field[1, 0] + src_field[-1, 0] + 
+                                  src_field[0, 1] + src_field[0, -1]) / 4)]
+
+# 3. compile update rule to function
+kernel_function = ps.create_kernel(update_rule).compile()
+
+# %% [markdown]
+# Now lets initialize the arrays. To do this we can get arrays (meshgrids) from the block with the coordinates of the local cells. We use this to initialize a circular shape.
+
+# %%
+dh.fill('src', 0.0)
+for block in dh.iterate(ghost_layers=False, inner_ghost_layers=False):
+    x, y = block.midpoint_arrays
+    inside_sphere = (x -20)**2 + (y-25)**2  < 8 ** 2
+    block['src'][inside_sphere] = 1.0
+plt.scalar_field( dh.gather_array('src') );
+
+# %% [markdown]
+# Now we can run our compute kernel on this data as usual. We just have to make sure that the field is synchronized after every step, i.e. that the ghost layers are correctly updated.
+
+# %%
+sync = dh.synchronization_function(['src'])
+for i in range(40):
+    sync()
+    dh.run_kernel(kernel_function)
+    dh.swap('src', 'dst')
+
+# %%
+plt.scalar_field( dh.gather_array('src') );
--- a/docs/source/tutorials/04_tutorial_advection_diffusion.py
+++ b/docs/source/tutorials/04_tutorial_advection_diffusion.py
+# ---
+# jupyter:
+#   jupytext:
+#     text_representation:
+#       extension: .py
+#       format_name: percent
+#       format_version: '1.3'
+#       jupytext_version: 1.16.7
+#   kernelspec:
+#     display_name: Python 3
+#     language: python
+#     name: python3
+# ---
+
+# %%
+from pystencils.session import *
+
+# %% [markdown]
+# # Tutorial 04: Advection Diffusion - Simple finite differences discretization
+#
+# In this tutorial we demonstrate how to use the discretization layer on top of *pystencils*, that defines how continuous differential operators are discretized. The result of this discretization layer are stencil equations which are used to generated C or CUDA code.
+#
+# We are going to discretize the [advection diffusion equation](https://en.wikipedia.org/wiki/Convection%E2%80%93diffusion_equation) without reaction terms:
+
+# %%
+domain_size = (200, 80)
+dim = len(domain_size)
+
+# create arrays
+c_arr = np.zeros(domain_size)
+v_arr = np.zeros(domain_size + (dim,))
+
+# create fields
+c, v, c_next = ps.fields("c, v(2), c_next: [2d]", c=c_arr, v=v_arr, c_next=c_arr)
+
+# write down advection diffusion pde
+# the equation is represented by a single term and an implicit "=0" is assumed.
+adv_diff_pde = ps.fd.transient(c) - ps.fd.diffusion(c, sp.Symbol("D")) + ps.fd.advection(c, v)
+adv_diff_pde
+
+# %% [markdown]
+# It describes how the concentration $c$ of a passive substance, which does not influence the flow field, behaves in a fluid with given velocity $v$.
+# To illustrate the two effects here, image we release a constant stream of dye at some point in a river. The dye is transported with the flow (advection) but also the trace gets wider and wider normal to the flow direction due to diffusion.
+
+# %%
+discretize = ps.fd.Discretization2ndOrder(1, 0.01)
+discretization = discretize(adv_diff_pde)
+discretization.subs(sp.Symbol("D"),1)
+
+# %%
+ast = ps.create_kernel([ps.Assignment(c_next.center(), discretization.subs(sp.Symbol("D"), 1))])
+kernel = ast.compile()
+
+# %%
+y = np.linspace(0, 1, v_arr.shape[1])
+v_arr[:, :, 0] = -y * (y - 1.0) * 5
+plt.vector_field(v_arr);
+
+
+# %%
+def boundary_handling(c):
+    # No concentration at the upper, lower wall and the left inflow border
+    c[:, 0] = 0
+    c[:, -1] = 0
+    c[0, :] = 0
+    # At outflow border: neumann boundaries by copying last valid layer
+    c[-1, :] = c[-2, :]
+    
+    # Some source inside the domain
+    c[10: 15, 25:30] = 1.0
+    c[20: 25, 60:65] = 1.0
+
+c_tmp_arr = np.empty_like(c_arr)
+def timeloop(steps=100):
+    global c_arr, c_tmp_arr
+    for i in range(steps):
+        boundary_handling(c_arr)
+        kernel(c=c_arr, c_next=c_tmp_arr, v=v_arr)
+        c_arr, c_tmp_arr = c_tmp_arr, c_arr
+    return c_arr
+
+
+# %%
+if 'is_test_run' in globals():
+    timeloop(10)
+    result = None
+else:
+    ani = ps.plot.scalar_field_animation(timeloop, rescale=True, frames=300)
+    result = ps.jupyter.display_as_html_video(ani)
+result
--- a/docs/source/tutorials/05_tutorial_phasefield_spinodal_decomposition.py
+++ b/docs/source/tutorials/05_tutorial_phasefield_spinodal_decomposition.py
+# ---
+# jupyter:
+#   jupytext:
+#     text_representation:
+#       extension: .py
+#       format_name: percent
+#       format_version: '1.3'
+#       jupytext_version: 1.16.7
+#   kernelspec:
+#     display_name: Python 3 (ipykernel)
+#     language: python
+#     name: python3
+# ---
+
+# %%
+from pystencils.session import *
+
+# %% [markdown]
+# # Tutorial 05: Phase-field simulation of spinodal decomposition
+#
+# In this series of demos, we show how to implement simple phase field models using finite differences.
+# We implement examples from the book **Programming Phase-Field Modelling** by S. Bulent Biner. 
+# Specifically, the model for spinodal decomposition implemented in this notebook can be found in Section 4.4 of the book.
+#
+# First we create a DataHandling instance, that manages the numpy arrays and their corresponding symbolic *sympy* fields. We create two arrays, one for the concentration $c$ and one for the chemical potential $\mu$, on a 2D periodic domain.
+
+# %% language="python3" language_info={"name": "python3", "pygments_lexer": "python3"}
+# dh = ps.create_data_handling(domain_size=(256, 256), periodicity=True)
+# μ_field = dh.add_array('mu', latex_name='μ')
+# c_field = dh.add_array('c')
+
+# %% [markdown]
+# In the next cell we build up the free energy density, consisting of a bulk and an interface component.
+# The bulk free energy is minimal in regions where only either phase 0 or phase 1 is present. Areas of mixture are penalized. The interfacial free energy penalized regions where the gradient of the phase field is large, i.e. it tends to smear out the interface. The strength of these counteracting contributions is balanced by the parameters $A$ for the bulk- and $\kappa$ for the interface part. The ratio of these parameters determines the interface width.
+
+# %%
+κ, A = sp.symbols("κ A")
+
+c = c_field.center
+μ = μ_field.center
+
+def f(c):
+    return A * c**2 * (1-c)**2
+
+bulk_free_energy_density = f(c)
+grad_sq = sum(ps.fd.diff(c, i)**2 for i in range(dh.dim))
+interfacial_free_energy_density = κ/2 * grad_sq
+
+free_energy_density = bulk_free_energy_density + interfacial_free_energy_density
+free_energy_density
+
+# %% [markdown]
+# In case you wonder what the index $C$ of the concentration means, it just indicates that the concentration is a field (array) and the $C$ indices indicates that we use the center value of the field when iterating over it. This gets important when we apply a finite difference discretization on the equation.
+#
+# The bulk free energy $c^2 (1-c)^2$ is just the simplest polynomial with minima at $c=0$ and $c=1$. 
+
+# %%
+plt.figure(figsize=(7,4))
+plt.sympy_function(bulk_free_energy_density.subs(A, 1), (-0.2, 1.2))
+plt.xlabel("c")
+plt.title("Bulk free energy");
+
+# %% [markdown]
+# To minimize the total free energy we use the Cahn Hilliard equation
+#
+# $$\partial_t c = \nabla \cdot \left( M \nabla \frac{\delta F}{\delta c} \right)$$
+#
+# where the functional derivative $\frac{\delta F}{\delta c}$ in this case is the chemical potential $\mu$.
+# A functional derivative is computed as 
+# $\frac{\delta F}{\delta c} = \frac{\partial F}{\partial c} - \nabla \cdot \frac{\partial F}{\partial \nabla c}$. 
+# That means we treat $\nabla c$ like a normal variable when calculating derivatives.
+#
+# We don't have to worry about that in detail, since pystencils offers a function to do just that:
+
+# %%
+ps.fd.functional_derivative(free_energy_density, c)
+
+# %% [markdown]
+# In this case we could quite simply do this derivative by hand but for more complex phase field models this step is quite tedious.
+#
+# If we discretize this term using finite differences, we have a computation rule how to compute the chemical potential $\mu$ from the free energy.
+
+# %%
+discretize = ps.fd.Discretization2ndOrder(dx=1, dt=0.01)
+
+μ_update_eq = ps.fd.functional_derivative(free_energy_density, c)
+μ_update_eq = ps.fd.expand_diff_linear(μ_update_eq, constants=[κ])  # pull constant κ in front of the derivatives
+μ_update_eq_discretized = discretize(μ_update_eq)
+μ_update_eq_discretized
+
+# %% [markdown]
+# pystencils computed the finite difference approximation for us. This was only possible since all symbols occuring inside derivatives are pystencils field variables, so that neighboring values can be accessed.
+# Next we bake this formula into a kernel that writes the chemical potential to a field. Therefor we first insert the $\kappa$ and $A$ parameters, build an assignment out of it and compile the kernel
+
+# %%
+μ_kernel = ps.create_kernel([ps.Assignment(μ_field.center, 
+                                           μ_update_eq_discretized.subs(A, 1).subs(κ, 0.5))]
+                           ).compile()
+
+# %% [markdown]
+# Next, we formulate the Cahn-Hilliard equation itself, which is just a diffusion equation:
+
+# %%
+M = sp.Symbol("M")
+cahn_hilliard = ps.fd.transient(c) - ps.fd.diffusion(μ, M)
+cahn_hilliard
+
+# %% [markdown]
+# It can be given right away to the `discretize` function, that by default uses a simple explicit Euler scheme for temporal, and second order finite differences for spatial discretization. 
+# It returns the update rule for the concentration field.
+
+# %%
+c_update = discretize(cahn_hilliard)
+c_update
+
+# %% [markdown]
+# Again, we build a kernel from this update rule:
+
+# %%
+c_kernel = ps.create_kernel([ps.Assignment(c_field.center, 
+                                           c_update.subs(M, 1))]
+                           ).compile()
+
+
+# %% [markdown]
+# Before we run the simulation, the domain has to be initialized. To access a numpy array inside a data handling we have to iterate over the data handling. This somewhat complicated way is necessary to be able to switch to distributed memory parallel simulations without having to alter the code. Basically this loops says "iterate over the portion of the domain that belongs to my process", which in our serial case here is just the full domain. 
+#
+# As suggested in the book, we initialize everything with $c=0.4$ and add some random noise on top of it.
+
+# %%
+def init(value=0.4, noise=0.02):
+    for b in dh.iterate():
+        b['c'].fill(value)
+        np.add(b['c'], noise*np.random.rand(*b['c'].shape), out=b['c'])
+
+
+# %% [markdown]
+# The time loop of the simulation is now rather straightforward. We call the kernels to update the chemical potential and the concentration in alternating fashion. In between we have to do synchronization steps for the fields that take care of the periodic boundary condition, and in the parallel case of the communciation between processes. 
+
+# %%
+def timeloop(steps=100):
+    c_sync = dh.synchronization_function(['c'])
+    μ_sync = dh.synchronization_function(['mu'])
+    for t in range(steps):
+        c_sync()
+        dh.run_kernel(μ_kernel)
+        μ_sync()
+        dh.run_kernel(c_kernel)
+    return dh.gather_array('c')
+init()
+
+# %% [markdown]
+# Now we can run the simulation and see how the phases separate
+
+# %%
+if 'is_test_run' in globals():
+    timeloop(10)
+    result = None
+else:
+    ani = ps.plot.scalar_field_animation(timeloop, rescale=True, frames=600)
+    result = ps.jupyter.display_as_html_video(ani)
+result
+
+# %% [markdown]
+# Now there are a lot of places to change and play with this model. Here a few ideas:
+#
+# - try different initial conditions and/or parameters $\kappa, A$
+# - the model can be generalized to 3D, by altering the DataHandling and the plot commands
+# - modify the free energy formulation, make one minima lower than the other, maybe even add another phase
--- a/docs/source/tutorials/06_tutorial_phasefield_dentritic_growth.py
+++ b/docs/source/tutorials/06_tutorial_phasefield_dentritic_growth.py
+# -*- coding: utf-8 -*-
+# ---
+# jupyter:
+#   jupytext:
+#     custom_cell_magics: kql
+#     text_representation:
+#       extension: .py
+#       format_name: percent
+#       format_version: '1.3'
+#       jupytext_version: 1.11.2
+#   kernelspec:
+#     display_name: .venv
+#     language: python
+#     name: python3
+# ---
+
+# %%
+from pystencils.session import *
+import pystencils as ps
+sp.init_printing()
+frac = sp.Rational
+
+# %% [markdown]
+# # Tutorial 06: Phase-field simulation of dentritic solidification
+#
+# This is the second tutorial on phase field methods with pystencils. Make sure to read the previous tutorial first. 
+#
+# In this tutorial we again implement a model described in **Programming Phase-Field Modelling** by S. Bulent Biner.
+# This time we implement the model from chapter 4.7 that describes dentritic growth. So get ready for some beautiful snowflake pictures.
+#
+# We start again by adding all required arrays fields. This time we explicitly store the change of the phase variable φ in time, since the dynamics is calculated using an Allen-Cahn formulation where a term $\partial_t \phi$ occurs.
+
+# %%
+dh = ps.create_data_handling(domain_size=(300, 300), periodicity=True, 
+                             default_target=ps.Target.CPU)
+φ_field = dh.add_array('phi', latex_name='φ')
+φ_field_tmp = dh.add_array('phi_temp', latex_name='φ_temp')
+φ_delta_field = dh.add_array('phidelta', latex_name='φ_D')
+t_field = dh.add_array('T')
+
+# %% [markdown]
+# This model has a lot of parameters that are created here in a symbolic fashion. 
+
+# %%
+ε, m, δ, j, θzero, α, γ, Teq, κ, τ = sp.symbols("ε m δ j θ_0 α γ T_eq κ τ")
+εb = sp.Symbol("\\bar{\\epsilon}")
+
+φ = φ_field.center
+φ_tmp = φ_field_tmp.center
+T = t_field.center
+
+def f(φ, m):
+    return φ**4 / 4 - (frac(1, 2) - m/3) * φ**3 + (frac(1,4)-m/2)*φ**2
+
+free_energy_density = ε**2 / 2 * (ps.fd.Diff(φ,0)**2 + ps.fd.Diff(φ,1)**2 ) + f(φ, m)
+free_energy_density
+
+# %% [markdown]
+# The free energy is again composed of a bulk and interface part.
+
+# %%
+plt.figure(figsize=(7,4))
+plt.sympy_function(f(φ, m=1), x_values=(-1.05, 1.5) )
+plt.xlabel("φ")
+plt.title("Bulk free energy");
+
+
+# %% [markdown]
+# Compared to last tutorial, this bulk free energy has also two minima, but at different values. 
+#
+# Another complexity of the model is its anisotropy. The gradient parameter $\epsilon$ depends on the interface normal.
+
+# %%
+def σ(θ):
+    return 1 + δ * sp.cos(j * (θ - θzero))
+
+θ = sp.atan2(ps.fd.Diff(φ, 1), ps.fd.Diff(φ, 0))
+
+ε_val = εb * σ(θ)
+ε_val
+
+
+# %%
+def m_func(T):
+    return (α / sp.pi) * sp.atan(γ * (Teq - T))
+
+
+# %% [markdown]
+# However, we just insert these parameters into the free energy formulation before doing the functional derivative, to make the dependence of $\epsilon$ on $\nabla \phi$ explicit.
+
+# %%
+fe = free_energy_density.subs({
+    m: m_func(T),
+    ε: εb * σ(θ),
+})
+
+dF_dφ = ps.fd.functional_derivative(fe, φ)
+dF_dφ = ps.fd.expand_diff_full(dF_dφ, functions=[φ])
+dF_dφ
+
+# %% [markdown]
+# Then we insert all the numeric parameters and discretize:
+
+# %%
+discretize = ps.fd.Discretization2ndOrder(dx=0.03, dt=1e-5)
+parameters = {
+    τ: 0.0003,
+    κ: 1.8,
+    εb: 0.01,
+    δ: 0.02,
+    γ: 10,
+    j: 6,
+    α: 0.9,
+    Teq: 1.0,
+    θzero: 0.2,
+    sp.pi: sp.pi.evalf()
+}
+parameters
+
+# %%
+dφ_dt = - dF_dφ / τ
+φ_eqs = ps.simp.sympy_cse_on_assignment_list([ps.Assignment(φ_delta_field.center, 
+                                                            discretize(dφ_dt.subs(parameters)))])
+φ_eqs.append(ps.Assignment(φ_tmp, discretize(ps.fd.transient(φ) - φ_delta_field.center)))
+
+temperature_evolution = -ps.fd.transient(T) + ps.fd.diffusion(T, 1) + κ * φ_delta_field.center
+temperature_eqs = [
+    ps.Assignment(T, discretize(temperature_evolution.subs(parameters)))
+]
+
+# %% [markdown]
+# When creating the kernels we pass as target (which may be 'Target.CPU' or 'Target.GPU') the default target of the target handling. This enables to switch to a GPU simulation just by changing the parameter of the data handling.
+#
+# The rest is similar to the previous tutorial.
+
+# %%
+φ_kernel = ps.create_kernel(φ_eqs, cpu_openmp=4, target=dh.default_target).compile()
+temperature_kernel = ps.create_kernel(temperature_eqs, target=dh.default_target).compile()
+
+
+# %%
+def timeloop(steps=200):
+    φ_sync = dh.synchronization_function(['phi'])
+    temperature_sync = dh.synchronization_function(['T'])
+    dh.all_to_gpu()  # this does nothing when running on CPU
+    for t in range(steps):
+        φ_sync()
+        dh.run_kernel(φ_kernel)
+        dh.swap('phi', 'phi_temp')
+        temperature_sync()
+        dh.run_kernel(temperature_kernel)
+    dh.all_to_cpu()
+    return dh.gather_array('phi')
+    
+def init(nucleus_size=np.sqrt(5)):
+    for b in dh.iterate():
+        x, y = b.cell_index_arrays
+        x, y = x-b.shape[0]//2, y-b.shape[0]//2
+        bArr = (x**2 + y**2) < nucleus_size**2
+        b['phi'].fill(0)
+        b['phi'][(x**2 + y**2) < nucleus_size**2] = 1.0
+        b['T'].fill(0.0)
+        
+def plot():
+    plt.subplot(1,3,1)
+    plt.scalar_field(dh.gather_array('phi'))
+    plt.title("φ")
+    plt.colorbar()
+    plt.subplot(1,3,2)
+    plt.title("T")
+    plt.scalar_field(dh.gather_array('T'))
+    plt.colorbar()
+    plt.subplot(1,3,3)
+    plt.title("∂φ")
+    plt.scalar_field(dh.gather_array('phidelta'))
+    plt.colorbar()
+
+
+# %%
+ps.show_code(φ_kernel)
+
+# %%
+timeloop(10)
+init()
+plot()
+print(dh)
+
+# %%
+result = None
+if 'is_test_run' not in globals():
+    ani = ps.plot.scalar_field_animation(timeloop, rescale=True, frames=600)
+    result = ps.jupyter.display_as_html_video(ani)
+result
+
+# %%
+assert np.isfinite(dh.max('phi'))
+assert np.isfinite(dh.max('T'))
--- a/docs/source/tutorials/demo_assignment_collection.py
+++ b/docs/source/tutorials/demo_assignment_collection.py
+# ---
+# jupyter:
+#   jupytext:
+#     text_representation:
+#       extension: .py
+#       format_name: percent
+#       format_version: '1.3'
+#       jupytext_version: 1.16.7
+#   kernelspec:
+#     display_name: Python 3
+#     language: python
+#     name: python3
+# ---
+
+# %% nbsphinx="hidden"
+from pystencils.session import *
+
+# %% [markdown]
+# # Demo: Assignment collections and simplification
+#
+#
+# ## Assignment collections
+#
+# The assignment collection class helps to formulate and simplify assignments for numerical kernels. 
+#
+# An ``AssignmentCollection`` is an ordered collection of assignments, together with an optional ordered collection of subexpressions, that are required to evaluate the main assignments. There are various simplification rules available that operate on ``AssignmentCollection``s.
+
+# %% [markdown]
+# We start by defining some stencil update rule. Here we also use the *pystencils* ``Field``, note however that the assignment collection module works purely on the *sympy* level.
+
+# %%
+a,b,c = sp.symbols("a b c")
+f = ps.fields("f(2) : [2D]")
+
+a1 = ps.Assignment(f[0,0](1), (a**2 +b) * f[0,1] + \
+                  (a**2 - c) * f[1,0] + \
+                  (a**2 - 2*c) * f[-1,0] + \
+                  (a**2) * f[0, -1])
+
+a2 = ps.Assignment(f[0,0](0), (c**2 +b) * f[0,1] + \
+                  (c**2 - c) * f[1,0] + \
+                  (c**2 - 2*c) * f[-1,0] + \
+                  (c**2 - a**2) * f[0, -1])
+
+
+ac = ps.AssignmentCollection([a1, a2], subexpressions=[])
+ac
+
+# %% [markdown]
+# *sympy* operations can be applied on an assignment collection: In this example we first expand the collection, then look for common subexpressions.
+
+# %%
+expand_all = ps.simp.apply_to_all_assignments(sp.expand)
+expandedEc = expand_all(ac)
+
+# %%
+ac_cse = ps.simp.sympy_cse(expandedEc)
+ac_cse
+
+# %% [markdown]
+# Symbols occuring in assignment collections are classified into 3 categories:
+# - ``free_symbols``: symbols that occur in right-hand-sides but never on left-hand-sides
+# - ``bound_symbols``: symbols that occur on left-hand-sides
+# - ``defined_symbols``: symbols that occur on left-hand-sides of a main assignment
+
+# %%
+ac_cse.free_symbols
+
+# %%
+ac_cse.bound_symbols
+
+# %%
+ac_cse.defined_symbols
+
+# %% [markdown]
+# Assignment collections can be splitted up, and merged together. For splitting, a list of symbols that occur on the left-hand-side in the main assignments has to be passed. The returned assignment collection only contains these main assignments together with all necessary subexpressions.
+
+# %%
+ac_f0 = ac_cse.new_filtered([f(0)])
+ac_f1 = ac_cse.new_filtered([f(1)])
+ac_f1
+
+# %% [markdown]
+# Note here that $\xi_4$ is no longer part of the subexpressions, since it is not used in the main assignment of $f_C^1$.
+#
+# If we merge both collections together, we end up with the original collection.
+
+# %%
+ac_f0.new_merged(ac_f1)
+
+# %% [markdown]
+# There is also a method that inserts all subexpressions into the main assignments. This is the inverse operation of common subexpression elimination.
+
+# %%
+assert sp.simplify(ac_f0.new_without_subexpressions().main_assignments[0].rhs - a2.rhs) == 0
+ac_f0.new_without_subexpressions()
+
+# %% [markdown]
+# To evaluate an assignment collection, use the ``lambdify`` method. It is very similar to *sympy*s ``lambdify`` function.
+
+# %%
+evalFct = ac_cse.lambdify([f[0,1], f[1,0]],  # new parameters of returned function
+                          fixed_symbols={a:1, b:2, c:3, f[0,-1]: 4, f[-1,0]: 5}) # fix values of other symbols
+evalFct(2,1)
+
+# %% [markdown]
+# lambdify is rather slow for evaluation. The intended way to evaluate an assignment collection is *pystencils* i.e. create a fast kernel, that applies the update at every site of a structured grid. The collection can be directly passed to the `create_kernel` function.
+
+# %%
+func = ps.create_kernel(ac_cse).compile()
+
+# %% [markdown]
+# ## Simplification Strategies
+#
+# In above examples, we already applied simplification rules to assignment collections. Simplification rules are functions that take, as a single argument, an assignment collection and return an modified/simplified copy of it. The ``SimplificationStrategy`` class holds a list of simplification rules and can apply all of them in the specified order. Additionally it provides useful printing and reporting functions. 
+#
+# We start by creating a simplification strategy, consisting of the expand and CSE simplifications we have already applied above:
+
+# %%
+strategy = ps.simp.SimplificationStrategy()
+strategy.add(ps.simp.apply_to_all_assignments(sp.expand))
+strategy.add(ps.simp.sympy_cse)
+
+# %% [markdown]
+# This strategy can be applied to any assignment collection:
+
+# %%
+strategy(ac)
+
+# %% [markdown]
+# The strategy can also print the simplification results at each stage. 
+# The report contains information about the number of operations after each simplification as well as the runtime of each simplification routine.
+
+# %%
+strategy.create_simplification_report(ac)
+
+# %% [markdown]
+# The strategy can also print the full collection after each simplification...
+
+# %%
+strategy.show_intermediate_results(ac)
+
+# %% [markdown]
+# ... or only specific assignments for better readability
+
+# %%
+strategy.show_intermediate_results(ac, symbols=[f(1)])
--- a/docs/source/tutorials/demo_benchmark.py
+++ b/docs/source/tutorials/demo_benchmark.py
+# ---
+# jupyter:
+#   jupytext:
+#     text_representation:
+#       extension: .py
+#       format_name: percent
+#       format_version: '1.3'
+#       jupytext_version: 1.16.7
+#   kernelspec:
+#     display_name: Python 3
+#     language: python
+#     name: python3
+# ---
+
+# %%
+from pystencils.session import *
+import timeit
+# %load_ext Cython
+
+# %% [markdown]
+# # Demo: Benchmark numpy, Cython, pystencils
+#
+# In this notebook we compare and benchmark different ways of implementing a simple stencil kernel in Python.
+# Our simple example computes the average of the four neighbors in 2D and stores it in a second array. To prevent out-of-bounds accesses, we skip the cells at the border and compute values only in the range `[1:-1, 1:-1]`
+
+# %% [markdown]
+# ## Implementations
+#
+# The first implementation is a pure Python implementation:
+
+# %%
+def avg_pure_python(src, dst):       
+    for x in range(1, src.shape[0] - 1):
+        for y in range(1, src.shape[1] - 1):
+            dst[x, y] = (src[x + 1, y] + src[x - 1, y] +
+                         src[x, y + 1] + src[x, y - 1]) / 4
+
+
+# %% [markdown]
+# Obviously, this will be a rather slow version, since the loops are written directly in Python. 
+#
+# Next, we use *numpy* functions to delegate the looping to numpy. The first version uses the `roll` function to shift the array by one element in each direction. This version has to allocate a new array for each accessed neighbor.
+
+# %%
+def avg_numpy_roll(src, dst):
+    neighbors = [np.roll(src, axis=a, shift=s) for a in (0, 1) for s in (-1, 1)]
+    np.divide(sum(neighbors), 4, out=dst)
+
+
+# %% [markdown]
+# Using views, we can get rid of the additional copies:
+
+# %%
+def avg_numpy_slice(src, dst):
+    dst[1:-1, 1:-1] = src[2:, 1:-1] + src[:-2, 1:-1] + \
+                      src[1:-1, 2:] + src[1:-1, :-2]
+
+
+# %% [markdown]
+# To further optimize the kernel we switch to Cython, to get a compiled C version.
+
+# %% language="cython"
+# import cython
+#
+# @cython.boundscheck(False)
+# @cython.wraparound(False)
+# def avg_cython(object[double, ndim=2] src, object[double, ndim=2] dst):
+#     cdef int xs, ys, x, y
+#     xs, ys = src.shape
+#     for x in range(1, xs - 1):
+#         for y in range(1, ys - 1):
+#             dst[x, y] = (src[x + 1, y] + src[x - 1, y] +
+#                          src[x, y + 1] + src[x, y - 1]) / 4
+
+# %% [markdown]
+# If available, we also try the numba just-in-time compiler
+
+# %%
+try:
+    from numba import jit
+
+    @jit(nopython=True)
+    def avg_numba(src, dst):
+        dst[1:-1, 1:-1] = src[2:, 1:-1] + src[:-2, 1:-1] + \
+                          src[1:-1, 2:] + src[1:-1, :-2]
+except ImportError:
+    pass
+
+# %% [markdown]
+# And finally we also create a *pystencils* version of the same stencil code:
+
+# %%
+src, dst = ps.fields("src, dst: [2D]")
+
+update = ps.Assignment(dst[0,0], 
+                       (src[1, 0] + src[-1, 0] + src[0, 1] + src[0, -1]) / 4)
+avg_pystencils = ps.create_kernel(update).compile()
+
+# %%
+all_implementations = {
+    'pure Python': avg_pure_python,
+    'numpy roll': avg_numpy_roll,
+    'numpy slice': avg_numpy_slice,
+    'pystencils': avg_pystencils,
+}
+if 'avg_cython' in globals():
+    all_implementations['Cython'] = avg_cython
+if 'avg_numba' in globals():
+    all_implementations['numba'] = avg_numba
+
+
+# %% [markdown]
+# ## Benchmark functions
+#
+# We implement a short function to get in- and output arrays of a given shape and to measure the runtime.
+
+# %%
+def get_arrays(shape):
+    in_arr = np.random.rand(*shape)
+    out_arr = np.empty_like(in_arr)
+    return in_arr, out_arr
+
+def do_benchmark(func, shape):
+    in_arr, out_arr = get_arrays(shape)
+    func(src=in_arr, dst=out_arr) # warmup
+    timer = timeit.Timer('f(src=src, dst=dst)', globals={'f': func, 'src': in_arr, 'dst': out_arr})
+    calls, time_taken = timer.autorange()
+    return time_taken / calls
+
+
+# %% [markdown]
+# ## Comparison
+
+# %%
+plot_order = ['pystencils', 'Cython', 'numba', 'numpy slice', 'numpy roll', 'pure Python']
+plot_order = [p for p in plot_order if p in all_implementations]
+
+def bar_plot(*shape):
+    names = plot_order
+    runtimes = tuple(do_benchmark(all_implementations[name], shape) for name in names)
+    speedups = tuple(runtime / min(runtimes) for runtime in runtimes)
+    y_pos = np.arange(len(names))
+    labels = tuple(f"{name} ({round(speedup, 1)} x)" for name, speedup in zip(names, speedups))
+    
+    plt.text(0.5, 0.5, f"Size {shape}", horizontalalignment='center', fontsize=16,
+             verticalalignment='center', transform=plt.gca().transAxes)
+    plt.barh(y_pos, runtimes, log=True)
+     
+    plt.yticks(y_pos, labels);
+    plt.xlabel('Runtime of single iteration')
+    
+plt.figure(figsize=(8, 8))
+
+plt.subplot(3, 1, 1)
+bar_plot(32, 32)
+
+plt.subplot(3, 1, 2)
+bar_plot(128, 128)
+
+plt.subplot(3, 1, 3)
+bar_plot(2048, 2048)
+
+# %% [markdown]
+# All runtimes are plotted logarithmically. Numbers next to the labels show how much slower the version is than the fastest one.
No results found