Skip to content
Snippets Groups Projects
Commit 638649d2 authored by Jan Hönig's avatar Jan Hönig
Browse files

pystencils-benchmark fixes

parent 9577deec
Branches ve
No related tags found
No related merge requests found
Pipeline #35862 failed
%% Cell type:markdown id:b79814e5-9a10-4b20-a886-c16db785bc48 tags:
# LLVM-VE Vector Intrinsics
- [Tutorial](https://sx-aurora-dev.github.io/ve-intrinsics-tutorial/)
- [List of intrinsics](https://sx-aurora-dev.github.io/velintrin.html)
- [Assembly manual](https://www.hpc.nec/documents/sdk/pdfs/VectorEngine-as-manual-v1.3.pdf)
## Vector Register
- `__vr`: 256 x 64 bit values
- 64 registers available (**beware of spilling**)
## Intrinsic Functions
- format: `_vel_<asm>_<suffix>`
- `<asm>`: instruction mnemonic in the [assembly manual](https://www.hpc.nec/documents/sdk/pdfs/VectorEngine-as-manual-v1.3.pdf)
- has one suffix which handles the type: `d`: double
- `<suffix>`: list of return value and arguments
- `v`: vector
- `s`: scalar
- `m` and `M`: mask for 256 and 512 elements
- `l`: vector length
## Vector Load/Store
- 64 bit: `_vel_vld_vssl`
- 32 bit: upper, lower or "packed"
- 2 `s` arguments: stride; base adress
## Vector Length and Pass Through Argument
- `l` defines how many elements are updated
- all instructions with additional `v` available, which is passed to the non-updated elements (pass-through: `pt`)
## NT-Stores
- `nc` variants of memory access intrinsics (e.g. `_vel_vldnc_vssl`)
## Vector Mask
- `__vm256`: 256 bit
- 8 registers
- 0 bit -> no update
- instructions with `m` suffix
## Packed Instructions
- operations on 512 elements of `fp32` or `int32`.
- `p` prefix (e.g. `_vel_pvfadd_vl`)
- TBD...
%% Cell type:code id:5d56a49c-5963-4f86-a8d8-0b753a971e52 tags:
``` python
from pystencils.session import *
import pystencils as ps
```
%% Cell type:code id:9c204a47-a39b-450b-9026-9bb63cf54e83 tags:
``` python
config = ps.CreateKernelConfig(cpu_vectorize_info={'instruction_set': 've'})
```
%% Cell type:code id:5bbc4118-e1b7-412b-9dc8-7d7d524786df tags:
``` python
a, b, c = ps.fields(a=np.ones(4000000), b=np.ones(4000000), c=np.ones(4000000))
alpha = sp.symbols('alpha')
```
%% Cell type:code id:85641913-f00a-4f95-af69-c7646ad5b3d0 tags:
``` python
@ps.kernel_config(config)
def vadd():
a[0] @= b[0] + c[0]
```
%% Cell type:code id:33097862-9f60-4624-a1f2-0fdfd82831bc tags:
``` python
kernel_vadd = ps.create_kernel(**vadd)
ps.show_code(kernel_vadd)
```
%% Output
%% Cell type:code id:99aee06b-1704-4d06-b261-cc07a6a7f9a3 tags:
``` python
@ps.kernel_config(config)
def daxpy():
b[0] @= alpha * a[0] + b[0]
```
%% Cell type:code id:e79ab7bb-128e-47ee-9d53-8779c4f55d7e tags:
``` python
kernel_daxpy = ps.create_kernel(**daxpy)
ps.show_code(kernel_daxpy)
```
%% Output
%% Cell type:code id:4bb14872-11de-42cc-b5fc-4f995c7a6725 tags:
``` python
@ps.kernel_config(config)
def daxpy_one_off():
b[0] @= alpha * a[0] + b[0]
```
%% Cell type:code id:457cc1fe-1aef-44bd-9fd1-f914e019c933 tags:
``` python
kernel_daxpy_one_off = ps.create_kernel(**daxpy_one_off)
ps.show_code(kernel_daxpy_one_off)
```
%% Output
%% Cell type:code id:ae05beb9-745f-4885-ad4f-667aae867f05 tags:
``` python
from pystencils_benchmark import kernel_header, kernel_source, generate_benchmark
from pathlib import Path
```
%% Cell type:code id:7dfa86e1-8962-47df-900c-3133883a19c7 tags:
``` python
example_path = Path.cwd() / 'example'
```
%% Cell type:code id:dd39502f-d935-4b88-b8c8-896b1a294bf9 tags:
``` python
generate_benchmark([kernel_vadd, kernel_daxpy, kernel_daxpy_one_off], example_path)
```
%% Cell type:code id:f084548f-3a8e-4d45-8c7e-5845013b222e tags:
``` python
# Examples:
# generate_benchmark(kernel_daxpy, example_path)
generate_benchmark(kernel_daxpy, example_path)
# generate_benchmark(kernel_vadd, example_path)
# generate_benchmark(kernel_daxpy_one_off, example_path)
```
%% Output
---------------------------------------------------------------------------
TemplateNotFound Traceback (most recent call last)
/tmp/ipykernel_28036/2725825833.py in <module>
1 # Examples:
----> 2 generate_benchmark(kernel_daxpy, example_path)
3 # generate_benchmark(kernel_vadd, example_path)
4 # generate_benchmark(kernel_daxpy_one_off, example_path)
5
~/git/pystencils-benchmark/pystencils_benchmark/benchmark.py in generate_benchmark(kernel_ast, path, dialect)
36
37 with open(src_path / 'main.c', 'w+') as f:
---> 38 f.write(kernel_main(kernel_ast))
39
40
~/git/pystencils-benchmark/pystencils_benchmark/benchmark.py in kernel_main(kernel, timing)
99
100 env = Environment(loader=PackageLoader('pystencils_benchmark'), undefined=StrictUndefined)
--> 101 main = env.get_template('main.c').render(**jinja_context)
102 return main
103
~/git/pystencils/venv/lib/python3.9/site-packages/jinja2/environment.py in get_template(self, name, parent, globals)
995 name = self.join_path(name, parent)
996
--> 997 return self._load_template(name, globals)
998
999 @internalcode
~/git/pystencils/venv/lib/python3.9/site-packages/jinja2/environment.py in _load_template(self, name, globals)
956 return template
957
--> 958 template = self.loader.load(self, name, self.make_globals(globals))
959
960 if self.cache is not None:
~/git/pystencils/venv/lib/python3.9/site-packages/jinja2/loaders.py in load(self, environment, name, globals)
123 # first we try to get the source for this template together
124 # with the filename and the uptodate function.
--> 125 source, filename, uptodate = self.get_source(environment, name)
126
127 # try to load the code from the bytecode cache if there is a
~/git/pystencils/venv/lib/python3.9/site-packages/jinja2/loaders.py in get_source(self, environment, template)
325 # Package is a directory.
326 if not os.path.isfile(p):
--> 327 raise TemplateNotFound(template)
328
329 with open(p, "rb") as f:
TemplateNotFound: main.h
%% Cell type:code id:954b5ec7-a88a-4723-bd7f-227a3919b32d tags:
``` python
```
%% Cell type:code id:f51f2aff-f232-4de9-a0c4-94d3682eff86 tags:
``` python
```
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment