Skip to content
Snippets Groups Projects
Commit 9748ab43 authored by Martin Bauer's avatar Martin Bauer
Browse files

Improvements for GPU code generation

- turned on restrict keyword by default (makes large difference on GPUs)
- smarter block indexing: changing block size depending on domain size
  Example: previously there where (1,1,1) blocks when requested
  block size was (64, 1, 1) and domain size (1, 512, 512), now the
  block size is changed automatically to (1, 64, 1) in this case
- added __lauch_bounds__ to kernels to allow better optimizations from
  the CUDA compiler
parent 3d3c174f
Branches
Tags
No related merge requests found
...@@ -10,7 +10,7 @@ def test_hash_equivalence(): ...@@ -10,7 +10,7 @@ def test_hash_equivalence():
exactly the same code (not only functionally equivalent code) should be produced. exactly the same code (not only functionally equivalent code) should be produced.
Due to undefined order in sets and dicts this may no be the case. Due to undefined order in sets and dicts this may no be the case.
""" """
ref_value = "461f0ced7afa3d0499d5bd90d87fcdb0cfc6a5f56ee9fa4f13386c15b8484ca2" ref_value = "5dfbb90b02e4940f05dcca11b43e1bb885d5655566735b52ad8c64f511848420"
ast = create_lb_ast(stencil='D3Q19', method='srt', optimization={'openmp': False}) ast = create_lb_ast(stencil='D3Q19', method='srt', optimization={'openmp': False})
code = generate_c(ast) code = generate_c(ast)
hash_value = sha256(code.encode()).hexdigest() hash_value = sha256(code.encode()).hexdigest()
......
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment