Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
pystencils
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Model registry
Operate
Environments
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Terms and privacy
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
pycodegen
pystencils
Commits
8af40ae4
Commit
8af40ae4
authored
7 months ago
by
Frederik Hennig
Browse files
Options
Downloads
Patches
Plain Diff
Clarify some doc comments; clarify launch grid specification
parent
fa7860cd
No related branches found
No related tags found
1 merge request
!430
Jupyter Inspection Framework, Book Theme, and Initial Drafts for Codegen Reference Guides
Pipeline
#70933
passed
7 months ago
Stage: Code Quality
Stage: Unit Tests
Stage: legacy_test
Stage: docs
Changes
3
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
docs/source/reference/gpu_kernels.md
+13
-6
13 additions, 6 deletions
docs/source/reference/gpu_kernels.md
src/pystencils/config.py
+6
-1
6 additions, 1 deletion
src/pystencils/config.py
src/pystencils/target.py
+1
-1
1 addition, 1 deletion
src/pystencils/target.py
with
20 additions
and
8 deletions
docs/source/reference/gpu_kernels.md
+
13
−
6
View file @
8af40ae4
...
...
@@ -78,19 +78,26 @@ kfunc(f=f_arr, g=g_arr)
### Modifying the Launch Grid
The
`kernel.compile()`
invocation in the above code produces a {any}
`CupyKernelWrapper`
callable object.
Its interface allows us to customize the GPU launch grid.
We can manually set both the number of threads per block, and the number of blocks on the grid:
This object holds the kernel's launch grid configuration
(i.e. the number of thread blocks, and the number of threads per block.)
Pystencils specifies a default value for the block size and if possible,
the number of blocks is automatically inferred in order to cover the entire iteration space.
In addition, the wrapper's interface allows us to customize the GPU launch grid,
by manually setting both the number of threads per block, and the number of blocks on the grid:
```
{code-cell} ipython3
kfunc.block_size = (16, 8, 8)
kfunc.num_blocks = (1, 2, 2)
```
In most cases, the number of blocks is automatically inferred from the block size
in order to cover the entire iteration space, so it does not need to be specified.
Setting a launch grid that is larger than the iteration space is also possible,
but will cause any threads working outside of the iteration bounds to idle.
For most kernels, setting only the
`block_size`
is sufficient since pystencils will
automatically compute the number of blocks;
for exceptions to this, see
[](
#manual_launch_grids
)
.
If
`num_blocks`
is set manually and the launch grid thus specified is too small, only
a part of the iteration space will be traversed by the kernel;
similarily, if it is too large, it will cause any threads working outside of the iteration bounds to idle.
(manual_launch_grids)=
### Manual Launch Grids and Non-Cuboid Iteration Patterns
In some cases, it will be unavoidable to set the launch grid size manually;
...
...
This diff is collapsed.
Click to expand it.
src/pystencils/config.py
+
6
−
1
View file @
8af40ae4
...
...
@@ -33,7 +33,12 @@ class _AUTO_TYPE:
AUTO
=
_AUTO_TYPE
()
"""
Special value that can be passed to some options for invoking automatic behaviour.
"""
"""
Special value that can be passed to some options for invoking automatic behaviour.
Currently, these options permit `AUTO`:
- `ghost_layers <CreateKernelConfig.ghost_layers>`
"""
@dataclass
...
...
This diff is collapsed.
Click to expand it.
src/pystencils/target.py
+
1
−
1
View file @
8af40ae4
...
...
@@ -87,7 +87,7 @@ class Target(Flag):
"""
GPU
=
CUDA
"""
Alias for backward compatibility.
"""
"""
Alias for
`Target.CUDA`, for
backward compatibility.
"""
SYCL
=
_GPU
|
_SYCL
"""
SYCL kernel target.
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment