Add CUDA support
This MR adds cuda support
remaining to-dos
-
Fix compilation problems and add required NVCC flags -
Add launch bound option
Edited by Markus Holzer
Merge request reports
Activity
requested review from @hoenig
assigned to @holzer
added 1 commit
- a0a17570 - Working CUDA benchmark Version. CUDA needs '.cu' files, otherwise it doesn't work?
added 16 commits
-
a0a17570...ae05d616 - 11 commits from branch
master
- c51eee43 - Added CUDA benchmarks
- 96c63098 - Working CUDA benchmark Version. CUDA needs '.cu' files, otherwise it doesn't work?
- ac4b31c1 - Updated import to new pystencils api
- 39b4029c - Exposing the cuda block size option to the generate_benchmark function
- 0028ee62 - Merge branch 'CUDA' of i10git.cs.fau.de:pycodegen/pystencils-benchmark into CUDA
Toggle commit list-
a0a17570...ae05d616 - 11 commits from branch
- Resolved by Jan Hönig
- Resolved by Jan Hönig
- Resolved by Jan Hönig
It means this:
__global__ void __launch_bounds__(MAX_THREADS_PER_BLOCK, MIN_BLOCKS_PER_MP) fooKernel(int *inArr, int *outArr) { // ... Computation of kernel }
So that you can add
__launch_bounds__
as an optional argument. It was pretty important for example on AMD GPUs to limit register usage with LBM kernels.In most cases, you don't need the second argument. However, when using pystencils standalone it can add the launch bounds option via
cupy
, but since this is not what we are doing here we need to add it manually as a tuning parameter.Edited by Markus HolzerWould it make sense to also add ROCm support? Mostly this would be just a renaming for example:
#include <cuda_runtime.h>
-->#include <hip_runtime.h>
.I'm not sure if it is better to add this in a second MR or directly here?
Edited by Markus Holzer
Please register or sign in to reply