Add CUDA support
This MR adds cuda support
remaining to-dos
-
Fix compilation problems and add required NVCC flags -
Add launch bound option
Merge request reports
Activity
requested review from @hoenig
assigned to @holzer
added 1 commit
- a0a17570 - Working CUDA benchmark Version. CUDA needs '.cu' files, otherwise it doesn't work?
added 16 commits
-
a0a17570...ae05d616 - 11 commits from branch
master
- c51eee43 - Added CUDA benchmarks
- 96c63098 - Working CUDA benchmark Version. CUDA needs '.cu' files, otherwise it doesn't work?
- ac4b31c1 - Updated import to new pystencils api
- 39b4029c - Exposing the cuda block size option to the generate_benchmark function
- 0028ee62 - Merge branch 'CUDA' of i10git.cs.fau.de:pycodegen/pystencils-benchmark into CUDA
Toggle commit list-
a0a17570...ae05d616 - 11 commits from branch
- Resolved by Jan Hönig
- Resolved by Jan Hönig
- Resolved by Jan Hönig
It means this:
__global__ void __launch_bounds__(MAX_THREADS_PER_BLOCK, MIN_BLOCKS_PER_MP) fooKernel(int *inArr, int *outArr) { // ... Computation of kernel }
So that you can add
__launch_bounds__
as an optional argument. It was pretty important for example on AMD GPUs to limit register usage with LBM kernels.In most cases, you don't need the second argument. However, when using pystencils standalone it can add the launch bounds option via
cupy
, but since this is not what we are doing here we need to add it manually as a tuning parameter.Edited by Markus HolzerWould it make sense to also add ROCm support? Mostly this would be just a renaming for example:
#include <cuda_runtime.h>
-->#include <hip_runtime.h>
.I'm not sure if it is better to add this in a second MR or directly here?
Edited by Markus Holzeradded 1 commit
- 9140da63 - Added a parameter to insert a launch bounds to the kernel
added 1 commit
- 4ee400e9 - added the new packages to the setup.cfg and the new templates to the
added 1 commit
- 4b1f3f53 - fixed the _add_launch_bounds and also added some small tests
added 1 commit
- 6e88d389 - Using cuda as a base for the docker container to also test the gpu
added 1 commit
- 82ce1d7d - Fix the missing constants for the gpu main file and added a kernel with
added 1 commit
- 1e542f17 - Skipping compiling and running cuda kernels if cuda or gpu is not
63 71 with tempfile.TemporaryDirectory(dir=Path.cwd()) as temp_dir: 64 72 temp_dir = Path(temp_dir) 65 73 pb.gpu.generate_benchmark([kernel_vadd, kernel_daxpy], temp_dir, compiler=compiler, **kwargs) 74 if not nvcc_available(): 75 return changed this line in version 12 of the diff
added 1 commit
- d38a9324 - using pytest skip if there is no nvcc or gpu available
added 1 commit
- 879ee872 - removed the unused `cuda_block_size` for the `gpu.generate_benchmark`
mentioned in commit fcfbef80