Add CUDA support (!1) · Merge requests · pycodegen / pystencils-benchmark · GitLab

Snippets Groups Projects

Merged Markus Holzer requested to merge CUDA into master 3 years ago

1 unresolved thread

This MR adds cuda support

remaining to-dos

Fix compilation problems and add required NVCC flags
Add launch bound option

Edited 1 year ago by Markus Holzer

Activity

Markus Holzer requested review from @hoenig 3 years ago

requested review from @hoenig
Markus Holzer assigned to @holzer 3 years ago

assigned to @holzer
Jan Hönig marked the checklist item Fix compilation problems and add required NVCC flags as completed 3 years ago

marked the checklist item Fix compilation problems and add required NVCC flags as completed
Jan Hönig marked the checklist item Fix compilation problems and add required NVCC flags as incomplete 3 years ago

marked the checklist item Fix compilation problems and add required NVCC flags as incomplete
Jan Hönig added 1 commit 3 years ago
added 1 commit

a0a17570 - Working CUDA benchmark Version. CUDA needs '.cu' files, otherwise it doesn't work?

Compare with previous version
Jan Hönig marked the checklist item Fix compilation problems and add required NVCC flags as completed 3 years ago

marked the checklist item Fix compilation problems and add required NVCC flags as completed
Christoph Alt added 16 commits 1 year ago
added 16 commits

a0a17570...ae05d616 - 11 commits from branch master

c51eee43 - Added CUDA benchmarks

96c63098 - Working CUDA benchmark Version. CUDA needs '.cu' files, otherwise it doesn't work?

ac4b31c1 - Updated import to new pystencils api

39b4029c - Exposing the cuda block size option to the generate_benchmark function

0028ee62 - Merge branch 'CUDA' of i10git.cs.fau.de:pycodegen/pystencils-benchmark into CUDA

Compare with previous version
Toggle commit list
Jan Hönig @hoenig started a thread on an old version of the diff 1 year ago

Resolved 1 year ago by Jan Hönig
Last reply by Jan Hönig 1 year ago

Jan Hönig @hoenig started a thread on an old version of the diff 1 year ago

Resolved 1 year ago by Jan Hönig

Jan Hönig approved this merge request 1 year ago

approved this merge request

Christoph Alt added 2 commits 1 year ago

added 2 commits

3300460d - made the gpu test more streamlined with the cpu tests
3e930f35 - removed some code duplication between benchmark and benchmark_gpu

Compare with previous version

Jan Hönig @hoenig started a thread on an old version of the diff 1 year ago

Resolved 1 year ago by Jan Hönig

Christoph Alt added 2 commits 1 year ago

added 2 commits

24f81cf6 - added submodules from cpu and gpu benchmark generation
857f1848 - removed the mutable default argument from the _kernel_header and

Compare with previous version

Jan Hönig resolved all threads 1 year ago

resolved all threads

Jan Hönig @hoenig 1 year ago

Owner

The code looks really nice now and easily extendible to other platforms/compilers. Is the second TODO in the PRs description also done?

Jan Hönig marked this merge request as ready 1 year ago

marked this merge request as ready

Christoph Alt @ob28imeq 1 year ago

Owner

Thank you and thank you for your review I am not really sure what it is meant there. In principle it is possible to configure the cuda_block_size within the generate_benchmark call. But as I see now this parameter is not used at all

Markus Holzer @holzer 1 year ago

Author Owner

It means this:

__global__ void
__launch_bounds__(MAX_THREADS_PER_BLOCK, MIN_BLOCKS_PER_MP)
fooKernel(int *inArr, int *outArr)
{
    // ... Computation of kernel
}

So that you can add __launch_bounds__ as an optional argument. It was pretty important for example on AMD GPUs to limit register usage with LBM kernels.

In most cases, you don't need the second argument. However, when using pystencils standalone it can add the launch bounds option via cupy, but since this is not what we are doing here we need to add it manually as a tuning parameter.

Edited 1 year ago by Markus Holzer

Markus Holzer approved this merge request 1 year ago

approved this merge request

Markus Holzer @holzer 1 year ago

Author Owner

Would it make sense to also add ROCm support? Mostly this would be just a renaming for example: #include <cuda_runtime.h> --> #include <hip_runtime.h> .

I'm not sure if it is better to add this in a second MR or directly here?

Edited 1 year ago by Markus Holzer

Christoph Alt @ob28imeq 1 year ago

Owner

I think it would be a bit cleaner to do that in another MR

Markus Holzer @holzer 1 year ago

Author Owner

Alright, fine for me

Christoph Alt added 1 commit 1 year ago

added 1 commit

9140da63 - Added a parameter to insert a launch bounds to the kernel

Compare with previous version

Christoph Alt added 1 commit 1 year ago

added 1 commit

4ee400e9 - added the new packages to the setup.cfg and the new templates to the

Compare with previous version

Christoph Alt added 1 commit 1 year ago

added 1 commit

4b1f3f53 - fixed the _add_launch_bounds and also added some small tests

Compare with previous version

Christoph Alt added 1 commit 1 year ago

added 1 commit

6e88d389 - Using cuda as a base for the docker container to also test the gpu

Compare with previous version

Christoph Alt added 1 commit 1 year ago

added 1 commit

82ce1d7d - Fix the missing constants for the gpu main file and added a kernel with

Compare with previous version

Christoph Alt added 1 commit 1 year ago

added 1 commit

1e542f17 - Skipping compiling and running cuda kernels if cuda or gpu is not

Compare with previous version

Jan Hönig @hoenig started a thread on an old version of the diff 1 year ago

tests/test_benchmark.py

     with tempfile.TemporaryDirectory(dir=Path.cwd()) as temp_dir:
         temp_dir = Path(temp_dir)
         pb.gpu.generate_benchmark([kernel_vadd, kernel_daxpy], temp_dir, compiler=compiler, **kwargs)
         if not nvcc_available():
             return

Christoph Alt added 1 commit 1 year ago

added 1 commit

d38a9324 - using pytest skip if there is no nvcc or gpu available

Compare with previous version

Markus Holzer marked the checklist item Add launch bound option as completed 1 year ago

marked the checklist item Add launch bound option as completed

Christoph Alt added 1 commit 1 year ago

added 1 commit

879ee872 - removed the unused `cuda_block_size` for the `gpu.generate_benchmark`

Compare with previous version

Markus Holzer merged 10 months ago

merged

Markus Holzer mentioned in commit 10 months ago

mentioned in commit fcfbef80

Please register or sign in to reply