Skip to content
Snippets Groups Projects

Add CUDA support

Merged Markus Holzer requested to merge CUDA into master
1 unresolved thread

This MR adds cuda support

remaining to-dos

  • Fix compilation problems and add required NVCC flags
  • Add launch bound option
Edited by Markus Holzer

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Jan Hönig
  • Jan Hönig approved this merge request

    approved this merge request

  • Christoph Alt added 2 commits

    added 2 commits

    • 3300460d - made the gpu test more streamlined with the cpu tests
    • 3e930f35 - removed some code duplication between benchmark and benchmark_gpu

    Compare with previous version

  • Jan Hönig
  • Christoph Alt added 2 commits

    added 2 commits

    • 24f81cf6 - added submodules from cpu and gpu benchmark generation
    • 857f1848 - removed the mutable default argument from the _kernel_header and

    Compare with previous version

  • Jan Hönig resolved all threads

    resolved all threads

  • The code looks really nice now and easily extendible to other platforms/compilers. Is the second TODO in the PRs description also done?

  • Jan Hönig marked this merge request as ready

    marked this merge request as ready

  • Thank you and thank you for your review :smile: I am not really sure what it is meant there. In principle it is possible to configure the cuda_block_size within the generate_benchmark call. But as I see now this parameter is not used at all

  • It means this:

    __global__ void
    __launch_bounds__(MAX_THREADS_PER_BLOCK, MIN_BLOCKS_PER_MP)
    fooKernel(int *inArr, int *outArr)
    {
        // ... Computation of kernel
    }

    So that you can add __launch_bounds__ as an optional argument. It was pretty important for example on AMD GPUs to limit register usage with LBM kernels.

    In most cases, you don't need the second argument. However, when using pystencils standalone it can add the launch bounds option via cupy, but since this is not what we are doing here we need to add it manually as a tuning parameter.

    Edited by Markus Holzer
  • Markus Holzer approved this merge request

    approved this merge request

  • Would it make sense to also add ROCm support? Mostly this would be just a renaming for example: #include <cuda_runtime.h> --> #include <hip_runtime.h> .

    I'm not sure if it is better to add this in a second MR or directly here?

    Edited by Markus Holzer
  • I think it would be a bit cleaner to do that in another MR

  • Alright, fine for me :smile:

  • Christoph Alt added 1 commit

    added 1 commit

    • 9140da63 - Added a parameter to insert a launch bounds to the kernel

    Compare with previous version

  • Christoph Alt added 1 commit

    added 1 commit

    • 4ee400e9 - added the new packages to the setup.cfg and the new templates to the

    Compare with previous version

  • Christoph Alt added 1 commit

    added 1 commit

    • 4b1f3f53 - fixed the _add_launch_bounds and also added some small tests

    Compare with previous version

  • Christoph Alt added 1 commit

    added 1 commit

    • 6e88d389 - Using cuda as a base for the docker container to also test the gpu

    Compare with previous version

  • Christoph Alt added 1 commit

    added 1 commit

    • 82ce1d7d - Fix the missing constants for the gpu main file and added a kernel with

    Compare with previous version

  • Christoph Alt added 1 commit

    added 1 commit

    • 1e542f17 - Skipping compiling and running cuda kernels if cuda or gpu is not

    Compare with previous version

  • 63 71 with tempfile.TemporaryDirectory(dir=Path.cwd()) as temp_dir:
    64 72 temp_dir = Path(temp_dir)
    65 73 pb.gpu.generate_benchmark([kernel_vadd, kernel_daxpy], temp_dir, compiler=compiler, **kwargs)
    74 if not nvcc_available():
    75 return
  • Christoph Alt added 1 commit

    added 1 commit

    • d38a9324 - using pytest skip if there is no nvcc or gpu available

    Compare with previous version

  • Markus Holzer marked the checklist item Add launch bound option as completed

    marked the checklist item Add launch bound option as completed

  • Christoph Alt added 1 commit

    added 1 commit

    • 879ee872 - removed the unused `cuda_block_size` for the `gpu.generate_benchmark`

    Compare with previous version

  • Markus Holzer mentioned in commit fcfbef80

    mentioned in commit fcfbef80

  • Please register or sign in to reply
    Loading