diff --git a/benchmarks/README.md b/benchmarks/README.md index a5ae4406fe729c067cdb6c34a231bf5ec230f181..ef410bcddf874f41509f62caba7c41eb586fb7f6 100644 --- a/benchmarks/README.md +++ b/benchmarks/README.md @@ -1 +1,155 @@ # LBMPY benchmarks + +This directory provides all scripts needed for reproducing the benchmark comparison +in between lbmpy and [lbmBench](https://github.com/RRZE-HPC/lbm-benchmark-kernels/). + + +## Get started +Clone this directory and make sure all dependencies are installed, which are: +- [Intel Compiler 19](https://software.intel.com/en-us/parallel-studio-xe/choose-download) +- Python 3 including (this branch of) [pystencils](https://i10git.cs.fau.de/pycodegen/pystencils/), + [lbmpy](https://i10git.cs.fau.de/pycodegen/lbmpy/), + [Kerncraft](https://github.com/RRZE-HPC/kerncraft/) and their dependencies. +- [likwid](https://github.com/RRZE-HPC/likwid/) or any similar tool for fixing + the CPU frequency and pinning threads. +- [Jupyter Notebook](https://jupyter.org/install) for creating plots out of the + result data. + +Copy the `config.json` file in `config/` into your local config under `~/.config/pystencils/` +for using ICC instead of GCC. + +If you want to reproduce all results you should get rid of all `results_*` files +in this directory, since some of the runscripts would append the results to the +old data, which falsifies the produced graphs. +If you only want to see the measured results or want to reproduce some of the +results, keep the result files accordingly. + +**Keep in mind to always fix the frequency of your system before running any benchmarks!** + + +## Hardware settings +The benchmarks can be run on any Skylake (SKL), Haswell (HSW) or Ivy Bridge (IVB). +However, for comparison the hardware used in this [report]() will be stated below. + +| Architecture | Platform | used fixed frequency | +|--------------|----------|----------------------| +| SKL | Intel Xeon Gold 6148 | 2.40 GHz | +| HSW | Intel Xeon E5-2695 v3| 2.30 GHz | +| IVB | Intel Xeon E5-2660 v2| 2.20 GHz | + +## Usage +All and results from this [report]() can be reproduced by running the corresponding +scripts and plots can be created by using the `plot_results.ipynb` Jupyter notebook. + +In general, we can divide the number of all tasks in **four steps**: + +### 1. Get roofline by using the copy benchmark +> Produces `results_copy.csv` + +Go inside of `copy/` and run +``` +./run_copy.sh ARCH +``` +For help run this script with `--help`. + +### 2. Get lbmBench results as reference peak performance +> Produces `results_lbmBench_[ARCH].csv`, `results_node.csv`, `results_dims.csv` + +Go inside of `lbmBench/` and run all three *create_plotdata_lbmBench...* scripts: +- Run + ``` + ./create_plotdata_lbmBench_single_core.sh ARCH + ``` + for creating the corresponding result file `results_lbmBench_ARCH.csv` for + all kernels on a single core. + For help run this script with `--help`. +- Run + ``` + ./create_plotdata_lbmBench_node_scale.sh ARCH + ``` + for creating results for the fastest lbmBench kernel scaling from 1 to 40 + threads for the corresponding architecture in `results_node.csv`. + For help run this script with `--help`. +- Run + ``` + ./create_plotdata_lbmBench_dim_scaling.sh ARCH + ``` + for creating results for the fastest lbmBench kernel on a single core with a + varying problem size for the x dimension + (`[10, 25, 50, 100, 125, 150, 175, 200, 250, 300, 400, 500]`). + For help run this script with `--help`. + +### 3. Get lbmpy results +> Produces `results_lbmpy_[ARCH].csv`, `results_node.csv`, `results_dims.csv` + +Go inside of `lbmpy/` and run all three *create_plotdata_lbmpy...* scripts: +- Run + ``` + ./create_plotdata_lbmpy_single_core.sh ARCH + ``` + for creating the corresponding result file `results_lbmpy_ARCH.csv` for all + parameter combinations on a single core. + For help run this script with `--help`. +- Run + ``` + ./create_plotdata_lbmpy_node_scale.sh ARCH + ``` + for creating results for the fastest lbmpy kernel scaling from 1 to 40 + threads for the corresponding architecture in `results_node.csv`. + For this, the `bench-omp-ARCH` binary is used. + If you want to compile your binary for the architecture `ARCH` yourself, + run the following commands before executing the main script: + ```bash + ./lbmpy_bench.py ARCH -f --openmp -o dummy-out + mv bench bench-omp-ARCH + rm dummy-out + ``` + For help run this script or `lbmpy_bench.py` with `--help`. +- Run + ``` + ./create_plotdata_lbmpy_dim_scaling.sh ARCH + ``` + for creating results for the fastest lbmpy kernel on a single core with a + varying problem size for the x dimension + (`[10, 25, 50, 100, 125, 150, 175, 200, 250, 300, 400, 500]`). + For help run this script with `--help`. + +### 4. Plot results and visualize differences +To get plots out the created results, open `plot_results.ipynb` in an +Jupyter notebook and execute the cells. + +**Export plots as SVG** +In the code cell of each plot exists a comment block with: +```python +# For download as SVG +# iplot(fig, image='svg', filename='YOUR_PLOT_NAME', image_width=1280) +``` +For downloading this plot, just uncomment the last line and adjust filename or +image size if needed. + +**Change bar graphs orientation** +For each bar graph you can switch between horizontal and vertical mode. +For all lbmBench kernels, this is done by changing the `lbm_orientation` +variable in the following code cell: +```python +################################################## +# **ADJUST HERE** +# 'h' or 'v' +lbm_orientation='v' + +################################################## +``` +Similar to that, for all lbmpy kernels the variable is set in the corresponding +code cell: +```python +################################################## +# **ADJUST HERE** +# 'h' or 'v' +lbmpy_orientation='v' + +################################################## +``` + +Additionally to the plots, all lbmpy results are also shown in a queryable +dataframe and the average speedup by using the `split` and `nontemporal` option +as well as the parameters of the fastest kernel are given. \ No newline at end of file