diff --git a/documentation/README.md b/documentation/README.md new file mode 100644 index 0000000000000000000000000000000000000000..a703ca3c8c5d251fdaa9ceb1bc4f104f8f0e951c --- /dev/null +++ b/documentation/README.md @@ -0,0 +1,74 @@ +## Prerequisites + +* HPC account at the RRZE +* SSH key pair for authentification - public key registered with the RRZE, private key for later use + +## GitLab configuration + +### Setting up authentification + +To ensure, that a given repository is entitled to run CI jobs as a given user, a authentification strategy using SSH keys is employed. + +Most CI configuration happens in the `.gitlab-ci.yml` file. +However, to make the private SSH key available in the pipeline without exposing it in `.gitlab-ci.yml` a "secret" CI variable needs to be set. +To do so, navigate to `Settings > CI/CD > Variables` on your repositories GitLab page. +Click `Add variable`, use "AUTH_KEY" as `Key` and your SSH private key as `Value` and confirm the dialog. + +A second secret variable, your HPC account name, has to be added. +Again click `Add variable`, this time supply "AUTH_USER" as `Key` and your HPC username as `Value` and confirm the dialog. + +Together with the public key being deposited on the cluster, this will ensure proper authentification. + +### Customizing `.gitlab-ci.yml` + +In this file, we configure options for the SLURM submission on the test cluster. +A example config can be found in `example.gitlab-ci.yml`. + +SLURM options can be set either globally in the `variables` section, or on a per-job basis. +The latter will override global variables with the same name. + +```yaml +variables: + SLURM_NODELIST: "phinally" + SLURM_TIMELIMIT: "30" + +... +benchmark-broadep2: + variables: + SLURM_NODELIST: "broadep2" # uses broadep2 instead of phinally for this benchmark + SLURM_TIMELIMIT: "10" # limit time to 10 instead of 30 minutes + +``` + +This configuration already suffices to have the CI jobs running on the node `phinally`. + +To pick a node to run your job on, set `SLURM_NODELIST` to the nodes hostname. +`SLURM_NODELIST` can only hold a single entry, as usage of multiple nodes at once is not available on the test cluster. +A list of available nodes with their descriptions can be found [here](https://hpc.fau.de/systems-services/systems-documentation-instructions/clusters/test-cluster/). + +A few restrictions apply to the SLURM options: +* The SLURM partition (i.e. `SLURM_PARTITION`) is hardcoded to "work". +* Number of available nodes (`SLURM_NODES`) is hardcoded to 1. On the testcluster only individual nodes can be used. +* The time limit of a single job (`SLURM_TIMELIMIT`) is limited to 120 (minutes). 120 minutes is also the default time limit. +* The default node (i.e. if no `SLURM_NODELIST` is given) is "phinally". +* To optionally enable LIKWID, `SLURM_CONSTRAINT: "hwperf"` has to be added as variable. + +In fact, almost all other options for the [`salloc` command](https://slurm.schedmd.com/salloc.html) used to submit your job can be customized. +To do so, pick the argument name to modify, e.g. `--mail-user`, remove leading dashes, uppercase it, replace dashes with underscores and prepend `SLURM_`, leading to e.g. `SLURM_MAIL_USER`. +This string is then to be used as the variable name, while the variable value can be customized as desired. +```yaml +variables: + SLURM_MAIL_USER: your@email.address +... +``` + +To disable submission of an individual job, add `NO_SLURM_SUMIT: 1` to its variables. + +## Notes + +A directory named `gitlab-runner` will be created in your `$WORK` directory. +It contains the build and execution directories and files for your CI jobs. + +It may happen that your CI job fails if the node is occupied with other jobs for more than 24 hours. +In that case, simply restart the CI job. +