HPC

Usage on High-Performance Computing (HCP) setups¶

The Donders provides access to a high-performance computing (HPC) cluster. High-performance computing (HPC) clusters are systems composed of interconnected computers (nodes) working together to solve complex computational problems. They enable parallel processing, allowing large-scale simulations, data analysis, and scientific research tasks to be completed more efficiently. Each node typically consists of multiple processors (CPUs/GPUs), memory, and storage.

PRESTUS is designed for HPC deployment: the most efficient way to run simulations is to start multiple simulations (e..g, for different individuals, target locations, or parameter setups) via parallel jobs. PBS (Portable Batch System) and SLURM ((Simple Linux Utility for Resource Management)) are tools that manage job scheduling, resource allocation, and execution in HPC systems. Both tools ensure efficient usage of cluster resources by managing multiple users and workloads. The Donders manages both PBS (historically) and SLURM (more recent) schedulers. To see current ressource usage, see https://grafana.dccn.nl/ (intranet required).

The following sections describe workflows at the Donders to start (interactive) jobs.

For more extensive documentation, see the HPC wiki (intranet required).

Donders HPC — PBS workflow

Select an access node: mentat001, mentat002, mentat003, mentat004
Login: ssh abcxyz@mentat004.dccn.nl
Start VNC manager: vncmanager - 2
Connect via VNC (see here) — TigerVNC: enter ip [e.g., mentat004.dccn.nl:12]
Load QSUB: module load qsub

Start interactive job:

qsub -I -l 'nodes=1:gpus=1,feature=cuda,walltime=05:00:00,mem=24gb,reqattr=cudacap>=5.0'

Start MATLAB:
```
module load matlab/R2022b
matlab
```
Note: PBS only supports CUDA 11.2, which is dropped starting in R2023 — see this issue.
Use PRESTUS scripts ending in *_qsub*.
Check job status in terminal: qstat

Donders HPC — SLURM workflow

Select an access node: mentat005, mentat006, mentat007 (previously mentat001s)
Login: ssh abcxyz@mentat001.dccn.nl
Start VNC manager: vncmanager - 2
Connect via VNC (see here) — TigerVNC: enter ip [e.g., mentat007.dccn.nl:12]
Load SLURM: module load slurm
Start interactive job:
- Without MATLAB GUI: srun --mem=8gb --time=01:00:00 --x11 -p interactive --pty bash -i
- With MATLAB GUI: srun --partition=gpu --gres=gpu:1 --mem=8G --time=01:00:00 --x11 --pty /bin/bash -i
Start MATLAB:
```
module load matlab/R2024a
matlab
```
Note: SLURM supports CUDA 12.2 — recent MATLAB versions (up to R2024) should be supported; see this issue.
Use PRESTUS scripts ending in *_slurm*.
Check job status: squeue. For PBS→SLURM command migration see this documentation.

GPU support¶

TUS simulations are accelerated by GPUs, but requesting GPUs can lead to longer wait times as the current concurrent GPU limit per user is 4. To reduce wait times, it is possible to run acoustic simulations first (to confirm targeting) because these require less RAM) and then run thermal simulations with the final protocol. Avoid blanket simulations (e.g., circling through all participants with all permutations) especially for thermal simulations.

Given that PRESTUS is a MATLAB toolbox, it currently only supports Nvidia GPUs. When Nvidia GPUs are digitally partitioned, there appears to be an issue with identifying the assigned GPU in MATLAB R2024+. For SLURM jobs, MATLAB R2023b is currently deployed by default.

The following settings can be used to specify the HPC GPU setup.

Field	Default	Explanation
parameters.hpc_gpu	"gpu:1"	Specific GPUs could be requested here (e.g.,`"nvidia_a100-sxm4-40gb:1"`, but this is not recommended. `scontrol show nodes \\| egrep -o gres/gpu:.=[0-9] \\| egrep -o 'nvidia_.=' \\| sort \\| uniq \\| sed 's/=//'` lists available GPU types.
parameters.hpc_partition	"gpu"	The Donders HPC offers a `gpu40g` partition that should be used for the majority of thermal simulations. It consists of nodes with GPU with vRAM > 40 GB.
parameters.hpc_reservation	""	By default do not use a reserved cue.

Benchmark data — Nvidia GPUs

These are potentially unrepresentative benchmarks run on a 256 × 216 × 192 mm grid, with minor variations depending on transducer placement.

Acoustic Simulations

GPU	Memory Used	Duration	Notes
A100 80 GB	12 GB	14 mins	Slightly smaller grid size
A100 80 GB (partitioned 2×40 GB)	12 GB	25 mins
A100 40 GB	12 GB	19 mins
A16 16 GB	12 GB	145 mins
P100 16 GB	12 GB	43 mins	Compiled, ~L40s
L40S 47 GB	14 GB	28 mins	Compiled

Heating Simulations

GPU	Memory Used	Duration	Notes
A100 80 GB	?? GB	~12 s/trial (400 trials: ~90 mins)	Slightly smaller grid size
A100 40 GB	?? GB	~17 s/trial (400 trials: ~115 mins)
A16 16 GB	Out of RAM	???
P100 16 GB	Out of RAM	???
L40S 47 GB	?? GB	~4 s/trial (400 trials: ~45 mins)