Getting started¶
Usage of performance tools to identify and remove bottlenecks is part of most modern performance engineering workflows. ReFrame is a regression testing framework for HPC systems that allows to write portable regression tests, focusing only on functionality and performance. It has been used in production at CSCS since 2016 and is being actively developed. All the tests used in this guide are freely available here.
This page will guide you through writing ReFrame tests to analyze the performance of your code. You should be familiar with ReFrame, this link to the ReFrame tutorial can serve as a starting point. As a reference test code, we will use the SPH-EXA mini-app. This code is based on the smoothed particle hydrodynamics (SPH) method, which is a particle-based, meshfree, Lagrangian method for simulating multidimensional fluids with arbitrary geometries, commonly employed in astrophysics, cosmology, and computational fluid-dynamics. The mini-app is a C++14, lightweight, flexible, and header-only code with no external software dependencies. Parallelism is expressed via multiple programming models such as OpenMP and HPX for node level parallelism, MPI for internode communication and Cuda, OpenACC and OpenMP targets for accelerators. Our reference HPC system is Piz Daint.
The most simple case of a ReFrame test is a test that compiles, runs and
checks the output of the job. Looking into the Class
shows how to setup and run the code
with the tool. In this example, we set the 3 parameters as follows: 24 mpi
tasks, a cube size of particles in the 3D square patch test and
only 1 step of simulation.
Running the test¶
The test can be run from the command-line:
module load reframe
cd hpctools.git/reframechecks/notool/
~/reframe.git/reframe.py \
-C ~/reframe.git/config/cscs.py \
--system daint:gpu \
--prefix=$SCRATCH -r \
--keep-stage-files \
-c ./internal_timers_mpi.py
- where:
-C
points to the site config-file,--system
selects the targeted system,--prefix
sets output directory,-r
runs the selected check,-c
points to the check.
A typical output from ReFrame will look like this:
Reframe version: 2.22
Launched on host: daint101
Reframe paths
=============
Check prefix :
Check search path : 'internal_timers_mpi.py'
Stage dir prefix : /scratch/snx3000tds/piccinal/stage/
Output dir prefix : /scratch/snx3000tds/piccinal/output/
Perf. logging prefix : /scratch/snx3000tds/piccinal/perflogs
[==========] Running 1 check(s)
[----------] started processing sphexa_timers_sqpatch_024mpi_001omp_100n_0steps (Strong scaling study)
[ RUN ] sphexa_timers_sqpatch_024mpi_001omp_100n_0steps on daint:gpu using PrgEnv-cray
[ OK ] sphexa_timers_sqpatch_024mpi_001omp_100n_0steps on daint:gpu using PrgEnv-cray
[ RUN ] sphexa_timers_sqpatch_024mpi_001omp_100n_0steps on daint:gpu using PrgEnv-cray_classic
[ OK ] sphexa_timers_sqpatch_024mpi_001omp_100n_0steps on daint:gpu using PrgEnv-cray_classic
[ RUN ] sphexa_timers_sqpatch_024mpi_001omp_100n_0steps on daint:gpu using PrgEnv-gnu
[ OK ] sphexa_timers_sqpatch_024mpi_001omp_100n_0steps on daint:gpu using PrgEnv-gnu
[ RUN ] sphexa_timers_sqpatch_024mpi_001omp_100n_0steps on daint:gpu using PrgEnv-intel
[ OK ] sphexa_timers_sqpatch_024mpi_001omp_100n_0steps on daint:gpu using PrgEnv-intel
[ RUN ] sphexa_timers_sqpatch_024mpi_001omp_100n_0steps on daint:gpu using PrgEnv-pgi
[ OK ] sphexa_timers_sqpatch_024mpi_001omp_100n_0steps on daint:gpu using PrgEnv-pgi
[----------] finished processing sphexa_timers_sqpatch_024mpi_001omp_100n_0steps (Strong scaling study)
[ PASSED ] Ran 5 test case(s) from 1 check(s) (0 failure(s))
By default, the test is run with every programming environment set inside the
check. It is possible to select only one programming environment with the -p
flag (-p PrgEnv-gnu
for instance).
Sanity checking¶
All of our tests passed. Sanity checking checks if a test passed or not. In
this simple example, we check that the job output reached the end of the first
step, this is coded in the self.sanity_patterns
part of the Class
.
Performance reporting¶
The mini-app calls the std::chrono
library to measure and report the
elapsed time for each timestep from different parts of the code in the job
output.
ReFrame supports the extraction and manipulation of performance data from the
program output, as well as a comprehensive way of setting performance
thresholds per system and per system partitions. In addition to performance
checking, it is possible to print a performance report with the
--performance-report
flag. A typical report for the mini-app with
PrgEnv-gnu will look like this:
PERFORMANCE REPORT
-----------------------------------------------
sphexa_timers_sqpatch_024mpi_001omp_100n_0steps
- daint:gpu
- PrgEnv-gnu
* num_tasks: 24
* Elapsed: 3.6201 s
* _Elapsed: 5 s
* domain_build: 0.0956 s
* mpi_synchronizeHalos: 0.4567 s
* BuildTree: 0 s
* FindNeighbors: 0.3547 s
* Density: 0.296 s
* EquationOfState: 0.0024 s
* IAD: 0.6284 s
* MomentumEnergyIAD: 1.0914 s
* Timestep: 0.6009 s
* UpdateQuantities: 0.0051 s
* EnergyConservation: 0.0012 s
* SmoothingLength: 0.0033 s
* %MomentumEnergyIAD: 30.15 %
* %Timestep: 16.6 %
* %mpi_synchronizeHalos: 12.62 %
* %FindNeighbors: 9.8 %
* %IAD: 17.36 %
This report is generated from the data collected from the job output and
processed in the self.perf_patterns
part of the check. For example, the
time spent in the MomentumEnergyIAD
is extracted with the
seconds_energ
method.
Similarly, the percentage of walltime spent in the MomentumEnergyIAD
is
calculated with the pctg_MomentumEnergyIAD
method.
Summary¶
We have covered the basic aspects of a ReFrame test. The next section will expand this test with the integration of a list of commonly used performance tools.