Getting started¶

Usage of performance tools to identify and remove bottlenecks is part of most modern performance engineering workflows. ReFrame is a regression testing framework for HPC systems that allows to write portable regression tests, focusing only on functionality and performance. It has been used in production at CSCS since 2016 and is being actively developed. All the tests used in this guide are freely available here.

This page will guide you through writing ReFrame tests to analyze the performance of your code. You should be familiar with ReFrame, this link to the ReFrame tutorial can serve as a starting point. As a reference test code, we will use the SPH-EXA mini-app. This code is based on the smoothed particle hydrodynamics (SPH) method, which is a particle-based, meshfree, Lagrangian method for simulating multidimensional fluids with arbitrary geometries, commonly employed in astrophysics, cosmology, and computational fluid-dynamics. The mini-app is a C++14, lightweight, flexible, and header-only code with no external software dependencies. Parallelism is expressed via multiple programming models such as OpenMP and HPX for node level parallelism, MPI for internode communication and Cuda, OpenACC and OpenMP targets for accelerators. Our reference HPC system is Piz Daint.

The most simple case of a ReFrame test is a test that compiles, runs and checks the output of the job. Looking into the Class shows how to setup and run the code with the tool. In this example, we set the 3 parameters as follows: 24 mpi tasks, a cube size of ${100}^3$ particles in the 3D square patch test and only 1 step of simulation.

Running the test¶

The test can be run from the command-line:

module load reframe
cd hpctools.git/reframechecks/notool/

~/reframe.git/reframe.py \
-C ~/reframe.git/config/cscs.py \
--system daint:gpu \
--prefix=$SCRATCH -r \
--keep-stage-files \
-c ./internal_timers_mpi.py

where:

-C points to the site config-file,
--system selects the targeted system,
--prefix sets output directory,
-r runs the selected check,
-c points to the check.

A typical output from ReFrame will look like this:

Reframe version: 2.22
Launched on host: daint101
Reframe paths
=============
    Check prefix      :
    Check search path : 'internal_timers_mpi.py'
    Stage dir prefix     : /scratch/snx3000tds/piccinal/stage/
    Output dir prefix    : /scratch/snx3000tds/piccinal/output/
    Perf. logging prefix : /scratch/snx3000tds/piccinal/perflogs
[==========] Running 1 check(s)

[----------] started processing sphexa_timers_sqpatch_024mpi_001omp_100n_0steps (Strong scaling study)
[ RUN      ] sphexa_timers_sqpatch_024mpi_001omp_100n_0steps on daint:gpu using PrgEnv-cray
[       OK ] sphexa_timers_sqpatch_024mpi_001omp_100n_0steps on daint:gpu using PrgEnv-cray
[ RUN      ] sphexa_timers_sqpatch_024mpi_001omp_100n_0steps on daint:gpu using PrgEnv-cray_classic
[       OK ] sphexa_timers_sqpatch_024mpi_001omp_100n_0steps on daint:gpu using PrgEnv-cray_classic
[ RUN      ] sphexa_timers_sqpatch_024mpi_001omp_100n_0steps on daint:gpu using PrgEnv-gnu
[       OK ] sphexa_timers_sqpatch_024mpi_001omp_100n_0steps on daint:gpu using PrgEnv-gnu
[ RUN      ] sphexa_timers_sqpatch_024mpi_001omp_100n_0steps on daint:gpu using PrgEnv-intel
[       OK ] sphexa_timers_sqpatch_024mpi_001omp_100n_0steps on daint:gpu using PrgEnv-intel
[ RUN      ] sphexa_timers_sqpatch_024mpi_001omp_100n_0steps on daint:gpu using PrgEnv-pgi
[       OK ] sphexa_timers_sqpatch_024mpi_001omp_100n_0steps on daint:gpu using PrgEnv-pgi
[----------] finished processing sphexa_timers_sqpatch_024mpi_001omp_100n_0steps (Strong scaling study)

[  PASSED  ] Ran 5 test case(s) from 1 check(s) (0 failure(s))

By default, the test is run with every programming environment set inside the check. It is possible to select only one programming environment with the -p flag (-p PrgEnv-gnu for instance).

Sanity checking¶

All of our tests passed. Sanity checking checks if a test passed or not. In this simple example, we check that the job output reached the end of the first step, this is coded in the self.sanity_patterns part of the Class.

Performance reporting¶

The mini-app calls the std::chrono library to measure and report the elapsed time for each timestep from different parts of the code in the job output. ReFrame supports the extraction and manipulation of performance data from the program output, as well as a comprehensive way of setting performance thresholds per system and per system partitions. In addition to performance checking, it is possible to print a performance report with the --performance-report flag. A typical report for the mini-app with PrgEnv-gnu will look like this:

PERFORMANCE REPORT
-----------------------------------------------
sphexa_timers_sqpatch_024mpi_001omp_100n_0steps
- daint:gpu
   - PrgEnv-gnu
      * num_tasks: 24
      * Elapsed: 3.6201 s
      * _Elapsed: 5 s
      * domain_build: 0.0956 s
      * mpi_synchronizeHalos: 0.4567 s
      * BuildTree: 0 s
      * FindNeighbors: 0.3547 s
      * Density: 0.296 s
      * EquationOfState: 0.0024 s
      * IAD: 0.6284 s
      * MomentumEnergyIAD: 1.0914 s
      * Timestep: 0.6009 s
      * UpdateQuantities: 0.0051 s
      * EnergyConservation: 0.0012 s
      * SmoothingLength: 0.0033 s
      * %MomentumEnergyIAD: 30.15 %
      * %Timestep: 16.6 %
      * %mpi_synchronizeHalos: 12.62 %
      * %FindNeighbors: 9.8 %
      * %IAD: 17.36 %

This report is generated from the data collected from the job output and processed in the self.perf_patterns part of the check. For example, the time spent in the MomentumEnergyIAD is extracted with the seconds_energ method. Similarly, the percentage of walltime spent in the MomentumEnergyIAD is calculated with the pctg_MomentumEnergyIAD method.

Summary¶

We have covered the basic aspects of a ReFrame test. The next section will expand this test with the integration of a list of commonly used performance tools.