Intel® tools

Intel® Inspector

Intel® Inspector is a dynamic memory and threading error checking tool. This page will guide you through writing ReFrame tests to analyze our code with this tool. Looking into the Class shows how to setup and run the code with the tool.

Running the test

The test can be run from the command-line:

module load reframe
cd hpctools.git/reframechecks/intel/

~/reframe.git/reframe.py \
-C ~/reframe.git/config/cscs.py \
--system daint:gpu \
--prefix=$SCRATCH -r \
-p PrgEnv-gnu \
--performance-report \
--keep-stage-files \
-c ./intel_inspector.py

A successful ReFrame output will look like the following:

Reframe version: 2.22
Launched on host: daint101

[----------] started processing sphexa_inspector_sqpatch_024mpi_001omp_100n_0steps (Tool validation)
[ RUN      ] sphexa_inspector_sqpatch_024mpi_001omp_100n_0steps on daint:gpu using PrgEnv-gnu
[       OK ] sphexa_inspector_sqpatch_024mpi_001omp_100n_0steps on daint:gpu using PrgEnv-gnu
[----------] finished processing sphexa_inspector_sqpatch_024mpi_001omp_100n_0steps (Tool validation)

[  PASSED  ] Ran 1 test case(s) from 1 check(s) (0 failure(s))

Several analyses are available.

The mi1 (memory leak) analysis is triggered by setting the executable_opts.

Use self.post_run to generate the report with the tool.

Performance reporting

A typical output from the --performance-report flag will look like this:

PERFORMANCE REPORT
--------------------------------------------------
sphexa_inspector_sqpatch_024mpi_001omp_100n_0steps
- dom:gpu
   - PrgEnv-gnu
      * num_tasks: 24
      * Elapsed: 8.899 s
      ...
      * Memory not deallocated: 1

This report is generated from the data collected from the tool and processed in the self.perf_patterns part of the check. The number of (memory not deallocated) problems detected by the tool is extracted with the inspector_not_deallocated method. Looking at the report with the tool shows that the problem comes from a system library (libpmi.so) hence we can assume there is no problem with the code.

Inspector screenshot

Intel Inspector (launched with: inspxe-gui rpt.nid00000/)

Intel® Vtune

Intel® Vtune is Intel’s performance profiler for C, C++, Fortran, Assembly and Python.

Running the test

The test can be run from the command-line:

module load reframe
cd hpctools.git/reframechecks/intel/

~/reframe.git/reframe.py \
-C ~/reframe.git/config/cscs.py \
--system daint:gpu \
--prefix=$SCRATCH -r \
-p PrgEnv-intel \
--performance-report \
--keep-stage-files \
-c ./intel_vtune.py

A successful ReFrame output will look like the following:

Reframe version: 3.0-dev2 (rev: 6d543136)
Launched on host: daint101

[----------] waiting for spawned checks to finish
[       OK ] sphexa_vtune_sqpatch_024mpi_001omp_100n_1steps on daint:gpu using PrgEnv-intel
[       OK ] sphexa_vtune_sqpatch_048mpi_001omp_125n_1steps on daint:gpu using PrgEnv-intel
[       OK ] sphexa_vtune_sqpatch_096mpi_001omp_157n_1steps on daint:gpu using PrgEnv-intel
[----------] all spawned checks have finished

[  PASSED  ] Ran 3 test case(s) from 3 check(s) (0 failure(s))

Several analyses are available:

Available analysis types are: ``vtune -h collect``

    .. code-block:: none

      hotspots            <--
      memory-consumption
      uarch-exploration
      memory-access
      threading
      hpc-performance
      system-overview
      graphics-rendering
      io
      fpga-interaction
      gpu-offload
      gpu-hotspots
      throttling
      platform-profiler

Looking into the Class shows how to setup and run the code with the tool. Notice that this class is a derived class hence super().__init__() is required. The hotspots analysis is triggered by setting the executable_opts.

Use self.post_run to generate the report with the tool. Notice that the tool collects performance data per compute node.

Performance reporting

An overview of the performance data for a 4 compute nodes job will typically look like this:

Vtune screenshot 00

Intel Vtune (overview)

As a result, a typical output from the --performance-report flag will look like this:

sphexa_vtune_sqpatch_096mpi_001omp_157n_1steps
   - PrgEnv-intel
      * num_tasks: 96
      * Elapsed: 5.0388 s
      * _Elapsed: 37 s
      * domain_distribute: 0.207 s
      * mpi_synchronizeHalos: 0.5091 s
      * BuildTree: 0 s
      * FindNeighbors: 0.8136 s
      * Density: 0.5672 s
      * EquationOfState: 0.001 s
      * IAD: 0.9999 s
      * MomentumEnergyIAD: 1.5887 s
      * Timestep: 0.1516 s
      * UpdateQuantities: 0.0087 s
      * EnergyConservation: 0.0028 s
      * SmoothingLength: 0.002 s
      * %MomentumEnergyIAD: 31.53 %
      * %Timestep: 3.01 %
      * %mpi_synchronizeHalos: 10.1 %
      * %FindNeighbors: 16.15 %
      * %IAD: 19.84 %
      * vtune_elapsed_min: 7.408 s
      * vtune_elapsed_max: 9.841 s
      * vtune_elapsed_cput: 8.1791 s
      * vtune_elapsed_cput_efft: 7.985 s
      * vtune_elapsed_cput_spint: 0.3246 s
      * vtune_elapsed_cput_spint_mpit: 0.3191 s
      * %vtune_effective_physical_core_utilization: 87.8 %
      * %vtune_effective_logical_core_utilization: 86.6 %
      * vtune_cput_cn0: 196.3 s
      * %vtune_cput_cn0_efft: 96.0 %
      * %vtune_cput_cn0_spint: 4.0 %
      * vtune_cput_cn1: 195.71 s
      * %vtune_cput_cn1_efft: 97.9 %
      * %vtune_cput_cn1_spint: 2.1 %
      * vtune_cput_cn2: 189.77 s
      * %vtune_cput_cn2_efft: 98.1 %
      * %vtune_cput_cn2_spint: 1.9 %
      * vtune_cput_cn3: 148.46 s
      * %vtune_cput_cn3_efft: 97.6 %
      * %vtune_cput_cn3_spint: 2.4 %

This report is generated from the data collected from the tool and processed in the set_vtune_perf_patterns_rpt method of the check.

Looking at the report with the tool gives more insight into the performance of the code:

Vtune screenshot 01

Intel Vtune Summary view (launched with: vtune-gui rpt.nid00034/rpt.nid00034.vtune)

Vtune screenshot 02

Intel Vtune Hotspots (Src/Assembly view)

Intel® Advisor

Intel® Advisor is a Vectorization Optimization and Thread Prototyping tool:

  • Vectorize & thread code for maximum performance

  • Easy workflow + data + tips = faster code faster

  • Prioritize, Prototype & Predict performance gain.

Running the test

The test can be run from the command-line:

module load reframe
cd hpctools.git/reframechecks/intel/

~/reframe.git/reframe.py \
-C ~/reframe.git/config/cscs.py \
--system daint:gpu \
--prefix=$SCRATCH -r \
-p PrgEnv-gnu \
--performance-report \
--keep-stage-files \
-c ./intel_advisor.py

A successful ReFrame output will look like the following:

Reframe version: 2.22
Launched on host: daint101

[----------] started processing sphexa_advisor_sqpatch_024mpi_001omp_100n_0steps (Tool validation)
[ RUN      ] sphexa_advisor_sqpatch_024mpi_001omp_100n_0steps on daint:gpu using PrgEnv-gnu
[       OK ] sphexa_advisor_sqpatch_024mpi_001omp_100n_0steps on daint:gpu using PrgEnv-gnu
[----------] finished processing sphexa_advisor_sqpatch_024mpi_001omp_100n_0steps (Tool validation)

[  PASSED  ] Ran 1 test case(s) from 1 check(s) (0 failure(s))

Several analyses are available:

Looking into the Class shows how to setup and run the code with the tool. The survey analysis is triggered by setting the executable_opts.

Use self.post_run to generate the report with the tool.

Performance reporting

A typical output from the --performance-report flag will look like this:

PERFORMANCE REPORT
--------------------------------------------------
sphexa_inspector_sqpatch_024mpi_001omp_100n_0steps
- dom:gpu
   - PrgEnv-gnu
      * num_tasks: 24
      * Elapsed: 3.6147 s
      ...
      * advisor_elapsed: 2.13 s
      * advisor_loop1_line: 94 (momentumAndEnergyIAD.hpp)

This report is generated from the data collected from the tool and processed in the self.perf_patterns part of the check. This information (elapsed walltime, source filename and line number) is extracted for mpi rank 0 with the advisor_elapsed, advisor_loop1_filename, and advisor_loop1_line methods. Looking at the report with the tool gives more insight into the performance of the code:

Advisor screenshot 01

Intel Advisor (launched with: advixe-gui rpt/rpt.advixeproj)

Advisor screenshot 02

Intel Advisor (survey analysis)