Intel® tools¶
Intel® Inspector¶
Intel® Inspector is a
dynamic memory and threading error checking tool. This page will guide you
through writing ReFrame tests to analyze our code with this
tool. Looking into the Class
shows how to setup and run the code with the tool.
Running the test¶
The test can be run from the command-line:
module load reframe
cd hpctools.git/reframechecks/intel/
~/reframe.git/reframe.py \
-C ~/reframe.git/config/cscs.py \
--system daint:gpu \
--prefix=$SCRATCH -r \
-p PrgEnv-gnu \
--performance-report \
--keep-stage-files \
-c ./intel_inspector.py
A successful ReFrame output will look like the following:
Reframe version: 2.22
Launched on host: daint101
[----------] started processing sphexa_inspector_sqpatch_024mpi_001omp_100n_0steps (Tool validation)
[ RUN ] sphexa_inspector_sqpatch_024mpi_001omp_100n_0steps on daint:gpu using PrgEnv-gnu
[ OK ] sphexa_inspector_sqpatch_024mpi_001omp_100n_0steps on daint:gpu using PrgEnv-gnu
[----------] finished processing sphexa_inspector_sqpatch_024mpi_001omp_100n_0steps (Tool validation)
[ PASSED ] Ran 1 test case(s) from 1 check(s) (0 failure(s))
Several analyses are available.
The mi1
(memory leak) analysis is triggered by setting the
executable_opts
.
Use self.post_run
to generate the report with the tool.
Performance reporting¶
A typical output from the --performance-report
flag will look like this:
PERFORMANCE REPORT
--------------------------------------------------
sphexa_inspector_sqpatch_024mpi_001omp_100n_0steps
- dom:gpu
- PrgEnv-gnu
* num_tasks: 24
* Elapsed: 8.899 s
...
* Memory not deallocated: 1
This report is generated from the data collected from the tool and processed in
the self.perf_patterns
part of the check.
The number of (memory not deallocated
) problems detected by the tool
is extracted with the inspector_not_deallocated
method.
Looking at the report with the tool shows that the problem comes from a system
library (libpmi.so
) hence we can assume there is no problem with the code.
Intel® Vtune¶
Intel® Vtune is Intel’s performance profiler for C, C++, Fortran, Assembly and Python.
Running the test¶
The test can be run from the command-line:
module load reframe
cd hpctools.git/reframechecks/intel/
~/reframe.git/reframe.py \
-C ~/reframe.git/config/cscs.py \
--system daint:gpu \
--prefix=$SCRATCH -r \
-p PrgEnv-intel \
--performance-report \
--keep-stage-files \
-c ./intel_vtune.py
A successful ReFrame output will look like the following:
Reframe version: 3.0-dev2 (rev: 6d543136)
Launched on host: daint101
[----------] waiting for spawned checks to finish
[ OK ] sphexa_vtune_sqpatch_024mpi_001omp_100n_1steps on daint:gpu using PrgEnv-intel
[ OK ] sphexa_vtune_sqpatch_048mpi_001omp_125n_1steps on daint:gpu using PrgEnv-intel
[ OK ] sphexa_vtune_sqpatch_096mpi_001omp_157n_1steps on daint:gpu using PrgEnv-intel
[----------] all spawned checks have finished
[ PASSED ] Ran 3 test case(s) from 3 check(s) (0 failure(s))
Several analyses are available:
Available analysis types are: ``vtune -h collect``
.. code-block:: none
hotspots <--
memory-consumption
uarch-exploration
memory-access
threading
hpc-performance
system-overview
graphics-rendering
io
fpga-interaction
gpu-offload
gpu-hotspots
throttling
platform-profiler
Looking into the Class
shows how
to setup and run the code with the tool. Notice that this class is a derived
class hence super().__init__()
is required. The hotspots
analysis is
triggered by setting the executable_opts
.
Use self.post_run
to generate the report with the tool. Notice that the
tool collects performance data per compute node.
Performance reporting¶
An overview of the performance data for a 4 compute nodes job will typically look like this:
As a result, a typical output from the --performance-report
flag will look
like this:
sphexa_vtune_sqpatch_096mpi_001omp_157n_1steps
- PrgEnv-intel
* num_tasks: 96
* Elapsed: 5.0388 s
* _Elapsed: 37 s
* domain_distribute: 0.207 s
* mpi_synchronizeHalos: 0.5091 s
* BuildTree: 0 s
* FindNeighbors: 0.8136 s
* Density: 0.5672 s
* EquationOfState: 0.001 s
* IAD: 0.9999 s
* MomentumEnergyIAD: 1.5887 s
* Timestep: 0.1516 s
* UpdateQuantities: 0.0087 s
* EnergyConservation: 0.0028 s
* SmoothingLength: 0.002 s
* %MomentumEnergyIAD: 31.53 %
* %Timestep: 3.01 %
* %mpi_synchronizeHalos: 10.1 %
* %FindNeighbors: 16.15 %
* %IAD: 19.84 %
* vtune_elapsed_min: 7.408 s
* vtune_elapsed_max: 9.841 s
* vtune_elapsed_cput: 8.1791 s
* vtune_elapsed_cput_efft: 7.985 s
* vtune_elapsed_cput_spint: 0.3246 s
* vtune_elapsed_cput_spint_mpit: 0.3191 s
* %vtune_effective_physical_core_utilization: 87.8 %
* %vtune_effective_logical_core_utilization: 86.6 %
* vtune_cput_cn0: 196.3 s
* %vtune_cput_cn0_efft: 96.0 %
* %vtune_cput_cn0_spint: 4.0 %
* vtune_cput_cn1: 195.71 s
* %vtune_cput_cn1_efft: 97.9 %
* %vtune_cput_cn1_spint: 2.1 %
* vtune_cput_cn2: 189.77 s
* %vtune_cput_cn2_efft: 98.1 %
* %vtune_cput_cn2_spint: 1.9 %
* vtune_cput_cn3: 148.46 s
* %vtune_cput_cn3_efft: 97.6 %
* %vtune_cput_cn3_spint: 2.4 %
This report is generated from the data collected from the tool and processed in
the set_vtune_perf_patterns_rpt
method of the check
.
Looking at the report with the tool gives more insight into the performance of the code:
Intel® Advisor¶
Intel® Advisor is a Vectorization Optimization and Thread Prototyping tool:
Vectorize & thread code for maximum performance
Easy workflow + data + tips = faster code faster
Prioritize, Prototype & Predict performance gain.
Running the test¶
The test can be run from the command-line:
module load reframe
cd hpctools.git/reframechecks/intel/
~/reframe.git/reframe.py \
-C ~/reframe.git/config/cscs.py \
--system daint:gpu \
--prefix=$SCRATCH -r \
-p PrgEnv-gnu \
--performance-report \
--keep-stage-files \
-c ./intel_advisor.py
A successful ReFrame output will look like the following:
Reframe version: 2.22
Launched on host: daint101
[----------] started processing sphexa_advisor_sqpatch_024mpi_001omp_100n_0steps (Tool validation)
[ RUN ] sphexa_advisor_sqpatch_024mpi_001omp_100n_0steps on daint:gpu using PrgEnv-gnu
[ OK ] sphexa_advisor_sqpatch_024mpi_001omp_100n_0steps on daint:gpu using PrgEnv-gnu
[----------] finished processing sphexa_advisor_sqpatch_024mpi_001omp_100n_0steps (Tool validation)
[ PASSED ] Ran 1 test case(s) from 1 check(s) (0 failure(s))
Several analyses are available:
Looking into the Class
shows how
to setup and run the code with the tool. The survey
analysis is
triggered by setting the executable_opts
.
Use self.post_run
to generate the report with the tool.
Performance reporting¶
A typical output from the --performance-report
flag will look like this:
PERFORMANCE REPORT
--------------------------------------------------
sphexa_inspector_sqpatch_024mpi_001omp_100n_0steps
- dom:gpu
- PrgEnv-gnu
* num_tasks: 24
* Elapsed: 3.6147 s
...
* advisor_elapsed: 2.13 s
* advisor_loop1_line: 94 (momentumAndEnergyIAD.hpp)
This report is generated from the data collected from the tool and processed in
the self.perf_patterns
part of the check
.
This information (elapsed walltime, source filename and line number) is
extracted for mpi rank 0 with the advisor_elapsed
,
advisor_loop1_filename
, and
advisor_loop1_line
methods.
Looking at the report with the tool gives more insight into the performance of
the code: