GPU Reference Guide

Regression tests

nsys_cuda.py

class reframechecks.nvidia.nsys_cuda.SphExaNsysCudaCheck(*args: Any, **kwargs: Any)

Bases: RegressionTest

This class runs the test code with Nvidia nsys systems (2 mpi tasks min) https://docs.nvidia.com/nsight-systems/index.html

Available analysis types are: nsys profile -help

2 parameters can be set for simulation:

Parameters
  • mpitask – number of mpi tasks; the size of the cube in the 3D square patch test is set with a dictionary depending on mpitask, but cubesize could also be on the list of parameters,

  • steps – number of simulation steps.

Typical performance reporting:

Versions:
  • cudatoolkit/10.2.89 has nsys/2019.5.2.16-b54ef97

  • nvhpc/2020_207-cuda-10.2 has nsys/2020.3.1.54-2bd2a65

  • nvidia-nsight-systems/2020.3.1.72 has nsys/2020.3.1.72-e5b8014 <–

class reframechecks.nvidia.nsys_cuda.SphExaNsysCudaCheck(*args: Any, **kwargs: Any)

Bases: RegressionTest

This class runs the test code with Nvidia nsys systems (2 mpi tasks min) https://docs.nvidia.com/nsight-systems/index.html

Available analysis types are: nsys profile -help

2 parameters can be set for simulation:

Parameters
  • mpitask – number of mpi tasks; the size of the cube in the 3D square patch test is set with a dictionary depending on mpitask, but cubesize could also be on the list of parameters,

  • steps – number of simulation steps.

Typical performance reporting:

Versions:
  • cudatoolkit/10.2.89 has nsys/2019.5.2.16-b54ef97

  • nvhpc/2020_207-cuda-10.2 has nsys/2020.3.1.54-2bd2a65

  • nvidia-nsight-systems/2020.3.1.72 has nsys/2020.3.1.72-e5b8014 <–

Sanity checks

reframechecks.common.sphexa.sanity_nvidia.nsys_perf_patterns(obj)

Dictionary of default nsys_perf_patterns for the tool

reframechecks.common.sphexa.sanity_nvidia.nsys_report_DtoH_KiB(self)

Reports [CUDA memcpy DtoH] Memory Operation (KiB) measured by the tool and averaged over compute nodes

> job.stdout
# CUDA Memory Operation Statistics (KiB)
#
#             Total      Operations            Average            Minimum
# -----------------  --------------  -----------------  -----------------
#         1530313.0             296             5170.0              0.055
#           16500.0              84              196.4             62.500
#           *******
# ...
#            Maximum  Name
#  -----------------  -------------------
#            81250.0  [CUDA memcpy HtoD]
#              250.0  [CUDA memcpy DtoH]
reframechecks.common.sphexa.sanity_nvidia.nsys_report_DtoH_pct(self)

Reports [CUDA memcpy DtoH] Time(%) measured by the tool and averaged over compute nodes

> job.stdout
# CUDA Memory Operation Statistics (nanoseconds)
#
# Time(%)      Total Time  Operations         Average  ...
# -------  --------------  ----------  --------------  ...
#     0.9         1385579          84         16495.0  ...
#    ****
#
#             Minimum         Maximum  Name
#      --------------  --------------  -------------------
#                6144           21312  [CUDA memcpy DtoH]
reframechecks.common.sphexa.sanity_nvidia.nsys_report_HtoD_KiB(self)

Reports [CUDA memcpy HtoD] Memory Operation (KiB) measured by the tool and averaged over compute nodes

> job.stdout
# CUDA Memory Operation Statistics (KiB)
#
#             Total      Operations            Average            Minimum
# -----------------  --------------  -----------------  -----------------
#         1530313.0             296             5170.0              0.055
#         *********
#           16500.0              84              196.4             62.500
# ...
#            Maximum  Name
#  -----------------  -------------------
#            81250.0  [CUDA memcpy HtoD]
#              250.0  [CUDA memcpy DtoH]
reframechecks.common.sphexa.sanity_nvidia.nsys_report_HtoD_pct(self)

Reports [CUDA memcpy HtoD] Time(%) measured by the tool and averaged over compute nodes

> job.stdout
# CUDA Memory Operation Statistics (nanoseconds)
#
# Time(%)      Total Time  Operations         Average  ...
# -------  --------------  ----------  --------------  ...
#    99.1       154400354         296        521622.8  ...
#    ****
#
#             Minimum         Maximum  Name
#      --------------  --------------  -------------------
#                 896         8496291  [CUDA memcpy HtoD]
reframechecks.common.sphexa.sanity_nvidia.nsys_report_computeIAD_pct(self)

Reports CUDA Kernel Time (%) for computeIAD measured by the tool and averaged over compute nodes

> job.stdout
# CUDA Kernel Statistics (nanoseconds)
#
# Time(%)      Total Time   Instances         Average         Minimum
# -------  --------------  ----------  --------------  --------------
#    49.7        69968829           6      11661471.5        11507063
#    26.4        37101887           6       6183647.8         6047175
#    ****
#    24.0        33719758          24       1404989.9         1371531
# ...
#         Maximum  Name
#  --------------  ------------------
#        11827539  computeMomentumAndEnergyIAD
#         6678078  computeIAD
#         1459594  density
reframechecks.common.sphexa.sanity_nvidia.nsys_report_cudaMemcpy_pct(self)

Reports CUDA API Time (%) for cudaMemcpy measured by the tool and averaged over compute nodes

> job.stdout

# CUDA API Statistics (nanoseconds)
#
# Time(%)      Total Time       Calls         Average         Minimum
# -------  --------------  ----------  --------------  --------------
#    44.9       309427138         378        818590.3            9709
#    ****
#    40.6       279978449           2     139989224.5           24173
#     9.5        65562201         308        212864.3             738
#     4.9        33820196         306        110523.5            2812
#     0.1          704223          36         19561.8            9305
# ....
#         Maximum  Name
#  --------------  ------------------
#        11665852  cudaMemcpy
#       279954276  cudaMemcpyToSymbol
#         3382747  cudaFree
#          591094  cudaMalloc
#           34042  cudaLaunch
reframechecks.common.sphexa.sanity_nvidia.nsys_report_momentumEnergy_pct(self)

Reports CUDA Kernel Time (%) for MomentumAndEnergyIAD measured by the tool and averaged over compute nodes

> job.stdout
# CUDA Kernel Statistics (nanoseconds)
#
# Time(%)      Total Time   Instances         Average         Minimum
# -------  --------------  ----------  --------------  --------------
#    49.7        69968829           6      11661471.5        11507063
#    ****
#    26.4        37101887           6       6183647.8         6047175
#    24.0        33719758          24       1404989.9         1371531
# ...
#         Maximum  Name
#  --------------  ------------------
#        11827539  computeMomentumAndEnergyIAD
#         6678078  computeIAD
#         1459594  density
reframechecks.common.sphexa.sanity_nvidia.nsys_version(obj)

Checks tool’s version:

> nsys --version
NVIDIA Nsight Systems version 2020.1.1.65-085319d
returns: True or False
reframechecks.common.sphexa.sanity_nvidia.nvprof_perf_patterns(obj)

Dictionary of default nsys_perf_patterns for the tool

reframechecks.common.sphexa.sanity_nvidia.nvprof_report_DtoH_KiB(self)

Reports [CUDA memcpy DtoH] Memory Operation (KiB) measured by the tool and averaged over compute nodes (TODO)

reframechecks.common.sphexa.sanity_nvidia.nvprof_report_DtoH_pct(self)

Reports [CUDA memcpy DtoH] Time(%) measured by the tool and averaged over compute nodes

> job.stdout (Name: [CUDA memcpy DtoH])
# Time(%)    Time   Calls       Avg       Min      Max
# 2.80%  1.3194ms      44  29.986us  29.855us 30.528us [CUDA memcpy DtoH]
# 1.34%  1.7667ms      44  40.152us  39.519us 41.887us [CUDA memcpy DtoH]
# ^^^^
reframechecks.common.sphexa.sanity_nvidia.nvprof_report_HtoD_KiB(self)

Reports [CUDA memcpy HtoD] Memory Operation (KiB) measured by the tool and averaged over compute nodes (TODO)

reframechecks.common.sphexa.sanity_nvidia.nvprof_report_HtoD_pct(self)

Reports [CUDA memcpy HtoD] Time(%) measured by the tool and averaged over compute nodes

> job.stdout (Name: [CUDA memcpy HtoD])
#             Type  Time(%)      Time Calls       Avg   Min       Max
#  GPU activities:   48.57%  22.849ms   162  141.04us 896ns  1.6108ms
#  GPU activities:   56.12%  74.108ms   162  457.45us 928ns  5.8896ms
#                    ^^^^^
reframechecks.common.sphexa.sanity_nvidia.nvprof_report_computeIAD_pct(self)

Reports CUDA Kernel Time (%) for computeIAD measured by the tool and averaged over compute nodes

> job.stdout
# (where Name = sphexa::sph::cuda::kernels::computeIAD)
# Time(%)     Time  Calls       Avg       Min       Max  Name
# 12.62%  5.9380ms      4  1.4845ms  1.3352ms  1.6593ms  void ...
# 10.54%  13.915ms      4  3.4788ms  3.3458ms  3.7058ms  void ...
# ^^^^^
reframechecks.common.sphexa.sanity_nvidia.nvprof_report_cudaMemcpy_pct(self)

Reports CUDA API Time (%) for cudaMemcpy measured by the tool and averaged over compute nodes

> job.stdout (where Name = cudaMemcpy|cudaMemcpyToSymbol)
#           Time(%) Total Time Calls   Average   Minimum   Maximum  Name
# API calls: 74.37%   219.93ms     2  109.96ms  20.433us  219.90ms  ...
#            18.32%   54.169ms   204  265.53us  11.398us  3.5624ms  ...
# API calls: 54.65%   222.03ms     2  111.02ms  20.502us  222.01ms  ...
#            34.88%   141.73ms   204  694.76us  21.168us  7.5486ms  ...
reframechecks.common.sphexa.sanity_nvidia.nvprof_report_momentumEnergy_pct(self)

Reports CUDA Kernel Time (%) for MomentumAndEnergyIAD measured by the tool and averaged over compute nodes

> job.stdout
# (where Name = sphexa::sph::cuda::kernels::computeMomentumAndEnergyIAD)
# Time(%)     Time  Calls   Avg       Min       Max  Name
# 28.25%  13.288ms  4  3.3220ms  3.1001ms  3.4955ms  void ...
# 21.63%  28.565ms  4  7.1414ms  6.6616ms  7.4616ms  void ...
# ^^^^^
reframechecks.common.sphexa.sanity_nvidia.nvprof_version(obj)

Checks tool’s version:

> nvprof --version
nvprof: NVIDIA (R) Cuda command line profiler
Copyright (c) 2012 - 2019 NVIDIA Corporation
Release version 10.2.89 (21)
                ^^^^^^^
returns: True or False

scorep_openacc.py

class reframechecks.openacc.scorep_openacc.SphExaNativeCheck(*args: Any, **kwargs: Any)

Bases: RegressionTest

This class runs the test code with Score-P (MPI+OpenACC):

4 parameters can be set for simulation:

Parameters
  • mpitask – number of mpi tasks; the size of the cube in the 3D square patch test is set with a dictionary depending on mpitask, but cubesize could also be on the list of parameters,

  • steps – number of simulation steps.

  • cycles – Compiler-instrumented code is required for OpenACC (regions obtained via sampling/unwinding cannot be filtered) => cycles is set to 0.

  • rumetric – Record Linux Resource Usage Counters to provide information about consumed resources and operating system events such as user/system time, maximum resident set size, and number of page faults: man getrusage

otf_profiler()
class reframechecks.openacc.scorep_openacc.SphExaNativeCheck(*args: Any, **kwargs: Any)

Bases: RegressionTest

This class runs the test code with Score-P (MPI+OpenACC):

4 parameters can be set for simulation:

Parameters
  • mpitask – number of mpi tasks; the size of the cube in the 3D square patch test is set with a dictionary depending on mpitask, but cubesize could also be on the list of parameters,

  • steps – number of simulation steps.

  • cycles – Compiler-instrumented code is required for OpenACC (regions obtained via sampling/unwinding cannot be filtered) => cycles is set to 0.

  • rumetric – Record Linux Resource Usage Counters to provide information about consumed resources and operating system events such as user/system time, maximum resident set size, and number of page faults: man getrusage

otf_profiler()

Sanity checks

reframechecks.common.sphexa.sanity_scorep_openacc.otf2cli_metric_flag(obj)

If SCOREP_METRIC_RUSAGE is defined then return the otf-profiler flags so that it will not segfault.

reframechecks.common.sphexa.sanity_scorep_openacc.otf2cli_metric_name(obj)

If SCOREP_METRIC_RUSAGE is defined then return the metric name.

reframechecks.common.sphexa.sanity_scorep_openacc.otf2cli_perf_patterns(obj)

Dictionary of default perf_patterns for the tool

reframechecks.common.sphexa.sanity_scorep_openacc.otf2cli_read_json_rpt(obj)

Reads the json file reported by otf_profiler, needed for perf_patterns.

reframechecks.common.sphexa.sanity_scorep_openacc.otf2cli_tool_reference(obj)

Dictionary of default reference for the tool