GPU Reference Guide¶
Regression tests¶
nsys_cuda.py¶
- class reframechecks.nvidia.nsys_cuda.SphExaNsysCudaCheck(*args: Any, **kwargs: Any)¶
Bases:
reframe.
This class runs the test code with Nvidia nsys systems (2 mpi tasks min) https://docs.nvidia.com/nsight-systems/index.html
Available analysis types are:
nsys profile -help
2 parameters can be set for simulation:
- Parameters
mpitask – number of mpi tasks; the size of the cube in the 3D square patch test is set with a dictionary depending on mpitask, but cubesize could also be on the list of parameters,
steps – number of simulation steps.
Typical performance reporting:
- Versions:
cudatoolkit/10.2.89 has nsys/2019.5.2.16-b54ef97
nvhpc/2020_207-cuda-10.2 has nsys/2020.3.1.54-2bd2a65
nvidia-nsight-systems/2020.3.1.72 has nsys/2020.3.1.72-e5b8014 <–
- class reframechecks.nvidia.nsys_cuda.SphExaNsysCudaCheck(*args: Any, **kwargs: Any)¶
Bases:
reframe.
This class runs the test code with Nvidia nsys systems (2 mpi tasks min) https://docs.nvidia.com/nsight-systems/index.html
Available analysis types are:
nsys profile -help
2 parameters can be set for simulation:
- Parameters
mpitask – number of mpi tasks; the size of the cube in the 3D square patch test is set with a dictionary depending on mpitask, but cubesize could also be on the list of parameters,
steps – number of simulation steps.
Typical performance reporting:
- Versions:
cudatoolkit/10.2.89 has nsys/2019.5.2.16-b54ef97
nvhpc/2020_207-cuda-10.2 has nsys/2020.3.1.54-2bd2a65
nvidia-nsight-systems/2020.3.1.72 has nsys/2020.3.1.72-e5b8014 <–
Sanity checks¶
- reframechecks.common.sphexa.sanity_nvidia.nsys_perf_patterns(obj)¶
Dictionary of default nsys_perf_patterns for the tool
- reframechecks.common.sphexa.sanity_nvidia.nsys_report_DtoH_KiB(self)¶
Reports
[CUDA memcpy DtoH]
Memory Operation (KiB) measured by the tool and averaged over compute nodes> job.stdout # CUDA Memory Operation Statistics (KiB) # # Total Operations Average Minimum # ----------------- -------------- ----------------- ----------------- # 1530313.0 296 5170.0 0.055 # 16500.0 84 196.4 62.500 # ******* # ... # Maximum Name # ----------------- ------------------- # 81250.0 [CUDA memcpy HtoD] # 250.0 [CUDA memcpy DtoH]
- reframechecks.common.sphexa.sanity_nvidia.nsys_report_DtoH_pct(self)¶
Reports
[CUDA memcpy DtoH]
Time(%) measured by the tool and averaged over compute nodes> job.stdout # CUDA Memory Operation Statistics (nanoseconds) # # Time(%) Total Time Operations Average ... # ------- -------------- ---------- -------------- ... # 0.9 1385579 84 16495.0 ... # **** # # Minimum Maximum Name # -------------- -------------- ------------------- # 6144 21312 [CUDA memcpy DtoH]
- reframechecks.common.sphexa.sanity_nvidia.nsys_report_HtoD_KiB(self)¶
Reports
[CUDA memcpy HtoD]
Memory Operation (KiB) measured by the tool and averaged over compute nodes> job.stdout # CUDA Memory Operation Statistics (KiB) # # Total Operations Average Minimum # ----------------- -------------- ----------------- ----------------- # 1530313.0 296 5170.0 0.055 # ********* # 16500.0 84 196.4 62.500 # ... # Maximum Name # ----------------- ------------------- # 81250.0 [CUDA memcpy HtoD] # 250.0 [CUDA memcpy DtoH]
- reframechecks.common.sphexa.sanity_nvidia.nsys_report_HtoD_pct(self)¶
Reports
[CUDA memcpy HtoD]
Time(%) measured by the tool and averaged over compute nodes> job.stdout # CUDA Memory Operation Statistics (nanoseconds) # # Time(%) Total Time Operations Average ... # ------- -------------- ---------- -------------- ... # 99.1 154400354 296 521622.8 ... # **** # # Minimum Maximum Name # -------------- -------------- ------------------- # 896 8496291 [CUDA memcpy HtoD]
- reframechecks.common.sphexa.sanity_nvidia.nsys_report_computeIAD_pct(self)¶
Reports
CUDA Kernel
Time (%) for computeIAD measured by the tool and averaged over compute nodes> job.stdout # CUDA Kernel Statistics (nanoseconds) # # Time(%) Total Time Instances Average Minimum # ------- -------------- ---------- -------------- -------------- # 49.7 69968829 6 11661471.5 11507063 # 26.4 37101887 6 6183647.8 6047175 # **** # 24.0 33719758 24 1404989.9 1371531 # ... # Maximum Name # -------------- ------------------ # 11827539 computeMomentumAndEnergyIAD # 6678078 computeIAD # 1459594 density
- reframechecks.common.sphexa.sanity_nvidia.nsys_report_cudaMemcpy_pct(self)¶
Reports
CUDA API
Time (%) for cudaMemcpy measured by the tool and averaged over compute nodes> job.stdout # CUDA API Statistics (nanoseconds) # # Time(%) Total Time Calls Average Minimum # ------- -------------- ---------- -------------- -------------- # 44.9 309427138 378 818590.3 9709 # **** # 40.6 279978449 2 139989224.5 24173 # 9.5 65562201 308 212864.3 738 # 4.9 33820196 306 110523.5 2812 # 0.1 704223 36 19561.8 9305 # .... # Maximum Name # -------------- ------------------ # 11665852 cudaMemcpy # 279954276 cudaMemcpyToSymbol # 3382747 cudaFree # 591094 cudaMalloc # 34042 cudaLaunch
- reframechecks.common.sphexa.sanity_nvidia.nsys_report_momentumEnergy_pct(self)¶
Reports
CUDA Kernel
Time (%) for MomentumAndEnergyIAD measured by the tool and averaged over compute nodes> job.stdout # CUDA Kernel Statistics (nanoseconds) # # Time(%) Total Time Instances Average Minimum # ------- -------------- ---------- -------------- -------------- # 49.7 69968829 6 11661471.5 11507063 # **** # 26.4 37101887 6 6183647.8 6047175 # 24.0 33719758 24 1404989.9 1371531 # ... # Maximum Name # -------------- ------------------ # 11827539 computeMomentumAndEnergyIAD # 6678078 computeIAD # 1459594 density
- reframechecks.common.sphexa.sanity_nvidia.nsys_version(obj)¶
Checks tool’s version:
> nsys --version NVIDIA Nsight Systems version 2020.1.1.65-085319d returns: True or False
- reframechecks.common.sphexa.sanity_nvidia.nvprof_perf_patterns(obj)¶
Dictionary of default nsys_perf_patterns for the tool
- reframechecks.common.sphexa.sanity_nvidia.nvprof_report_DtoH_KiB(self)¶
Reports
[CUDA memcpy DtoH]
Memory Operation (KiB) measured by the tool and averaged over compute nodes (TODO)
- reframechecks.common.sphexa.sanity_nvidia.nvprof_report_DtoH_pct(self)¶
Reports
[CUDA memcpy DtoH]
Time(%) measured by the tool and averaged over compute nodes> job.stdout (Name: [CUDA memcpy DtoH]) # Time(%) Time Calls Avg Min Max # 2.80% 1.3194ms 44 29.986us 29.855us 30.528us [CUDA memcpy DtoH] # 1.34% 1.7667ms 44 40.152us 39.519us 41.887us [CUDA memcpy DtoH] # ^^^^
- reframechecks.common.sphexa.sanity_nvidia.nvprof_report_HtoD_KiB(self)¶
Reports
[CUDA memcpy HtoD]
Memory Operation (KiB) measured by the tool and averaged over compute nodes (TODO)
- reframechecks.common.sphexa.sanity_nvidia.nvprof_report_HtoD_pct(self)¶
Reports
[CUDA memcpy HtoD]
Time(%) measured by the tool and averaged over compute nodes> job.stdout (Name: [CUDA memcpy HtoD]) # Type Time(%) Time Calls Avg Min Max # GPU activities: 48.57% 22.849ms 162 141.04us 896ns 1.6108ms # GPU activities: 56.12% 74.108ms 162 457.45us 928ns 5.8896ms # ^^^^^
- reframechecks.common.sphexa.sanity_nvidia.nvprof_report_computeIAD_pct(self)¶
Reports
CUDA Kernel
Time (%) for computeIAD measured by the tool and averaged over compute nodes> job.stdout # (where Name = sphexa::sph::cuda::kernels::computeIAD) # Time(%) Time Calls Avg Min Max Name # 12.62% 5.9380ms 4 1.4845ms 1.3352ms 1.6593ms void ... # 10.54% 13.915ms 4 3.4788ms 3.3458ms 3.7058ms void ... # ^^^^^
- reframechecks.common.sphexa.sanity_nvidia.nvprof_report_cudaMemcpy_pct(self)¶
Reports
CUDA API
Time (%) for cudaMemcpy measured by the tool and averaged over compute nodes> job.stdout (where Name = cudaMemcpy|cudaMemcpyToSymbol) # Time(%) Total Time Calls Average Minimum Maximum Name # API calls: 74.37% 219.93ms 2 109.96ms 20.433us 219.90ms ... # 18.32% 54.169ms 204 265.53us 11.398us 3.5624ms ... # API calls: 54.65% 222.03ms 2 111.02ms 20.502us 222.01ms ... # 34.88% 141.73ms 204 694.76us 21.168us 7.5486ms ...
- reframechecks.common.sphexa.sanity_nvidia.nvprof_report_momentumEnergy_pct(self)¶
Reports
CUDA Kernel
Time (%) for MomentumAndEnergyIAD measured by the tool and averaged over compute nodes> job.stdout # (where Name = sphexa::sph::cuda::kernels::computeMomentumAndEnergyIAD) # Time(%) Time Calls Avg Min Max Name # 28.25% 13.288ms 4 3.3220ms 3.1001ms 3.4955ms void ... # 21.63% 28.565ms 4 7.1414ms 6.6616ms 7.4616ms void ... # ^^^^^
- reframechecks.common.sphexa.sanity_nvidia.nvprof_version(obj)¶
Checks tool’s version:
> nvprof --version nvprof: NVIDIA (R) Cuda command line profiler Copyright (c) 2012 - 2019 NVIDIA Corporation Release version 10.2.89 (21) ^^^^^^^ returns: True or False
scorep_openacc.py¶
- class reframechecks.openacc.scorep_openacc.SphExaNativeCheck(*args: Any, **kwargs: Any)¶
Bases:
reframe.
This class runs the test code with Score-P (MPI+OpenACC):
4 parameters can be set for simulation:
- Parameters
mpitask – number of mpi tasks; the size of the cube in the 3D square patch test is set with a dictionary depending on mpitask, but cubesize could also be on the list of parameters,
steps – number of simulation steps.
cycles – Compiler-instrumented code is required for OpenACC (regions obtained via sampling/unwinding cannot be filtered) => cycles is set to 0.
rumetric – Record Linux Resource Usage Counters to provide information about consumed resources and operating system events such as user/system time, maximum resident set size, and number of page faults: man getrusage
- otf_profiler()¶
- class reframechecks.openacc.scorep_openacc.SphExaNativeCheck(*args: Any, **kwargs: Any)¶
Bases:
reframe.
This class runs the test code with Score-P (MPI+OpenACC):
4 parameters can be set for simulation:
- Parameters
mpitask – number of mpi tasks; the size of the cube in the 3D square patch test is set with a dictionary depending on mpitask, but cubesize could also be on the list of parameters,
steps – number of simulation steps.
cycles – Compiler-instrumented code is required for OpenACC (regions obtained via sampling/unwinding cannot be filtered) => cycles is set to 0.
rumetric – Record Linux Resource Usage Counters to provide information about consumed resources and operating system events such as user/system time, maximum resident set size, and number of page faults: man getrusage
- otf_profiler()¶
Sanity checks¶
- reframechecks.common.sphexa.sanity_scorep_openacc.otf2cli_metric_flag(obj)¶
If SCOREP_METRIC_RUSAGE is defined then return the
otf-profiler
flags so that it will not segfault.
- reframechecks.common.sphexa.sanity_scorep_openacc.otf2cli_metric_name(obj)¶
If SCOREP_METRIC_RUSAGE is defined then return the metric name.
- reframechecks.common.sphexa.sanity_scorep_openacc.otf2cli_perf_patterns(obj)¶
Dictionary of default
perf_patterns
for the tool
- reframechecks.common.sphexa.sanity_scorep_openacc.otf2cli_read_json_rpt(obj)¶
Reads the json file reported by
otf_profiler
, needed for perf_patterns.
- reframechecks.common.sphexa.sanity_scorep_openacc.otf2cli_tool_reference(obj)¶
Dictionary of default
reference
for the tool