Translations:OpenACC Tutorial - Optimizing loops/19/en

From Alliance Doc
Jump to navigation Jump to search

As instructed in the third section of this tutorial, open the NVidia Visual Profiler and start a new session with the latest executable we have built. Then, follow the following steps (see beside for screenshots of each step):

  1. Go in in the "Analysis" tab, and click on "Examine GPU Usage". Once the analysis is run, the profiler gives you a series of warning. This gives you indications on what it might be possible to improve upon.
  2. Then click on "Examine Individual Kernels". This will show you a list of kernels.
  3. Select the top one, and click on "Perform Kernel Analysis". The profiler will show you a more detailed analysis of this specific kernel, highlighting the most likely bottleneck. In this case, the performance is limited by memory latency.
  4. Click on "Perform Latency Analysis"