Skip to content

Performance tuning for NVIDIA Grace-Hopper for the Gordon Bell runs #987

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ntselepidis
Copy link
Contributor

@ntselepidis ntselepidis commented Aug 17, 2025

User description

This PR will add some changes to reduce registers and improve performance.
Tested with NVHPC nightly on single Santis node with 4 Grace-Hoppers on 3D_IGR_TaylorGreenVortex_nvidia case.


PR Type

Enhancement


Description

  • Add GPU register limit optimization for NVIDIA Grace-Hopper

  • Include GPU memory management directive for Jacobian arrays


Diagram Walkthrough

flowchart LR
  A["CMakeLists.txt"] -- "add maxregcount:165" --> B["GPU Register Limit"]
  C["m_igr.fpp"] -- "add GPU_DECLARE" --> D["GPU Memory Management"]
  B --> E["Performance Optimization"]
  D --> E
Loading

File Walkthrough

Relevant files
Configuration changes
CMakeLists.txt
Set GPU register count limit                                                         

CMakeLists.txt

  • Add -gpu=maxregcount:165 compiler flag to limit GPU register usage
+1/-1     
Enhancement
m_igr.fpp
Add GPU memory management directive                                           

src/simulation/m_igr.fpp

  • Add $:GPU_DECLARE directive for Jacobian arrays (jac, jac_rhs,
    jac_old)
+1/-0     

@ntselepidis ntselepidis changed the title Do some performance tuning for NVIDIA systems Performance tuning for NVIDIA Grace-Hopper for the Gordon Bell runs Aug 17, 2025
@ntselepidis ntselepidis marked this pull request as ready for review August 17, 2025 13:40
@ntselepidis ntselepidis requested review from a team as code owners August 17, 2025 13:40
Copy link

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Possible Issue

The $:GPU_DECLARE(create='[jac, jac_rhs, jac_old]') is introduced for pointer variables jac, jac_rhs, and jac_old. If these pointers are later associated/allocated differently than expected, the directive may not correctly manage their device residency, potentially causing mismatched host/device pointers or lifetimes. Validate that allocation/association and deallocation flows are compatible with this directive.

real(wp), pointer, contiguous, dimension(:, :, :) :: jac, jac_rhs, jac_old
$:GPU_DECLARE(create='[jac, jac_rhs, jac_old]')
real(wp), allocatable, dimension(:, :, :), pinned, target :: jac_host
Perf Risk

Hard-coding -gpu=maxregcount:165 may improve occupancy on GH200 but can cause spills and regressions on other NVIDIA GPUs or with different kernels/configurations. Consider guarding by architecture, making it configurable, or validating via ptxinfo to ensure no excessive spilling occurs.

target_compile_options(${a_target}
    PRIVATE -gpu=keep,ptxinfo,lineinfo -gpu=maxregcount:165
)

Copy link

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Merge GPU flags into one

Combine GPU flags into a single -gpu option to avoid later flags overriding
earlier ones in NVHPC. Using multiple -gpu options can lead to only the last one
being honored, silently dropping requested features. Merge them with commas to
ensure all settings apply.

CMakeLists.txt [501-503]

 target_compile_options(${a_target}
-    PRIVATE -gpu=keep,ptxinfo,lineinfo -gpu=maxregcount:165
+    PRIVATE -gpu=keep,ptxinfo,lineinfo,maxregcount:165
 )
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies that using multiple -gpu flags with the NVHPC compiler can lead to earlier flags being overridden, and proposes the correct fix of merging them.

Medium
Fix GPU declare list syntax

Ensure the GPU declaration uses the correct list syntax expected by your
codegen/macros. If the directive expects a comma-separated list without
brackets, the current bracketed form may be ignored at compile-time, leaving
jac
arrays undeclared on GPU.
*

src/simulation/m_igr.fpp [29-30]

 real(wp), pointer, contiguous, dimension(:, :, :) :: jac, jac_rhs, jac_old
-$:GPU_DECLARE(create='[jac, jac_rhs, jac_old]')
+$:GPU_DECLARE(create='jac, jac_rhs, jac_old')
Suggestion importance[1-10]: 5

__

Why: The suggestion correctly identifies a potential syntax issue in the custom $:GPU_DECLARE directive, but it is speculative as it depends on the macro's specific implementation which is not visible.

Low
  • More

Copy link

codecov bot commented Aug 17, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 40.93%. Comparing base (88c0a11) to head (a1ed1a2).

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #987   +/-   ##
=======================================
  Coverage   40.93%   40.93%           
=======================================
  Files          70       70           
  Lines       20288    20288           
  Branches     2517     2517           
=======================================
  Hits         8305     8305           
  Misses      10447    10447           
  Partials     1536     1536           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sbryngelson
Copy link
Member

it seems the maxregisters flag should be guarded by the device type (i'm sure some devices don't have that many registers)

@ntselepidis ntselepidis marked this pull request as draft August 18, 2025 18:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

2 participants