-
Notifications
You must be signed in to change notification settings - Fork 116
Performance tuning for NVIDIA Grace-Hopper for the Gordon Bell runs #987
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Explore these optional code suggestions:
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #987 +/- ##
=======================================
Coverage 40.93% 40.93%
=======================================
Files 70 70
Lines 20288 20288
Branches 2517 2517
=======================================
Hits 8305 8305
Misses 10447 10447
Partials 1536 1536 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
it seems the maxregisters flag should be guarded by the device type (i'm sure some devices don't have that many registers) |
User description
This PR will add some changes to reduce registers and improve performance.
Tested with NVHPC nightly on single Santis node with 4 Grace-Hoppers on
3D_IGR_TaylorGreenVortex_nvidia
case.PR Type
Enhancement
Description
Add GPU register limit optimization for NVIDIA Grace-Hopper
Include GPU memory management directive for Jacobian arrays
Diagram Walkthrough
File Walkthrough
CMakeLists.txt
Set GPU register count limit
CMakeLists.txt
-gpu=maxregcount:165
compiler flag to limit GPU register usagem_igr.fpp
Add GPU memory management directive
src/simulation/m_igr.fpp
$:GPU_DECLARE
directive for Jacobian arrays (jac
,jac_rhs
,jac_old
)