Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Nexus GPU jobs for NERSC/Perlmutter #4699

Merged
merged 1 commit into from
Aug 15, 2023

Conversation

aannabe
Copy link
Contributor

@aannabe aannabe commented Aug 14, 2023

Proposed changes

OMP_PROC_BIND and OMP_PLACES environment variables should not be set for GPU jobs. With this change, Nexus GPU jobs are running for me on Perlmutter.

What type(s) of changes does this code introduce?

  • Bugfix

Does this introduce a breaking change?

  • No

What systems has this change been tested on?

NERSC/Perlmutter (Nexus tests are passing)

Checklist

  • Yes. This PR is up to date with current the current state of 'develop'

@prckent prckent self-requested a review August 15, 2023 14:43
Copy link
Contributor

@prckent prckent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving since it allows jobs to run!

However I suspect we might need to revisit this for effective multithreaded GPU offload. Did the environment variable change come as a recommendation from NERSC or through experimentation?

@aannabe
Copy link
Contributor Author

aannabe commented Aug 15, 2023

The change is inferred from the job script examples given on the NERSC website and through testing. The jobs start but hang when the OMP variables are set.

https://docs.nersc.gov/systems/perlmutter/running-jobs/

@aannabe
Copy link
Contributor Author

aannabe commented Aug 15, 2023

I agree this should be revisited once a more efficient job specification is achieved.

@prckent
Copy link
Contributor

prckent commented Aug 15, 2023

Test this please

@ye-luo ye-luo enabled auto-merge August 15, 2023 15:46
@ye-luo ye-luo merged commit b24ba10 into QMCPACK:develop Aug 15, 2023
@aannabe aannabe deleted the perlmutter_gpu branch August 15, 2023 17:17
@prckent prckent mentioned this pull request Aug 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants