Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ensure localrc is created in the correct subdir for NVHPC v22.9+ #3240

Merged
merged 6 commits into from
Apr 3, 2024

Conversation

jfgrimm
Copy link
Member

@jfgrimm jfgrimm commented Mar 1, 2024

To fix issue reported in slack. Currently, compiling the following minimal example fails:

$ cat minimal.cpp
#include <ranges>

int main(){ return 0; }
$ nvc++ -std=c++20 minimal.cpp -o minimal
"minimal.cpp", line 1: catastrophic error: cannot open source file "ranges"
  #include <ranges>
                   ^

1 catastrophic error detected in the compilation of "minimal.cpp".
Compilation terminated.

Ss noticed by @branfosj:

During the install we use makelocalrc to generate a localrc file. This has a bunch of pointers to the GCC install.
In my working 22.3 install, there is one of these files. In Linux_x86_64/22.3/compilers/bin/localrc
In my broken 24.1 install there are two of them

Linux_x86_64/24.1/compilers/bin/localrc
Linux_x86_64/24.1/compilers/localrc

The second of these is the one generated by EB and is correct. The first one has a mix of OS GCC and EB GCC in it.

(created using eb --new-pr)

@jfgrimm jfgrimm added the bug fix label Mar 1, 2024
@jfgrimm jfgrimm added this to the 4.x milestone Mar 1, 2024
@jfgrimm jfgrimm changed the title ensure local.rc is created in the correct subdir for NVHPC v22.9+ ensure localrc is created in the correct subdir for NVHPC v22.9+ Mar 1, 2024
@jfgrimm
Copy link
Member Author

jfgrimm commented Mar 1, 2024

Test report by @jfgrimm

Overview of tested easyconfigs (in order)

  • SUCCESS NVHPC-22.7-CUDA-11.7.0.eb
  • SUCCESS NVHPC-22.9-CUDA-11.7.0.eb
  • SUCCESS NVHPC-22.11-CUDA-11.7.0.eb
  • SUCCESS NVHPC-23.1-CUDA-12.0.0.eb
  • SUCCESS NVHPC-23.7-CUDA-12.1.1.eb

Build succeeded for 5 out of 5 (5 easyconfigs in total)
himem01.viking2.yor.alces.network - Linux Rocky Linux 8.8, x86_64, AMD EPYC 7643 48-Core Processor, Python 3.6.8
See https://gist.github.com/jfgrimm/827262c27e9c29fd815b783377b97e48 for a full test report.

edit: I can confirm that all the above modules can now compile the example without issues

@boegel
Copy link
Member

boegel commented Mar 1, 2024

@jfgrimm Can we add that minimal test as a sanity check command in sanity_check_step, while we're at it?

@boegel boegel modified the milestones: 4.x, release after 4.9.0 Mar 1, 2024
@jfgrimm
Copy link
Member Author

jfgrimm commented Mar 1, 2024

@boegel we could, but it will only show up when the system compiler is older than GCC 9, no?
Perhaps we should check there is only one locale, with sensible settings

@boegel
Copy link
Member

boegel commented Mar 1, 2024

@boegel we could, but it will only show up when the system compiler is older than GCC 9, no? Perhaps we should check there is only one locale, with sensible settings

That doesn't seem unlikely to me:

$ cat /etc/redhat-release
Red Hat Enterprise Linux release 8.8 (Ootpa)
$ which gcc
/usr/bin/gcc
$ gcc -V
...
gcc version 8.5.0 20210514 (Red Hat 8.5.0-18) (GCC)

It's just good to catch this type of problem early on via a lightweight sanity check, we shouldn't declare success on the installation if it's picking up the wrong GCC...

@jfgrimm
Copy link
Member Author

jfgrimm commented Mar 1, 2024

Test report by @jfgrimm

Overview of tested easyconfigs (in order)

  • SUCCESS NVHPC-22.7-CUDA-11.7.0.eb
  • SUCCESS NVHPC-22.9-CUDA-11.7.0.eb
  • SUCCESS NVHPC-22.11-CUDA-11.7.0.eb
  • SUCCESS NVHPC-23.1-CUDA-12.0.0.eb
  • SUCCESS NVHPC-23.7-CUDA-12.1.1.eb

Build succeeded for 5 out of 5 (5 easyconfigs in total)
himem01.viking2.yor.alces.network - Linux Rocky Linux 8.8, x86_64, AMD EPYC 7643 48-Core Processor, Python 3.6.8
See https://gist.github.com/jfgrimm/6f845306a003de633715871e698f4d1d for a full test report.

@jfgrimm
Copy link
Member Author

jfgrimm commented Mar 1, 2024

@boegel done

@boegel
Copy link
Member

boegel commented Mar 1, 2024

@boegelbot please test @ generoso
EB_ARGS="NVHPC-24.1-CUDA-12.3.0.eb --sanity-check-only"

@branfosj
Copy link
Member

branfosj commented Mar 1, 2024

Test report by @branfosj

Overview of tested easyconfigs (in order)

Build succeeded for 0 out of 1 (1 easyconfigs in total)
bear-pg0105u03b - Linux RHEL 8.6, x86_64, Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz (icelake), Python 3.6.8
See https://gist.github.com/branfosj/8c593983c4914e341aed350fa254fe2a for a full test report.

@boegelbot
Copy link

@boegel: Request for testing this PR well received on login1

PR test command 'EB_PR=3240 EB_ARGS="NVHPC-24.1-CUDA-12.3.0.eb --sanity-check-only" EB_CONTAINER= EB_REPO=easybuild-easyblocks /opt/software/slurm/bin/sbatch --job-name test_PR_3240 --ntasks=4 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 13006

Test results coming soon (I hope)...

- notification for comment with ID 1973357008 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link

Test report by @boegelbot

Overview of tested easyconfigs (in order)

Build succeeded for 0 out of 1 (1 easyconfigs in total)
cns1 - Linux Rocky Linux 8.9, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/boegelbot/10d893a3cd0d7cba3970bdd6636a367e for a full test report.

@branfosj
Copy link
Member

branfosj commented Mar 1, 2024

My failure is with EB5

nvc++ is checked:

== 2024-03-01 15:08:04,681 run.py:379 INFO Shell command completed successfully (see output above): nvc++ -v

but then we unload the module

== 2024-03-01 15:08:05,097 run.py:379 INFO Shell command completed successfully (see output above): /usr/share/lmod/lmod/libexec/lmod python unload NVHPC/20.11

so nvc++ is no longer found

== 2024-03-01 15:08:05,102 run.py:578 INFO running cmd: nvc++ -std=c++20 minimal.cpp -o minimal 
== 2024-03-01 15:08:07,590 build_log.py:171 ERROR EasyBuild encountered an error (at easybuild/src/easybuild-framework/easybuild/base/exceptions.py:126 in __init__): cmd "nvc++ -std=c++20 minimal.cpp -o minimal" exited with exit code 127 and output:
/bin/bash: nvc++: command not found

@boegel
Copy link
Member

boegel commented Mar 1, 2024

@branfosj We should pass the nvc++ command via custom_commands, then that problem shouldn't occur.

I'll take a quick look at that...

@jfgrimm
Copy link
Member Author

jfgrimm commented Mar 1, 2024

ah right
I assume I didn't see that because we rpath

run minimal NVHPC compilation command via custom_commands passed to sanity_check_step
@boegel
Copy link
Member

boegel commented Mar 1, 2024

@boegelbot please test @ jsc-zen3
EB_ARGS="NVHPC-24.1-CUDA-12.3.0.eb NVHPC-22.7-CUDA-11.7.0.eb"
EB_BRANCH=5.0.x

boegel
boegel previously approved these changes Mar 1, 2024
@boegelbot
Copy link

@boegel: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ "5.0.x" != 'develop' ]]; then EB_BRANCH="5.0.x" ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/"5.0.x" source init_env_easybuild_develop.sh; fi; EB_PR=3240 EB_ARGS="NVHPC-24.1-CUDA-12.3.0.eb NVHPC-22.7-CUDA-11.7.0.eb" EB_REPO=easybuild-easyblocks EB_BRANCH="5.0.x" /opt/software/slurm/bin/sbatch --job-name test_PR_3240 --ntasks=8 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 3706

Test results coming soon (I hope)...

- notification for comment with ID 1973401312 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link

Test report by @boegelbot

Overview of tested easyconfigs (in order)

Build succeeded for 1 out of 2 (2 easyconfigs in total)
jsczen3c1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.3, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.18
See https://gist.github.com/boegelbot/f75bf019c003d8b01bd9132dd5f65ce6 for a full test report.

@jfgrimm
Copy link
Member Author

jfgrimm commented Mar 1, 2024

@boegel are we using an unsupported GCC/CUDA combination in NVHPC 24.1? I'm not sure there is a CUDA release that supports GCC 13.x yet...

edit: hmm, compiling the example with NVHPC 24.1 doesn't fail for me

@boegel
Copy link
Member

boegel commented Mar 2, 2024

@boegelbot please test @ generoso
EB_ARGS="NVIDIA-20.11.eb NVIDIA-20.7.eb NVIDIA-20.9.eb NVIDIA-21.11.eb NVIDIA-21.2.eb NVIDIA-21.3.eb NVIDIA-21.5.eb NVIDIA-21.9.eb NVIDIA-22.1-CUDA-11.4.1.eb NVIDIA-22.11-CUDA-11.7.0.eb NVIDIA-22.7-CUDA-11.7.0.eb NVIDIA-22.9-CUDA-11.7.0.eb NVIDIA-23.1-CUDA-12.0.0.eb NVIDIA-23.7-CUDA-12.1.1.eb NVIDIA-23.7-CUDA-12.2.0.eb NVIDIA-24.1-CUDA-12.3.0.eb"

@boegel
Copy link
Member

boegel commented Mar 2, 2024

@boegel are we using an unsupported GCC/CUDA combination in NVHPC 24.1? I'm not sure there is a CUDA release that supports GCC 13.x yet...

edit: hmm, compiling the example with NVHPC 24.1 doesn't fail for me

I suspect it may be a problem with RHEL 9.x more than with the GCC being used...

That issue shouldn't block this PR.

@boegelbot
Copy link

@boegel: Request for testing this PR well received on login1

PR test command 'EB_PR=3240 EB_ARGS="NVIDIA-20.11.eb NVIDIA-20.7.eb NVIDIA-20.9.eb NVIDIA-21.11.eb NVIDIA-21.2.eb NVIDIA-21.3.eb NVIDIA-21.5.eb NVIDIA-21.9.eb NVIDIA-22.1-CUDA-11.4.1.eb NVIDIA-22.11-CUDA-11.7.0.eb NVIDIA-22.7-CUDA-11.7.0.eb NVIDIA-22.9-CUDA-11.7.0.eb NVIDIA-23.1-CUDA-12.0.0.eb NVIDIA-23.7-CUDA-12.1.1.eb NVIDIA-23.7-CUDA-12.2.0.eb NVIDIA-24.1-CUDA-12.3.0.eb" EB_CONTAINER= EB_REPO=easybuild-easyblocks /opt/software/slurm/bin/sbatch --job-name test_PR_3240 --ntasks=4 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 13014

Test results coming soon (I hope)...

- notification for comment with ID 1974785517 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@SebastianAchilles
Copy link
Member

@boegelbot please test @ generoso
EB_ARGS="NVHPC-20.11.eb NVHPC-20.7.eb NVHPC-20.9.eb NVHPC-21.11.eb NVHPC-21.2.eb NVHPC-21.3.eb NVHPC-21.5.eb NVHPC-21.9.eb NVHPC-22.11-CUDA-11.7.0.eb NVHPC-22.7-CUDA-11.7.0.eb NVHPC-22.9-CUDA-11.7.0.eb NVHPC-23.1-CUDA-12.0.0.eb NVHPC-23.7-CUDA-12.1.1.eb NVHPC-23.7-CUDA-12.2.0.eb NVHPC-24.1-CUDA-12.3.0.eb"

@boegelbot
Copy link

@SebastianAchilles: Request for testing this PR well received on login1

PR test command 'EB_PR=3240 EB_ARGS="NVHPC-20.11.eb NVHPC-20.7.eb NVHPC-20.9.eb NVHPC-21.11.eb NVHPC-21.2.eb NVHPC-21.3.eb NVHPC-21.5.eb NVHPC-21.9.eb NVHPC-22.11-CUDA-11.7.0.eb NVHPC-22.7-CUDA-11.7.0.eb NVHPC-22.9-CUDA-11.7.0.eb NVHPC-23.1-CUDA-12.0.0.eb NVHPC-23.7-CUDA-12.1.1.eb NVHPC-23.7-CUDA-12.2.0.eb NVHPC-24.1-CUDA-12.3.0.eb" EB_CONTAINER= EB_REPO=easybuild-easyblocks /opt/software/slurm/bin/sbatch --job-name test_PR_3240 --ntasks=4 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 13041

Test results coming soon (I hope)...

- notification for comment with ID 1981851883 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link

Test report by @boegelbot

Overview of tested easyconfigs (in order)

Build succeeded for 12 out of 15 (15 easyconfigs in total)
cns1 - Linux Rocky Linux 8.9, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/boegelbot/aec80a45ae80021a23c15749020b2b21 for a full test report.

Co-authored-by: Mikael Öhman <micketeer@gmail.com>
Copy link
Contributor

@Micket Micket left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@Micket
Copy link
Contributor

Micket commented Apr 2, 2024

@boegelbot please test @ generoso
EB_ARGS="NVHPC-20.11.eb NVHPC-20.7.eb NVHPC-20.9.eb"

@boegelbot
Copy link

@Micket: Request for testing this PR well received on login1

PR test command 'EB_PR=3240 EB_ARGS="NVHPC-20.11.eb NVHPC-20.7.eb NVHPC-20.9.eb" EB_CONTAINER= EB_REPO=easybuild-easyblocks /opt/software/slurm/bin/sbatch --job-name test_PR_3240 --ntasks=4 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 13246

Test results coming soon (I hope)...

- notification for comment with ID 2032452776 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@Micket
Copy link
Contributor

Micket commented Apr 2, 2024

Test report by @Micket

Overview of tested easyconfigs (in order)

  • SUCCESS NVHPC-23.1-CUDA-12.0.0.eb
  • SUCCESS NVHPC-23.7-CUDA-12.1.1.eb

Build succeeded for 2 out of 2 (2 easyconfigs in total)
vera-icelake-build - Linux Rocky Linux 8.9, x86_64, Intel(R) Xeon(R) Silver 4316 CPU @ 2.30GHz, Python 3.6.8
See https://gist.github.com/Micket/a8a31954082d2d1af62948515649954e for a full test report.

@boegelbot
Copy link

Test report by @boegelbot

Overview of tested easyconfigs (in order)

  • SUCCESS NVHPC-20.11.eb
  • SUCCESS NVHPC-20.7.eb
  • SUCCESS NVHPC-20.9.eb

Build succeeded for 3 out of 3 (3 easyconfigs in total)
cns1 - Linux Rocky Linux 8.9, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/boegelbot/7b477991b0fe982422a304a60f26a9f6 for a full test report.

@Micket
Copy link
Contributor

Micket commented Apr 2, 2024

@boegelbot please test @ jsc-zen3
EB_ARGS="NVHPC-22.11-CUDA-11.7.0.eb NVHPC-22.7-CUDA-11.7.0.eb NVHPC-22.9-CUDA-11.7.0.eb NVHPC-23.1-CUDA-12.0.0.eb NVHPC-23.7-CUDA-12.1.1.eb NVHPC-23.7-CUDA-12.2.0.eb NVHPC-24.1-CUDA-12.3.0.eb"

@boegelbot
Copy link

@Micket: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=3240 EB_ARGS="NVHPC-22.11-CUDA-11.7.0.eb NVHPC-22.7-CUDA-11.7.0.eb NVHPC-22.9-CUDA-11.7.0.eb NVHPC-23.1-CUDA-12.0.0.eb NVHPC-23.7-CUDA-12.1.1.eb NVHPC-23.7-CUDA-12.2.0.eb NVHPC-24.1-CUDA-12.3.0.eb" EB_REPO=easybuild-easyblocks EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_3240 --ntasks=8 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 3911

Test results coming soon (I hope)...

- notification for comment with ID 2032514951 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link

Test report by @boegelbot

Overview of tested easyconfigs (in order)

  • SUCCESS NVHPC-22.11-CUDA-11.7.0.eb
  • SUCCESS NVHPC-22.7-CUDA-11.7.0.eb
  • SUCCESS NVHPC-22.9-CUDA-11.7.0.eb
  • SUCCESS NVHPC-23.1-CUDA-12.0.0.eb
  • SUCCESS NVHPC-23.7-CUDA-12.1.1.eb
  • FAIL (build issue) NVHPC-24.1-CUDA-12.3.0.eb (partial log available at https://gist.github.com/boegelbot/42c6f7236775f1503832805e6885a53b)
  • SUCCESS CUDA-12.2.0.eb
  • SUCCESS NVHPC-23.7-CUDA-12.2.0.eb

Build succeeded for 7 out of 8 (7 easyconfigs in total)
jsczen3c1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.3, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.18
See https://gist.github.com/boegelbot/9b64f2055cd19da230b993d78994cd65 for a full test report.

Copy link
Member

@boegel boegel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@Micket
Copy link
Contributor

Micket commented Apr 2, 2024

Not sure what to make of the NVHPC-24.1-CUDA-12.3.0.eb failure on jsc-zen3.

@boegel
Copy link
Member

boegel commented Apr 2, 2024

Test report by @boegel

Overview of tested easyconfigs (in order)

  • SUCCESS NVHPC-21.2.eb
  • SUCCESS NVHPC-22.9-CUDA-11.7.0.eb
  • SUCCESS NVHPC-24.1-CUDA-12.3.0.eb

Build succeeded for 3 out of 3 (3 easyconfigs in total)
node3125.skitty.os - Linux RHEL 8.8, x86_64, Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz, Python 3.6.8
See https://gist.github.com/boegel/d4e98801b9360d640406797adac316c1 for a full test report.

@boegel
Copy link
Member

boegel commented Apr 3, 2024

Not sure what to make of the NVHPC-24.1-CUDA-12.3.0.eb failure on jsc-zen3.

@Micket That's most likely due to GCC 13.2 and CUDA 12.3.0 not being compatible, see also #20158, so that shouldn't block this PR.

@boegel boegel merged commit b9d09f7 into easybuilders:develop Apr 3, 2024
47 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants