Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add patch to improve CUDA 11 compatibility of GCCcore/12.2.0 and GCCcore/12.3.0 #18854

Merged

Conversation

Flamefire
Copy link
Contributor

(created using eb --new-pr)

@Flamefire
Copy link
Contributor Author

Test report by @Flamefire
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
taurusml3 - Linux RHEL 7.6, POWER, 8335-GTX, 6 x NVIDIA Tesla V100-SXM2-32GB, 440.64.00, Python 2.7.5
See https://gist.github.com/Flamefire/10e05632966712c7387566bf1015c953 for a full test report.

@Flamefire
Copy link
Contributor Author

Test report by @Flamefire
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
taurusi8016 - Linux CentOS Linux 7.9.2009, x86_64, AMD EPYC 7352 24-Core Processor, 8 x NVIDIA NVIDIA A100-SXM4-40GB, 470.57.02, Python 2.7.5
See https://gist.github.com/Flamefire/24520e3f258c065e8d17eed9e5c14d2c for a full test report.

@boegel boegel changed the title improve CUDA 11 compatibility of GCC 12.2 improve CUDA 11 compatibility of GCCcore/12.2.0 Sep 27, 2023
@boegel boegel changed the title improve CUDA 11 compatibility of GCCcore/12.2.0 add patch to improve CUDA 11 compatibility of GCCcore/12.2.0 Sep 27, 2023
@boegel
Copy link
Member

boegel commented Sep 27, 2023

@Flamefire It seems like this same patch could/should be applied to GCCcore/12.3.0 as well?

$ sed -n '1466,1470p' gcc-12.3.0/libstdc++-v3/include/bits/locale_facets_nonio.tcc
      __err = ios_base::goodbit;
      bool __use_state = false;
#if __GNUC__ >= 5 && !defined(__clang__)
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wpmf-conversions"

@Flamefire
Copy link
Contributor Author

Yes very likely. However I wanted to get PyTorch working with GCC 12.2 & CUDA 11.7 first to have a proof this works. But takes longer than I thought due to issues with UCX on PPC and new failures in PyTorch on x86/A100

Feel free to add this patch to GCC 12.3 though as it doesn't seem those issues are caused by this patch

@boegel boegel changed the title add patch to improve CUDA 11 compatibility of GCCcore/12.2.0 add patch to improve CUDA 11 compatibility of GCCcore/12.2.0 and GCCcore/12.3.0 Oct 11, 2023
@boegel
Copy link
Member

boegel commented Oct 11, 2023

@boegelbot please test @ generoso
CORE_CNT=16

@boegelbot
Copy link
Collaborator

@boegel: Request for testing this PR well received on login1

PR test command 'EB_PR=18854 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs /opt/software/slurm/bin/sbatch --job-name test_PR_18854 --ntasks="16" ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 11919

Test results coming soon (I hope)...

- notification for comment with ID 1757036025 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegel
Copy link
Member

boegel commented Oct 11, 2023

@boegelbot please test @ jsc-zen2
CORE_CNT=16

@boegelbot
Copy link
Collaborator

@boegel: Request for testing this PR well received on jsczen2l1.int.jsc-zen2.easybuild-test.cluster

PR test command 'EB_PR=18854 EB_ARGS= EB_REPO=easybuild-easyconfigs /opt/software/slurm/bin/sbatch --mem-per-cpu=4000M --job-name test_PR_18854 --ntasks="16" ~/boegelbot/eb_from_pr_upload_jsc-zen2.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 3530

Test results coming soon (I hope)...

- notification for comment with ID 1757058037 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
cnx1 - Linux Rocky Linux 8.5, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/boegelbot/6acbd73d2ee0163ae70ca3945f3c2eba for a full test report.

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
jsczen2c1.int.jsc-zen2.easybuild-test.cluster - Linux Rocky Linux 8.5, x86_64, AMD EPYC 7742 64-Core Processor (zen2), Python 3.6.8
See https://gist.github.com/boegelbot/cce9bade7986e5f12646cdffd2783f42 for a full test report.

@boegel
Copy link
Member

boegel commented Oct 11, 2023

Test report by @boegel
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
node3159.skitty.os - Linux RHEL 8.8, x86_64, Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz (skylake_avx512), Python 3.6.8
See https://gist.github.com/boegel/f1fb1b85ffbf8bbb0fe7b39729a242a9 for a full test report.

Copy link
Member

@boegel boegel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@boegel
Copy link
Member

boegel commented Oct 11, 2023

Going in, thanks @Flamefire!

@boegel boegel merged commit 8ca0577 into easybuilders:develop Oct 11, 2023
5 checks passed
@Flamefire Flamefire deleted the 20230922152922_new_pr_GCCcore1220 branch October 11, 2023 11:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants