-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update UCX to version 1.12.1 #7809
Update UCX to version 1.12.1 #7809
Conversation
A new Pull Request was created by @fwyzard (Andrea Bocci) for branch IB/CMSSW_12_4_X/master. @cmsbuild, @smuzaffar, @aandvalenzuela, @iarspider can you please review it and eventually sign? Thanks. |
@smuzaffar do I need to add |
34e7699
to
bc787d8
Compare
please test |
Pull request #7809 was updated. |
This PR includes #7795 to ease testing. It can be rebased once that is merged. |
-1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-ee27b4/24064/summary.html External BuildI found compilation error when building: Requested to quit. * The action "build-external+ucx+1.12.1-871f2c8f3832a729236a3a4b83fb7b49" was not completed successfully because Failed to build ucx. Log file in /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/slc7_amd64_gcc10/external/ucx/1.12.1-871f2c8f3832a729236a3a4b83fb7b49/log. Final lines of the log file: 67 | ret = gdr_copy_from_bar(buffer, (void *)remote_addr, length); | ^~~~~~~~~~~~~~~~~ | gdr_copy_from_mapping rocm_gdr_ep.c:67:15: error: nested extern declaration of 'gdr_copy_from_bar' [-Werror=nested-externs] cc1: all warnings being treated as errors make[4]: *** [libuct_rocm_gdr_la-rocm_gdr_ep.lo] Error 1 make[4]: *** Waiting for unfinished jobs.... make[4]: Leaving directory `/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/slc7_amd64_gcc10/external/ucx/1.12.1-871f2c8f3832a729236a3a4b83fb7b49/ucx-1.12.1/src/uct/rocm/gdr' make[3]: *** [all-recursive] Error 1 |
No, there is no need to explicitly add it in cmssw-tool-conf. |
bc787d8
to
fb64181
Compare
Pull request #7809 was updated. |
fb64181
to
ef20ef1
Compare
please test |
Pull request #7809 was updated. |
-1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-ee27b4/24079/summary.html External BuildI found compilation error when building: + sed '-e/SUBDIRS/s/ *\//' -i src/uct/rocm/Makefile.am + sed '-e/src\/uct\/rocm\/gdr\/configure\.m4/d' -i src/uct/rocm/configure.m4 + rm -rf src/uct/rocm/gdr + ./autogen.sh /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/tmp/rpm-tmp.Saz1OQ: line 48: ./autogen.sh: No such file or directory error: Bad exit status from /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/tmp/rpm-tmp.Saz1OQ (%prep) RPM build errors: Bad exit status from /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/tmp/rpm-tmp.Saz1OQ (%prep) |
ef20ef1
to
2c12a58
Compare
Pull request #7809 was updated. |
Requires: rdma-core | ||
Requires: rocm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fwyzard , this will package rocm
for all archs. I think we should include and configure rocm
only for x86_64 archs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right, of course.
Add the xpmem library from the HEAD of the master branch as of 2022.03.08, corresponding to the commit 61c39efdea943ac863037d7e35b236145904e64d. Based on xpmem v2.6.3 with updates for Linux kernel up to 5.17.
Enable additional libraries in UCX: - enable the use of xpmem for intra-node communication; - enable the use of ROCm for AMD gpus (only for x86_64); - remove the ROCm GDR module, which is not compatible with GDRCopy v2.x. Update UCX to version 1.12.1: - change the default for UCX_MEM_CUDA_HOOK_MODE from "reloc" to "bistro"; - various bug fixes for CUDA and ROCm; - see https://github.com/openucx/ucx/releases/tag/v1.12.1 for the full change log.
539a478
to
431af70
Compare
please test |
Pull request #7809 was updated. |
@cmsbuild, please test for el8_amd64_gcc10 |
@cmsbuild, please test for el9_amd64_gcc11 |
@cmsbuild, please test for el8_ppc64le_gcc10 |
@cmsbuild, please test for el8_aarch64_gcc10 |
-1 Failed Tests: UnitTests The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:
You can see more details here: Unit TestsI found errors in the following unit tests: ---> test TestFWCoreServicesDriver had ERRORS ---> test testFWCoreUtilities had ERRORS ---> test DRNTest had ERRORS |
please test with cms-sw/cms-bot#1751 |
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-ee27b4/24114/summary.html The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:
You can see more details here: Comparison Summary@slava77 comparisons for the following workflows were not done due to missing matrix map:
Summary:
|
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-ee27b4/24128/summary.html The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:
You can see more details here: Comparison Summary@slava77 comparisons for the following workflows were not done due to missing matrix map:
Summary:
|
+externals |
1 similar comment
+externals |
This pull request is fully signed and it will be integrated in one of the next IB/CMSSW_12_4_X/master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2) |
Add the xpmem library from the HEAD of the master branch as of 2022.03.08, corresponding to the commit 61c39efdea943ac863037d7e35b236145904e64d.
Based on v2.6.3 with updates for Linux kernel up to 5.17.
Enable additional libraries in UCX:
Update UCX to version 1.12.1: