-
-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update to UCX v1.14.0 and configure/build for release #111
Conversation
Hi! This is the friendly automated conda-forge-linting service. I just wanted to let you know that I linted all conda-recipes in your PR ( |
@conda-forge-admin, please rerender |
Hi! This is the friendly automated conda-forge-webservice. I tried to rerender for you, but it looks like there was nothing to do. This message was generated by GitHub actions workflow run https://github.com/conda-forge/ucx-split-feedstock/actions/runs/3267176097. |
Thanks, Peter! LGTM but I'd like to have @jakirkham or someone else to take another look 🙂 |
Does this result in ABI changes? In particular were there debug symbols that would have been linked against before that are now removed? |
Hi! This is the friendly automated conda-forge-linting service. I just wanted to let you know that I linted all conda-recipes in your PR ( |
As per our previous discussion, we wanted to wait for the change to build for release until the new UCX release was out, now v1.14.0 has been released and I've updated this PR to use v1.14.0. @jakirkham @leofang would you mind taking a look? |
It seems that the CentOS 6 builds fail with UCX 1.14 because ucx-split-feedstock/recipe/install_ucx.sh Line 11 in dd05d5a
--with-verbs entirely until we can guarantee rdma-core v28 or higher is available as part of the conda-forge packaging process.
|
To be clear, MOFED 5.x is now the only tested UCX version, I know MOFED 4.x was still "supported" by UCX until last year or so but not actively tested, thus I was constantly finding bugs. As a result of that we decided to stop supporting MOFED 4.x entirely for RAPIDS, both internally at NVIDIA and for customers. I would recommend the same be done here, as MOFED 4.x support (rdma-core < v28) is going to be hit or miss. |
recipe/meta.yaml
Outdated
|
||
build: | ||
skip: true # [not linux] | ||
number: 0 | ||
number: 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're bumping the version we don't need to bump the build number
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was an oversight, this PR was only changing to release with no bump on the UCX version, so this was left by accident. We also have the number
variable above, is there a reason we can't use it here @jakirkham ? It is currently only used by outputs.name=ucx.build.number
, wondering whether we can use for both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 50a5dc2 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we should use the top-level number
and template it everywhere else
A separate question would be other we want to backport the CUDA 12 fixes ourselves ( #116 (comment) ) to ensure they are in 1.14.0 |
Now that there's a virtual package plugin system for conda we in theory could have a virtual package for MOFED version and constrain UCX packages based on that MOFED version if desired. Is there a path forward for an updated rdma-core CDT or are we limited to CentOS 7 for CDTs? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Peter! 🙏
Looks good. One minor suggestion below.
Co-authored-by: jakirkham <jakirkham@gmail.com>
Thanks Peter! 🙏 LGTM Will let this sit for a bit to see if anyone else has feedback. Planning on merging EOD tomorrow if no comments |
Thanks Peter for the PR and Keith for help reviewing! 🙏 |
conda-forge/ucx-split-feedstock#111 was merged and UCX v1.14.0 is now available on conda-forge which we should support. Authors: - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - Ray Douglass (https://github.com/raydouglass) URL: #935
UCX 1.14.0 was recently released and conda-forge package was updated in conda-forge/ucx-split-feedstock#111 with several packaging improvements. Relax the pin to allow installing UCX v1.14.x as well. Authors: - Peter Andreas Entschev (https://github.com/pentschev) - AJ Schmidt (https://github.com/ajschmidt8) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - AJ Schmidt (https://github.com/ajschmidt8) URL: #1366
requirements: | ||
build: | ||
- {{ compiler("c") }} | ||
- {{ compiler("cxx") }} | ||
- {{ compiler("cuda") }} # [cuda_compiler_version != "None"] | ||
- {{ cdt("libnl") }} # [cdt_name == "cos6"] | ||
- {{ cdt("libnl3") }} # [cdt_name == "cos7"] | ||
- {{ cdt("libibcm-devel") }} # [cdt_name == "cos6"] | ||
- {{ cdt("libibverbs-devel") }} # [cdt_name == "cos6"] | ||
- {{ cdt("librdmacm-devel") }} # [cdt_name == "cos6"] | ||
- {{ cdt("numactl-devel") }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noticed we still have this CDT. Though we added libnuma
below. Should we drop the CDT then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressing in PR ( #118 )
Contributes to rapidsai/build-planning#142 Proposes removing the exclusion for `ucx-proc` in the CI job that checks that all projects have produced recent-enough nightlies. See that linked issue for much more detail, but in short: * there haven't been new `ucx-proc` packages for 2+ years, since conda-forge/ucx-split-feedstock#111 * RAPIDS can safely drop its `ucx-proc` dependency, and other PRs linked to rapidsai/build-planning#142 have done that for the 25.02 release * `ucx-proc` should not show up in RAPIDS 25.02 environment solves, and so we don't need to carry an exclusion for it in the testing config Authors: - James Lamb (https://github.com/jameslamb) Approvers: - https://github.com/jakirkham - Bradley Dice (https://github.com/bdice) URL: #742
Checklist
0
(if the version changed)conda-smithy
(Use the phrase@conda-forge-admin, please rerender
in a comment in this PR for automated rerendering)Changes:
libnuma
dependency from conda-forgeCloses #26
Closes #66
Closes #102
Closes #110
Closes #112
Closes #115