Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support post-release versions, publish v1.15.0.post1 #5

Merged
merged 2 commits into from
May 9, 2024

Conversation

jameslamb
Copy link
Member

Contributes to rapidsai/build-planning#57.

libucx.load_library() defined here tries to pre-load libcuda.so and libnvidia-ml.so, to raise an informative error (instead of a cryptic one from a linker) if someone attempts to use the libraries from this wheel on a system without a GPU.

Some of the projects using these wheels, like ucxx and ucx-py, are expected to be usable on systems without a GPU. See rapidsai/ucx-py#1041 (comment).

To avoid those libraries needing to try-catch these errors, this proposes the following:

  • removing those checks and deferring to downstream libraries to handle the non-GPU case
  • modifying the build logic so we can publish patched versions of these wheels like v1.15.0.post1

Notes for Reviewers

Proposing starting with 1.15.0.post1 right away, since that's the version that ucx-py will use. I'm proposing the following sequence of PRs here (assuming downstream testing goes well):

  1. this one
  2. another changing the version to 1.14.0.post1
  3. another changing the version to 1.16.0.post1

@jameslamb jameslamb requested a review from a team as a code owner May 9, 2024 17:01
python/libucx/setup.py Outdated Show resolved Hide resolved
Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once this merges, @pentschev or @jameslamb could you test installing a ucxx wheel along with this specific version of libucx (the ucxx wheels will allow using 1.15.0 at runtime right now I believe) to ensure that things work the way we want on both CPU-only and GPU-enabled machines?

@jameslamb
Copy link
Member Author

Once this merges, @pentschev or @jameslamb could you test installing a ucxx wheel along with this specific version of libucx (the ucxx wheels will allow using 1.15.0 at runtime right now I believe) to ensure that things work the way we want on both CPU-only and GPU-enabled machines?

Yes I can do this.

@jameslamb jameslamb requested a review from vyasr May 9, 2024 18:05
@jameslamb
Copy link
Member Author

CI is stuck waiting for arm64 runners. Once those run and (hopefully) pass, I'll merge this and test with ucxx + these new wheels.

Copy link
Member

@pentschev pentschev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jameslamb !

@vyasr vyasr merged commit a0bf56f into rapidsai:main May 9, 2024
12 checks passed
@jameslamb jameslamb deleted the support-post-versions branch May 9, 2024 21:45
@jameslamb
Copy link
Member Author

@vyasr @pentschev the CI pipeline from this PR has been stuck waiting for a runner for 2+ hours (build link), so the new wheels aren't up on the nightly index yet (https://pypi.anaconda.org/rapidsai-wheels-nightly/simple/libucx-cu12/).

To get around that, I did a minimal test of ucxx by downloading the new wheels from S3 + running https://github.com/rapidsai/ucxx/blob/branch-0.38/python/examples/basic.py.

#!/bin/bash

set -e -u -o pipefail

# check if there's a GPU attached
nvidia-smi || true

# download ucx wheel with the fixes
rm -f ./ucx-wheels-cu12.tar.gz
wget \
    -O ucx-wheels-cu12.tar.gz \
    https://downloads.rapids.ai/ci/ucx-wheels/branch/main/a0bf56f/ucx-wheels_wheel_cpp_ucx_cu12_x86_64.tar.gz

mkdir -p /tmp/delete-me/ucx-wheels/
tar \
    -xvzf ./ucx-wheels-cu12.tar.gz \
    -C /tmp/delete-me/ucx-wheels/

# install it
pip install /tmp/delete-me/ucx-wheels/*.whl

# Install the latest `ucxx` wheel.
pip install 'ucxx-cu12==0.38.*,>=0.0.0a0'

# try importing ucxx and libucx
python -c "import ucxx._lib.libucxx as ucx_api"
python -c "import libucx; libucx.load_library()"

# try running the example
python ./python/examples/basic.py

Ran it with and without a GPU visible to the processes.

# with GPU
docker run \
    --rm \
    --gpus 1 \
    -v $(pwd):/opt/work \
    -w /opt/work \
    -it rapidsai/citestwheel:cuda12.2.2-ubuntu22.04-py3.10 \
    bash ./test.sh

# no GPU
docker run \
    --rm \
    -v $(pwd):/opt/work \
    -w /opt/work \
    -it rapidsai/citestwheel:cuda12.2.2-ubuntu22.04-py3.10 \
    bash ./test.sh

Saw it succeed (after applying the modifications from rapidsai/ucxx#229) on both.

I think that's enough evidence to move forward with publishing the other versions (1.14.0.post1 and 1.16.0.post1). But to be sure, tomorrow I'll try with the 1.15.0.post1 wheels in CI for rapidsai/ucx-py#1041.

@jameslamb jameslamb added improvement Improves an existing functionality non-breaking Introduces a non-breaking change bug Something isn't working and removed improvement Improves an existing functionality labels May 10, 2024
@jameslamb jameslamb mentioned this pull request May 10, 2024
@jameslamb jameslamb changed the title support post-release versions, public v1.15.0.post1 support post-release versions, publish v1.15.0.post1 May 10, 2024
rapids-bot bot pushed a commit to rapidsai/ucxx that referenced this pull request May 10, 2024
Running the example at `python/examples/basic.py` results in the following

```text
[1715298582.923398] [f059c9da14bb:1    :0]          parser.c:2033 UCX  WARN  unused environment variable: UCX_MEMTYPE_CACHE (maybe: UCX_MEMTYPE_CACHE?)
[1715298582.923398] [f059c9da14bb:1    :0]          parser.c:2033 UCX  WARN  (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning)
Traceback (most recent call last):
  File "/opt/work/./python/examples/basic.py", line 259, in <module>
    main()
  File "/opt/work/./python/examples/basic.py", line 227, in main
    listener_ep.tag_send(Array(send_bufs[0]), tag=ucx_api.UCXTag(0)),
AttributeError: module 'ucxx._lib.libucxx' has no attribute 'UCXTag'. Did you mean: 'UCXXTag'?
```

It looks to me like that suggestion is right, and that that class is callsed `UCXXTag`:

https://github.com/rapidsai/ucxx/blob/2195ceabf35b404b3ae6b09f784d1d312b4b6fce/python/ucxx/_lib/tag.pyx#L8

This PR proposes the following:

* fixing that reference
* adding a print statement at the end so that you know the example reached the end successfully without having to inspect the exit code of the process

## Notes for Reviewers

I found this because I was using this example to smoke test changes to the new ucx wheels: rapidsai/ucx-wheels#5

### How I tested this

```shell
docker run \
    --rm \
    --gpus 1 \
    -v $(pwd):/opt/work \
    -w /opt/work \
    -it rapidsai/citestwheel:cuda12.2.2-ubuntu22.04-py3.10 \
    pip install 'ucxx-cu12==0.38.*,>=0.0.0a0' && python ./python/examples/basic.py
```

Authors:
  - James Lamb (https://github.com/jameslamb)
  - Peter Andreas Entschev (https://github.com/pentschev)

Approvers:
  - Peter Andreas Entschev (https://github.com/pentschev)

URL: #229
@jameslamb
Copy link
Member Author

these wheels are working with ucx-py builds in both GPU and non-GPU environments 🎉

rapidsai/ucx-py#1041

@vyasr
Copy link
Contributor

vyasr commented May 10, 2024

Awesome!

vyasr pushed a commit that referenced this pull request May 10, 2024
Follow-up to #5.

Proposes publishing a `1.14.1.post1` version, identical to version
`1.14.1` except that `load_library()` will no longer raise exceptions in
non-GPU environments.

## Notes for Reviewers

Just putting this up to get in the CI run. Should probably wait to merge
it until testing on rapidsai/ucx-py#1041 is
done.
@jameslamb jameslamb mentioned this pull request May 10, 2024
vyasr pushed a commit that referenced this pull request May 10, 2024
Follow-up to #5.
Similar to #6.

Proposes publishing a `1.16.0.post1` version, identical to version
`1.16.0` except that `load_library()` will no longer raise exceptions in
non-GPU environments.

## Notes for Reviewers

Checked
https://pypi.anaconda.org/rapidsai-wheels-nightly/simple/libucx-cu12/ to
confirm that it's exactly `1.16.0` that we want to do a post-release
for.
@jameslamb jameslamb mentioned this pull request May 10, 2024
vyasr pushed a commit that referenced this pull request May 15, 2024
Proposes adding some minimal `pre-commit` checks and a CI job to run
them.

I think this is a cheap, low-risk way to get a bit more release
confidence in changes like #5.

## Notes for Reviewers

This includes a very pared-down version of
https://github.com/rapidsai/shared-workflows/blob/branch-24.06/.github/workflows/checks.yaml.
I'm proposing not depending on `shared-workflows` because this repo
doesn't follow the RAPIDS branching model, and because right now the
need is so simply (just running `pre-commit run --all-files`).

---------

Co-authored-by: Kyle Edwards <kyedwards@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working non-breaking Introduces a non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants