-
Notifications
You must be signed in to change notification settings - Fork 646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI: fix workflows #1035
base: main
Are you sure you want to change the base?
CI: fix workflows #1035
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first glance, this basically reverts a lot of Rickard's work - could you explain why the changes are needed? There also seems to be a bunch of changes that aren't really "fix regression"..?
.github/workflows/python-package.yml
Outdated
push: | ||
branches: [ "main" ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
push
needs to be done for specific branches normally. (reverted)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having no branch filter here means CI gets run on pushes that don't have a PR too, so you can test things out in your own branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding branch filters etc, feel free to suggest changes. The decisions were deliberate, but mine, maybe there are other considerations. My idea was that all PRs should be built here, but when creating forks you also get them built and validated. I was planning to add the concurrency part, I deliberately left it out due to the flux in pipelines and it was annoying to have the builds cancelled all the time, but I think it makes sense to add them
as I already mentioned, this is a quick and manual revert to fix the current regression state. |
The missing Wouldn't it be better to just fix things for the better instead of a "quick revert" that someone will need to fix again later? Since there's a lot of activity in this repo just now, and we're working in a state where we don't really have the luxury to actually properly test anything, I think we should accept the fact that |
Could you maybe look into other options than |
8d8747f
to
ece67ae
Compare
.github/workflows/python-package.yml
Outdated
arch: aarch64 | ||
os: [windows-latest] | ||
arch: [x86_64] | ||
cuda-version: ['11.8.0', '12.1.1'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I deliberately skipped cuda 11 from pipelines not to make them slower before the dust had settled. The main branch is very active at the moment. I don't see a problem adding it, but we must realize that we burn CPU minutes quite fast now. I did not consider this a regression since Python packages were published outside of pipelines before (and still are, we only build them).
Some comment from me, possibly #949 was merged too quick without aligning on the scope. Sorry for not realizing this! May I suggest discussing improvements in the RFCs, maybe in #1031? I would argue the easiest way to manage CUDA dependencies is through Docker. What is currently lacking is dependabot-managed update for the CUDA versions. I was actually planning to fix this in a follow up PR, it involves creating Dockerfiles for the CUDA images as this is what Dependabot can work wiith. We need to continuously monitor for updates as NVIDIA deprecates CUDA versions fast, and Dependabot is the realistic way to handle this. The problem here is Windows. While there seems to be some community-based Windows containers, AFAIK hosted runners won't run Windows containers. The community GitHub action does a decent job, but it's slow, owing to the fact that it runs the horribly slow CUDA installer. Some kind of image-based approach (just like Docker for the Linux builds) would be the best solution |
.github/workflows/python-package.yml
Outdated
# Check out dependencies code | ||
- uses: actions/checkout@v4 | ||
name: Check out NVidia cub | ||
with: | ||
repository: nvidia/cub | ||
ref: 1.11.0 | ||
path: dependencies/cub | ||
# Compile C++ code | ||
- name: Build C++ | ||
- name: Setup Mambaforge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still not convinced mamba is the way to go.
Also, there's a lot of duplication here w.r.t the Docker flow. What is the reason we need to change this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same, it seems overly complicated for not much obvious benefit to me. I like to be able to do something simple like pip install -e .[dev]
and use the cache on it.
8a5c804
to
95b1779
Compare
66fd141
to
42942b4
Compare
- fix custom command - fix *_OUTPUT_DIRECTORY
* fix to support cmake -B build option * add cuda 11.8, 12.1 * use python-version==3.10 for builds. * fix wheel names for aarch64 * drop artifact retention-days: it could be configured Settings->Actions->General. * make wheel with all available cuda versions. * use docker-run-actions * update docker image version
Hey @wkpark, @akx, @matthewdouglas, @rickardp, Yes agreed, #949 was merged a bit hasty, due to an misunderstanding and too quick trigger finger on my side. Sorry for that! And thanks everyone for the collective cleanup / hotfix actions that ensued and got everything back to working order. Based on your discussion above, I'm now not sure if this PR is still WIP or if you all agree that it's ready for review and merge? If not, what do we need to still implement or agree on in order to move forward? |
# fix wheel name | ||
if [ "${{ matrix.arch }}" = "aarch64" ]; then | ||
o=$(ls dist/*.whl) | ||
n=$(echo $o | sed 's@_x86_64@_aarch64@') | ||
[ "$n" != "$o" ] && mv $o $n | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a hack? Why is the wheel named x86_64... if it contains aarch64 material? IOW, are you sure it contains aarch64 material if it's named wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@akx So to me it looks like the lib is built on aarch64 via Docker, but this step of packaging the artifacts for a wheel is being executed on an x86-64 host always.
I agree it seems hacky. Maybe using something like wheel tags $o --platform-tag=manylinux2014_aarch64
will feel a little better.
TBH I thought this PR was abandoned and most of the stuff had been broken out to other PRs. I'll have another look as I am not completely sure what issues remain that this PR solves |
These are already on master I think
IMHO this is better to handle like #1052 so we can have Dependabot notifying when there are upgrade
@wkpark I think you mentioned this a while back, but I don't see why this is necessary. To me it seems that it only complicates path handling in the cmake files. Source files and output are where they are anyway and intermediates are already git ignored
CUDA can't be cross compiled but we can use native aarch64 agents when they become publically available
IMHO we should test on all versions we support. But there was an idea floated a while back (possibly by @akx) about splitting building wheels from testing them. The building of wheels is really quick so it won't speed anything up, but it will reduce the storage use. |
I suggest we make these separate PRs. I think retention days we want to set separately per artifact type as some artifacts (wheels) are more useful than others (.so files). But agreed these need tuning
Same here. But I don't know if the speed of the build tool will matter, as most of the time is spent in nvcc. And it's an additional moving part. But maybe we can isolate the change and see its impact? |
several PRs have been already extracted and merged. |
9b72679
to
7800734
Compare
fix regressions introduced by PR #949PR HOTFIX: Fix regression (cpu fix) #1038 included.(merged)misc workflows fixes
restored
restore some accidentally and not carefully merged but abandoned stuff.
-B build
option correctly.use conda+mamba method. disable cuda-toolkits. (the cuda-toolkits is too slow and not stable in some cases.)cmake
options. (setCOMPUTE_CAPABILITY
for example)fixed
cuda-toolkit seems too slow and even breaks sometimes. (Please see https://github.com/wkpark/bitsandbytes/actions/runs/7793281860/job/21252763742)fix docker usinguse docker-run-actionscontainer: image:
method.misc
retention-days:
it could be configured Settings->Actions->General.concurrency:
fix restored.fail-fast: false
restored.ilammy/msvc-dev-cmd@v1.13.0
instead ofmicrosoft/setup-msbuild
microsoft/setup-msbuild
is used for msbuild.ilammy/msvc-dev-cmd@v1.13.0
cmake -G Ninja
: ninja is more faster and popular. (triton use Ninja for example)COMPUTE_CAPABILITY
pull_request
, use conciseCOMPUTE_CAPABILITY=50;52;60;61;62;70;72;75;80;86;87;89;90
refs/tags/*
, use full capabilities,COMPUTE_CAPABILITY=61;75;86;89
(only popular GPUs included)refs/tags/*
aarch64 build issue
Build time using docker + aarch64 arch seems too slow. current total build time is about 30~40min.
COMPUTE_CAPABILITY
forpull_requests
outdated or resolved
CPU builds with aarch64 gcc/g++ are fine, but for cuda builds there is no benefit as docker itself does not have arm64 libraries installed and it will fail to link correctly.resolved.Some of @rickardp's changes, whether intended or not, maybe better done inpython-package.yml
. However, it causes some serious problems, and it is important to quickly fix things that were fine before, I am submitting this PR.P.S.: PR #1018 is not included.