-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce CI jobs. #236
Reduce CI jobs. #236
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notes: Look at the conda-python-tests.yaml
file for a cleaner diff. This one has the same matrix changes but also added DEPENDENCIES
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notes:
- I dropped the ARM nightly job for CUDA 11.4. This should be adequately covered by amd64 tests with CUDA 11.4.
- Added DEPENDENCIES to match Python matrices
- I dropped one amd64 nightly job for CUDA 11.4 since we had two. That felt unnecessary.
- I added a test for ARM with the latest CUDA version. I think we need some test of CUDA-latest on both ARM and amd64. However this means there are no CUDA 12.2 jobs... would we rather change the CUDA 12.0 to 12.2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this all seems reasonable. Would expect ARM users to more often be on newer hardware and thus have newer dependencies (especially CTK). So this seems like the right direction to shift resources
Edit: Personally I would lean towards on ARM on CUDA 12.2 and x86_64 on CUDA 12.0. Some things (like libcufile
) weren't available on ARM until CUDA 12.2 and would like for us to test those
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I would lean towards on ARM on CUDA 12.2 and x86_64 on CUDA 12.0. Some things (like
libcufile
) weren't available on ARM until CUDA 12.2 and would like for us to test those
Seconded
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated this to use more CUDA 12.2 in ARM nightlies. For some PR tests I left ARM on CUDA 12.0 where we only have three PR jobs (two x86 and one ARM). I want to cover (11.8, 12.0, 12.5), which is our usual testing strategy of prioritizing CUDA (last major latest minor, current major earliest minor, current major latest minor).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notes:
- Previously we had no tests of wheels on Rocky Linux 8. That is important to cover because our wheels are specifically built for glibc 2.28 as a lower bound, which corresponds to Rocky Linux 8.
- I cut the nightly amd64 jobs quite a bit. It seems like 6 jobs is overkill.
- I deleted a nightly ARM job for CUDA 11.8 because there were two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding wheels that was likely an oversight when moving from CentOS 7 to Rocky Linux 8. Agree we should cover them on images where the GLIBC allows
Generally agree with the cuts. Though would shift things a bit amongst those jobs. Will comment below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding wheels that was likely an oversight when moving from CentOS 7 to Rocky Linux 8. Agree we should cover them on images where the GLIBC allows
Generally agree with the cuts. Though would shift things a bit amongst those jobs. Will comment below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Bradley! 🙏
Would change some Python 3.10 jobs to 3.11. Soon we will be adding Python 3.12 and need to add those somewhere. So would rather have Python 3.11 in a few more spots to make it easy to swap some of those for Python 3.12
Co-authored-by: jakirkham <jakirkham@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use more CUDA 12.2 jobs on ARM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with the changes. Don't have super strong feelings about the exact variable combinations in each job, so if everyone else's thoughts have been addressed this looks good to me
I'm going to run a test PR on cudf with this branch, then merge if that looks good. Test PR here: rapidsai/cudf#16646 |
Things look fine downstream! Merging. |
We want to reduce CI times, which means we need to reduce the job queue.
Currently we are limited by the number of available runners for
linux-amd64-gpu-v100-latest
during working/daytime hours (Pacific) and limited bylinux-arm64-gpu-a100-latest
during nightlies (which are daytime working hours for EU/APAC teams).I also added
DEPENDENCIES: ['oldest', 'latest']
to C++ test jobs. While we don't have a specific use case for that yet, it is good to keep our matrix defined the same way for all test jobs (conda C++ and conda Python and wheels).Related to rapidsai/build-planning#95.