-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Meta] Reduce CI runtimes #95
Comments
I updated the table above to reflect that we may also be able to optimize our usage of pytest for better performance (see e.g. rapidsai/cudf#16851). |
Some other thoughts to explore:
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
CI runtimes are increasingly becoming a bottleneck for development in RAPIDS. There are numerous reasons for this, including (but not limited to):
In the past, our primary focus has been in reducing the load on our GPU runners because those are in the shortest supply, which in turn has meant a focus on more carefully pruning the test matrix (since only test jobs require GPU runners). While this has helped alleviate pressure in the short term, it is clear that we need to take more expansive steps to address the problem in a more comprehensive way. Some notes that should guide some thinking:
This meta-issue aims to catalog a number of the different efforts we could undertake going forward. I have organized solutions into a few different classes.
Tooling
These improvements have a cost to implement, but once implemented will have only positive impacts since they do not involve making any compromises in testing coverage or frequency.
More judicious selection of jobs
These improvements have a cost to implement and will also have nonzero ongoing maintenance cost to ensure that test coverage remains correct. If implemented correctly, there will be no loss in coverage, but correct implementation will require some care.
Running more jobs only in nightlies
These are easy to implement, but without careful monitoring of nightly results could have significant costs if issues are only uncovered later.
Other
Miscellaneous other improvements that will help without being directly focused on improving build times.
--tb=native
therefore shaves off substantial time (10-20%) in total test suite runs since many of our repos have a large number of xfailed tests (Switch to using nativetraceback
cudf#16851). Similarly, in the past we've observed significant improvements by switching the mode by which pytest-xdist distributes to avoid idle workers. There may be other similar optimizations in our pytest usage to be considered.The text was updated successfully, but these errors were encountered: