-
Notifications
You must be signed in to change notification settings - Fork 572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EPIC: Issues with Trilinos PR testing 2022-08 #10858
Comments
As explained in #10896 (comment), there are not causes of PR failures, they are victims of a build failure that was also reported in those same PR builds (or not for the clang-10.0.0 builds using older CMake 3.17.1 as per #10893 (comment) :-( ). |
Even with all of the fixes in the linked issues above, we still have a large log jam in PR testing. For example, right now this issues query shows there are 8 PRs that have the Update: Here is the helpdesk issue: TRILINOSHD-186 |
FYI: I posted TRILINOSHD-187 to see if they can turn off auto-retest of failing PR builds. That is often a waste of computing resources as the developer has not even had a chance to look over the failed PR build results to see if rerunning the PR builds is likely to fix anything. |
As noted above, only one of two Jenkins shows:
So the PR tester logic seems to think that the other instance ( showing:
I think as a result, all of the 8 PRs with As a result, no PR builds have started since Aug 19, 2022 - 18:08 MDT shown here. This system does not seem to be very robust to instances of these drivers going down. |
Looking over the PR builds being skipped with the message:
at: I am noticing that all of the So if you want to get your PR to be tested, just open a new PR and make sure the PR number is even :-) |
Looking at the history for the broken PR tester driver: it seems that the Trilinos_autotester_driver_inst_1 last ran successfully on 'Aug 16, 2022, 2:03:02 PM'. After that, starting on 'Aug 17, 2022, 6:37:29 PM' for: that instance was broken and not running any PR builds. So it seems that the PR tester implementation for the last 4 days has been running only one set of PR builds at a time and then only testing even-numbered PRs :-) And to back this up, this CDash query shows that the last time an odd-numbered PR was tested was Aug 17, 2022 - 14:18 MDT with the build: After that, only even-numbered PRs were tested. And if you look at the current list of PRs with (NOTE: Those dates don't exactly match up so not sure how an odd-numbered PR was able to be tested after 'Aug 16, 2022, 2:03:02 PM' but the general trend of only even-number PRs getting tested and only odd-numbered PRs that still have |
As predicted above, the next job fired off at 7:16 AM MDT with: and it is running PR builds for the even-numbered PR #10912 showing:
(Jenkins must be showing times according to my browser so this must be 09:22:10 EDT, not MDT). |
Another problem with the PR builds: |
FYI: I think the major errors blocking PR builds from passing have been resolved. Now there are just a few random failures like #6861 that are still taking down PR testing iterations. |
FYI As (not) shown in this GitHub Issue Query there are no more PRs with the and And this GitHub Issue Query shows that there are just 8 open PRs with approved reviews and of those 7 have a failing PR builds status. Therefore, the logjam of PR builds is over. Yes, there are still a few sources of random failures that may trigger a failing PR testing iteration (i.e. #6861 and #10989) but everything else seems to have been addressed. I will now close this EPIC and remove the pin. |
There are a number of issues impacting Trilinos PR testing currently that are blocking many active PRs from passing the PR tester:
rhel7_sems-cuda-11.4.2-sems-gnu-10.1.0-sems-openmpi-4.0.5_release
builds starting 2022-07-20 #10847 ... Tests have been disabled for nowAlso of interest:
The text was updated successfully, but these errors were encountered: