-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[drci] Workflow file errors remain after they are retried #4969
Comments
AI: We need to handle the case where the failing workflow keep showing up on Dr.CI box even after they have been retried. Otherwise, people will need to force merge to land their changes AI: There are some errors that are flaky in the view of users, but is too critical for CI to be marked as flaky. If we see an infra failures, it's better to not ignore it and rerun the job instead. |
I think this is fixed by #5038 Leaving this open because the second AI item mentioned by Huy in the previous comment is not covered by this and I'm not sure if we still want to do it |
This issue has been fixed. There are some cases like pytorch/pytorch#123104 where the workflow path still appears in the job name used by mergebot, but it looks like a different issues pytorch/pytorch#122422 |
Dr CI does not handle workflow file errors very well after they are retried
🔗 Helpful Links
🧪 See artifacts and rendered test results at hud.pytorch.org/pr/120358
Note: Links to docs will display an error until the docs builds have been completed.
❗ 1 Merge Blocking SEVs
There is 1 active merge blocking SEVs. Please view them below:
If you must merge, use
@pytorchbot merge -f
.✅ You can merge normally! (5 Unrelated Failures)
As of commit c83eb30280626ff84d7a2950d200cb3257e16fb2 with merge base 8fa634070189a5567b7bb0ddf1f389d6e43bebe5 ():
FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
test_nn.py::TestNNDeviceTypeCPU::test_clip_grad_norm_foreach_False_norm_type_0_5_cpu
This comment was automatically generated by Dr. CI and updates every 15 minutes.
The text was updated successfully, but these errors were encountered: