Skip to content

[autorevert] implement autorevert and fix detection logic #6983

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

izaitsevfb
Copy link
Contributor

@izaitsevfb izaitsevfb commented Aug 8, 2025

Summary

  • Implemented revert detection/recording
  • Implemented failure-only rule matching in the autorevert detector to prevent “success” jobs with a classification label from contaminating pattern detection
  • Added a unit test

Bug Fixed

  • Cause: The detector previously matched on classification_rule regardless of
    job conclusion. Baseline commit 33ec6e3 had multiple “success” shards labele
    d with rule='pytest failure', which the detector misread as “older commit alre
    ady has the same failure,” suppressing the pattern for bbc0df1/4fd5fab.
  • Fix: Require conclusion == 'failure' wherever the detector compares rules (b
    oth for newer commit confirmation and older baseline exclusion). This prevents n
    oise from success+rule rows and correctly flags commit-caused failures like the
    ROCm case.

Testing

python -m pytorch_auto_revert autorevert-checker rocm --hours 82 --do-restart --dry-run
python -m pytorch_auto_revert autorevert-checker rocm --hours 82 --do-restart --dry-run
Fetching workflow data for 1 workflows since 2025-08-04T08:56:25.851470...
Found 161 commits with job data for workflow 'rocm'
✓ 3 AUTOREVERT PATTERNS DETECTED

Pattern #1:
Failure rule: 'pytest failure'
Recent commits with failure: bdb07a2b 8085edc8
Older commit without failure: 41081276
✗ NOT REVERTED: 8085edc8f9c98f670f585586b4286a942927537a was not reverted
  ⟳ DRY RUN: Would restart rocm for 8085edc8
  ⟳ DRY RUN: Would restart rocm for 41081276

Pattern #2:
Failure rule: 'pytest failure'
Recent commits with failure: 908c5cc4 b6c53383
Older commit without failure: 33ec6e3e
✗ NOT REVERTED: b6c53383fe2f29e6ed35430e90867dbeb8980d42 was not reverted
  ⟳ DRY RUN: Would restart rocm for b6c53383
  ⟳ DRY RUN: Would restart rocm for 33ec6e3e

Pattern #3:
Failure rule: 'pytest failure'
Recent commits with failure: 4fd5fabe bbc0df10
Older commit without failure: efc4b460
✓ REVERTED (nosignal): bbc0df1094b5a4dcd2cce83f8402127b07913231 was reverted by 41081276 after 18.5 hours

==================================================
SUMMARY STATISTICS
==================================================
Workflow(s): rocm
Timeframe: 82 hours
Commits checked: 161
Auto revert patterns detected: 3
Actual reverts inside auto revert patterns detected (precision): 1 (33.3%)
Total revert commits in period: 9

Revert categories:
  nosignal: 5 (55.6%)
  ignoredsignal: 2 (22.2%)
  ghfirst: 2 (22.2%)

Total reverts excluding ghfirst: 7
Reverts (excluding ghfirst) that dont match any auto revert pattern detected (recall): 6 (85.7%)
Per workflow precision:
  rocm: 1 reverts out of 3 patterns (33.3%) [excluding ghfirst: 1 (33.3%)]

Reverted patterns:
  - pytest failure: bbc0df10 (nosignal)

Restarted workflows: 4
  - rocm for 8085edc8
  - rocm for 41081276
  - rocm for b6c53383
  - rocm for 33ec6e3e

the actual culprit was correctly identified:

Pattern #7:
Failure rule: 'pytest failure'
Recent commits with failure: 4fd5fabe bbc0df10
Older commit without failure: efc4b460
✓ REVERTED (nosignal): bbc0df1094b5a4dcd2cce83f8402127b07913231 was reverted by 41081276 after 18.5 hours

there are multiple patterns detected, because the failure was jumping across workflows: rocm and rocm-mi300

@pytorch-bot pytorch-bot bot added the ci-no-td label Aug 8, 2025
Copy link

vercel bot commented Aug 8, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Updated (UTC)
torchci ⬜️ Ignored (Inspect) Visit Preview Aug 8, 2025 1:57am

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 8, 2025
@izaitsevfb izaitsevfb force-pushed the autorevert-do-autorevert branch from 2891ee8 to 1508f12 Compare August 8, 2025 01:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-no-td CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant