Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix androd-perf workflow OOM #6109

Closed
wants to merge 1 commit into from
Closed

Fix androd-perf workflow OOM #6109

wants to merge 1 commit into from

Conversation

guangy10
Copy link
Contributor

ic4 and dl3 run OOM on 2xlarge runner.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 10, 2024
Copy link

pytorch-bot bot commented Oct 10, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/6109

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 3be677b with merge base df5b2ab (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot
Copy link
Contributor

@guangy10 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@guangy10
Copy link
Contributor Author

Let me make sure we have a way to reproduce the workflow issue before merging this fix.

The issue is if one of the model fails to expert, it will:

  1. The "Upload to S3" will run anyway and will fail. Expected behavior is that it should be cancelled or skipped.
  2. It will cancel ALL benchmark-on-device jobs. Expected behavior is to continue running benchmark jobs for other models.

cc: @huydhn

@guangy10 guangy10 temporarily deployed to upload-benchmark-results October 10, 2024 18:36 — with GitHub Actions Inactive
@guangy10
Copy link
Contributor Author

Okay, verified that we can reproduce the issue by just run with an supported model/backend, like this: https://github.com/pytorch/executorch/actions/runs/11280108348

@guangy10
Copy link
Contributor Author

Safe to merge now

@facebook-github-bot
Copy link
Contributor

@guangy10 merged this pull request in 5d12e5b.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants