-
Notifications
You must be signed in to change notification settings - Fork 415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Actions job to concatenate tutorials data to one CSV and run analysis notebook #1703
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1703 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 170 170
Lines 14636 14636
=========================================
Hits 14636 14636 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
@esantorella has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@esantorella has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: ## Motivation We are using nbconvert to run tutorials. nbconvert is not really made for this use case, but papermill is, so we have some handwritten code than can be handled by papermill. With papermill, we can go a bit further and use SMOKE_TEST as a [parameter](https://papermill.readthedocs.io/en/latest/usage-parameterize.html) rather than an environment variable. That would make it easy for people to work with the tutorials as notebooks. Pull Request resolved: pytorch#1706 Test Plan: Ran tutorials locally and made sure smoke-test flag was getting used appropriately. ## Related pull requests Enabling papermill will make pytorch#1703, which automates running a notebook, a bit easier. Reviewed By: saitcakmak Differential Revision: D43631568 Pulled By: esantorella fbshipit-source-id: 66fbcca511beb9f46cc281c0ba74a27e4c86e46d
Summary: ## Motivation We are using nbconvert to run tutorials. nbconvert is not really made for this use case, but papermill is, so we have some handwritten code than can be handled by papermill. With papermill, we can go a bit further and use SMOKE_TEST as a [parameter](https://papermill.readthedocs.io/en/latest/usage-parameterize.html) rather than an environment variable. That would make it easy for people to work with the tutorials as notebooks. Pull Request resolved: #1706 Test Plan: Ran tutorials locally and made sure smoke-test flag was getting used appropriately. ## Related pull requests Enabling papermill will make #1703, which automates running a notebook, a bit easier. Reviewed By: saitcakmak Differential Revision: D43631568 Pulled By: esantorella fbshipit-source-id: e7bfeb68e221fff4f1633af8deb9a11d1ff1c0e6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One concern I have is that by repeatedly committing updated plots to the repo, the repo size will increase substantially over time (due to commit history). I know that this is an issue with plotly plots in Ax (which is why we force-push the gh-pages branch with no history) but I don't know how much of an issue this is with seaborn or matplotlib plots. If the NB size with the output plots is small, then I think this is ok. If it is something we'd measure in MBs, then this might become an issue over time.
Co-authored-by: Sait Cakmak <saitcakmak@outlook.com>
@esantorella has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Great point; these were pretty big (~6 MB). I cut it down to ~1 MB by switching to SVG graphics and rearranging the plots, which hopefully makes them easier to read too. |
So does this mean that this adds about ~1MB to the repo on every tutorials run? |
Yeah. I'm not thrilled with this setup, but I'm reluctant to abandon it since it seems to have already shown some interesting patterns, like a sudden ~13% increase in how much memory the average tutorial uses in smoke test mode: I want to look into alternatives ways of setting this up... |
@esantorella has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
suggestions from code review
@esantorella has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@esantorella merged this pull request in 7a04fec. |
Summary: ## Motivation #1695 and #1703 introduced logging of tutorials runtime and memory usage, and visualizing the results in [a notebook stored in the artifacts branch](https://github.com/pytorch/botorch/blob/artifacts/notebooks/tutorials_performance_tracking.ipynb). This information been occasionally helpful for checking whether a tutorials timeout stemmed from a pervasive slowdown, a method-specific issue, or random chance. However, it has not been used often, increases the size of the repository, and now has stopped updating and generated [a failure in the nightly cron](https://github.com/pytorch/botorch/actions/runs/8704134925/job/23882395249#step:12:132). ### Have you read the [Contributing Guidelines on pull requests](https://github.com/pytorch/botorch/blob/main/CONTRIBUTING.md#pull-requests)? Yes Pull Request resolved: #2298 Test Plan: [x] Run tutorials locally [ ] Make sure tutorials action passes on PR [ ] Nightly cron ## Related PRs #1695 , #1703 Reviewed By: saitcakmak Differential Revision: D56192232 Pulled By: esantorella fbshipit-source-id: 02b0c1c3702929ebbea2e2cb90e5669ac7040c44
After tutorials performance data is uploaded to artifacts branch, a job runs to concatenate tutorials data to one CSV and run analysis notebook. The notebook, minus cell outputs, and the script that runs it will be in the 'main' branch, while the data and notebook outputs stay in the 'analytics' branch, which continues to not need to stay up-to-date with 'main' since it doesn't have its own Python code.
Have you read the Contributing Guidelines on pull requests?
Yes
Test Plan
[x] Ran with manual actions run "on push" (smoke test): here, more recent actions run here.
[x] Auto-generated notebook should have nifty visualizations here
[x] Up-to-date dataset of all tutorials runs exists and looks OK here