Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Actions job to concatenate tutorials data to one CSV and run analysis notebook #1703

Closed
wants to merge 22 commits into from

Conversation

esantorella
Copy link
Member

@esantorella esantorella commented Feb 25, 2023

After tutorials performance data is uploaded to artifacts branch, a job runs to concatenate tutorials data to one CSV and run analysis notebook. The notebook, minus cell outputs, and the script that runs it will be in the 'main' branch, while the data and notebook outputs stay in the 'analytics' branch, which continues to not need to stay up-to-date with 'main' since it doesn't have its own Python code.

Have you read the Contributing Guidelines on pull requests?

Yes

Test Plan

[x] Ran with manual actions run "on push" (smoke test): here, more recent actions run here.
[x] Auto-generated notebook should have nifty visualizations here
[x] Up-to-date dataset of all tutorials runs exists and looks OK here

@esantorella esantorella self-assigned this Feb 25, 2023
@facebook-github-bot facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Feb 25, 2023
@codecov
Copy link

codecov bot commented Feb 25, 2023

Codecov Report

Merging #1703 (d63ede7) into main (2d87f90) will not change coverage.
The diff coverage is n/a.

❗ Current head d63ede7 differs from pull request most recent head e42fd75. Consider uploading reports for the commit e42fd75 to get more accurate results

@@            Coverage Diff            @@
##              main     #1703   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files          170       170           
  Lines        14636     14636           
=========================================
  Hits         14636     14636           

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@esantorella esantorella marked this pull request as ready for review February 27, 2023 20:02
@esantorella esantorella changed the title [WIP] Actions job to concatenate tutorials data to one CSV and run analysis notebook Actions job to concatenate tutorials data to one CSV and run analysis notebook Feb 27, 2023
@facebook-github-bot
Copy link
Contributor

@esantorella has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@esantorella has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

esantorella added a commit to esantorella/botorch that referenced this pull request Mar 9, 2023
Summary:
## Motivation

We are using nbconvert to run tutorials. nbconvert is not really made for this use case, but papermill is, so we have some handwritten code than can be handled by papermill. With papermill, we can go a bit further and use SMOKE_TEST as a [parameter](https://papermill.readthedocs.io/en/latest/usage-parameterize.html) rather than an environment variable. That would make it easy for people to work with the tutorials as notebooks.

Pull Request resolved: pytorch#1706

Test Plan:
Ran tutorials locally and made sure smoke-test flag was getting used appropriately.

## Related pull requests

Enabling papermill will make pytorch#1703, which automates running a notebook, a bit easier.

Reviewed By: saitcakmak

Differential Revision: D43631568

Pulled By: esantorella

fbshipit-source-id: 66fbcca511beb9f46cc281c0ba74a27e4c86e46d
facebook-github-bot pushed a commit that referenced this pull request Mar 9, 2023
Summary:
## Motivation

We are using nbconvert to run tutorials. nbconvert is not really made for this use case, but papermill is, so we have some handwritten code than can be handled by papermill. With papermill, we can go a bit further and use SMOKE_TEST as a [parameter](https://papermill.readthedocs.io/en/latest/usage-parameterize.html) rather than an environment variable. That would make it easy for people to work with the tutorials as notebooks.

Pull Request resolved: #1706

Test Plan:
Ran tutorials locally and made sure smoke-test flag was getting used appropriately.

## Related pull requests

Enabling papermill will make #1703, which automates running a notebook, a bit easier.

Reviewed By: saitcakmak

Differential Revision: D43631568

Pulled By: esantorella

fbshipit-source-id: e7bfeb68e221fff4f1633af8deb9a11d1ff1c0e6
Copy link
Contributor

@saitcakmak saitcakmak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One concern I have is that by repeatedly committing updated plots to the repo, the repo size will increase substantially over time (due to commit history). I know that this is an issue with plotly plots in Ax (which is why we force-push the gh-pages branch with no history) but I don't know how much of an issue this is with seaborn or matplotlib plots. If the NB size with the output plots is small, then I think this is ok. If it is something we'd measure in MBs, then this might become an issue over time.

esantorella and others added 2 commits March 10, 2023 17:24
@facebook-github-bot
Copy link
Contributor

@esantorella has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@esantorella
Copy link
Member Author

One concern I have is that by repeatedly committing updated plots to the repo, the repo size will increase substantially over time (due to commit history). I know that this is an issue with plotly plots in Ax (which is why we force-push the gh-pages branch with no history) but I don't know how much of an issue this is with seaborn or matplotlib plots. If the NB size with the output plots is small, then I think this is ok. If it is something we'd measure in MBs, then this might become an issue over time.

Great point; these were pretty big (~6 MB). I cut it down to ~1 MB by switching to SVG graphics and rearranging the plots, which hopefully makes them easier to read too.

@Balandat
Copy link
Contributor

So does this mean that this adds about ~1MB to the repo on every tutorials run?

@esantorella
Copy link
Member Author

esantorella commented Mar 13, 2023

So does this mean that this adds about ~1MB to the repo on every tutorials run?

Yeah. I'm not thrilled with this setup, but I'm reluctant to abandon it since it seems to have already shown some interesting patterns, like a sudden ~13% increase in how much memory the average tutorial uses in smoke test mode:

image

I want to look into alternatives ways of setting this up...

@facebook-github-bot
Copy link
Contributor

@esantorella has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@esantorella has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@esantorella merged this pull request in 7a04fec.

@Balandat Balandat deleted the tutorials_analytics branch March 24, 2023 14:55
@Balandat Balandat restored the tutorials_analytics branch March 24, 2023 14:55
@Balandat Balandat deleted the tutorials_analytics branch April 15, 2023 19:45
esantorella added a commit to esantorella/botorch that referenced this pull request Apr 16, 2024
facebook-github-bot pushed a commit that referenced this pull request Apr 16, 2024
Summary:
## Motivation

#1695 and #1703 introduced logging of tutorials runtime and memory usage, and visualizing the results in [a notebook stored in the artifacts branch](https://github.com/pytorch/botorch/blob/artifacts/notebooks/tutorials_performance_tracking.ipynb). This information been occasionally helpful for checking whether a tutorials timeout stemmed from a pervasive slowdown, a method-specific issue, or random chance. However, it has not been used often, increases the size of the repository, and now has stopped updating and generated [a failure in the nightly cron](https://github.com/pytorch/botorch/actions/runs/8704134925/job/23882395249#step:12:132).

### Have you read the [Contributing Guidelines on pull requests](https://github.com/pytorch/botorch/blob/main/CONTRIBUTING.md#pull-requests)?

Yes

Pull Request resolved: #2298

Test Plan:
[x] Run tutorials locally
[ ] Make sure tutorials action passes on PR
[ ] Nightly cron

## Related PRs
 #1695 , #1703

Reviewed By: saitcakmak

Differential Revision: D56192232

Pulled By: esantorella

fbshipit-source-id: 02b0c1c3702929ebbea2e2cb90e5669ac7040c44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed Do not delete this pull request or issue due to inactivity. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants