Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Downsampling of observed data for single-fidelity Bayesian optimization #607

Merged
merged 3 commits into from
Apr 4, 2023

Conversation

mseeger
Copy link
Collaborator

@mseeger mseeger commented Mar 22, 2023

We currently only limit the dataset size for multi-fidelity BO. This closes the gap. Has been asked for by a student working with Ralf Herbrich at HPI.


By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@mseeger mseeger requested review from wesk and wistuba and removed request for wesk March 22, 2023 08:44
@codecov
Copy link

codecov bot commented Mar 22, 2023

Codecov Report

Patch coverage: 72.64% and project coverage change: +0.18 🎉

Comparison is base (b4d4ed0) 64.98% compared to head (acb36d7) 65.17%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #607      +/-   ##
==========================================
+ Coverage   64.98%   65.17%   +0.18%     
==========================================
  Files         353      360       +7     
  Lines       26238    26424     +186     
==========================================
+ Hits        17051    17222     +171     
- Misses       9187     9202      +15     
Impacted Files Coverage Δ
benchmarking/commons/hpo_main_common.py 48.18% <ø> (ø)
benchmarking/nursery/test_bolimit/baselines.py 0.00% <0.00%> (ø)
...king/nursery/test_bolimit/benchmark_definitions.py 0.00% <0.00%> (ø)
benchmarking/nursery/test_bolimit/hpo_main.py 0.00% <0.00%> (ø)
benchmarking/nursery/test_bolimit/launch_remote.py 0.00% <0.00%> (ø)
.../bayesopt/models/subsample_state_multi_fidelity.py 95.86% <ø> (ø)
...optimizer/schedulers/searchers/gp_fifo_searcher.py 71.42% <ø> (ø)
...imizer/schedulers/searchers/gp_searcher_factory.py 82.84% <85.00%> (+0.22%) ⬆️
...bayesopt/models/subsample_state_single_fidelity.py 100.00% <100.00%> (ø)
tst/schedulers/bayesopt/test_subsample_state.py 98.32% <100.00%> (+0.19%) ⬆️

... and 33 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@mseeger mseeger changed the title Downsampling of observed data for single-fidelity Bayesian optimization feature: Downsampling of observed data for single-fidelity Bayesian optimization Mar 28, 2023
Copy link
Collaborator

@wistuba wistuba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly comments regarding documentation

@@ -197,6 +197,13 @@ full range of arguments. We list the most important ones:
each round. ``opt_maxiter`` is the maximum number of L-BFGS iterations. We
run ``opt_nstarts`` such optimizations from random starting points and pick
the best.
* ``max_size_data_for_model``, ``max_size_top_fraction``: GP computations scale
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

based on this description, it is not clear to me what max_size_top_fraction is for

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I add something

@@ -0,0 +1,2 @@
syne-tune[gpsearchers,blackbox-repository,aws]
tqdm
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is the dependency tqdm coming from?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tqdm is not among the Syne Tune dependencies, but it is needed for benchmarking.

is fit to, via ``max_size_data_for_model`` in ``search_options``. If the data
is larger, it is downsampled to this size. Sampling is controlled by another
argument ``max_size_top_fraction``. Namely, this fraction of entries in the
downsampled set are filled by the best points in the full set, while the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"best points" refers to configurations with highest/lowest metric?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I add a comment

Comment on lines 189 to 190
the dataset gets large. You can use ``opt_skip_init_length``,
``opt_skip_period`` to this end.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure whether it is discussed somewhere else but at this point it is unclear how to set them (bool, None, 0?) or what exactly they refer to.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a link

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: multifid/singlefid -> multi_fidelity. I prefer full names. abbreviations might not be easy to understand for everyone. it might also lead to inconsistent way of abbreviating things. this is more a suggestion.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@wistuba wistuba self-requested a review April 4, 2023 09:44
@mseeger mseeger merged commit d334ae2 into main Apr 4, 2023
@mseeger mseeger deleted the bo_limit branch April 4, 2023 13:02
@wesk wesk added the feature label Apr 11, 2023
@wesk wesk changed the title feature: Downsampling of observed data for single-fidelity Bayesian optimization feat: Downsampling of observed data for single-fidelity Bayesian optimization Apr 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants