Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework validation for measurements and pending_experiments #456

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

Scienfitz
Copy link
Collaborator

@Scienfitz Scienfitz commented Jan 3, 2025

Fixes #453

Notes re measurements validation

  • Two utilities (one for parameters and one for targets) for input validation has been extracted. Additional validations for binary targets have been added. The utilities contain parts from add_measurements and fuzzy_row_match
  • As a result fuzzy_row_match does not perform any validation anymore
  • add_measurements now simply calls the utilities
  • tests for invalid parameter input have been extended

Notes re pending_experiments validation

  • The parameter input validation utility is now called as part of Bayesian recommenders, causing proper error messages instead of crypto and seemingly unrelated ones
  • A recommender based test for invalid values in pending_experiments has been added

Remaining problem:
The places of validation for measurements and pending_experiments are inconsistent. The former is done in the campaign and not in any recommender, the latter is not done in the campaign but done in the recommender. One of the issues is that numerical_measurements_must_be_within_tolerance is not available for recommenders, but its required for the parameter input validation.

Suggestion:
Imo we dont want to add more keywords to .recommend or so, hence the pending_experiments validation currently assumes numerical_measurements_must_be_within_tolerance=False and cannot be configured otherwise. To me it seems the simplest solution would be to completely get rid of any numerical_measurements_must_be_within_tolerance keywords and make it an environment variable, we wouldn't have to worry about its availability anymore and it would require adding it to more and more signatures.

@Scienfitz Scienfitz added the enhancement Expand / change existing functionality label Jan 3, 2025
@Scienfitz Scienfitz self-assigned this Jan 3, 2025
Copy link
Collaborator

@AdrianSosic AdrianSosic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Scienfitz, thanks the refactor 🏗️ Below my comments

baybe/utils/validation.py Outdated Show resolved Hide resolved
baybe/utils/validation.py Outdated Show resolved Hide resolved
baybe/utils/validation.py Outdated Show resolved Hide resolved
baybe/utils/validation.py Outdated Show resolved Hide resolved
baybe/utils/validation.py Outdated Show resolved Hide resolved
baybe/utils/validation.py Show resolved Hide resolved
tests/test_input_output.py Outdated Show resolved Hide resolved
tests/test_pending_experiments.py Outdated Show resolved Hide resolved
tests/test_pending_experiments.py Outdated Show resolved Hide resolved
tests/test_input_output.py Show resolved Hide resolved
@Scienfitz Scienfitz force-pushed the feature/exp_input_validation branch 3 times, most recently from 0a42492 to a412305 Compare January 6, 2025 12:36
Copy link
Collaborator

@AVHopp AVHopp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

baybe/utils/dataframe.py Show resolved Hide resolved
baybe/utils/dataframe.py Show resolved Hide resolved
baybe/telemetry.py Show resolved Hide resolved
@Scienfitz Scienfitz force-pushed the feature/exp_input_validation branch from 7056db6 to 969dea4 Compare January 14, 2025 13:20
@Scienfitz Scienfitz requested a review from AdrianSosic January 14, 2025 13:21
@Scienfitz
Copy link
Collaborator Author

Scienfitz commented Feb 6, 2025

After a pause I reanalzyed the outstanding issues here:

  1. Validation should be brought to both the campaign and recommenders. This will ensure proper valdiation for measurements and pending_experiments for users using / not using campaings. There needs to be a mechanism how a campaign can tell the recommender not to perform any validation, otherwise validation would be repeated (it could also an option to simply accept that).

  2. The flag numerical_measurements_must_be_within_tolerance is only available in .add_measurements but for validating pending_experiments in a Campaign it also needs to be available in .recommend

Solution Proposal:

  • numerical_measurements_must_be_within_tolerance must become an attribute of the Campaign again. In this manner its available for both recommend and add_measurements. I would prefer that over including it in the recommend signature or so
  • RecommenderProtocol.recommend should get an additional kwarg called numerical_measurements_must_be_within_tolerance. This could be of type bool | None. If its True/False then the recommender would just validate measurements / pending_experiments with the respective flag. If its None, validation would be skipped, conveniently solving 1.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Scienfitz, turning this into a threaded discussion for easier communication:

Regarding the two points you mentioned:

  1. I'm afraid the suggested solution with numerical_measurements_must_be_within_tolerance fixes the problem from the wrong angle, tbh. IMO, a clean solution requires fixing Refactoring out-of-range and tolerance mechanism #428 first. If you think about it: it really doesn't make much sense to impose any user restrictions on measurements and pending_measurements in terms of tolerances (yes: for our internal tracking of metadata, we can do whatever we think is reasonable) because the measurements are really already there / currently in progress. So what's even the point of validating them? That would be like User says: I've already collected some data but Baybe says: Nope! 😄 Validation that the data is compatible with the parameters in the first place is a different story, of course, but that's part 2 then. What do you think?

  2. Agree, we need a mechanism that the campaign can "inform" the recommender that data has already been validated. There are really a lot of different ways how we could do this, here just some of them, which can potentially also be combined:

    • Adding a (private) flag to Recommender.recommend
    • Wrapping the call inside campaign into a context manager
    • Use a separate validation manager (i.e. an object that keeps track of which objects have been already validated) to decouple it entirely from campaign/recommender
    • Use some clever form of dataframe wrapper

The advantage of the latter three would be that no API change is required. I don't have a clear opinion yet but right now I'd go with the wrapper or the context manager. And perhaps one comment because I can anticipate objections against the wrapper: Note that could be as slim as just subclassing pd.DataFrame into ValidatedDataframe, nothing else. So no API change involved – the recommender would only require do to a simple isinstance check. What do you think? Whatever we do, it'd be good if this stays completely behind the scenes in case we notice problems later.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. we certainly dont need to tangle issues more than they are already. Essentially you are opening an entirely new question does requiring measurements being inside allowed toelrances even make sense?. This PR does not aim at answering this, instead, for consistency I was assuming that recommenders must also allow this choice because the campaign already allows it. If the answer is no then the whole problem is greatly simplified and the calls inside recommenders just happens with numerical_measurements_must_be_within_tolerance=False always. If theres an update to that it can be changed again, again tis PR aims at fixing Pending experiments with CustomDiscreteParameter value not in search space gives cryptic error #453 (coming from disallowed values not related to tolerances) and not Refactoring out-of-range and tolerance mechanism #428 (dealing with disallowed values due to tolerances)

  2. Don't get the private flag? Do you mean an private attribute to the class or the proposed argument? Prefer the context manager out of the other variants but havent implemented yet (any hint or link how to do it?). Reject last two points as overkill.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Don't know your opinion yet, but I think the answer should be no. And therefore I think we don't even need to handle it in this PR + it should be fixed/removed in the Campaign in another PR to solve Refactoring out-of-range and tolerance mechanism #428.

  2. The "private flag" would be something like adding _validate_measurements to Recommender.recommend, which I don't like so much. Of course, one could also add it as a "regular" flag but this is really something the user should not need to bother and adding the flag in any form definitely adds some noise to the API. Below a rough sketch of how I could envision the other alternatives.

Wrapper

Even if you don't like the wrapper, it could be a quick temporary (internal) fix until we have the perhaps cleaner approach with the context manager set up. I would probably be as simple as:

class ValidatedDataframe(pd.DataFrame):
    pass

Then, at the point in the campaign where the validation happens, either the campaign or the validating function would need to set df.__class__ = ValidatedDataframe and then the recommender would only trigger validation in an if not isinstance(df, ValidatedDataframe) condition.

Context Manager

Potentially cleaner but requires some external settings mechanism to keep track of the state. This is definitely something that we should set up anyway for all sorts of other flags as well, like polars, single_precision etc. I have some ideas how to do this nicely in the long run, but for now a quick'n'dirty internal(!) solution would also do the job. Would look like

with settings.validate_user_data(False):
    recommender.recommend(...)

And then inside recommender you have the corresponding check reading the global flag maintained by the settings object.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PS: In the long run, the settings approach of course has additional benefits, e.g. when can easily integrate this with environment variables so that you can flexibly control the settings from within and outside python, regardless of the specific context you're in.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I see the wrapper is probably much better given we have no settings at the moment. Will try, ty.

Besides, I think we are talking about a situation where I want to know whether a df has already been validated. This does not really conform to the meaning of a setting or at least we then have to distinguish whether inputs are checked for validity from them being checked for numerical tolerance match

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AdrianSosic ok Ive added the wrapper suggestion in the last two commits + rebased the branch. It seems to work out of the box

@Scienfitz Scienfitz force-pushed the feature/exp_input_validation branch from f2b383f to 7c2b551 Compare February 7, 2025 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Expand / change existing functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pending experiments with CustomDiscreteParameter value not in search space gives cryptic error
3 participants