Rework validation for `measurements` and `pending_experiments` #456

Scienfitz · 2025-01-03T16:52:17Z

Fixes #453

Notes re measurements validation

Two utilities (one for parameters and one for targets) for input validation has been extracted. Additional validations for binary targets have been added. The utilities contain parts from add_measurements and fuzzy_row_match
As a result fuzzy_row_match does not perform any validation anymore
add_measurements now simply calls the utilities
tests for invalid parameter input have been extended

Notes re pending_experiments validation

The parameter input validation utility is now called as part of Bayesian recommenders, causing proper error messages instead of crypto and seemingly unrelated ones
A recommender based test for invalid values in pending_experiments has been added

Remaining problem:
The places of validation for measurements and pending_experiments are inconsistent. The former is done in the campaign and not in any recommender, the latter is not done in the campaign but done in the recommender. One of the issues is that numerical_measurements_must_be_within_tolerance is not available for recommenders, but its required for the parameter input validation.

Suggestion:
Imo we dont want to add more keywords to .recommend or so, hence the pending_experiments validation currently assumes numerical_measurements_must_be_within_tolerance=False and cannot be configured otherwise. To me it seems the simplest solution would be to completely get rid of any numerical_measurements_must_be_within_tolerance keywords and make it an environment variable, we wouldn't have to worry about its availability anymore and it would require adding it to more and more signatures.

AdrianSosic

Hi @Scienfitz, thanks the refactor 🏗️ Below my comments

baybe/utils/validation.py

tests/test_input_output.py

tests/test_pending_experiments.py

tests/test_input_output.py

AVHopp

LGTM

baybe/utils/dataframe.py

baybe/telemetry.py

Scienfitz · 2025-02-06T14:30:53Z

After a pause I reanalzyed the outstanding issues here:

Validation should be brought to both the campaign and recommenders. This will ensure proper valdiation for measurements and pending_experiments for users using / not using campaings. There needs to be a mechanism how a campaign can tell the recommender not to perform any validation, otherwise validation would be repeated (it could also an option to simply accept that).
The flag numerical_measurements_must_be_within_tolerance is only available in .add_measurements but for validating pending_experiments in a Campaign it also needs to be available in .recommend

Solution Proposal:

numerical_measurements_must_be_within_tolerance must become an attribute of the Campaign again. In this manner its available for both recommend and add_measurements. I would prefer that over including it in the recommend signature or so
RecommenderProtocol.recommend should get an additional kwarg called numerical_measurements_must_be_within_tolerance. This could be of type bool | None. If its True/False then the recommender would just validate measurements / pending_experiments with the respective flag. If its None, validation would be skipped, conveniently solving 1.

AdrianSosic · 2025-02-07T08:52:13Z

baybe/campaign.py

Hi @Scienfitz, turning this into a threaded discussion for easier communication:

Regarding the two points you mentioned:

I'm afraid the suggested solution with numerical_measurements_must_be_within_tolerance fixes the problem from the wrong angle, tbh. IMO, a clean solution requires fixing Refactoring out-of-range and tolerance mechanism #428 first. If you think about it: it really doesn't make much sense to impose any user restrictions on measurements and pending_measurements in terms of tolerances (yes: for our internal tracking of metadata, we can do whatever we think is reasonable) because the measurements are really already there / currently in progress. So what's even the point of validating them? That would be like User says: I've already collected some data but Baybe says: Nope! 😄 Validation that the data is compatible with the parameters in the first place is a different story, of course, but that's part 2 then. What do you think?

Agree, we need a mechanism that the campaign can "inform" the recommender that data has already been validated. There are really a lot of different ways how we could do this, here just some of them, which can potentially also be combined:

Adding a (private) flag to Recommender.recommend

Wrapping the call inside campaign into a context manager

Use a separate validation manager (i.e. an object that keeps track of which objects have been already validated) to decouple it entirely from campaign/recommender

Use some clever form of dataframe wrapper

The advantage of the latter three would be that no API change is required. I don't have a clear opinion yet but right now I'd go with the wrapper or the context manager. And perhaps one comment because I can anticipate objections against the wrapper: Note that could be as slim as just subclassing pd.DataFrame into ValidatedDataframe, nothing else. So no API change involved – the recommender would only require do to a simple isinstance check. What do you think? Whatever we do, it'd be good if this stays completely behind the scenes in case we notice problems later.

we certainly dont need to tangle issues more than they are already. Essentially you are opening an entirely new question does requiring measurements being inside allowed toelrances even make sense?. This PR does not aim at answering this, instead, for consistency I was assuming that recommenders must also allow this choice because the campaign already allows it. If the answer is no then the whole problem is greatly simplified and the calls inside recommenders just happens with numerical_measurements_must_be_within_tolerance=False always. If theres an update to that it can be changed again, again tis PR aims at fixing Pending experiments with CustomDiscreteParameter value not in search space gives cryptic error #453 (coming from disallowed values not related to tolerances) and not Refactoring out-of-range and tolerance mechanism #428 (dealing with disallowed values due to tolerances)

Don't get the private flag? Do you mean an private attribute to the class or the proposed argument? Prefer the context manager out of the other variants but havent implemented yet (any hint or link how to do it?). Reject last two points as overkill.

Don't know your opinion yet, but I think the answer should be no. And therefore I think we don't even need to handle it in this PR + it should be fixed/removed in the Campaign in another PR to solve Refactoring out-of-range and tolerance mechanism #428.

The "private flag" would be something like adding _validate_measurements to Recommender.recommend, which I don't like so much. Of course, one could also add it as a "regular" flag but this is really something the user should not need to bother and adding the flag in any form definitely adds some noise to the API. Below a rough sketch of how I could envision the other alternatives.

Wrapper

Even if you don't like the wrapper, it could be a quick temporary (internal) fix until we have the perhaps cleaner approach with the context manager set up. I would probably be as simple as:

class ValidatedDataframe(pd.DataFrame): pass

Then, at the point in the campaign where the validation happens, either the campaign or the validating function would need to set df.__class__ = ValidatedDataframe and then the recommender would only trigger validation in an if not isinstance(df, ValidatedDataframe) condition.

Context Manager

Potentially cleaner but requires some external settings mechanism to keep track of the state. This is definitely something that we should set up anyway for all sorts of other flags as well, like polars, single_precision etc. I have some ideas how to do this nicely in the long run, but for now a quick'n'dirty internal(!) solution would also do the job. Would look like

with settings.validate_user_data(False): recommender.recommend(...)

And then inside recommender you have the corresponding check reading the global flag maintained by the settings object.

PS: In the long run, the settings approach of course has additional benefits, e.g. when can easily integrate this with environment variables so that you can flexibly control the settings from within and outside python, regardless of the specific context you're in.

OK I see the wrapper is probably much better given we have no settings at the moment. Will try, ty.

Besides, I think we are talking about a situation where I want to know whether a df has already been validated. This does not really conform to the meaning of a setting or at least we then have to distinguish whether inputs are checked for validity from them being checked for numerical tolerance match

@AdrianSosic ok Ive added the wrapper suggestion in the last two commits + rebased the branch. It seems to work out of the box

Co-authored-by: AdrianSosic <adrian.sosic@merckgroup.com>

Scienfitz added the enhancement Expand / change existing functionality label Jan 3, 2025

Scienfitz self-assigned this Jan 3, 2025

Scienfitz requested review from AdrianSosic and AVHopp as code owners January 3, 2025 16:52

AdrianSosic requested changes Jan 6, 2025

View reviewed changes

Scienfitz force-pushed the feature/exp_input_validation branch 3 times, most recently from 0a42492 to a412305 Compare January 6, 2025 12:36

AVHopp approved these changes Jan 8, 2025

View reviewed changes

baybe/utils/dataframe.py Show resolved Hide resolved

baybe/utils/dataframe.py Show resolved Hide resolved

baybe/telemetry.py Show resolved Hide resolved

Scienfitz force-pushed the feature/exp_input_validation branch from 7056db6 to 969dea4 Compare January 14, 2025 13:20

Scienfitz requested a review from AdrianSosic January 14, 2025 13:21

AdrianSosic reviewed Feb 7, 2025

View reviewed changes

Scienfitz and others added 13 commits February 7, 2025 16:35

Extract experimental input validation utility

91ec0ad

Fix simulation with empty initial data

98a4c86

Expand basic input output tests

2830d16

Add test for invalid pending_experiments

6d126aa

Add pending_experiments validation

feb5340

Fix docstring

5761ad0

Add utility for creating fake input

be440ec

Add fixture for fake measurements

1be124d

Update type hints

851fc51

Improve text

99cac51

Co-authored-by: AdrianSosic <adrian.sosic@merckgroup.com>

Add note

827c4ba

Add validation everywhere

7203c57

Avoid duplicated validation

7c2b551

Scienfitz force-pushed the feature/exp_input_validation branch from f2b383f to 7c2b551 Compare February 7, 2025 15:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework validation for `measurements` and `pending_experiments` #456

Rework validation for `measurements` and `pending_experiments` #456

Scienfitz commented Jan 3, 2025 •

edited by AdrianSosic

Loading

AdrianSosic left a comment

AVHopp left a comment

Scienfitz commented Feb 6, 2025 •

edited

Loading

AdrianSosic Feb 7, 2025

Scienfitz Feb 7, 2025

AdrianSosic Feb 7, 2025

AdrianSosic Feb 7, 2025

Scienfitz Feb 7, 2025

Scienfitz Feb 7, 2025

Rework validation for measurements and pending_experiments #456

Are you sure you want to change the base?

Rework validation for measurements and pending_experiments #456

Conversation

Scienfitz commented Jan 3, 2025 • edited by AdrianSosic Loading

AdrianSosic left a comment

Choose a reason for hiding this comment

AVHopp left a comment

Choose a reason for hiding this comment

Scienfitz commented Feb 6, 2025 • edited Loading

AdrianSosic Feb 7, 2025

Choose a reason for hiding this comment

Scienfitz Feb 7, 2025

Choose a reason for hiding this comment

AdrianSosic Feb 7, 2025

Choose a reason for hiding this comment

Wrapper

Context Manager

AdrianSosic Feb 7, 2025

Choose a reason for hiding this comment

Scienfitz Feb 7, 2025

Choose a reason for hiding this comment

Scienfitz Feb 7, 2025

Choose a reason for hiding this comment

Rework validation for `measurements` and `pending_experiments` #456

Rework validation for `measurements` and `pending_experiments` #456

Scienfitz commented Jan 3, 2025 •

edited by AdrianSosic

Loading

Scienfitz commented Feb 6, 2025 •

edited

Loading