kernels working on a given set of features #476

e-dorigatti · 2024-12-04T14:36:43Z

No description provided.

e-dorigatti · 2024-12-04T14:37:03Z

                SingleTaskGPSurrogate(
                    # inputs= ...,
                    # outputs= ...,
                    kernel=AdditiveKernel(
                        kernels=[
                            RBFKernel(
                                ard=True,
                                lengthscale_prior=HVARFNER_LENGTHSCALE_PRIOR(),
                                features=["x1", "x2"],
                            ),
                            RBFKernel(
                                ard=True,
                                lengthscale_prior=HVARFNER_LENGTHSCALE_PRIOR(),
                                features=["x3", "x4", "x5"],
                            ),
                        ]
                    ),
                )

TODOs:

tests
validator (make sure the features actually are in the inputs)

e-dorigatti · 2024-12-04T16:14:29Z

Example with mol features:

import pandas as pd

import bofire.strategies.api as strategies
from bofire.data_models.domain import api as domain_api
from bofire.data_models.features import api as features_api
from bofire.data_models.kernels import api as kernels_api
from bofire.data_models.molfeatures import api as molfeatures_api
from bofire.data_models.priors.api import HVARFNER_LENGTHSCALE_PRIOR
from bofire.data_models.strategies import api as strategies_api
from bofire.data_models.surrogates import api as surrogates_api


domain = domain_api.Domain(
    inputs=domain_api.Inputs(
        features=[
            features_api.ContinuousInput(key="x1", bounds=(-1, 1)),
            features_api.ContinuousInput(key="x2", bounds=(-1, 1)),
            features_api.CategoricalMolecularInput(
                key="mol", categories=["CO", "CCO", "CCCO"]
            ),
        ]
    ),
    outputs=domain_api.Outputs(features=[features_api.ContinuousOutput(key="f")]),
)


strategy = strategies.map(
    strategies_api.SoboStrategy(
        domain=domain,
        surrogate_specs=surrogates_api.BotorchSurrogates(
            surrogates=[
                surrogates_api.SingleTaskGPSurrogate(
                    inputs=domain.inputs,
                    outputs=domain.outputs,
                    input_preprocessing_specs={
                        "mol": molfeatures_api.Fingerprints(),
                    },
                    kernel=kernels_api.AdditiveKernel(
                        kernels=[
                            kernels_api.RBFKernel(
                                ard=True,
                                lengthscale_prior=HVARFNER_LENGTHSCALE_PRIOR(),
                                features=["x1", "x2"],
                            ),
                            kernels_api.TanimotoKernel(
                                features=["mol"],
                            ),
                        ]
                    ),
                )
            ]
        ),
    )
)


strategy.tell(
    experiments=pd.DataFrame(
        [
            {"x1": 0.2, "x2": 0.4, "mol": "CO", "f": 1.0},
            {"x1": 0.4, "x2": 0.2, "mol": "CCO", "f": 2.0},
            {"x1": 0.6, "x2": 0.6, "mol": "CCCO", "f": 3.0},
        ]
    )
)
candidates = strategy.ask(candidate_count=1)
print(candidates)

#     x1   x2   mol    f_pred      f_sd     f_des
# 0  1.0  1.0  CCCO  2.932646  0.945029  2.932646

jduerholt · 2024-12-05T20:14:57Z

@e-dorigatti: This looks really great and will make the GP section so much easier, we can then get rid of a lot of GP classes! Thank you very much!

jduerholt · 2024-12-06T15:25:16Z

I think, a specific challenge but also a good opportunity to test will be the MixedSingleTaskGPSurrogate it is based on botorchs MixedSingleTaskGP which assumes that every categorical features is one column where the categories are different numbers, then it uses this column to compute a hamming distance (https://github.com/experimental-design/bofire/blob/main/bofire/surrogates/mixed_single_task_gp.py). Since BoFire always assumes that categorical features enter as one-hot encoded, we used there a OneHotToNumeric input transform.

The question is how we deal with this? One option would be to code a new Hamming distance kernel which also works on one hot encodeded categoricals or to also start to model the input transforms on a feature specific level in the datamodel. This would then also solve the PiecewiselinearGP but would add more complexity but also versatility.

The mapping of the features for the input transforms could be done in the same way as you are currently doing it for the kernels.

What do you think? I have the feeling that exposing also the input transforms would be the cleanest solution. What is difficult here, is that the OneHotToNumeric transform changes the indices of features as it collapses for every cateogrical feature from n_categories to 1. So we have some problems here ;) Maybe we should then rewrite the kernel and expose the transforms :D

e-dorigatti · 2024-12-06T21:06:27Z

Hi @jduerholt thanks for the kind feedback and pointers :) this sounds like a very important use-case to implement and in the next weeks I will look into it. My first idea would be to go via the input_preprocessing_specs and try to mimic what Fingerprints is doing, i.e., the onehot encoding in this case, probably by recycling some functions from CategoricalInput, and indeed creating a new kernel for the Hamming distance.

Going through an input transform sounds (a lot?) more involved, why do you think that would be preferable to the preprocessing specs? Maybe some kernel/features combinations are handled more efficiently in botorch? For example, it would make a lot of sense if a Hamming kernel did not need to actually expand the one-hot encoding. But this I could also implement myself. Maybe I need to learn more about botorch first :)

jduerholt · 2024-12-09T14:02:09Z

Hi @e-dorigatti,

a few more notes/explanations from my side:

There are two types of transforms in principle:
1. Everything that we have in the input_transform_specs, this is executed before the actual optimization/model fitting. For examples, categorical features are transformed/expanded into Onehots or molecular ones into fingerprints, than the model is fitted upon this representation and also the optimizer on top of the acquisition function is working on top of this represenation. Currently, we can give there either one of CategoricalEncodingEnums or one of the molfeatures classes. This is also a bit of inconsistent here, and we have to tidy it up at some point. The molfeatures classes was added later, and we found there that simple enums were not enough as the transform need to be parameterized.
2. Then we have the botorch input transforms, this is executed on the fly during gp fitting, optimization. A transform(s) is added to the GP and the data is transformed before it is actually entering the kernel. We use them for their flexibility, for example we could have two models in an optimization and the one could be equipped with a Normalize transform and the other one with an InputStandardize transform. Then one GP would be fitted to the normalized (between 0 and 1) data and the other one to the standardized data (zero mean, unit varriance), and still both of them could be used in combination during optimization. Another example is filtering of features. It could be the case, that we have build two surrogates, but in one of them one feature is not present (perhaps, because we know that it has not predictive power for the output of this model) and now we want to use both surrogates in optimization. In this case, BoFire is automatically combining both Models in botorch's ModelList and adds a filter features transform before the first one that filters those features out that should be not part of the optimization (could be in principle also done with the approach here in the PR). Another example is the PiecewieseLinearGP. For the optimizer, the input to the model are the nodes of the piecewiselinear, but the nodes are on the fly duing optimization expanded (in a differentiable manner) to the discretized piecewiselinear on which then the Wasserstein Kernel is applied. Currently, desite chosing which scaler to use, botorch input transform are not explicitly exposed to BoFires API.
We decided at some point that, categoricals have to be encoded for use in surrogates/optimization always as one-hot, as this allows both for dealing with them as normal features treated as the continuous ones without special kernels and using them in a MixedGP botorch OneHotToNumeric transform. Of course, also other design choices would be possible here, which have drawbacks in itself.
Regarding this PR, an 80% solution which would make different implementations for and MixedSingleTaskGPMixedTanimotoGPSurrogate obsolete would be to have the HammingKernel being able to act directly on the onehot encoded values. This would mean that one would move the OneHotToNumeric transform into the kernel. Furthermore, also the MultiTaskGP could then benefit from this as we would then have the same flexibility there. Of course we could also lift the restriction of having categoricals always as one-hots.
In a next step, one should then make the already eposed scaling transforms (Normalize, Standardize) configurable by giving the user the explicit option which features these transform are acting on.

In general, botorch input transforms are a great way to build up complex differentiable optimization pipelines, on the other hand it is very challenging to come up with a general framework, parameterizied by feature names that can compose different input transforms on different features including the bookkeeping of indices.

I hope his posts helps ;)

e-dorigatti · 2024-12-11T17:24:06Z

Thank you for the thorough explanation @jduerholt, this clears up a lot of doubts! So the key difference is that the optimizer of the acquisition function is working with the features that result after applying the input preprocessing specs, but before the input transform.

What are the reasons that led you to decide that categoricals have to be one-hotted? Currently because I don't know your motivations it looks a bit silly since you use the input transform to do the one-hot, then the OneHotToNumeric to undo the one-hot, as the Hamming kernel does not need one-hots in the first place. If we now add more code to deal with the fact that this specific transform (and only this one?) changes the input dimensionality, it would seem like over-complicating things even further (but it would not be that complicated in the end). Maybe, given that with this PR you can have kernels focusing on specific (subsets of) features, it is now unnecessary to force categoricals to be one-hot? Moreover, because molecules are in the end a special type of categorical, it would make sense imho to convert the molecular featurizers to transforms rather than preprocessing.

I am now wondering whether it is necessary to expose botorch's input transforms, or it would be okay to put these transformations within the kernels themselves, so that users would specify the features that the kernel is acting on as well as how they should be transformed. I think this would be simpler to implement, and not less powerful than botorch's input transforms unless I am missing something. It also makes sense because, in some way, kernels are the only reason why we need input transforms in the first place, and different kernels could work on different representations of the same feature.

What do you think? In the end, I think that both options can be implemented relatively easily, but perhaps one is more disruptive than the other. I think you have a better perspective and ability to judge :)

jduerholt · 2024-12-11T21:22:08Z

Will provide you a thorough answer tomorrow ;)

jduerholt · 2024-12-12T11:53:48Z

@e-dorigatti :

What are the reasons that led you to decide that categoricals have to be one-hotted? Currently because I don't know your motivations it looks a bit silly since you use the input transform to do the one-hot, then the OneHotToNumeric to undo the one-hot, as the Hamming kernel does not need one-hots in the first place.

The reason was that I wanted to have things too flexible:

I wanted to be able to combine a SingleTaskGPSurrogate which needs to onehot the categoricals with a MixedSingleTaskGPSurrogate in the same optimization. For this, there is the need that for both of them the same features enter the GP. So I decided for this complicated solution using the OneHotToNumeric transform, as I saw that it was already available in botorch. And then I just said that CategoricalnputFeatures always has to be one hot.
But this kind of constraint is already relaxed for the following features which all inherit from CategoricalInputFeature:
- CategoricalDescriptorInput: Here also CategoricalEncodingEnum.DESCRIPTOR is allowed.
- CategoricalMolecularInput: Here also the vairous mofeature transforms are allowed.
- TaskInput: If you want to use it within a MultiTaskGPSurrogate, it has to be ordinal encoded.

Moreover, because molecules are in the end a special type of categorical, it would make sense imho to convert the molecular featurizers to transforms rather than preprocessing.

You are right, they are special forms of the categorical features. But I would not convert them to botorch input transforms, for the following reasons:

Botorch input transforms are super nice if you have to change something between the input representation for the optimizer and the representation of the GP. Especially, if you want it in differentiable manner, as we do it in the PiecewiseLinearGPSurrogate. This is not really the case for the molecular featureizers. The featurization is non differentiable and we can do it easily upfront.
The current setup for the transforms is working and I do not see a need to change it.

I am now wondering whether it is necessary to expose botorch's input transforms, or it would be okay to put these transformations within the kernels themselves, so that users would specify the features that the kernel is acting on as well as how they should be transformed. I think this would be simpler to implement, and not less powerful than botorch's input transforms unless I am missing something. It also makes sense because, in some way, kernels are the only reason why we need input transforms in the first place, and different kernels could work on different representations of the same feature.

Two reasons for keeping it:

Practical reason; The transforms are deeply baked into botorch and also well tested. If we want to do it in the kernels, we have to implement it ourselves.
Computational reason: the botorch/gpytorch gps store the training data x*which is used then used at inference time to calculate the covariance matrix/kernel with respect to the test data x: k(x,x*). The trick is that it stores the transformed training data, so you do not need to transform the training data at every inference step. All these mechanism are already implemented and well tested especially on the botorch side. For this reason, in general I want to keep the trasforms, but of course there could be exceptions for edge cases for it.

What I would do now:

Release the restriction that CategoricalInputss has to be always one hot encoded and allow also for ordinal encodings.
Fix get_categorical_combinations in BotorchStrategy so that it also works for categoricals encoded as ordinals.
Implement in get_categorical_combinations also the generation of the molecular ones. Currently you can only run optimizations over molecular spaces if you do not have only continuus ones.
Drop the options in BoTorchStrategy for the categorical_method, descriptor_method und discrete_method. Depending on the surrogate, we had here so far the possibility to use continuous relaxations of them during the optimization, which ofc can lead to wrong optima. So I would drop this support (makes the code much easier), especially as we get the GA by @LukasHebing and @sleweke-bayer for mixed spaces.
Change the ScalerEnums into actual classes in which we can define on which features/feature types the corresponding transforms should apply.
Try to write the MixedSingleTaskGPSurrogate in plain bofire as a SingleTaskGPSurrogate which the explicit kernels and see that we get the same results.
Remove all necessary GP implementations and write helper methods that helps you to generate the complex specs.

This should overall, simplify our codebase a lot and will give in general much more flexibility. Of course, there will be also edge cases. What do you think?

Ofc, this should not all be done in this PR by you. So I would suggest, that we try to get this PR (which does not breaks anything) in as fast as possible, agree on the changes from above and then distribute the taks. Fine for you?

e-dorigatti · 2024-12-19T12:14:52Z

Thank you for the clarifications @jduerholt. Following your suggestions I have implemented the Hamming kernel working on one-hot features, and verified that one can get the same results as MixedSingleTaskGP (or even better with a different kernel). The kernel mapper decides transparently from the user whether to use this kernel or the one from botorch depending on whether feature names are specified (SingleTaskGP), or not (MixedSingleTaskGP).

…nel" This reverts commit 4a2a547.

This reverts commit 6ad1dfd.

…features" This reverts commit 17d8350.

…gorical features"" This reverts commit 2e29852.

…stom kernel"" This reverts commit 1cd2776.

e-dorigatti · 2024-12-19T14:57:34Z

Apologies, I got lost debugging those tests that were failing on GH but not locally, then I noticed other PRs are suffering from the same issue. According to the print statements I added it seems like an issue with differentiation in the formulaic package?

>       assert np.allclose(B, J)
E       assert False
E        +  where False = <function allclose at 0x7f392a5de930>(array([[0., 1., 0., 0., 0., 2.],\n       [0., 0., 1., 0., 0., 1.],\n       [0., 0., 0., 1., 6., 0.]]), array([[[0, 1, 0, 0, 0, 2],\n        [0, 0, 1, 0, 0, 1],\n        [0, 0, 0, 1, 0, 6]]]))
E        +    where <function allclose at 0x7f392a5de930> = np.allclose

tests/bofire/strategies/doe/test_objective.py:52: AssertionError
----------------------------- Captured stdout call -----------------------------
model 1 + x1 + x2 + x3 + x3 ** 2 + x1:x2
diff x1 0 + 1 + 0 + 0 + 0 + x2
diff x2 0 + 0 + 1 + 0 + 0 + x1
diff x3 0 + 0 + 0 + 1 + 0 + (2*x3)

Where diff x3 should be 0 + 0 + 0 + 1 + (2*x3) + 0 (full output here)

…n-feature-subsets

kernels working on a given set of features

ac7de64

e-dorigatti linked an issue Dec 4, 2024 that may be closed by this pull request

kernels on feature subsets #474

Open

e-dorigatti and others added 5 commits December 4, 2024 15:40

pre-commit

a806dc0

test map singletaskgp with additive kernel

c3f1899

test active_dims of mapped kernels

86b6ad6

add features_to_idx_mapper to outlier detection tutorial

b4458fb

correctly handling categorical mol features

d243ad0

validating mol features transforms

842797f

verifying proper type

22c4382

e-dorigatti added 3 commits December 19, 2024 11:38

custom hamming kernel enabling single task gp on categorical features

17d8350

removed unnecessary parameter from data model

6ad1dfd

testing equivalence of mixed gp and single gp with custom kernel

4a2a547

e-dorigatti added 8 commits December 19, 2024 14:18

(temporary) running on all py versions

3750827

(temporary) debug github actions by printing

7162983

more printing

01a01e1

Revert "testing equivalence of mixed gp and single gp with custom ker…

1cd2776

…nel" This reverts commit 4a2a547.

Revert "removed unnecessary parameter from data model"

8400fdb

This reverts commit 6ad1dfd.

Revert "custom hamming kernel enabling single task gp on categorical …

2e29852

…features" This reverts commit 17d8350.

Revert "Revert "custom hamming kernel enabling single task gp on cate…

7e455b7

…gorical features"" This reverts commit 2e29852.

Revert "Revert "testing equivalence of mixed gp and single gp with cu…

25f947b

…stom kernel"" This reverts commit 1cd2776.

removed test debug and restored to latest implemented features

2c145b6

e-dorigatti added 19 commits December 19, 2024 16:04

pinning compatible version of formulaic

30dd123

Merge branch 'main' into 474-kernels-on-feature-subsets

065824f

pinning compatible version of formulaic

b53d3bb

removed old code

8d47cbd

lint

ce38428

removed scratch file

16bdc1f

removed old code again

e306d16

silencing pyright false positive

9d5dfc6

compatibility with py39

62ba2c2

pin compatible version of formulaic

d2c1f5d

restored old code

966bf8b

pinning sklearn

231f9f6

pinning sklearn

6a7c9d7

pinning scikit everywhere

6576547

Merge branch '488-tests-failing-with-formulaic=11' into 474-kernels-o…

3e79e31

…n-feature-subsets

not testing for prediction quality

e70cc16

matching lengthscale constraints in hamming kernel

54b3c7f

removed equivalence test

9b32536

testing hamming kernel

831a03e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kernels working on a given set of features #476

kernels working on a given set of features #476

e-dorigatti commented Dec 4, 2024

e-dorigatti commented Dec 4, 2024 •

edited

Loading

e-dorigatti commented Dec 4, 2024 •

edited

Loading

jduerholt commented Dec 5, 2024

jduerholt commented Dec 6, 2024 •

edited

Loading

e-dorigatti commented Dec 6, 2024 •

edited

Loading

jduerholt commented Dec 9, 2024

e-dorigatti commented Dec 11, 2024

jduerholt commented Dec 11, 2024

jduerholt commented Dec 12, 2024 •

edited

Loading

e-dorigatti commented Dec 19, 2024 •

edited

Loading

e-dorigatti commented Dec 19, 2024

kernels working on a given set of features #476

Are you sure you want to change the base?

kernels working on a given set of features #476

Conversation

e-dorigatti commented Dec 4, 2024

e-dorigatti commented Dec 4, 2024 • edited Loading

e-dorigatti commented Dec 4, 2024 • edited Loading

jduerholt commented Dec 5, 2024

jduerholt commented Dec 6, 2024 • edited Loading

e-dorigatti commented Dec 6, 2024 • edited Loading

jduerholt commented Dec 9, 2024

e-dorigatti commented Dec 11, 2024

jduerholt commented Dec 11, 2024

jduerholt commented Dec 12, 2024 • edited Loading

e-dorigatti commented Dec 19, 2024 • edited Loading

e-dorigatti commented Dec 19, 2024

e-dorigatti commented Dec 4, 2024 •

edited

Loading

e-dorigatti commented Dec 4, 2024 •

edited

Loading

jduerholt commented Dec 6, 2024 •

edited

Loading

e-dorigatti commented Dec 6, 2024 •

edited

Loading

jduerholt commented Dec 12, 2024 •

edited

Loading

e-dorigatti commented Dec 19, 2024 •

edited

Loading