-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kernels working on a given set of features #476
base: main
Are you sure you want to change the base?
Conversation
TODOs:
|
Example with mol features: import pandas as pd
import bofire.strategies.api as strategies
from bofire.data_models.domain import api as domain_api
from bofire.data_models.features import api as features_api
from bofire.data_models.kernels import api as kernels_api
from bofire.data_models.molfeatures import api as molfeatures_api
from bofire.data_models.priors.api import HVARFNER_LENGTHSCALE_PRIOR
from bofire.data_models.strategies import api as strategies_api
from bofire.data_models.surrogates import api as surrogates_api
domain = domain_api.Domain(
inputs=domain_api.Inputs(
features=[
features_api.ContinuousInput(key="x1", bounds=(-1, 1)),
features_api.ContinuousInput(key="x2", bounds=(-1, 1)),
features_api.CategoricalMolecularInput(
key="mol", categories=["CO", "CCO", "CCCO"]
),
]
),
outputs=domain_api.Outputs(features=[features_api.ContinuousOutput(key="f")]),
)
strategy = strategies.map(
strategies_api.SoboStrategy(
domain=domain,
surrogate_specs=surrogates_api.BotorchSurrogates(
surrogates=[
surrogates_api.SingleTaskGPSurrogate(
inputs=domain.inputs,
outputs=domain.outputs,
input_preprocessing_specs={
"mol": molfeatures_api.Fingerprints(),
},
kernel=kernels_api.AdditiveKernel(
kernels=[
kernels_api.RBFKernel(
ard=True,
lengthscale_prior=HVARFNER_LENGTHSCALE_PRIOR(),
features=["x1", "x2"],
),
kernels_api.TanimotoKernel(
features=["mol"],
),
]
),
)
]
),
)
)
strategy.tell(
experiments=pd.DataFrame(
[
{"x1": 0.2, "x2": 0.4, "mol": "CO", "f": 1.0},
{"x1": 0.4, "x2": 0.2, "mol": "CCO", "f": 2.0},
{"x1": 0.6, "x2": 0.6, "mol": "CCCO", "f": 3.0},
]
)
)
candidates = strategy.ask(candidate_count=1)
print(candidates)
# x1 x2 mol f_pred f_sd f_des
# 0 1.0 1.0 CCCO 2.932646 0.945029 2.932646 |
@e-dorigatti: This looks really great and will make the GP section so much easier, we can then get rid of a lot of GP classes! Thank you very much! |
I think, a specific challenge but also a good opportunity to test will be the The question is how we deal with this? One option would be to code a new Hamming distance kernel which also works on one hot encodeded categoricals or to also start to model the input transforms on a feature specific level in the datamodel. This would then also solve the PiecewiselinearGP but would add more complexity but also versatility. The mapping of the features for the input transforms could be done in the same way as you are currently doing it for the kernels. What do you think? I have the feeling that exposing also the input transforms would be the cleanest solution. What is difficult here, is that the |
Hi @jduerholt thanks for the kind feedback and pointers :) this sounds like a very important use-case to implement and in the next weeks I will look into it. My first idea would be to go via the Going through an input transform sounds (a lot?) more involved, why do you think that would be preferable to the preprocessing specs? Maybe some kernel/features combinations are handled more efficiently in botorch? For example, it would make a lot of sense if a Hamming kernel did not need to actually expand the one-hot encoding. But this I could also implement myself. Maybe I need to learn more about botorch first :) |
Hi @e-dorigatti, a few more notes/explanations from my side:
In general, botorch input transforms are a great way to build up complex differentiable optimization pipelines, on the other hand it is very challenging to come up with a general framework, parameterizied by feature names that can compose different input transforms on different features including the bookkeeping of indices. I hope his posts helps ;) |
Thank you for the thorough explanation @jduerholt, this clears up a lot of doubts! So the key difference is that the optimizer of the acquisition function is working with the features that result after applying the input preprocessing specs, but before the input transform. What are the reasons that led you to decide that categoricals have to be one-hotted? Currently because I don't know your motivations it looks a bit silly since you use the input transform to do the one-hot, then the OneHotToNumeric to undo the one-hot, as the Hamming kernel does not need one-hots in the first place. If we now add more code to deal with the fact that this specific transform (and only this one?) changes the input dimensionality, it would seem like over-complicating things even further (but it would not be that complicated in the end). Maybe, given that with this PR you can have kernels focusing on specific (subsets of) features, it is now unnecessary to force categoricals to be one-hot? Moreover, because molecules are in the end a special type of categorical, it would make sense imho to convert the molecular featurizers to transforms rather than preprocessing. I am now wondering whether it is necessary to expose botorch's input transforms, or it would be okay to put these transformations within the kernels themselves, so that users would specify the features that the kernel is acting on as well as how they should be transformed. I think this would be simpler to implement, and not less powerful than botorch's input transforms unless I am missing something. It also makes sense because, in some way, kernels are the only reason why we need input transforms in the first place, and different kernels could work on different representations of the same feature. What do you think? In the end, I think that both options can be implemented relatively easily, but perhaps one is more disruptive than the other. I think you have a better perspective and ability to judge :) |
Will provide you a thorough answer tomorrow ;) |
The reason was that I wanted to have things too flexible:
You are right, they are special forms of the categorical features. But I would not convert them to botorch input transforms, for the following reasons:
Two reasons for keeping it:
What I would do now:
This should overall, simplify our codebase a lot and will give in general much more flexibility. Of course, there will be also edge cases. What do you think? Ofc, this should not all be done in this PR by you. So I would suggest, that we try to get this PR (which does not breaks anything) in as fast as possible, agree on the changes from above and then distribute the taks. Fine for you? |
Thank you for the clarifications @jduerholt. Following your suggestions I have implemented the Hamming kernel working on one-hot features, and verified that one can get the same results as MixedSingleTaskGP (or even better with a different kernel). The kernel mapper decides transparently from the user whether to use this kernel or the one from botorch depending on whether feature names are specified (SingleTaskGP), or not (MixedSingleTaskGP). |
Apologies, I got lost debugging those tests that were failing on GH but not locally, then I noticed other PRs are suffering from the same issue. According to the print statements I added it seems like an issue with differentiation in the formulaic package?
Where |
…n-feature-subsets
No description provided.