DoE with NChooseK #341

jduerholt · 2024-01-24T13:24:57Z

I discovered a strange behavior when setting up the following DoE including NchooseKs:

from bofire.data_models.features.api import ContinuousInput
from bofire.data_models.constraints.api import LinearEqualityConstraint, LinearInequalityConstraint, NChooseKConstraint
from bofire.data_models.domain.api import Domain, Inputs, Constraints
from bofire.utils.doe import get_confounding_matrix
from bofire.data_models.strategies.api import  PolytopeSampler
import bofire.strategies.api as strategies
from bofire.strategies.doe.design import find_local_max_ipopt
import matplotlib.pyplot as plt



domain = Domain(
    inputs=Inputs(
        features = [
            ContinuousInput(key="i_1",bounds=(0,55)),
            ContinuousInput(key="i_2",bounds=(0,55)),
            ContinuousInput(key="c",bounds=(15,55)),
            ContinuousInput(key="a",bounds=(5,40)),
            ContinuousInput(key="p_1",bounds=(0,5)),
            ContinuousInput(key="p_2",bounds=(0,5)),
        ]),
        constraints = Constraints(
            constraints=[
                LinearEqualityConstraint(features=["i_1","i_2","c","a","p_1","p_2"], coefficients=[1,1,1,1,1,1,], rhs=100),
                LinearInequalityConstraint(features=["i_1", "i_2"], coefficients=[-1,-1], rhs=-15),
                LinearInequalityConstraint(features=["p_1", "p_2"], coefficients=[-1,-1], rhs=-0.5),
                LinearInequalityConstraint(features=["c","i_1", "i_2"], coefficients=[1,1,1], rhs=92),
                NChooseKConstraint(features=["i_1", "i_2"], min_count=0, max_count=1, none_also_valid=False),
                NChooseKConstraint(features=["p_1", "p_2"], min_count=0, max_count=1, none_also_valid=False),
            ]
        )
)

candidates = find_local_max_ipopt(domain,"linear")# sampling=sampling)

fig, axes = plt.subplots(ncols=4, figsize=(12, 4))
for i, combo in enumerate([["i_1", "p_1"],["i_2", "p_2"], ["i_1", "p_2"], ["i_2", "p_1"]]):
    axes[i].scatter(candidates[combo[0]], candidates[combo[1]])
    axes[i].set_xlabel(combo[0])
    axes[i].set_ylabel(combo[1])
plt.show()

What we are having here is a mixture design, in which we have two groups of two (i group with i_1 and i_2 and p group with p_1 and p_2). Out of each group always one ingredient has to be used. For this reason, I set the lower bound on the features to zero, and introduced per group a linear inequality constraint to have the proper lower bound and an NChooseK constraint.

If I now plot the results, one gets for example this:

One can see that for example the combination high i_1 and high p_1 is never tried, which is really strange because then one introduces an artifical correlation between this degrees of freedom.

In the following, I setup the same design but without the subgroups:

domain = Domain(
    inputs=Inputs(
        features = [
            ContinuousInput(key="i_1",bounds=(15,55)),
            ContinuousInput(key="c",bounds=(15,55)),
            ContinuousInput(key="a",bounds=(5,40)),
            ContinuousInput(key="p_1",bounds=(0.5,5)),
        ]),
        constraints = Constraints(
            constraints=[
                LinearEqualityConstraint(features=["i_1","c","a","p_1",], coefficients=[1,1,1,1], rhs=100),
                LinearInequalityConstraint(features=["c","i_1"], coefficients=[1,1], rhs=92),
            ]
        )
)

candidates = find_local_max_ipopt(domain,"linear")# sampling=sampling)

fig, ax = plt.subplots()
ax.scatter(candidates["i_1"], candidates["p_1"])
plt.show()

Here I get the following:

One sees that when setting it up without NchooseKs it combines high values of both with each other.

Do you have any idea for this strange behavior?

One idea that I had was that it is affected by how the starting points for the ipopt are generated, currently we sample initial point by using the PolytopeSampler, which returns already candidates that obey NChooseK constraints. And afterwards, the DOE module put its bounds randomly on top, without taking into account on which positions the constraints are already fulfilled in the starting points. I think we should change it. What do you think? But providing initial samples by hand that do no obey to the NchooseKs, does not help either.

Would be really happy to get your input on this!

cc @dlinzner-bcs

The text was updated successfully, but these errors were encountered:

jduerholt · 2024-01-24T14:28:22Z

Ok, I went deeper into the rabbit hole and I think, I find the problem. Something is wrong with nchoosek_constraints_as_bounds, which one sees when one executes the following:

bounds = nchoosek_constraints_as_bounds(domain, n_experiments=10)

for j in range(10):
    for i, k in enumerate(domain.inputs.get_keys()):
        b = bounds[j*6+i]
        if k in ["i_1", "i_2", "p_1", "p_2"]:
            if b != (0.0 , 0.0):
                print(k, b)
    print("------")

You will either have combinations i_2, p_2 / i_1, p_1 or i_1, p_2 / i_2, p_1 but you will never get the all four possible options. The question is why ...

Osburg · 2024-01-24T15:08:08Z

Hi @jduerholt I had a look into it and I think you are right: Initially the combinations of variables is set to zero. If the number of experiments exceeds the number of possible combinations it will just repeat the same order of combinations without reshuffling. Since both NChooseK constraints have the same number of possible combinations, the same set of bounds will be repeated over and over again. Reshuffling the combinations after each n_combinations fixes this. I will open a PR in a few mins. Thanks a lot for pointing out this bug!

Cheers,
Aaron

Closes experimental-design#341

Osburg · 2024-01-24T15:12:40Z

After the proposed changes running your code from above gives me stuff like this

I think this is more what it should look like, right?

jduerholt · 2024-01-24T19:47:54Z

Thank you very much @Osburg, I just merged it in.

But we should rething it in general, what happens?

We use the PolytopeSampler to generate a set of starting points. These starting points already obey the NChooseK constraints and all other linear constraints assigned. In the next step we randomly fix features to zero for the ipopt optimizer, but these features are different to the ones that are already zero due to the PolytopeSampler. From my perspective, we should fix for ipopt exactly those features that are already zero and are part of an NchooseK constraints.

What do you think?

jduerholt · 2024-01-24T20:02:46Z

The other option is to ignore the NChooseK constraints in the initial sampling, I do not know how it affects the performance of ipopt. For SLSQP, it is good to have all the constraints fulfilled for the starting points ... But what we do currently, is heavily inconsistent in my opinion.

Osburg · 2024-01-25T01:40:26Z

In principle I agree that the sampling and NChooseK constraint handling in doe are not quite harmonizing. But I am not sure if fixing the nonzero variables based on the initial guess actually is an improvement. I guess we cannot make sure that the nonzero variables of the initial guess follow a uniform (or some other desireable) distribution. Couldn't it happen that - in the most extreme case - all points in the initial set of experiments have the same nonzero variables? We would then be stuck with these in the entire optimization process. The advantage of the current approach is that we make sure that all possible combinations of nonzero variables will be populated somewhat uniformly (tbh i never even tried to do the math to check if a uniform distribution of the nonzero variables is optimal, but intuitively this seems to be a good idea). What do you think about this point? Do you see a way do avoid this problem?

jduerholt · 2024-01-25T08:16:23Z

Hi @Osburg,

Hmm, I have the impression that how I generate the samples that obey the NChooseK constraints follows the same distribution that you also use for assigning the fixed zeros:

bofire/bofire/strategies/samplers/sampler.py

Line 55 in 64f6240

if len(self.domain.constraints.get(NChooseKConstraint)) > 0:

Maybe you can just test it to see how it actually works, the edge case that you described above should not be possible. In general, I prefer to initialize an optimizer with samples that already fulfill the constrains (but this is just a stomach feeling).

After you have checked it out (only if you have time), we could then proceed with discussing the path forward, would this be okay for you?

Best,

Johannes

jduerholt added the help wanted Extra attention is needed label Jan 24, 2024

jduerholt added the bug Something isn't working label Jan 24, 2024

Osburg added a commit to Osburg/bofire that referenced this issue Jan 24, 2024

fix bug in nchoosek_constraints_as_bounds

e65b32d

Closes experimental-design#341

Osburg mentioned this issue Jan 24, 2024

fix bug in nchoosek_constraints_as_bounds #342

Merged

jduerholt closed this as completed in #342 Jan 24, 2024

jduerholt reopened this Jan 24, 2024

rosonaeldred mentioned this issue Dec 4, 2024

Natural Support for NChoosek with min_count>0 #471

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DoE with NChooseK #341

DoE with NChooseK #341

jduerholt commented Jan 24, 2024

jduerholt commented Jan 24, 2024

Osburg commented Jan 24, 2024 •

edited

Loading

Osburg commented Jan 24, 2024

jduerholt commented Jan 24, 2024

jduerholt commented Jan 24, 2024 •

edited

Loading

Osburg commented Jan 25, 2024 •

edited

Loading

jduerholt commented Jan 25, 2024

DoE with NChooseK #341

DoE with NChooseK #341

Comments

jduerholt commented Jan 24, 2024

jduerholt commented Jan 24, 2024

Osburg commented Jan 24, 2024 • edited Loading

Osburg commented Jan 24, 2024

jduerholt commented Jan 24, 2024

jduerholt commented Jan 24, 2024 • edited Loading

Osburg commented Jan 25, 2024 • edited Loading

jduerholt commented Jan 25, 2024

Osburg commented Jan 24, 2024 •

edited

Loading

jduerholt commented Jan 24, 2024 •

edited

Loading

Osburg commented Jan 25, 2024 •

edited

Loading