-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DoE with NChooseK #341
Comments
Ok, I went deeper into the rabbit hole and I think, I find the problem. Something is wrong with bounds = nchoosek_constraints_as_bounds(domain, n_experiments=10)
for j in range(10):
for i, k in enumerate(domain.inputs.get_keys()):
b = bounds[j*6+i]
if k in ["i_1", "i_2", "p_1", "p_2"]:
if b != (0.0 , 0.0):
print(k, b)
print("------") You will either have combinations |
Hi @jduerholt I had a look into it and I think you are right: Initially the combinations of variables is set to zero. If the number of experiments exceeds the number of possible combinations it will just repeat the same order of combinations without reshuffling. Since both NChooseK constraints have the same number of possible combinations, the same set of bounds will be repeated over and over again. Reshuffling the combinations after each n_combinations fixes this. I will open a PR in a few mins. Thanks a lot for pointing out this bug! Cheers, |
Thank you very much @Osburg, I just merged it in. But we should rething it in general, what happens? We use the What do you think? |
The other option is to ignore the NChooseK constraints in the initial sampling, I do not know how it affects the performance of ipopt. For SLSQP, it is good to have all the constraints fulfilled for the starting points ... But what we do currently, is heavily inconsistent in my opinion. |
In principle I agree that the sampling and NChooseK constraint handling in doe are not quite harmonizing. But I am not sure if fixing the nonzero variables based on the initial guess actually is an improvement. I guess we cannot make sure that the nonzero variables of the initial guess follow a uniform (or some other desireable) distribution. Couldn't it happen that - in the most extreme case - all points in the initial set of experiments have the same nonzero variables? We would then be stuck with these in the entire optimization process. The advantage of the current approach is that we make sure that all possible combinations of nonzero variables will be populated somewhat uniformly (tbh i never even tried to do the math to check if a uniform distribution of the nonzero variables is optimal, but intuitively this seems to be a good idea). What do you think about this point? Do you see a way do avoid this problem? |
Hi @Osburg, Hmm, I have the impression that how I generate the samples that obey the NChooseK constraints follows the same distribution that you also use for assigning the fixed zeros:
Maybe you can just test it to see how it actually works, the edge case that you described above should not be possible. In general, I prefer to initialize an optimizer with samples that already fulfill the constrains (but this is just a stomach feeling). After you have checked it out (only if you have time), we could then proceed with discussing the path forward, would this be okay for you? Best, Johannes |
Hi @Osburg,
I discovered a strange behavior when setting up the following DoE including NchooseKs:
What we are having here is a mixture design, in which we have two groups of two (
i
group withi_1
andi_2
andp
group withp_1
andp_2
). Out of each group always one ingredient has to be used. For this reason, I set the lower bound on the features to zero, and introduced per group a linear inequality constraint to have the proper lower bound and an NChooseK constraint.If I now plot the results, one gets for example this:
One can see that for example the combination high
i_1
and highp_1
is never tried, which is really strange because then one introduces an artifical correlation between this degrees of freedom.In the following, I setup the same design but without the subgroups:
Here I get the following:
One sees that when setting it up without NchooseKs it combines high values of both with each other.
Do you have any idea for this strange behavior?
One idea that I had was that it is affected by how the starting points for the ipopt are generated, currently we sample initial point by using the
PolytopeSampler
, which returns already candidates that obey NChooseK constraints. And afterwards, the DOE module put its bounds randomly on top, without taking into account on which positions the constraints are already fulfilled in the starting points. I think we should change it. What do you think? But providing initial samples by hand that do no obey to the NchooseKs, does not help either.Would be really happy to get your input on this!
cc @dlinzner-bcs
The text was updated successfully, but these errors were encountered: