Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I optimality criterion #428

Open
wants to merge 22 commits into
base: main
Choose a base branch
from
Open

I optimality criterion #428

wants to merge 22 commits into from

Conversation

Osburg
Copy link
Collaborator

@Osburg Osburg commented Aug 19, 2024

This PR should close #337. It should implement the i-optimality criterion as described by @KaiExner as the class IOptimality.

When constructing a new IOptimality object, the space of feasible points is filled using the space filling objective (the number of points n_space can be set by the user and by default the number of points is set to be equal to the requested number of experiments in the design to be found n_experiments).
From these points, the corresponding design matrix Y is generated according to the user-provided model which is then used to define the evaluation function of the criterion as tr((Y.T @ Y) / n_space @ inv(X.T @ X)) where X is the model matrix corresponding to some input x.

@dlinzner-bcs @KaiExner do you think this is roughly what it supposed to look like?

This is still a draft since tests etc. are still missing.

@Osburg Osburg marked this pull request as draft August 19, 2024 22:54
@Osburg Osburg requested a review from dlinzner-bcs August 19, 2024 22:54
@dlinzner-bcs dlinzner-bcs requested a review from KaiExner August 28, 2024 11:41
@KaiExner
Copy link
Collaborator

KaiExner commented Aug 28, 2024 via email

@Osburg
Copy link
Collaborator Author

Osburg commented Aug 28, 2024

Hi @KaiExner,

Danke für das Beispiel. Der folgende Code generiert einen Versuchsplan dazu und erzeugt die Abbildung unten

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import FormatStrFormatter

from bofire.data_models.constraints.api import (
    NonlinearEqualityConstraint,
    NonlinearInequalityConstraint,
    LinearEqualityConstraint,
    LinearInequalityConstraint,
    InterpointEqualityConstraint,
)
from bofire.data_models.domain.api import Domain
from bofire.data_models.features.api import ContinuousInput, ContinuousOutput
from bofire.strategies.doe.design import find_local_max_ipopt
from bofire.strategies.enum import OptimalityCriterionEnum
domain = Domain(
   inputs = [
    ContinuousInput(key="X1", bounds = (-1,1)),
    ContinuousInput(key="X2", bounds = (-1,1)),
    ContinuousInput(key="X3", bounds = (-1,1))
    ],
   outputs = [ContinuousOutput(key="y")],
   constraints = [
       LinearInequalityConstraint(features=["X1","X2"], coefficients=[1,1], rhs=0.8),
       LinearInequalityConstraint(features=["X1","X2"], coefficients=[-1,-1], rhs=-0.2)
   ]
)

i_optimal_design = find_local_max_ipopt(
    domain, 
    "Y ~ X1 + X2 + X3 + X1*X2 + X1*X3 + X2*X3 + X1^2 + X2^2 + X3^2", 
    n_experiments=16, 
    ipopt_options={"disp":5},
    objective=OptimalityCriterionEnum.I_OPTIMALITY
    ).to_numpy().T
jmp = np.array([[0.8454641457,-0.340784303,0],
[1,-0.8,-1],
[1,-0.261259178,1],
[-0.49600524,1,1],
[-0.8,1,0],
[1,-0.8,1],
[0.3972024633,0.0427975367,0],
[-0.2,1,0],
[1,-0.2,-1],
[-0.08,0.88,-1],
[-0.512263201,1,-1],
[-0.08, 0.28, 1],
[0.16, 0.64, 1],
[-0.08, 0.28,-1],
[0.2386392856,0.2013607144,0],
[0.3420421924,0.1377730655,0]])

fig = plt.figure(figsize=((10,10)))
ax = fig.add_subplot(111, projection='3d')
ax.view_init(45, 45)
ax.set_xlabel("$x_1$")
ax.set_ylabel("$x_2$")
ax.set_zlabel("$x_3$")
plt.rcParams["figure.figsize"] = (10,8)


#plot D-optimal solutions
ax.scatter(
    xs=i_optimal_design[0],
    ys=i_optimal_design[1],
    zs=i_optimal_design[2],
    marker="o",
    s=40,
    color="orange",
    label="our solution, 16 points"
)

ax.scatter(
    xs=jmp[:,0],
    ys=jmp[:,1],
    zs=jmp[:,2],
    marker="o",
    s=40,
    color="red",
    label="jmp solution, 16 points"
)

plt.legend()

image

Das in bofire generierte design sieht so aus:

[[-0.2  ,  1.   ,  1.   ],
[ 0.092,  0.108, -1.   ],
[ 1.   , -0.2  , -1.   ],
[-0.8  ,  1.   , -1.   ],
[-0.2  ,  1.   , -1.   ],
[ 1.   , -0.8  , -1.   ],
[-0.8  ,  1.   , -1.   ],
[ 1.   , -0.2  ,  1.   ],
[ 0.117,  0.083,  1.   ],
[ 0.092,  0.108, -1.   ],
[ 0.419,  0.381, -1.   ],
[ 0.117,  0.083,  1.   ],
[-0.8  ,  1.   ,  1.   ],
[ 1.   , -0.8  , -1.   ],
[ 0.299,  0.501,  1.   ],
[ 1.   , -0.8  ,  1.   ]]

Leider nicht ganz das gleiche. Kann man sich sicher sein, dass jmp tatsächlich ein globales Minimum findet? Weil die Lösung aus jmp irgendwie etwas unregelmäßig aussieht (ist z.B. nicht symmetrisch bzgl. Vertauschung von X1 und X2) und die bofire-Lösung im zweiten Bsp. (siehe unten) sehr nah an der exakten Lösung zu liegen scheint...

Ich habe gemerkt, dass im Paper, das ich für die Beispiele in tutorials/doe/basic_examples.ipynb verwendet habe (siehe hier), auch Beispiele für exakte I-optimale designs vorgegeben sind. Der folgende Code erzeugt die in der Abb. gezeigten Designs (grün: exaktes I-optimales Design, gelb: unser Design)

domain = Domain(
   inputs = [
    ContinuousInput(key="x1", bounds = (0,1)),
    ContinuousInput(key="x2", bounds = (0.1, 1)),
    ContinuousInput(key="x3", bounds = (0, 0.6))
    ],
   outputs = [ContinuousOutput(key="y")],
   constraints = [
       LinearEqualityConstraint(features=["x1","x2","x3"], coefficients=[1,1,1], rhs=1),
       LinearInequalityConstraint(features=["x1","x2"], coefficients=[5,4], rhs=3.9),
       LinearInequalityConstraint(features=["x1","x2"], coefficients=[-20,5], rhs=-3)
   ]
)

i_optimal_design = find_local_max_ipopt(
    domain, 
    "x1 + x2 + x3 + {x1**2} + {x2**2} + {x3**2} + {x1**3} + {x2**3} + {x3**3} + x1:x2 + x1:x3 + x2:x3 + x1:x2:x3", 
    n_experiments=12, 
    objective=OptimalityCriterionEnum.I_OPTIMALITY).to_numpy().T
i_opt = np.array([[0.7000, 0.1000, 0.2000],
[0.3000, 0.6000, 0.1000],
[0.2031, 0.1969, 0.6000],
[0.5899, 0.2376, 0.1725],
[0.4105, 0.4619, 0.1276],
[0.2720, 0.4882, 0.2398],
[0.2281, 0.3124, 0.4595],
[0.3492, 0.1000, 0.5508],
[0.5614, 0.1000, 0.3386],
[0.4621, 0.2342, 0.3037],
[0.3353, 0.2228, 0.4419],
[0.3782, 0.3618, 0.2600]]).T # values taken from paper


fig = plt.figure(figsize=((10,10)))
ax = fig.add_subplot(111, projection='3d')
ax.set_title("cubic model")
ax.view_init(45, 45)
ax.set_xlabel("$x_1$")
ax.set_ylabel("$x_2$")
ax.set_zlabel("$x_3$")
plt.rcParams["figure.figsize"] = (10,8)

#plot feasible polytope
ax.plot(
    xs=[7/10, 3/10, 1/5, 3/10, 7/10],
    ys=[1/10, 3/5, 1/5, 1/10, 1/10],
    zs=[1/5, 1/10, 3/5, 3/5, 1/5],
    linewidth=2
)

#plot I-optimal solution
ax.scatter(
    xs=i_opt[0],
    ys=i_opt[1],
    zs=i_opt[2],
    marker="o",
    s=40,
    color="darkgreen",
    label="I-optimal design, 12 points"
)

ax.scatter(
    xs=i_optimal_design[0],
    ys=i_optimal_design[1],
    zs=i_optimal_design[2],
    marker="o",
    s=40,
    color="orange",
    label="optimal_design solution, 12 points"
)

plt.legend()

image

@KaiExner
Copy link
Collaborator

KaiExner commented Aug 29, 2024 via email

@Osburg
Copy link
Collaborator Author

Osburg commented Aug 29, 2024

Hallo Kai,

Das stimmt natürlich, ich habe auch gefunden, woran es lag. Ich habe den Ausdruck

"Y ~ X1 + X2 + X3 + X1*X2 + X1*X3 + X2*X3 + X1^2 + X2^2 + X3^2"

übergeben, die Terme zweiter Ordnung müssen aber als {Xi**2} statt Xi^2 geschrieben werden, sonst werden sie von formulaic ignoriert. Mit dem richtigen Ausdruck

"Y ~ X1 + X2 + X3 + X1*X2 + X1*X3 + X2*X3 + {X1**2} + {X2**2} + {X3**2}"

kommt folgendes Design heraus, sieht das besser aus? :)

image

array([[ 1.  , -0.8 ,  1.  ],
       [ 1.  , -0.2 , -1.  ],
       [ 0.1 ,  0.1 , -1.  ],
       [ 1.  , -0.2 ,  1.  ],
       [-0.2 ,  1.  , -1.  ],
       [-0.2 ,  1.  ,  1.  ],
       [ 0.09,  0.11,  1.  ],
       [-0.8 ,  1.  ,  0.02],
       [ 1.  , -0.2 , -0.  ],
       [ 1.  , -0.8 , -1.  ],
       [-0.8 ,  1.  , -1.  ],
       [-0.2 ,  1.  ,  0.01],
       [ 0.2 ,  0.2 , -0.  ],
       [ 1.  , -0.8 , -0.02],
       [ 0.1 ,  0.1 , -0.  ],
       [-0.8 ,  1.  ,  1.  ]])

Grüße
Aaron

@KaiExner
Copy link
Collaborator

KaiExner commented Aug 30, 2024 via email

@Osburg
Copy link
Collaborator Author

Osburg commented Aug 31, 2024

Hi @KaiExner und @jduerholt ,

danke für die Auswertung, dann passt etwas mit der Erstellung des Designs noch nich...
Ich habe mir die Sache noch einmal angesehen und versucht, ein ähnliches Kriterium wie das in jmp zu bauen. Dazu habe ich auch äquidistante gridpoints erstellt (anstelle der Bestimmung mittels SpaceFilling Objective) und diejenigen Punkte, die die Constraints nicht erfüllen, weggeschmissen (siehe Code am Ende der Nachricht). Das spacing ist so gewählt, dass auch 1.8e6 gridpoints übrig bleiben.
Ich komme dann auch zu dem Ergebnis, dass das bisherige Design aus bofire deutlich schlechter als das von jmp abschneidet (wobei sich die genauen Werte von denen aus deiner letzten Nachricht unterscheiden, äquivalent sind die Kriterien also immer noch nicht :( )

# delta = 1e-2, objective using 80 gridpoints, criterion using 1.8e6 gridpoints
print("ours:", criterion(i_optimal_design.T.flatten()), "jmp:", criterion(jmp.flatten()))
ours: 1.464475006770114 jmp: 0.3522515468462639

Für das BoFire-Design wurden nur 80 gridpoints und ein sehr großer Regularisierungsparameter von 1e-2 verwendet, ich konnte mir vorstellen, dass das zum schlechten Ergebnis führt. Das Design unten verwendet für die Optimierung das oben genannte Kriterium mit mehr gridpoints und den Regularisierungsparameter habe ich auf 1e-7 gesetzt. Das folgende Design kommt dabei heraus (, was im BoFire-Kriterium vergleichbar gut wie das jmp Design abschneidet).

# delta = 1e-7, objective using 1.8e6 gridpoints, criterion using 1.8e6 points
print("ours:", criterion(i_optimal_design.T.flatten()), "jmp:", criterion(jmp.flatten()))
print(np.round(i_optimal_design,2).T)
ours: 0.34273537557403694 jmp: 0.3522515468462639
[[-0.49  1.    0.02]
 [ 1.   -0.46 -0.  ]
 [-0.2   1.   -0.58]
 [-0.2   1.    1.  ]
 [ 0.1   0.1  -0.01]
 [ 0.13  0.36 -1.  ]
 [ 1.   -0.2   1.  ]
 [ 0.21  0.26  1.  ]
 [-0.8   1.   -1.  ]
 [ 1.   -0.8  -1.  ]
 [ 0.93 -0.13 -1.  ]
 [ 0.1   0.1  -0.01]
 [ 0.19  0.29  0.05]
 [ 1.   -0.46 -0.  ]
 [ 1.   -0.8   1.  ]
 [-0.8   1.    1.  ]]

image

@KaiExner falls du Zeit und Lust hast: Schneidet dieses Design auch in jmp deutlich besser ab als das vorherige? (sorry, habe selbst keinen Zugriff)
@jduerholt Es sieht also so aus, als bräuchte man sehr viele gridpoints, um ein gescheites Kriterium zu generieren. Dafür ist eine Optimierung mit SpaceFilling unpraktisch, weil es ewig rechnen würde. Andererseits ist die Alternative (d.h. grid erstellen und constraintverletzende Punkte rausschmeißen) glaub ich nicht so toll für Gleichheitsconstraints... Ich habe jetzt mal erlaubt, dass der Benutzer auch ein selbst erstelltes objective übergeben kann (und by default wird SpaceFilling verwendet), das find ich aber auch iwie unbefriedigend. Wie siehst du das?

Code für "I-Kriterium" mit 1.8e6 gridpoints

from bofire.strategies.doe.objective import IOptimality
import pandas as pd
from itertools import product
import torch
formula = Formula("X1 + X2 + X3 + X1*X2 + X1*X3 + X2*X3 + {X1**2} + {X2**2} + {X3**2}") 

domain = Domain(
   inputs = [
    ContinuousInput(key="X1", bounds = (-1,1)),
    ContinuousInput(key="X2", bounds = (-1,1)),
    ContinuousInput(key="X3", bounds = (-1,1))
    ],
   outputs = [ContinuousOutput(key="y")],
   constraints = [
       LinearInequalityConstraint(features=["X1","X2"], coefficients=[1,1], rhs=0.8),
       LinearInequalityConstraint(features=["X1","X2"], coefficients=[-1,-1], rhs=-0.2)
   ]
)

criterion = IOptimality(
        domain=domain,
        model=formula,
        n_experiments=16,
        delta=1e-7,
        transform_range=None,
        n_space=80,
        ipopt_options={"disp":0, "linear_solver":"ma57", "hsllib":"libcoinhsl.dylib", "tol":1e-15, "acceptable_tol":1e-15},
)


gridpoints = np.linspace(-1,1,200)
points = list(product(gridpoints, repeat=3))
points = np.array(points)
points = pd.DataFrame(points, columns=["X1", "X2", "X3"])

fulfilled = domain.constraints(experiments=points)
fulfilled = np.array(np.logical_and(fulfilled[0] <= 0, fulfilled[1] <= 0), dtype=bool)
points = points[fulfilled]

X = formula.get_model_matrix(points).to_numpy()

criterion.YtY = torch.from_numpy(X.T @ X) / X.shape[0]
criterion.YtY.requires_grad = False

print("Number of gridpoints fulfilling the constraints:", X.shape[0])

@KaiExner
Copy link
Collaborator

KaiExner commented Sep 1, 2024 via email

@Osburg Osburg marked this pull request as ready for review September 1, 2024 21:34
@Osburg
Copy link
Collaborator Author

Osburg commented Sep 1, 2024

Hallo Kai,

danke für das Feedback :) Dann scheint die Implementierung jetzt ja fürs erste in Ordnung zu sein (außer du willst, dass sie erst noch an mehr Problemen getestet wird). @jduerholt: wäre dann jetzt kein draft mehr. Wenn die Dinge für dich in Ordnung ausschauen gerne jederzeit mergen.

Liebe Grüße
Aaron

@KaiExner
Copy link
Collaborator

KaiExner commented Sep 2, 2024 via email

Copy link
Contributor

@dlinzner-bcs dlinzner-bcs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Osburg and @KaiExner for this! I understand the code works now but is computationally intensive due to the large number of grid points for integration. That is ok for now I think. Just an idea: Since our models are polynomials, we could also analytically integrate all the resulting expressions for the I criterion, right?

image

where F is the model matrix and f is our polynomial. So we would need a symbolic integration to get YtY, right? I do not have time at the moment to implement this at the moment - but I can have a look once that changes :)

@Osburg
Copy link
Collaborator Author

Osburg commented Sep 2, 2024

Hi @dlinzner-bcs,

the computationally intensive part is only where the IOptimality object is generated. A space-filling design Y with many (typically $n &gt;&gt; n_{model terms}$) points must be generated. After that, the moment matrix M = Y.T @ Y is calculated (see here) which has the dimensions $M \in \mathbb{R}^{n_{model terms} x n_{model terms}}$.
If Y is generated using SpaceFilling, this can be very time consuming and in higher dimensions also memory demanding. If a equidistant grid is used, this is less of the problem.
The actual optimization after creating the IOptimality object is not much more demanding than w/ other objectives.

Memory can become an issue in higher dimensions as well even when the equidistant grid is used, bc i did not implement it smartly. For these cases i think we would have to try to subdivide the calculation of M into subtasks where not the full matrix Y is needed at any single time point.

Cheers
Aaron :)

P.S.: I don't know if anyone uses it but in general also nonpolynomial terms are supported. If it is imaginable that someone wants to make use of this at some point we should try to keep the criterion compatible with nonpolynomial terms, or what do you think? If not we can surely use a different implement as well!

@jduerholt
Copy link
Contributor

Hi @Osburg,

thank you very much. I will have a look in the next days.

Best,

Johannes

@jduerholt
Copy link
Contributor

Honestly, I do not know if I am the correct one to review :D @Osburg Do you check and fix the failing tests? I can then have a rough look again :D

@Osburg
Copy link
Collaborator Author

Osburg commented Sep 6, 2024

Hey @jduerholt! Yes, will take care of the tests. sry was very busy with work in the last couple of days :D

Copy link
Contributor

@jduerholt jduerholt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Osburg,

thank you very much. I let some comments especially regarding the code structure. I am also a bit under the impression, that the DoE module needs some kind of restructuring in total as we are just adding more features on top which makes it sometimes more messy.

Just look at my comments! It is not meant as criticism but more as a general observation.

Best,

Johannes

bofire/data_models/constraints/constraint.py Show resolved Hide resolved
bofire/strategies/doe/objective.py Show resolved Hide resolved
self.YtY = torch.from_numpy(X.T @ X) / n_space
self.YtY.requires_grad = False

print("Done!")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need all this prints?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No :D

model_type=model, rhs_only=True, domain=domain
)
X = model_formula.get_model_matrix(design).to_numpy()
self.space_filling_design = design
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you use this attribute somewhere?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again no :D will remove it


model_formula = get_formula_from_string(
model_type=model, rhs_only=True, domain=domain
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understood it correctly, until here, a lot of functionality is just recoding other stuff that to come up with a starting set. And this recoding is partially done to prevent, circular imports. What do you think about adding an additional keyword argument to init, which just takes the inital design as input. Then the user also has much more flexibility in controlling this critical part of the design. Furtheremore, it would make the code much cleaner. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the DoEStrategy, we could then add this part of coming up with an inital setup in case somebody wants to perform an I-optimal design. But here, the user would still have the option todo custom stuff.

What do you think about a tidy-up of the DoE functionality in general (after this PR ;))?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could start to revamp find_local_max_iptopt, by changing the signature in a way that it just accepts the instantiated objective class, which is holding the domain instead also taking the 'domainas argument. The model type would then also be only part of the objective and not also of thefind_local_max_ipopt` signature. This way, we can make it all much cleaner and modular. What do you think?

@Osburg
Copy link
Collaborator Author

Osburg commented Sep 9, 2024

HI @jduerholt,

totally agree with your opinion that stuff is getting more and more messy and one (or me :D) should spend time to rewrite the doe part in a nicer way. Since you know more about overall bofire etc + probably you have the better ideas how a better version would look like: Would you like to discuss this in a zoom call or so including @dlinzner-bcs if he likes (or alternatively somewhere here on github, but this would take more time after all i guess :D)? Starting from this I could then take care of the reimplementation.
Apart from hearing your ideas about restructuring the code there is some more stuff that I would need clarification about:

  • Are all the branch-and-bound spin-offs functional or in use by someone? I think we do not have any example notebooks etc. so i am not sure if anyone is using them --> If they are essentially dead code, should we remove them?
  • Is anyone still directly using the find_local_max_ipopt_... functions (in the past some people opened issues where they used it i think)? If yes, should we keep something similar in the future? What do you think?
  • In principle I would like to use torch.autograd.functional.hessian() to provide second order information. But there is one part of the function whose gradient is still not computed using pytorch (namely, the derivative of the model matrix w.r.t. the design), which would probably make it very tedious to write down the hessian. Do you have an idea how one could, starting from a pandas DataFrame, evaluate the model formula in a way that we can use autograd? (Maybe convert the formulaic Formula to a sympy expression and try to use sth like sympytorch or so, idk...? )
  • Is the multi-start idea from Add restart to DoE strategies (multi-start) #351 still sth we would like to have?

<-- These points are not to be discussed or answered now, it's just a bunch of open questions i would want to discuss before I start to rewrite stuff.

cheers
Aaron :)

@Osburg
Copy link
Collaborator Author

Osburg commented Dec 18, 2024

Moved to #485

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

i-optimal designs
4 participants