Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION] Get the best predicted parameter (not observed) #1029

Closed
jultou-raa opened this issue Jul 21, 2022 · 9 comments
Closed

[QUESTION] Get the best predicted parameter (not observed) #1029

jultou-raa opened this issue Jul 21, 2022 · 9 comments
Assignees
Labels
question Further information is requested

Comments

@jultou-raa
Copy link

jultou-raa commented Jul 21, 2022

Hi guys!

Playing with ax for a while, and I never found a "native" way to get a predicted optimal set of parameters.

For exemple : If I take a dummy function like $x^2+1$ and want to minimize it, I expect the optimal parameter to be $x=0$.

Optim_Search

Using the AxClient API, I'm trying to recover the best parameter using ax_client.get_best_parameters(). But this returns the best observed data from completed trials. So here I get the black left point near 0...

Is-it possible to have something predicting the global optimum using the underlying model? Mean, a prediction of $x=0$ in my case?

If you want to play with this dataset, I give you a snapshot here

Thanks for your help!

@bernardbeckerman
Copy link
Contributor

Hi @jultou-raa, thanks for posting here! Let me follow up with the team and get back to you.

@bernardbeckerman bernardbeckerman self-assigned this Jul 21, 2022
@bernardbeckerman bernardbeckerman added the question Further information is requested label Jul 21, 2022
@bernardbeckerman
Copy link
Contributor

Hi @jultou-raa, would you be able to share the code you're using to run AxClient API? There is a way to do this, but the best way depends on your modeling setup.

@jultou-raa
Copy link
Author

jultou-raa commented Jul 22, 2022

Hi @bernardbeckerman :)

First of all, thank you for the quick reply !

For sure, I can explain the process followed for this example :)

  1. Initialize the AxClient using a custom generation strategy (2 SOBOL + GPEI):

    gs = GenerationStrategy(
                        steps = [
                            GenerationStep(
                                model=Models.SOBOL,
                                num_trials=2,
                                min_trials_observed=2,
                            ),
                            GenerationStep(
                                model=Models.GPEI,
                                num_trials=-1,
                                max_parallelism=nb_trials_batch,
                            ),
                        ]
                    )
    ax_client = AxClient(generation_strategy=gs)
  2. Create experiment with this search space:

    objectives:
      - obj
    parameters:
      x:
        name: "x" 
        type: "range"
        bounds: [-1.0, 1.0]
        value_type: "float"
  3. Then I simulate a Human-In-the-Loop by:

    1. Asking the next trial using: ax_client.get_next_trial()
    2. Computing with an external spreadsheet
    3. Give the result to the ax_client using:
      ax_client.complete_trial(
                          trial_index=trial,
                          raw_data=data,
                      )

For my example, looping four times gaves me:

Trial Index obj Arm Name x Trial Status Generation Method
0 1.007089604 0_0 0.08419978618621826 COMPLETED Sobol
1 1.206375758 1_0 -0.45428598672151566 COMPLETED Sobol
2 1.490542428 2_0 0.7003873413780071 COMPLETED GPEI
3 1.006544824 3_0 -0.08090008723609299 COMPLETED GPEI

Asking ax_client.get_best_parameters() returns me Trial#3(obj=1.006544824 which is the best observed indeed) but I expect something computed by the underlying Gaussian Process model.

Is it this kind of information you needed?

Thanks again for your help and all the Facebook/Ax team ! 👍

@jultou-raa
Copy link
Author

Hi guys !
@bernardbeckerman, @lena-kashtelyan : Do you have some news on this topic ?

Maybe something is not clear for you in my previous comment ? If so please feel free to ask me again :)

Thanks for your help !

@sgbaird
Copy link
Contributor

sgbaird commented Aug 2, 2022

@jultou-raa one option that comes to mind is swapping out the acquisition function with PosteriorMean (assuming I'm understanding this acquisition function correctly) and then call get_next_trial(). Haven't tried it out, but may be worth a shot. You might also consider the upper confidence bound (UCB) acquisition function #955.

See also ModularBoTorchModel, #278, and #615. Maybe one of the Ax devs knows of a better way.

Also curious to know your use case for this. I've thought of doing something similar when it comes to fixed budget sizes, something to the effect of "every 10 iterations, try evaluating at the best-predicted location" to periodically demonstrate shorter-term gains to a stakeholder, especially for those less familiar with the efficiency of Bayesian optimization. Maybe a bad idea, but something that's come to mind.

@bernardbeckerman
Copy link
Contributor

@sgbaird thanks for this response, and @jultou-raa sorry for the late reply! I agree with everything @sgbaird said, and also want to ask a bit more about your use case, particularly why you're looking for the modeled optimum rather than the optimum found so far. In the case that your goal is to do one final sample of the modeled optimum so as to get the best final result, I think this might not be the best strategy, since expected improvement is by definition the one-step optimal strategy for this purpose. Does that make sense? Also let me know if @sgbaird's solution works for you.

@jultou-raa
Copy link
Author

jultou-raa commented Aug 10, 2022

Hi guys!

Thank you @sgbaird for this answer.

Tried both solution (Posterior mean and UCB) as you mentioned before.

This is the way I use it:

from ax.service.ax_client import AxClient
[...]
from ax.modelbridge.registry import Models
from botorch.acquisition import qUpperConfidenceBound, PosteriorMean
from ax.models.torch.botorch_modular.surrogate import Surrogate
from botorch.models.gp_regression import SingleTaskGP


ax_client = AxClient(generation_strategy=gs)  # See post https://github.com/facebook/Ax/issues/1029#issuecomment-1192290881 for gs variable details

[...]  # do the four first trials as mentioned above

gp_posterior_mean = Models.BOTORCH_MODULAR(
            experiment=ax_client.experiment,
            data=ax_client.experiment.fetch_data(),
            surrogate=Surrogate(SingleTaskGP),
            botorch_acqf_class=PosteriorMean,  # Here I tried PosteriorMean or qUpperConfidenceBound
        )

trial = gp_posterior_mean.gen(1)

trial.Arms[0].parameters

Doing this i have two different results (using one or the other) for my exemple above:

  • PosteriorMean gives me: 1.16E−03
  • qUpperConfidenceBound gives me: 2.20E−03

Now I have three questions 😄:

  • Am I doing the computation properly?
  • If I understand BoTorch documentation I can only use PosteriorMean with only one outcome? (q=1)
  • Is it possible to have something more like a centered interval for the optimal parameter (using $3\sigma$ for exemple)? Something like $x \in [-2.2E-03, 2.2E03]$ ?

@sgbaird and @bernardbeckerman, to answer you about why I need this. We do experiment that cost a lot, so the budget is small to reach the target. The idea behind is to finish an exploration/exploitation optimization (what GPEI or GPKG do) by a pure exploitation. for exemple if I got six shots to optimize my solution, it could be interesting for us to build the metamodel using 5 "smart" points and then the last one is a pure exploitation of the previous knowledge.

In the exemple I gave you in this thread, the last point Expected Improvement asked me was 0.0020908159615589117. So, it is a little better than qUpperConfidenceBound but less than PosteriorMean...

@dme65
Copy link
Contributor

dme65 commented Sep 15, 2022

Hi @jultou-raa,

Regarding your three questions:

  1. Yes, what you are doing looks correct to me.
  2. PosteriorMean assumes one outcome, but q=1 here corresponds to how many candidates you want to evaluate. That is, you won't be able to use PosteriorMean if you want to generate more than 1 candidate and evaluate those in parallel. That doesn't seem to be something you are interested in though given your description above.
  3. I'm not sure if I understand what you mean here. Can you add some more details?

Taking a step back, every acquisition function you consider here generates a candidate close to zero and I wouldn't read too much into the fact that 1.16e−3 is slightly closer to zero than 2.20e−3. Given the situation you describe, EI is probably a natural choice here as it aims to maximize the expected improvement given one more function evaluation.

While it may feel like PosteriorMean is a natural choice when you have one evaluation left and want to focus on exploitation, here is a scenario where it will probably do the wrong thing: Assume your current best function value is f* and that the best posterior mean according to the model is also f*. Assume in addition that the uncertainty according to the model is 0 (the model is very confident in its prediction). Now, assume there is a second point with posterior mean f* + epsilon with epsilon>0 very close to zero, but that this point has very high uncertainty according to the model (the model is very unsure about its prediction). If you use PosteriorMean, it will ignore the model uncertainty and pick the point with posterior mean f*, which isn't a great choice since this point has no upside whatsoever. On the other hand, EI will end up picking the point with posterior mean f* + epsilon since this point has higher upside and may actually give you a sizable improvement compared to your current best point.

@lena-kashtelyan
Copy link
Contributor

This seems answered and inactive, closing. Please feel free to reopen!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants