[question] Service API low recommended parallelism #199

casassg · 2019-11-07T23:15:03Z

Service API allows to run parallel trials. However, when reaching the threashold of acceptable number get_next_trial throws a ValueError. I was wondering if this could be changed for some more specific exception or returning None?

I want to avoid using enforce_sequential_optimization=False and get_recommended_max_parallelism is returning me a small parallel setting on default ([(5, 5), (-1, 3)]).

Any tips to either increase this number for parallel evaluations (still sequential batch evaluations)?

The text was updated successfully, but these errors were encountered:

lena-kashtelyan · 2019-11-08T16:00:59Z

Hi, @casassg! Changing it to a more specific exception is actually in the works, should be in within a couple of weeks.

The easiest way to set your own parallelism is via setting your own generation strategy, I think, like so:

gs = GenerationStrategy(
    steps=[
        GenerationStep(
            model=Models.SOBOL,
            num_trials=5,  # Or however many Sobol [trials](https://ax.dev/docs/glossary.html#trial) you want
            min_trials_observed=3, # How many trials need to be completed to move to next model
            max_parallelism=5,  # Your recommended max parallelism for this step
            model_kwargs={},  # Any kwargs you want passed into the model, e.g. {"seed": 999} for Sobol   
        ),
        GenerationStep(
            model=Models.GPEI,
            ...  # Same settings are available as in the above step
        ),
    ]
)

ax_client = AxClient(generation_strategy=gs)

Some additional resources:

To read up more about generation strategy settings: https://ax.dev/api/modelbridge.html#factory-and-generation-strategy,
For an example of how we construct the generation strategy by default: https://github.com/facebook/Ax/blob/master/ax/modelbridge/dispatch_utils.py#L91,
To see which models are available through the Models registry and how to instantiate them: https://github.com/facebook/Ax/blob/master/ax/modelbridge/registry.py#L158.

Let me know if this helps; I'm happy to elaborate on any of this, also!

P.S. the parallelism setting is rather low for a reason; Bayesian optimization works much better when its more sequential (for a usual hyperparameter optimization problem, with the parallelism of 3, it will converge to the optimal set of parameters, within 25 trials). However, if you are running a lot of trials, then such a low parallelism setting doesn't matter as much. We can help pick the optimal parallelism settings if need be.

newtonle · 2019-11-08T17:38:59Z

To add to this question: how is this recommended max parallelism calculated? What exactly are you optimizing for?

casassg · 2019-11-08T17:59:06Z

So my use case is that I need to train models that sometimes require hours to train (working to make sure that I reduce this), but I'm wondering if in this case running more parallel batches would help. Specially I'm worried when having a large hyperparameter search space where the model won't converge in 25 trials.

Also, is there any easy way to check how close to converging the Bayesian Optimization model is? Having access to that would potentially be useful to stop launching more jobs.

I guess I could extract the model and check it somehow, but wondering if you use any specific way to see that convergence.

lena-kashtelyan · 2019-11-11T18:39:19Z

@newtonle, currently its just fixed at as many as needed for Sobol and at 3 for BayesOpt, since we have found empirically that those values work well.

@casassg, that makes sense. If you have the capacity to increase parallelism and run more trials, then you can certainly do that!

Re: checking convergence –– you could use the optimization trace plot (ax_client.get_optimization_trace, see the Service API tutorial for how to render) to see whether the objective value stopped improving; @bletham, @Balandat, or @eytan might know of another way! We should certainly consider adding an easy way to perform that check to the Service API –– thank you for bringing it up!

Also, just as an FYI –– we are planning some changes to the way the parallelism is handled by the Service API in the coming couple of weeks (among them, making the 'need more data' exception more specific than a ValueError); it shouldn't change the settings I gave in my example, however. Did they work as a solution for your case?

casassg · 2019-11-18T18:36:35Z

I wonder if it would be possible to make this more automatic as you suggest. It would be nice to get some kind of early stopping check.

Would be interested in following progress in on how Service API parallelism can improve. Curious if there's any way to follow the conversation.

lena-kashtelyan · 2019-11-19T00:20:27Z

@casassg, we will definitely consider adding an early stopping check and see if we can develop a good heuristic for it, thank you for the suggestion!

Re: following the conversation, I will definitely describe the changes in the changelog of the repo, but maybe @kkashin has more ideas on how we could keep you in the loop!

lena-kashtelyan · 2019-12-04T16:41:48Z

Making this an "enhancement" and a "wishlist" item for now, since early-stopping heuristic will be part of our more long-term plan and the question part of the issue was answered.

lena-kashtelyan · 2020-08-28T15:12:02Z

It is now possible to increase parallelism level for any generation step, and there is information on it in this section and this section of the Service API tutorial. To reiterate, there are two ways of setting max parallelism for a given generation step:

by specifying one's own custom generation strategy as was described in this comment,
by specifying "max_parallelism_override" as part of choose_generation_strategy_kwargs argument to create_experiment, as described here.

There is also now a special exception, MaxParallelismReached, indicating exactly what it says : ) So it's hopefully easier to catch / handle it.

A tutorial on generation strategy settings is in the works and coming soon, so stay tuned. We are also working on early-stopping functionality, but since that isn't the main part of this issue, I'm closing it –– feel free to open another to request status updates on convergence indication / early stopping of optimization.

lena-kashtelyan self-assigned this Nov 8, 2019

lena-kashtelyan added the question Further information is requested label Nov 8, 2019

lena-kashtelyan added wishlist Long-term wishlist feature requests enhancement New feature or request and removed question Further information is requested labels Dec 4, 2019

lena-kashtelyan mentioned this issue Mar 19, 2020

[Question] Can I change acquisition function and kernel in service API? #278

Closed

lena-kashtelyan mentioned this issue Aug 26, 2020

Failed and abandoned trials: indicating trial failures and and excluding certain parameter values from future trials #372

Closed

lena-kashtelyan closed this as completed Aug 28, 2020

This was referenced Feb 18, 2021

[Question] How to skip the Sobol step from the generation strategy and initialize the model with existing data? #180

Closed

Noisy objective function not taken into account in SimpleExperiment when suggesting best parameters #501

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[question] Service API low recommended parallelism #199

[question] Service API low recommended parallelism #199

casassg commented Nov 7, 2019

lena-kashtelyan commented Nov 8, 2019 •

edited

Loading

newtonle commented Nov 8, 2019

casassg commented Nov 8, 2019

lena-kashtelyan commented Nov 11, 2019 •

edited

Loading

casassg commented Nov 18, 2019

lena-kashtelyan commented Nov 19, 2019

lena-kashtelyan commented Dec 4, 2019 •

edited

Loading

lena-kashtelyan commented Aug 28, 2020 •

edited

Loading

[question] Service API low recommended parallelism #199

[question] Service API low recommended parallelism #199

Comments

casassg commented Nov 7, 2019

lena-kashtelyan commented Nov 8, 2019 • edited Loading

Some additional resources:

newtonle commented Nov 8, 2019

casassg commented Nov 8, 2019

lena-kashtelyan commented Nov 11, 2019 • edited Loading

casassg commented Nov 18, 2019

lena-kashtelyan commented Nov 19, 2019

lena-kashtelyan commented Dec 4, 2019 • edited Loading

lena-kashtelyan commented Aug 28, 2020 • edited Loading

lena-kashtelyan commented Nov 8, 2019 •

edited

Loading

lena-kashtelyan commented Nov 11, 2019 •

edited

Loading

lena-kashtelyan commented Dec 4, 2019 •

edited

Loading

lena-kashtelyan commented Aug 28, 2020 •

edited

Loading