Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[question] Service API low recommended parallelism #199

Closed
casassg opened this issue Nov 7, 2019 · 8 comments
Closed

[question] Service API low recommended parallelism #199

casassg opened this issue Nov 7, 2019 · 8 comments
Assignees
Labels
enhancement New feature or request wishlist Long-term wishlist feature requests

Comments

@casassg
Copy link
Contributor

casassg commented Nov 7, 2019

Service API allows to run parallel trials. However, when reaching the threashold of acceptable number get_next_trial throws a ValueError. I was wondering if this could be changed for some more specific exception or returning None?

I want to avoid using enforce_sequential_optimization=False and get_recommended_max_parallelism is returning me a small parallel setting on default ([(5, 5), (-1, 3)]).

Any tips to either increase this number for parallel evaluations (still sequential batch evaluations)?

@lena-kashtelyan
Copy link
Contributor

lena-kashtelyan commented Nov 8, 2019

Hi, @casassg! Changing it to a more specific exception is actually in the works, should be in within a couple of weeks.

The easiest way to set your own parallelism is via setting your own generation strategy, I think, like so:

gs = GenerationStrategy(
    steps=[
        GenerationStep(
            model=Models.SOBOL,
            num_trials=5,  # Or however many Sobol [trials](https://ax.dev/docs/glossary.html#trial) you want
            min_trials_observed=3, # How many trials need to be completed to move to next model
            max_parallelism=5,  # Your recommended max parallelism for this step
            model_kwargs={},  # Any kwargs you want passed into the model, e.g. {"seed": 999} for Sobol   
        ),
        GenerationStep(
            model=Models.GPEI,
            ...  # Same settings are available as in the above step
        ),
    ]
)

ax_client = AxClient(generation_strategy=gs)

Some additional resources:

Let me know if this helps; I'm happy to elaborate on any of this, also!


P.S. the parallelism setting is rather low for a reason; Bayesian optimization works much better when its more sequential (for a usual hyperparameter optimization problem, with the parallelism of 3, it will converge to the optimal set of parameters, within 25 trials). However, if you are running a lot of trials, then such a low parallelism setting doesn't matter as much. We can help pick the optimal parallelism settings if need be.

@lena-kashtelyan lena-kashtelyan self-assigned this Nov 8, 2019
@lena-kashtelyan lena-kashtelyan added the question Further information is requested label Nov 8, 2019
@newtonle
Copy link

newtonle commented Nov 8, 2019

To add to this question: how is this recommended max parallelism calculated? What exactly are you optimizing for?

@casassg
Copy link
Contributor Author

casassg commented Nov 8, 2019

So my use case is that I need to train models that sometimes require hours to train (working to make sure that I reduce this), but I'm wondering if in this case running more parallel batches would help. Specially I'm worried when having a large hyperparameter search space where the model won't converge in 25 trials.

Also, is there any easy way to check how close to converging the Bayesian Optimization model is? Having access to that would potentially be useful to stop launching more jobs.

I guess I could extract the model and check it somehow, but wondering if you use any specific way to see that convergence.

@lena-kashtelyan
Copy link
Contributor

lena-kashtelyan commented Nov 11, 2019

@newtonle, currently its just fixed at as many as needed for Sobol and at 3 for BayesOpt, since we have found empirically that those values work well.


@casassg, that makes sense. If you have the capacity to increase parallelism and run more trials, then you can certainly do that!

Re: checking convergence –– you could use the optimization trace plot (ax_client.get_optimization_trace, see the Service API tutorial for how to render) to see whether the objective value stopped improving; @bletham, @Balandat, or @eytan might know of another way! We should certainly consider adding an easy way to perform that check to the Service API –– thank you for bringing it up!

Also, just as an FYI –– we are planning some changes to the way the parallelism is handled by the Service API in the coming couple of weeks (among them, making the 'need more data' exception more specific than a ValueError); it shouldn't change the settings I gave in my example, however. Did they work as a solution for your case?

@casassg
Copy link
Contributor Author

casassg commented Nov 18, 2019

I wonder if it would be possible to make this more automatic as you suggest. It would be nice to get some kind of early stopping check.

Would be interested in following progress in on how Service API parallelism can improve. Curious if there's any way to follow the conversation.

@lena-kashtelyan
Copy link
Contributor

@casassg, we will definitely consider adding an early stopping check and see if we can develop a good heuristic for it, thank you for the suggestion!

Re: following the conversation, I will definitely describe the changes in the changelog of the repo, but maybe @kkashin has more ideas on how we could keep you in the loop!

@lena-kashtelyan
Copy link
Contributor

lena-kashtelyan commented Dec 4, 2019

Making this an "enhancement" and a "wishlist" item for now, since early-stopping heuristic will be part of our more long-term plan and the question part of the issue was answered.

@lena-kashtelyan
Copy link
Contributor

lena-kashtelyan commented Aug 28, 2020

It is now possible to increase parallelism level for any generation step, and there is information on it in this section and this section of the Service API tutorial. To reiterate, there are two ways of setting max parallelism for a given generation step:

  • by specifying one's own custom generation strategy as was described in this comment,
  • by specifying "max_parallelism_override" as part of choose_generation_strategy_kwargs argument to create_experiment, as described here.

There is also now a special exception, MaxParallelismReached, indicating exactly what it says : ) So it's hopefully easier to catch / handle it.

A tutorial on generation strategy settings is in the works and coming soon, so stay tuned. We are also working on early-stopping functionality, but since that isn't the main part of this issue, I'm closing it –– feel free to open another to request status updates on convergence indication / early stopping of optimization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request wishlist Long-term wishlist feature requests
Projects
None yet
Development

No branches or pull requests

3 participants