-
Notifications
You must be signed in to change notification settings - Fork 322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[question] Service API low recommended parallelism #199
Comments
Hi, @casassg! Changing it to a more specific exception is actually in the works, should be in within a couple of weeks. The easiest way to set your own parallelism is via setting your own generation strategy, I think, like so:
Some additional resources:
Let me know if this helps; I'm happy to elaborate on any of this, also! P.S. the parallelism setting is rather low for a reason; Bayesian optimization works much better when its more sequential (for a usual hyperparameter optimization problem, with the parallelism of 3, it will converge to the optimal set of parameters, within 25 trials). However, if you are running a lot of trials, then such a low parallelism setting doesn't matter as much. We can help pick the optimal parallelism settings if need be. |
To add to this question: how is this recommended max parallelism calculated? What exactly are you optimizing for? |
So my use case is that I need to train models that sometimes require hours to train (working to make sure that I reduce this), but I'm wondering if in this case running more parallel batches would help. Specially I'm worried when having a large hyperparameter search space where the model won't converge in 25 trials. Also, is there any easy way to check how close to converging the Bayesian Optimization model is? Having access to that would potentially be useful to stop launching more jobs. I guess I could extract the model and check it somehow, but wondering if you use any specific way to see that convergence. |
@newtonle, currently its just fixed at as many as needed for Sobol and at 3 for BayesOpt, since we have found empirically that those values work well. @casassg, that makes sense. If you have the capacity to increase parallelism and run more trials, then you can certainly do that! Re: checking convergence –– you could use the optimization trace plot ( Also, just as an FYI –– we are planning some changes to the way the parallelism is handled by the Service API in the coming couple of weeks (among them, making the 'need more data' exception more specific than a |
I wonder if it would be possible to make this more automatic as you suggest. It would be nice to get some kind of early stopping check. Would be interested in following progress in on how Service API parallelism can improve. Curious if there's any way to follow the conversation. |
@casassg, we will definitely consider adding an early stopping check and see if we can develop a good heuristic for it, thank you for the suggestion! Re: following the conversation, I will definitely describe the changes in the changelog of the repo, but maybe @kkashin has more ideas on how we could keep you in the loop! |
Making this an "enhancement" and a "wishlist" item for now, since early-stopping heuristic will be part of our more long-term plan and the question part of the issue was answered. |
It is now possible to increase parallelism level for any generation step, and there is information on it in this section and this section of the Service API tutorial. To reiterate, there are two ways of setting max parallelism for a given generation step:
There is also now a special exception, A tutorial on generation strategy settings is in the works and coming soon, so stay tuned. We are also working on early-stopping functionality, but since that isn't the main part of this issue, I'm closing it –– feel free to open another to request status updates on convergence indication / early stopping of optimization. |
Service API allows to run parallel trials. However, when reaching the threashold of acceptable number
get_next_trial
throws aValueError
. I was wondering if this could be changed for some more specific exception or returningNone
?I want to avoid using
enforce_sequential_optimization=False
andget_recommended_max_parallelism
is returning me a small parallel setting on default ([(5, 5), (-1, 3)]
).Any tips to either increase this number for parallel evaluations (still sequential batch evaluations)?
The text was updated successfully, but these errors were encountered: