-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cuml + optuna HPO example #141
Add cuml + optuna HPO example #141
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Should this notebook even be here? Maybe, because it uses Related to that question, which tags should we use? |
@skirui-source was also looking at a notebook that only used Perhaps you two could sync up and decide which of the two should go here. I'd be happy with at least one and I'm fine with both. But we probably don't need more than that. But if there is a ton of overlap between then two then they could be merged. |
Sounds like a plan. I left a comment on the other (obvious) optuna notebook issues regarding -1/0/+1 for migrating. I'll wait and see which one @skirui-source has tackled. |
I figured out how to address all the problems I mentioned in the top comment 🥳 . I think a lot of it was related to switching from |
View / edit / reply to this conversation on ReviewNB hcho3 commented on 2023-02-17T01:36:52Z Line #1. import pandas as pd Not needed |
View / edit / reply to this conversation on ReviewNB hcho3 commented on 2023-02-17T01:36:53Z The objective function will be the one we optimize in Optuna Study. Should be revised to "We will optimize the objective function using Optuna Study" |
View / edit / reply to this conversation on ReviewNB hcho3 commented on 2023-02-17T01:36:54Z Optuna uses study and trials Revise to "Optuna uses studies and trials" |
View / edit / reply to this conversation on ReviewNB hcho3 commented on 2023-02-17T01:36:55Z # Submit 4 optimization tasks, where each task runs about 40 optimization trials "Submit n_workers optimization tasks..." |
View / edit / reply to this conversation on ReviewNB hcho3 commented on 2023-02-17T01:36:56Z futures = [ c.submit( study.optimize, lambda trial: objective(trial, X, y), n_trials=N_TRIALS // 4, pure=False, ) for _ in range(4) ] Use |
Overall, the notebook looks good to me. I left a few minor comments. I was able to run the notebook end-to-end using the latest stable Rapids Docker container ( |
Thanks for the comments. I implemented them all as is, except for the |
I'm a bit puzzled about the build error. The problem is that there is a Does someone have an idea where to go poking around? |
Hmm I wondered if throwing notebooks through jinja would throw up some problems like this. I guess the options are:
|
Could we skip notebooks in the version templating extension? At least temporarily and then work on improving the extension. Switching away from plotly would be a bit tricky, one of the plots uses parallel coordinates, which is not something that you can (easily) do in matplotlib. And it is a useful plot type for the thing we are trying to show. But then again, maybe we should just remove the visualisation part and instead refer the reader to some optuna docs about how to do that? It would fit with "one idea per example", which I guess in this case is "look, optuna + dask + rapids = it just works!". |
…otebook-cuml-optuna-hpo
The JS injected by plotly causes parsing errors in the templating extension we use to render the docs. For now the plots are removed, could be re-added if the extension is reworked.
The latest commit implements the idea of removing the plotting. If we go ahead with this I'll create an issue to remind us to work on the extension (and add the plotting back). |
With #160 merged, let's put back the visualisation stuff. |
965b197
to
a776a44
Compare
e0baa97
to
b485938
Compare
This is ready for review/merge. In the end I removed the plotting because it doesn't render/appear in the rendered docs. Maybe because we are not including all the right JS libs that the notebook UI includes/loads? Maybe something for a future PR dedicated to making plotlyJS work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This generally looks great thanks!
Could you add a few more tags. The notebook uses libraries including cudf, cuml, and numpy which we need tags for. We probably also need a new dataset tag for BNP Paribas Cardif Claims Management although not sure what the best short slug is for that.
Added more tags. What do you think of |
…tebook-cuml-optuna-hpo
Time to merge this? |
Sorry for the delay in merging here. Let's get this in. |
Closes rapidsai/cloud-ml-examples#227
🚧 This isn't ready to merge yet.I've updated things to the latest version of optuna (3.1.0). This means that we no longer need the
dask_optuna
extension as the dask integration is now part of optuna.I also removed the section on the dashboard. This is because you now need to install a separate package to get the dashboard. It seems like this would add a lot of explaining to an already long notebook for not so much benefit. If there was something particular to point out or do with the dashboard it would be worth it, currently the notebook just pointed out that the dashboard exists. There are visualisation examples in the notebook already.
Right now there are two (and a half) problems:
top
doesn't report any CPU usage andnvidia-smi
also doesn't report anything happening. Waiting for half an hour or so, also doesn't solve the problem. The weird thing is that this morning it "just worked" but a few days ago I had the same problem, and now again. If someone has an idea where to start poking, let me know.DaskStorage()
). Does anyone see a downside to doing this?nvidia-smi
output in the collapsible below. First question: is usingnvidia-smi
a good (approximate tool) to see if more than one GPU is being used? Second question: has it always been like this?Details
Before merging this we need to run the notebook at least once from start to finish. Other steps depend on the answer to the questions.
Copying task list from the issue: