Add cuml + optuna HPO example #141

betatim · 2023-02-08T15:05:12Z

~~🚧 This isn't ready to merge yet.~~

I've updated things to the latest version of optuna (3.1.0). This means that we no longer need the dask_optuna extension as the dask integration is now part of optuna.

I also removed the section on the dashboard. This is because you now need to install a separate package to get the dashboard. It seems like this would add a lot of explaining to an already long notebook for not so much benefit. If there was something particular to point out or do with the dashboard it would be worth it, currently the notebook just pointed out that the dashboard exists. There are visualisation examples in the notebook already.

Right now there are two (and a half) problems:

sometimes the notebook just "hangs" when you run the HPO loop. As far as I can tell nothing is happening when this state is reached. top doesn't report any CPU usage and nvidia-smi also doesn't report anything happening. Waiting for half an hour or so, also doesn't solve the problem. The weird thing is that this morning it "just worked" but a few days ago I had the same problem, and now again. If someone has an idea where to start poking, let me know.
you need to remove the SQLite DB used as storage each time you run the notebook. I assume this is a bug in the dask storage extension(?) that somehow leaves the DB in a state where it can't be used again?? Or maybe we'd have to use a new study name? We can solve this by using an in memory storage (don't provide a path to DaskStorage()). Does anyone see a downside to doing this?
when the notebook does run (and doesn't get stuck), it doesn't seem like it is using more than one GPU. nvidia-smi output in the collapsible below. First question: is using nvidia-smi a good (approximate tool) to see if more than one GPU is being used? Second question: has it always been like this?

Details

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.29.05    Driver Version: 495.29.05    CUDA Version: 11.5     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000000:3B:00.0 Off |                    0 |
| N/A   55C    P0    44W /  70W |   3459MiB / 15109MiB |     80%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla T4            On   | 00000000:5E:00.0 Off |                    0 |
| N/A   53C    P0    29W /  70W |    100MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  Tesla T4            On   | 00000000:AF:00.0 Off |                    0 |
| N/A   46C    P0    27W /  70W |    100MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  Tesla T4            On   | 00000000:D8:00.0 Off |                    0 |
| N/A   44C    P0    28W /  70W |    100MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A    925320      C   ...-deploy-optuna/bin/python     3359MiB |
|    0   N/A  N/A    925478      C   ...-deploy-optuna/bin/python       97MiB |
|    1   N/A  N/A    925486      C   ...-deploy-optuna/bin/python       97MiB |
|    2   N/A  N/A    925481      C   ...-deploy-optuna/bin/python       97MiB |
|    3   N/A  N/A    925488      C   ...-deploy-optuna/bin/python       97MiB |
+-----------------------------------------------------------------------------+

Before merging this we need to run the notebook at least once from start to finish. Other steps depend on the answer to the questions.

Copying task list from the issue:

Test if notebook works
Fix up anything that needs changing
Ensure notebook has good title, description and metadata tags in the first cell
Replace deployment instructions with links to docs pages
Copy notebook into a new folder into deployment docs examples
Copy any supporting files to the folder
Add notebook to examples toctree
Make PR to deployment docs repo

review-notebook-app · 2023-02-08T15:05:22Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

betatim · 2023-02-08T16:40:44Z

Should this notebook even be here? Maybe, because it uses LocalCUDACluster, but maybe the RAPIDS examples are a better home?

Related to that question, which tags should we use?

jacobtomlinson · 2023-02-09T10:38:23Z

@skirui-source was also looking at a notebook that only used LocalCUDACluster. I think it would be good to have an example or two that use it, but I don't want tons and tons.

Perhaps you two could sync up and decide which of the two should go here. I'd be happy with at least one and I'm fine with both. But we probably don't need more than that. But if there is a ton of overlap between then two then they could be merged.

betatim · 2023-02-09T12:47:34Z

Sounds like a plan. I left a comment on the other (obvious) optuna notebook issues regarding -1/0/+1 for migrating. I'll wait and see which one @skirui-source has tackled.

betatim · 2023-02-16T11:33:24Z

I figured out how to address all the problems I mentioned in the top comment 🥳 . I think a lot of it was related to switching from dask_optuna to the dask extension built in to optuna in the newer version.

review-notebook-app · 2023-02-17T01:36:53Z

View / edit / reply to this conversation on ReviewNB

hcho3 commented on 2023-02-17T01:36:52Z
----------------------------------------------------------------

Line #1.    import pandas as pd

Not needed

review-notebook-app · 2023-02-17T01:36:54Z

View / edit / reply to this conversation on ReviewNB

hcho3 commented on 2023-02-17T01:36:53Z
----------------------------------------------------------------

The objective function will be the one we optimize in Optuna Study.

Should be revised to "We will optimize the objective function using Optuna Study"

review-notebook-app · 2023-02-17T01:36:54Z

View / edit / reply to this conversation on ReviewNB

hcho3 commented on 2023-02-17T01:36:54Z
----------------------------------------------------------------

Optuna uses study and trials

Revise to "Optuna uses studies and trials"

review-notebook-app · 2023-02-17T01:36:55Z

View / edit / reply to this conversation on ReviewNB

hcho3 commented on 2023-02-17T01:36:55Z
----------------------------------------------------------------

# Submit 4 optimization tasks, where each task runs about 40 optimization trials

"Submit n_workers optimization tasks..."

review-notebook-app · 2023-02-17T01:36:56Z

View / edit / reply to this conversation on ReviewNB

hcho3 commented on 2023-02-17T01:36:56Z
----------------------------------------------------------------

futures = [
    c.submit(
        study.optimize,
        lambda trial: objective(trial, X, y),
        n_trials=N_TRIALS // 4,
        pure=False,
    )
    for _ in range(4)
]

Use n_workers instead of hard-coding 4.

hcho3 · 2023-02-17T01:38:15Z

Overall, the notebook looks good to me. I left a few minor comments.

I was able to run the notebook end-to-end using the latest stable Rapids Docker container (nvcr.io/nvidia/rapidsai/rapidsai-core:23.02-cuda11.8-runtime-ubuntu22.04-py3.10).

betatim · 2023-02-17T10:42:36Z

Thanks for the comments. I implemented them all as is, except for the import pandas one: there was an additional unused import I also removed :D

betatim · 2023-02-17T11:05:24Z

I'm a bit puzzled about the build error. The problem is that there is a \\ in the notebook that makes the version templating extension unhappy. The problem is around byte 3755442. If you print that part of the file there is indeed a \\ there, but there is also a \\ about 70 characters before. The offending bit is "var x = new MutationObserver(function (mutations, observer) {{\\n",\n which is part of the plotly(?) JS that gets injected. But it looks totally legit to have a \\n there. So yeah, I'm a bit stumped.

Does someone have an idea where to go poking around?

jacobtomlinson · 2023-02-22T14:40:41Z

Hmm I wondered if throwing notebooks through jinja would throw up some problems like this.

I guess the options are:

Remove the offending plotly section
Rework the extension to handle this case

betatim · 2023-02-22T14:55:09Z

Could we skip notebooks in the version templating extension? At least temporarily and then work on improving the extension.

Switching away from plotly would be a bit tricky, one of the plots uses parallel coordinates, which is not something that you can (easily) do in matplotlib. And it is a useful plot type for the thing we are trying to show. But then again, maybe we should just remove the visualisation part and instead refer the reader to some optuna docs about how to do that? It would fit with "one idea per example", which I guess in this case is "look, optuna + dask + rapids = it just works!".

…otebook-cuml-optuna-hpo

The JS injected by plotly causes parsing errors in the templating extension we use to render the docs. For now the plots are removed, could be re-added if the extension is reworked.

betatim · 2023-02-22T15:28:34Z

The latest commit implements the idea of removing the plotting. If we go ahead with this I'll create an issue to remind us to work on the extension (and add the plotting back).

betatim · 2023-02-22T19:09:00Z

With #160 merged, let's put back the visualisation stuff.

betatim · 2023-02-28T10:28:47Z

This is ready for review/merge. In the end I removed the plotting because it doesn't render/appear in the rendered docs. Maybe because we are not including all the right JS libs that the notebook UI includes/loads? Maybe something for a future PR dedicated to making plotlyJS work.

jacobtomlinson

This generally looks great thanks!

Could you add a few more tags. The notebook uses libraries including cudf, cuml, and numpy which we need tags for. We probably also need a new dataset tag for BNP Paribas Cardif Claims Management although not sure what the best short slug is for that.

betatim · 2023-03-01T16:01:40Z

Added more tags. What do you think of dataset/bnp-claims? (I've added that for now)

…tebook-cuml-optuna-hpo

betatim · 2023-03-09T09:20:31Z

Time to merge this?

jacobtomlinson · 2023-03-30T16:01:53Z

Sorry for the delay in merging here. Let's get this in.

Add cuml + optuna HPO example

a36f21a

jacobtomlinson marked this pull request as draft February 8, 2023 16:38

hcho3 self-assigned this Feb 8, 2023

Update job submission

f8cbb5e

betatim marked this pull request as ready for review February 16, 2023 11:32

Improve wording and reduce hard-coding

3a61140

betatim added 3 commits February 17, 2023 10:49

Add cuml + optuna HPO example

b3d05e2

Update job submission

b3014f4

Improve wording and reduce hard-coding

53ff402

betatim added 2 commits February 22, 2023 15:22

Merge remote-tracking branch 'origin/notebook-cuml-optuna-hpo' into n…

a776a44

…otebook-cuml-optuna-hpo

Remove plotting

965b197

The JS injected by plotly causes parsing errors in the templating extension we use to render the docs. For now the plots are removed, could be re-added if the extension is reworked.

jacobtomlinson mentioned this pull request Feb 22, 2023

Fix template rendering on pages with bad jinja2 characters #160

Merged

betatim force-pushed the notebook-cuml-optuna-hpo branch from 965b197 to a776a44 Compare February 28, 2023 09:54

Merge branch 'main' into notebook-cuml-optuna-hpo

b485938

betatim force-pushed the notebook-cuml-optuna-hpo branch from e0baa97 to b485938 Compare February 28, 2023 10:10

jacobtomlinson reviewed Mar 1, 2023

View reviewed changes

Add additional tags for libraries and dataset used

c777e42

Merge branch 'main' of https://github.com/rapidsai/deployment into no…

d1e50a2

…tebook-cuml-optuna-hpo

jacobtomlinson merged commit eed6b0d into rapidsai:main Mar 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add cuml + optuna HPO example #141

Add cuml + optuna HPO example #141

betatim commented Feb 8, 2023 •

edited

Loading

review-notebook-app bot commented Feb 8, 2023

betatim commented Feb 8, 2023 •

edited

Loading

jacobtomlinson commented Feb 9, 2023

betatim commented Feb 9, 2023

betatim commented Feb 16, 2023

review-notebook-app bot commented Feb 17, 2023 •

edited

Loading

review-notebook-app bot commented Feb 17, 2023 •

edited

Loading

review-notebook-app bot commented Feb 17, 2023 •

edited

Loading

review-notebook-app bot commented Feb 17, 2023 •

edited

Loading

review-notebook-app bot commented Feb 17, 2023 •

edited

Loading

hcho3 commented Feb 17, 2023 •

edited

Loading

betatim commented Feb 17, 2023

betatim commented Feb 17, 2023

jacobtomlinson commented Feb 22, 2023

betatim commented Feb 22, 2023

betatim commented Feb 22, 2023

betatim commented Feb 22, 2023

betatim commented Feb 28, 2023

jacobtomlinson left a comment

betatim commented Mar 1, 2023

betatim commented Mar 9, 2023

jacobtomlinson commented Mar 30, 2023

Add cuml + optuna HPO example #141

Add cuml + optuna HPO example #141

Conversation

betatim commented Feb 8, 2023 • edited Loading

review-notebook-app bot commented Feb 8, 2023

betatim commented Feb 8, 2023 • edited Loading

jacobtomlinson commented Feb 9, 2023

betatim commented Feb 9, 2023

betatim commented Feb 16, 2023

review-notebook-app bot commented Feb 17, 2023 • edited Loading

review-notebook-app bot commented Feb 17, 2023 • edited Loading

review-notebook-app bot commented Feb 17, 2023 • edited Loading

review-notebook-app bot commented Feb 17, 2023 • edited Loading

review-notebook-app bot commented Feb 17, 2023 • edited Loading

hcho3 commented Feb 17, 2023 • edited Loading

betatim commented Feb 17, 2023

betatim commented Feb 17, 2023

jacobtomlinson commented Feb 22, 2023

betatim commented Feb 22, 2023

betatim commented Feb 22, 2023

betatim commented Feb 22, 2023

betatim commented Feb 28, 2023

jacobtomlinson left a comment

Choose a reason for hiding this comment

betatim commented Mar 1, 2023

betatim commented Mar 9, 2023

jacobtomlinson commented Mar 30, 2023

betatim commented Feb 8, 2023 •

edited

Loading

betatim commented Feb 8, 2023 •

edited

Loading

review-notebook-app bot commented Feb 17, 2023 •

edited

Loading

review-notebook-app bot commented Feb 17, 2023 •

edited

Loading

review-notebook-app bot commented Feb 17, 2023 •

edited

Loading

review-notebook-app bot commented Feb 17, 2023 •

edited

Loading

review-notebook-app bot commented Feb 17, 2023 •

edited

Loading

hcho3 commented Feb 17, 2023 •

edited

Loading