You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Create startup script that installs RAPIDS and dask-databricks, then runs dask-databricks
Create a MNMG cluster that uses the 14.2 (Scala 2.12, Spark 3.5.0) runtime
Select Use your own Docker container* and enter the image databricksruntime/gpu-tensorflow:cuda11.8 or databricksruntime/gpu-pytorch:cuda11.8.
The container images use CUDA 11.8 and there are no CUDA 12 images available from Databricks.
The single-node instructions don't use a custom container at all, so in theory we should be able to do the same with he multi-node instructions.
In practice if you omit the custom container the init scripts fails. The logs show that NVML can't be found during Dask startup. This makes me think that either the NVIDIA Driver or CUDA toolkit are not installed at the time the init script runs and are installed later.
We should find a way to start up dask-databricks without using a custom container and update the documentation.
The text was updated successfully, but these errors were encountered:
Our current docs for multi-node Databricks cover the following process:
dask-databricks
, then runsdask-databricks
14.2 (Scala 2.12, Spark 3.5.0)
runtimedatabricksruntime/gpu-tensorflow:cuda11.8
ordatabricksruntime/gpu-pytorch:cuda11.8
.The container images use CUDA 11.8 and there are no CUDA 12 images available from Databricks.
The single-node instructions don't use a custom container at all, so in theory we should be able to do the same with he multi-node instructions.
In practice if you omit the custom container the init scripts fails. The logs show that NVML can't be found during Dask startup. This makes me think that either the NVIDIA Driver or CUDA toolkit are not installed at the time the init script runs and are installed later.
We should find a way to start up
dask-databricks
without using a custom container and update the documentation.The text was updated successfully, but these errors were encountered: