These are a few examples to get started on Azure. We'll look at how to set up the environment locally and on Azure to run the notebooks provided.
Sections in README
- Create an Azure Machine Learning Service Workspace
- RAPIDS MNMG example using dask-clouprovider
- RAPIDS Hyperparameter Optimization on AzureML
- Model Intepretability using GPU SHAP on Azure
- RAPIDS MNMG with Azure Kubernetes Service (AKS) using Dask Kubernetes
An Azure Machine Learning service workspace will manage experiments and coordinate storage, databases and computing resources for machine learning applications.
-
First create an Azure subscription or access existing information from the Azure portal.
-
Next you will need to access a Resource group or create a new one in Azure portal:
- Sign in to the Azure portal and navigate to Resource groups page by clicking on Resource groups in the portal:
- Select one of the available Resource groups or create a new one by clicking on the Add button:
- You can also select + Create a resource in the upper-left corner of Azure portal and search for Resource group
Select a a Subscription with GPU resources, enter a name for the Resource group and select a Region with GPU resources. Check these pages for the List of supported regions and information on GPU optimized VM sizes. Pick a region that is closest to your location or contains your data.
- Next we will create a Machine Learning service workspace: navigate to your Resource groups page and click on the Add button, this will take you to the Azure Marketplace. Use the search bar to find Machine Learning or select AI + Machine Learning category on the left:
- Click on Machine Learning and this will direct you to the page below:
- Enter a unique Workspace Name that indentifies your workspace, select your Azure Subscription, use an existing Resource group in your subscription and select a Location with adequate GPU quota.
After entering the information, select Review + Create. The deployment success message will appear and and you can view the new workspace by clicking on Go to resource.
- After creating the workspace, download the config.json file that includes information about workspace configuration.
This file will be used with Azure Machine Learning SDK for Python in the notebook example to load the workspace and contains a dictionary list with key-values for:
- Workspace name
- Azure region
- Subscription id
- Resource group
The Azure MNMG notebooks will use Dask Cloud Provider to run multi-node multi-GPU examples on Azure. For each example, we will make use of AzureVMCluster function to set-up a cluster and run an example. We have two example notebooks:
- Random Forest using Dask CloudProvider
- XGBoost using Dask CloudProvider. This notebook additionally demonstrates how to speed up deployment using custom VM images via
packer
.
We recommend using RAPIDS docker image on your local system and using the same image in the notebook so that the libraries can match accurately. You can achieve this using conda environments for RAPIDS too.
For example, in the Random Forest Notebook, we are using rapidsai/rapidsai:21.06-cuda11.0-runtime-ubuntu18.04-py3.8
docker image, to pull and run this use the following command. The -v
flag sets the volume you'd like to mount on the docker container. This way, the changes you make within the docker container are present on your local system to. Make sure to change local/path
to the path which contains this repository.
docker run --runtime nvidia --rm -it -p 8888:8888 -p 8787:8787 -v /local/path:/docker/path rapidsai/rapidsai:21.06-cuda11.0-runtime-ubuntu18.04-py3.8
For the XGBoost notebook, we are using the image rapidsai/rapidsai:cuda11.2-runtime-ubuntu18.04-py3.8
.
We need to setup a Virtual Network and Security Group to run this example. You can use either the command line or the Azure Portal to set these up.
Below, we'll be looking at how you can use command line to set it up. These commands need to be executed within the docker container.
Note: Be sure to set up all the resources in the same region
-
To setup the azure authentication, run
az login
-
You can make use of the resoruce group you've set up earlier.
-
To create a virtual network -
az network vnet create -g <resource group name> --location <location -n <vnet name> --address-prefix 10.0.0.0/16 --subnet-name <subnet name> --subnet-prefix 10.0.0.0/24
-
We can now set up the Security group and add a rule for the dask cloud provider run.
az network nsg create -g <resource group name> --name <security group name> --location <region>
az network nsg rule create -g <resource group name> --nsg-name <security group name> -n MyNsgRuleWithAsg \
--priority 500 --source-address-prefixes Internet --destination-port-ranges 8786 8787 \
--destination-address-prefixes '*' --access Allow --protocol Tcp --description "Allow Internet to Dask on ports 8786,8787."
For more details, visit Microsoft Azure - dask cloud provider
- Once you have set up the resources, start a jupyter notebook on the docker container using the following command
jupyter notebook --ip 0.0.0.0 --port 8888 --no-browser --allow-root --NotebookApp.token=''
-
Navigate to
Azure-MNMG-RF.ipynb
orAzure-MNMG-XGBoost.ipynb
underazure/notebooks
. -
Update the notebook with the names of resources appropriately and run it.
This example will walk you through how to launch RAPIDS-accelerated hyperparameter optimization jobs on Microsoft Azure ML. Azure ML will train and evaluate models with many different variations of key parameters in order to find the combination that yields the highest accuracy. You'll start by launching a Jupyter notebook locally, which will launch all of the jobs and walk you through the process in more detail.
Install the Azure Machine Learning Python SDK (if you are running in your own environment. SDK is already installed in Azure Notebooks or other Microsoft managed environments), this link includes additional instructions to setup environment on your local computer.
After setting up a conda environment, clone the clould-ml-examples repository by running the following command in a local_directory
:
git clone https://github.com/rapidsai/cloud-ml-examples.git
Navigate to the azure/notebooks subdirectory. This will include hyperparameter optimizaiton notebooks: HPO-RAPIDS.ipynb and HPO-SKLearn.ipynb. Copy the config.json file (that you downloaded after creating a ML workspace) in the directory that contains these notebooks (azure/notebooks). You will load the information from this file in the Initialize workspace
step of the notebook.
Activate the conda environment, where the Azure ML SDK was installed and launch the Jupyter Notebook server with the following command:
jupyter notebook
Open your web browser, navigate to http://localhost:8888/ and access HPO-RAPIDS.ipynb
from your local machine. Follow the steps in the notebook for hyperparameter tuning with RAPIDS on GPUs.
- Follow steps in 1 to set up a Resource Group and Machine Learning workspace.
- Follow steps in 2(a) to set up the local environment using docker.
- In the docker container, Clone the cloud-ml-examples repository:
git clone https://github.com/rapidsai/cloud-ml-examples.git
- Navigate to the
azure/notebooks/remote-explanation
and open upazure-gpu-shap.ipynb
- The necessary packages needed are present in the notebook, uncomment and run the appropriate cell.
For detailed instructions of setup and example notebooks to run RAPIDS with Azure Kubernetes Service using Dask Kubernetes, navigate to the kubernetes
subdirectory.
- Detailed instructions to set up RAPIDS with AKS using Dask Kubernetes is in the markdown file Detailed_setup_guide.md . Go through this before you try to run any other notebooks.
- Shorter example notebook using Dask + RAPIDS + XGBoost in MNMG_XGBoost.ipynb
- Full example with performance sweeps over multiple algorithms and larger dataset in Dask_cuML_Exploration_Full.ipynb