Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use nc6_v2 instead of nc6 #2469

Merged
merged 1 commit into from
Jul 20, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@
"id": "06008690",
"metadata": {},
"source": [
"We also need to give the name of the compute cluster we want to use in AzureML. Later in this notebook, we will create it if it does not already exist:"
"We can optionally provide the name of the compute cluster we want to use in AzureML. Later in this notebook, we will create it if it does not already exist as an example. AzureML can also run on serverless computes if a compute is not explicitly set. "
]
},
{
Expand All @@ -37,7 +37,7 @@
"metadata": {},
"outputs": [],
"source": [
"train_compute_name = \"gpu-cluster-nc6\"\n",
"train_compute_name = \"gpu-cluster-nc6-v3\"\n",
"\n",
"rai_compute_name = \"cpucluster\""
]
Expand Down Expand Up @@ -140,7 +140,21 @@
"source": [
"#### Compute target setup\n",
"\n",
"You will need to provide a [Compute Target](https://docs.microsoft.com/en-us/azure/machine-learning/concept-azure-machine-learning-architecture#computes) that will be used for your AutoML model training. AutoML models for image tasks require [GPU SKUs](https://docs.microsoft.com/en-us/azure/virtual-machines/sizes-gpu) such as the ones from the NC, NCv2, NCv3, ND, NDv2 and NCasT4 series. We recommend using the NCsv3-series (with v100 GPUs) for faster training. Using a compute target with a multi-GPU VM SKU will leverage the multiple GPUs to speed up training. Additionally, setting up a compute target with multiple nodes will allow for faster model training by leveraging parallelism, when tuning hyperparameters for your model.\n"
"There are two ways to submit a job - through a compute or a serverless job.\n",
"\n",
"##### Serverless Job:\n",
"\n",
"In a serverless job, there is no need to create a compute explicitly.\n",
"Simply pass the desired instance type value to the `instance_type` parameter while creating a pipeline job.\n",
"This allows for quick and convenient job submission without the need for managing a compute cluster.\n",
"\n",
"##### Compute Job:\n",
"\n",
"The following code below demonstrates how to create a gpu compute cluster.\n",
imatiach-msft marked this conversation as resolved.
Show resolved Hide resolved
"After creating the compute cluster, pass the name of the compute cluster to the `compute_name` parameter while submitting the pipeline job. This ensures that the job runs on the specified compute cluster, allowing for more control and customization.\n",
"\n",
"You will need to provide a [Compute Target](https://docs.microsoft.com/en-us/azure/machine-learning/concept-azure-machine-learning-architecture#computes) that will be used for your AutoML model training. AutoML models for image tasks require [GPU SKUs](https://docs.microsoft.com/en-us/azure/virtual-machines/sizes-gpu) such as the ones from the NCv3, ND, NDv2 and NCasT4 series. We recommend using the NCsv3-series (with v100 GPUs) for faster training. Using a compute target with a multi-GPU VM SKU will leverage the multiple GPUs to speed up training. Additionally, setting up a compute target with multiple nodes will allow for faster model training by leveraging parallelism, when tuning hyperparameters for your model.\n",
"\n"
]
},
{
Expand All @@ -160,7 +174,7 @@
" train_compute_config = AmlCompute(\n",
" name=train_compute_name,\n",
" type=\"amlcompute\",\n",
" size=\"Standard_NC6\",\n",
" size=\"Standard_NC6s_v3\",\n",
" min_instances=0,\n",
" max_instances=4,\n",
" idle_time_before_scale_down=120,\n",
Expand Down Expand Up @@ -532,6 +546,14 @@
"exp_name = \"dpv2-image-classification-experiment\""
]
},
{
"cell_type": "markdown",
"id": "005a1098",
"metadata": {},
"source": [
"This pipeline uses serverless compute. To use the compute you created above, uncomment the compute parameter line."
]
},
{
"cell_type": "code",
"execution_count": null,
Expand All @@ -541,9 +563,16 @@
"source": [
"# Create the AutoML job with the related factory-function.\n",
"\n",
"import random\n",
"import string\n",
"\n",
"allowed_chars = string.ascii_lowercase + string.digits\n",
"suffix = \"\".join(random.choice(allowed_chars) for x in range(5))\n",
"job_name = \"dpv2-image-classification-job-02\" + suffix\n",
"\n",
"image_classification_job = automl.image_classification(\n",
" compute=train_compute_name,\n",
" # name=\"dpv2-image-classification-job-02\",\n",
" # compute=train_compute_name,\n",
" name=job_name,\n",
" experiment_name=exp_name,\n",
" training_data=my_training_data_input,\n",
" validation_data=my_validation_data_input,\n",
Expand Down Expand Up @@ -798,29 +827,14 @@
"### 4.4 Register model"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "69c6aee4",
"metadata": {},
"outputs": [],
"source": [
"import random\n",
"import string\n",
"\n",
"# Creating a unique model name by including a random suffix\n",
"allowed_chars = string.ascii_lowercase + string.digits\n",
"model_suffix = \"\".join(random.choice(allowed_chars) for x in range(5))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bf1c5a6b",
"metadata": {},
"outputs": [],
"source": [
"model_name = \"ic-mc-rai-fridge-items-model\" + model_suffix\n",
"model_name = \"ic-mc-rai-fridge-items-model\" + suffix\n",
"model = Model(\n",
" path=f\"azureml://jobs/{best_run.info.run_id}/outputs/artifacts/outputs/mlflow-model/\",\n",
" name=model_name,\n",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -649,7 +649,7 @@
" classes=classes,\n",
" model_type=\"pytorch\",\n",
" enable_error_analysis=False,\n",
" num_masks=100,\n",
" num_masks=200,\n",
" mask_res=4,\n",
" )\n",
" rai_image_job.set_limits(timeout=120)\n",
Expand Down
Loading