Pmanoj/read model specific defaults (#2442)

* reading the model specific defaults from model card * updating the metric defaults for the tasks * updating the defaults from bool -> string * fixing formatting issues
Azure · Jul 11, 2023 · c9eefae · c9eefae
1 parent 738af5c
commit c9eefae
Show file tree

Hide file tree

Showing 5 changed files with 255 additions and 45 deletions.
diff --git a/sdk/python/foundation-models/system/finetune/question-answering/extractive-qa.ipynb b/sdk/python/foundation-models/system/finetune/question-answering/extractive-qa.ipynb
@@ -92,7 +92,7 @@
     "        workspace_name=\"<WORKSPACE_NAME>\",\n",
     "    )\n",
     "\n",
-    "# the models, fine tuning pipelines and environments are available in the AzureML system registry, \"azureml-preview\"\n",
+    "# the models, fine tuning pipelines and environments are available in the AzureML system registry, \"azureml\"\n",
     "registry_ml_client = MLClient(credential, registry_name=\"azureml\")\n",
     "\n",
     "experiment_name = \"question-answering-extractive-qna\"\n",
@@ -344,6 +344,54 @@
     "Create the job that uses the `question-answering` pipeline component. [Learn more](https://github.com/Azure/azureml-assets/blob/main/training/finetune_acft_hf_nlp/components/pipeline_components/question_answering/README.md) about all the parameters supported for fine tuning."
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Define finetune parameters\n",
+    "\n",
+    "Finetune parameters can be grouped into 2 categories - training parameters, optimization parameters\n",
+    "\n",
+    "Training parameters define the training aspects such as - \n",
+    "1. the optimizer, scheduler to use\n",
+    "2. the metric to optimize the finetune\n",
+    "3. number of training steps and the batch size\n",
+    "and so on\n",
+    "\n",
+    "Optimization parameters help in optimizing the GPU memory and effectively using the compute resources. Below are few of the parameters that belong to this category. _The optimization parameters differs for each model and are packaged with the model to handle these variations._\n",
+    "1. enable the deepspeed, ORT and LoRA\n",
+    "2. enable mixed precision training\n",
+    "2. enable multi-node training "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Training parameters\n",
+    "training_parameters = dict(\n",
+    "    num_train_epochs=3,\n",
+    "    per_device_train_batch_size=1,\n",
+    "    per_device_eval_batch_size=1,\n",
+    "    learning_rate=2e-5,\n",
+    "    metric_for_best_model=\"exact\",\n",
+    ")\n",
+    "print(f\"The following training parameters are enabled - {training_parameters}\")\n",
+    "\n",
+    "# Optimization parameters - As these parameters are packaged with the model itself, lets retrieve those parameters\n",
+    "if \"model_specific_defaults\" in foundation_model.tags:\n",
+    "    optimization_parameters = ast.literal_eval(\n",
+    "        foundation_model.tags[\"model_specific_defaults\"]\n",
+    "    )  # convert string to python dict\n",
+    "else:\n",
+    "    optimization_parameters = dict(\n",
+    "        apply_lora=\"true\", apply_deepspeed=\"true\", apply_ort=\"true\"\n",
+    "    )\n",
+    "print(f\"The following optimizations are enabled - {optimization_parameters}\")"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -400,14 +448,8 @@
     "        answer_text_key=\"text\",\n",
     "        # training settings\n",
     "        number_of_gpu_to_use_finetuning=gpus_per_node,  # set to the number of GPUs available in the compute\n",
-    "        num_train_epochs=3,\n",
-    "        per_device_train_batch_size=1,\n",
-    "        per_device_eval_batch_size=1,\n",
-    "        learning_rate=2e-5,\n",
-    "        metric_for_best_model=\"exact\",\n",
-    "        apply_lora=\"true\",\n",
-    "        apply_deepspeed=\"true\",\n",
-    "        apply_ort=\"true\",\n",
+    "        **training_parameters,\n",
+    "        **optimization_parameters\n",
     "    )\n",
     "    return {\n",
     "        # map the output of the fine tuning job to the output of the pipeline job so that we can easily register the fine tuned model\n",

diff --git a/sdk/python/foundation-models/system/finetune/summarization/news-summary.ipynb b/sdk/python/foundation-models/system/finetune/summarization/news-summary.ipynb
@@ -91,7 +91,7 @@
     "        workspace_name=\"<WORKSPACE_NAME>\",\n",
     "    )\n",
     "\n",
-    "# the models, fine tuning pipelines and environments are available in the AzureML system registry, \"azureml-preview\"\n",
+    "# the models, fine tuning pipelines and environments are available in the AzureML system registry, \"azureml\"\n",
     "registry_ml_client = MLClient(credential, registry_name=\"azureml\")\n",
     "\n",
     "experiment_name = \"summarization-news-summary\"\n",
@@ -349,6 +349,54 @@
     "Create the job that uses the `summarization` pipeline component. [Learn more](https://github.com/Azure/azureml-assets/blob/main/training/finetune_acft_hf_nlp/components/pipeline_components/summarization/README.md) about all the parameters supported for fine tuning."
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Define finetune parameters\n",
+    "\n",
+    "Finetune parameters can be grouped into 2 categories - training parameters, optimization parameters\n",
+    "\n",
+    "Training parameters define the training aspects such as - \n",
+    "1. the optimizer, scheduler to use\n",
+    "2. the metric to optimize the finetune\n",
+    "3. number of training steps and the batch size\n",
+    "and so on\n",
+    "\n",
+    "Optimization parameters help in optimizing the GPU memory and effectively using the compute resources. Below are few of the parameters that belong to this category. _The optimization parameters differs for each model and are packaged with the model to handle these variations._\n",
+    "1. enable the deepspeed, ORT and LoRA\n",
+    "2. enable mixed precision training\n",
+    "2. enable multi-node training "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Training parameters\n",
+    "training_parameters = dict(\n",
+    "    num_train_epochs=3,\n",
+    "    per_device_train_batch_size=1,\n",
+    "    per_device_eval_batch_size=1,\n",
+    "    learning_rate=2e-5,\n",
+    "    metric_for_best_model=\"rouge1\",\n",
+    ")\n",
+    "print(f\"The following training parameters are enabled - {training_parameters}\")\n",
+    "\n",
+    "# Optimization parameters - As these parameters are packaged with the model itself, lets retrieve those parameters\n",
+    "if \"model_specific_defaults\" in foundation_model.tags:\n",
+    "    optimization_parameters = ast.literal_eval(\n",
+    "        foundation_model.tags[\"model_specific_defaults\"]\n",
+    "    )  # convert string to python dict\n",
+    "else:\n",
+    "    optimization_parameters = dict(\n",
+    "        apply_lora=\"true\", apply_deepspeed=\"true\", apply_ort=\"true\"\n",
+    "    )\n",
+    "print(f\"The following optimizations are enabled - {optimization_parameters}\")"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -394,14 +442,8 @@
     "        summary_key=\"highlights\",\n",
     "        # training settings\n",
     "        number_of_gpu_to_use_finetuning=gpus_per_node,  # set to the number of GPUs available in the compute\n",
-    "        num_train_epochs=3,\n",
-    "        per_device_train_batch_size=1,\n",
-    "        per_device_eval_batch_size=1,\n",
-    "        learning_rate=2e-5,\n",
-    "        metric_for_best_model=\"rouge1\",\n",
-    "        apply_deepspeed=\"true\",\n",
-    "        apply_ort=\"true\",\n",
-    "        apply_lora=\"true\",\n",
+    "        **training_parameters,\n",
+    "        **optimization_parameters\n",
     "    )\n",
     "    return {\n",
     "        # map the output of the fine tuning job to the output of the pipeline job so that we can easily register the fine tuned model\n",

diff --git a/sdk/python/foundation-models/system/finetune/text-classification/emotion-detection.ipynb b/sdk/python/foundation-models/system/finetune/text-classification/emotion-detection.ipynb
@@ -92,7 +92,7 @@
     "        workspace_name=\"<WORKSPACE_NAME>\",\n",
     "    )\n",
     "\n",
-    "# the models, fine tuning pipelines and environments are available in the AzureML system registry, \"azureml-preview\"\n",
+    "# the models, fine tuning pipelines and environments are available in the AzureML system registry, \"azureml\"\n",
     "registry_ml_client = MLClient(credential, registry_name=\"azureml\")\n",
     "\n",
     "experiment_name = \"text-classification-emotion-detection\"\n",
@@ -386,6 +386,54 @@
     "Create the job that uses the `text-classification` pipeline component. [Learn more](https://github.com/Azure/azureml-assets/blob/main/training/finetune_acft_hf_nlp/components/pipeline_components/text_classification/README.md) about all the parameters supported for fine tuning."
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Define finetune parameters\n",
+    "\n",
+    "Finetune parameters can be grouped into 2 categories - training parameters, optimization parameters\n",
+    "\n",
+    "Training parameters define the training aspects such as - \n",
+    "1. the optimizer, scheduler to use\n",
+    "2. the metric to optimize the finetune\n",
+    "3. number of training steps and the batch size\n",
+    "and so on\n",
+    "\n",
+    "Optimization parameters help in optimizing the GPU memory and effectively using the compute resources. Below are few of the parameters that belong to this category. _The optimization parameters differs for each model and are packaged with the model to handle these variations._\n",
+    "1. enable the deepspeed, ORT and LoRA\n",
+    "2. enable mixed precision training\n",
+    "2. enable multi-node training "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Training parameters\n",
+    "training_parameters = dict(\n",
+    "    num_train_epochs=3,\n",
+    "    per_device_train_batch_size=1,\n",
+    "    per_device_eval_batch_size=1,\n",
+    "    learning_rate=2e-5,\n",
+    "    metric_for_best_model=\"f1_macro\",\n",
+    ")\n",
+    "print(f\"The following training parameters are enabled - {training_parameters}\")\n",
+    "\n",
+    "# Optimization parameters - As these parameters are packaged with the model itself, lets retrieve those parameters\n",
+    "if \"model_specific_defaults\" in foundation_model.tags:\n",
+    "    optimization_parameters = ast.literal_eval(\n",
+    "        foundation_model.tags[\"model_specific_defaults\"]\n",
+    "    )  # convert string to python dict\n",
+    "else:\n",
+    "    optimization_parameters = dict(\n",
+    "        apply_lora=\"true\", apply_deepspeed=\"true\", apply_ort=\"true\"\n",
+    "    )\n",
+    "print(f\"The following optimizations are enabled - {optimization_parameters}\")"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -431,14 +479,8 @@
     "        label_key=\"label_string\",\n",
     "        # Training settings\n",
     "        number_of_gpu_to_use_finetuning=gpus_per_node,  # set to the number of GPUs available in the compute\n",
-    "        num_train_epochs=3,\n",
-    "        per_device_train_batch_size=1,\n",
-    "        per_device_eval_batch_size=1,\n",
-    "        learning_rate=2e-5,\n",
-    "        metric_for_best_model=\"f1_macro\",\n",
-    "        apply_deepspeed=\"true\",\n",
-    "        apply_lora=\"true\",\n",
-    "        apply_ort=\"true\",\n",
+    "        **training_parameters,\n",
+    "        **optimization_parameters\n",
     "    )\n",
     "    return {\n",
     "        # map the output of the fine tuning job to the output of pipeline job so that we can easily register the fine tuned model\n",

diff --git a/sdk/python/foundation-models/system/finetune/token-classification/token-classification.ipynb b/sdk/python/foundation-models/system/finetune/token-classification/token-classification.ipynb
@@ -92,7 +92,7 @@
     "        workspace_name=\"<WORKSPACE_NAME>\",\n",
     "    )\n",
     "\n",
-    "# the models, fine tuning pipelines and environments are available in the AzureML system registry, \"azureml-preview\"\n",
+    "# the models, fine tuning pipelines and environments are available in the AzureML system registry, \"azureml\"\n",
     "registry_ml_client = MLClient(credential, registry_name=\"azureml\")\n",
     "\n",
     "experiment_name = \"token-classification-ner\"\n",
@@ -352,6 +352,54 @@
     "Create the job that uses the `token-classification` pipeline component. [Learn more](https://github.com/Azure/azureml-assets/blob/main/training/finetune_acft_hf_nlp/components/pipeline_components/token_classification/README.md) about all the parameters supported for fine tuning."
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Define finetune parameters\n",
+    "\n",
+    "Finetune parameters can be grouped into 2 categories - training parameters, optimization parameters\n",
+    "\n",
+    "Training parameters define the training aspects such as - \n",
+    "1. the optimizer, scheduler to use\n",
+    "2. the metric to optimize the finetune\n",
+    "3. number of training steps and the batch size\n",
+    "and so on\n",
+    "\n",
+    "Optimization parameters help in optimizing the GPU memory and effectively using the compute resources. Below are few of the parameters that belong to this category. _The optimization parameters differs for each model and are packaged with the model to handle these variations._\n",
+    "1. enable the deepspeed, ORT and LoRA\n",
+    "2. enable mixed precision training\n",
+    "2. enable multi-node training "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Training parameters\n",
+    "training_parameters = dict(\n",
+    "    num_train_epochs=3,\n",
+    "    per_device_train_batch_size=1,\n",
+    "    per_device_eval_batch_size=1,\n",
+    "    learning_rate=2e-5,\n",
+    "    metric_for_best_model=\"f1\",\n",
+    ")\n",
+    "print(f\"The following training parameters are enabled - {training_parameters}\")\n",
+    "\n",
+    "# Optimization parameters - As these parameters are packaged with the model itself, lets retrieve those parameters\n",
+    "if \"model_specific_defaults\" in foundation_model.tags:\n",
+    "    optimization_parameters = ast.literal_eval(\n",
+    "        foundation_model.tags[\"model_specific_defaults\"]\n",
+    "    )  # convert string to python dict\n",
+    "else:\n",
+    "    optimization_parameters = dict(\n",
+    "        apply_lora=\"true\", apply_deepspeed=\"true\", apply_ort=\"true\"\n",
+    "    )\n",
+    "print(f\"The following optimizations are enabled - {optimization_parameters}\")"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -397,14 +445,8 @@
     "        tag_key=\"ner_tags_str\",\n",
     "        # Training settings\n",
     "        number_of_gpu_to_use_finetuning=gpus_per_node,  # set to the number of GPUs available in the compute\n",
-    "        num_train_epochs=3,\n",
-    "        per_device_train_batch_size=1,\n",
-    "        per_device_eval_batch_size=1,\n",
-    "        learning_rate=2e-5,\n",
-    "        metric_for_best_model=\"f1\",\n",
-    "        apply_lora=\"true\",\n",
-    "        apply_ort=\"true\",\n",
-    "        apply_deepspeed=\"true\",\n",
+    "        **training_parameters,\n",
+    "        **optimization_parameters\n",
     "    )\n",
     "    return {\n",
     "        # map the output of the fine tuning job to the output of pipeline job so that we can easily register the fine tuned model\n",