Azure · vlbejan · Jul 26, 2023 · Jul 25, 2023 · Jul 25, 2023 · Jul 26, 2023
diff --git a/...imeseries-in-pipeline/automl-forecasting-demand-hierarchical-timeseries-in-pipeline.ipynb b/...imeseries-in-pipeline/automl-forecasting-demand-hierarchical-timeseries-in-pipeline.ipynb
@@ -14,7 +14,14 @@
    "metadata": {},
    "source": [
     "# Automated Machine Learning\n",
-    "**Demand Forecasting Using HTS**\n",
+    "\n",
+    "\n",
+    "## Demand Forecasting Using HTS (preview)\n",
+    "\n",
+    "> [!IMPORTANT]\n",
+    "> Items marked (preview) in this article are currently in public preview.\n",
+    "> The preview version is provided without a service level agreement, and it's not recommended for production workloads. Certain features might not be supported or might have constrained capabilities.\n",
+    "> For more information, see [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/).\n",
     "\n",
     "## Contents\n",
     "1. [Introduction](#Introduction)\n",
@@ -33,7 +40,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## <pre> 1. Introduction <a id=\"Introduction\"></pre>\n",
+    "## 1. Introduction <a id=\"Introduction\">\n",
     "\n",
     "The objective of this notebook is to illustrate how to use the component-based AutoML hierarchical time series solution for demand forecasting tasks. It walks you through all stages of model evaluation and production process starting with data ingestion and concluding with batch endpoint deployment for production. Please see the following [link](placeholder) for a detailed description of the hierarchical time series modeling.\n",
     "\n",
@@ -48,7 +55,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## <pre> 2. Setup <a id=\"Setup\"></pre>"
+    "## 2. Setup <a id=\"Setup\">"
    ]
   },
   {
@@ -159,7 +166,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## <pre> 3. Compute <a id=\"Compute\"></pre>\n",
+    "## 3. Compute   <a id=\"Compute\">\n",
     "\n",
     "#### Create or Attach existing AmlCompute\n",
     "\n",
@@ -205,9 +212,22 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## <pre> 4. Data <a id=\"Data\"></pre>\n",
+    "## 4. Data   <a id=\"Data\">\n",
+    "\n",
+    "For illustration purposes we use the UCI electricity data ([link](https://archive.ics.uci.edu/ml/datasets/ElectricityLoadDiagrams20112014#)). The original dataset contains electricity consumption data for 370 consumers measured at 15 minute intervals. In the data set for this demonstrations, we have aggregated to an hourly frequency and convereted to the kilowatt hours (kWh) for 10 customers. Each customer is assigned to one of the two groups as denoted by the entries in the `group_id` column. \n",
     "\n",
-    "For illustration purposes we use the UCI electricity data ([link](https://archive.ics.uci.edu/ml/datasets/ElectricityLoadDiagrams20112014#)). The original dataset contains electricity consumption data for 370 consumers measured at 15 minute intervals. In the data set for this demonstrations, we have aggregated to an hourly frequency and convereted to the kilowatt hours (kWh) for 10 customers. Each customer is assigned to one of the two groups as denoted by the entries in the `group_id` column. The following cells read and print the first few rows of the training data as well as print the number of unique time series in the dataset."
+    "The data for this notebook is located in the `automl-sample-notebook-data` container in the datastore and is publicly available. In the next few cells, we will download the train, test and inference datasets from the public datastore and store them locally."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "train_data_path = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/uci-demand-pipeline-data-hts/train/uci_electro_small_hts_train.parquet\"\n",
+    "test_data_path = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/uci-demand-pipeline-data-hts/test/uci_electro_small_hts_test.parquet\"\n",
+    "inference_data_path = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/uci-demand-pipeline-data-hts/inference/uci_electro_small_hts_inference.parquet\""
    ]
   },
   {
@@ -221,16 +241,41 @@
     "hierarchy_column_names = [\"group_id\", \"customer_id\"]"
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def create_folder_and_save_as_parquet(file_uri, output_folder):\n",
+    "    os.makedirs(output_folder, exist_ok=True)\n",
+    "    data_frame = pd.read_parquet(file_uri)\n",
+    "    file_name = os.path.split(file_uri)[-1]\n",
+    "    data_path = os.path.join(output_folder, file_name)\n",
+    "    data_frame.to_parquet(data_path, index=False)\n",
+    "    return None\n",
+    "\n",
+    "\n",
+    "create_folder_and_save_as_parquet(train_data_path, \"./data/train\")\n",
+    "create_folder_and_save_as_parquet(test_data_path, \"./data/test\")\n",
+    "create_folder_and_save_as_parquet(inference_data_path, \"./data/inference\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The following cells read and print the first few rows of the training data as well as the number of unique time series in the dataset."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
     "dataset_type = \"train\"\n",
-    "df = pd.read_parquet(\n",
-    "    f\"./data/{dataset_type}/uci_electro_small_mm_{dataset_type}.parquet\"\n",
-    ")\n",
+    "df = pd.read_parquet(f\"./data/{dataset_type}\")\n",
     "df.head(3)"
    ]
   },
@@ -257,7 +302,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## <pre> 5. Import Components From Registry <a id=\"ImportComponents\"></pre>\n",
+    "## 5. Import Components From Registry   <a id=\"ImportComponents\">\n",
     "\n",
     "An Azure Machine Learning component is a self-contained piece of code that does one step in a machine learning pipeline. A component is analogous to a function - it has a name, inputs, outputs, and a body. Components are the building blocks of the Azure Machine Learning pipelines. It's a good engineering practice to build a machine learning pipeline where each step has well-defined inputs and outputs. In Azure Machine Learning, a component represents one reusable step in a pipeline. Components are designed to help improve the productivity of pipeline building. Specifically, components offer:\n",
     "- Well-defined interface: Components require a well-defined interface (input and output). The interface allows the user to build steps and connect steps easily. The interface also hides the complex logic of a step and removes the burden of understanding how the step is implemented.\n",
@@ -413,7 +458,7 @@
     "tags": []
    },
    "source": [
-    "## <pre> 6. Create a Pipeline <a id=\"CreatePipeline\"></pre>\n",
+    "##   6. Create a Pipeline   <a id=\"CreatePipeline\">\n",
     "\n",
     "Now that we imported the components we will build an evaluation pipeline. This pipeline will allow us to partition the data, train best models for each partition, genererate rolling forecasts on the test set, and, finally, calculate metrics on the test set output."
    ]
@@ -635,7 +680,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## <pre> 7. Kick Off Pipeline Runs <a id=\"PipelineRuns\"></pre>\n",
+    "## 7. Kick Off Pipeline Runs   <a id=\"PipelineRuns\">\n",
     "\n",
     "Now that the pipeline is defined, we will use it to kick off several runs. First, we will kick off an experiment which will train, inference and evaluate the performance for the best AutoML model for each `hierarchy_training_level`. Next, we will kick off the same pipeline which will only use the naive model for the same training level of the hierarchy. This will allow us to establish a baseline and compare performance results."
    ]
@@ -810,7 +855,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## <pre> 8. Download Pipeline Output <a id=\"DownloadOutput\"></pre>\n",
+    "## 8. Download Pipeline Output   <a id=\"DownloadOutput\">\n",
     "Next, we will download the output files generated by the compute metrics components for each executed pipeline and save them in the corresponfing subfolder of the `output` folder. First, we create corresponding output directories. Then, we execute the `ml_client.jobs.download` command which downloads experiments' outputs."
    ]
   },
@@ -878,7 +923,7 @@
     }
    },
    "source": [
-    "## <pre> 9. Compare Evaluation Results <a id=\"CompareResults\"></pre>\n",
+    "## 9. Compare Evaluation Results   <a id=\"CompareResults\">\n",
     "\n",
     "### 9.1. Examine Metrics\n",
     "\n",
@@ -1090,7 +1135,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## <pre> 10. Deployment <a id=\"Deployment\"></pre>\n",
+    "## 10. Deployment   <a id=\"Deployment\">\n",
     "\n",
     "In this section, we will illustrate how to deploy and inference models using batch endpoint. Batch endpoints are endpoints that are used to do batch inferencing on large volumes of data in asynchronous way. Batch endpoints receive pointers to data and run jobs asynchronously to process the data in parallel on compute clusters and store outputs to a datastore for further analysis. For more information on batch endpoints see this [link](https://learn.microsoft.com/en-us/azure/machine-learning/concept-endpoints-batch?view=azureml-api-2).\n",
     "\n",

diff --git a/...hierarchical-timeseries-in-pipeline/data/inference/uci_electro_small_mm_inference.parquet b/...hierarchical-timeseries-in-pipeline/data/inference/uci_electro_small_mm_inference.parquet
diff --git a/...ng-demand-hierarchical-timeseries-in-pipeline/data/test/uci_electro_small_mm_test.parquet b/...ng-demand-hierarchical-timeseries-in-pipeline/data/test/uci_electro_small_mm_test.parquet
diff --git a/...-demand-hierarchical-timeseries-in-pipeline/data/train/uci_electro_small_mm_train.parquet b/...-demand-hierarchical-timeseries-in-pipeline/data/train/uci_electro_small_mm_train.parquet
diff --git a/...ng-demand-many-models-in-pipeline/automl-forecasting-demand-many-models-in-pipeline.ipynb b/...ng-demand-many-models-in-pipeline/automl-forecasting-demand-many-models-in-pipeline.ipynb
@@ -14,7 +14,13 @@
    "metadata": {},
    "source": [
     "# Automated Machine Learning\n",
-    "**Demand Forecasting Using Many Models**\n",
+    "\n",
+    "## Demand Forecasting Using Many Models (preview)\n",
+    "\n",
+    "> [!IMPORTANT]\n",
+    "> Items marked (preview) in this article are currently in public preview.\n",
+    "> The preview version is provided without a service level agreement, and it's not recommended for production workloads. Certain features might not be supported or might have constrained capabilities.\n",
+    "> For more information, see [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/).\n",
     "\n",
     "## Contents\n",
     "1. [Introduction](#Introduction)\n",
@@ -33,7 +39,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## <pre>1. Introduction <a id=\"Introduction\"></pre>\n",
+    "## 1. Introduction  <a id=\"Introduction\">\n",
     "\n",
     "The objective of this notebook is to illustrate how to use the component-based AutoML many models solution for demand forecasting tasks. It walks you through all stages of model evaluation and production process starting with data ingestion and concluding with batch endpoint deployment for production.\n",
     "\n",
@@ -48,7 +54,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## <pre> 2. Setup <a id=\"Setup\"></pre>"
+    "## 2. Setup  <a id=\"Setup\">"
    ]
   },
   {
@@ -159,7 +165,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## <pre> 3. Compute <a id=\"Compute\">\n",
+    "## 3. Compute  <a id=\"Compute\">\n",
     "\n",
     "#### Create or Attach existing AmlCompute\n",
     "\n",
@@ -205,7 +211,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## <pre> 4. Data <a id=\"Data\"></pre>\n",
+    "## 4. Data  <a id=\"Data\">\n",
     "\n",
     "For illustration purposes we use the UCI electricity data ([link](https://archive.ics.uci.edu/ml/datasets/ElectricityLoadDiagrams20112014#)). The original dataset contains electricity consumption data for 370 consumers measured at 15 minute intervals. In the data set for this demonstrations, we have aggregated to an hourly frequency and converted to the kilowatt hours (kWh) for 10 customers.\n",
     "\n",
@@ -218,9 +224,9 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "train_data_path = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/uci_electro_small_public_mm_train/uci_electro_small_mm_train.csv\"\n",
-    "test_data_path = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/uci_electro_small_public_mm_test/uci_electro_small_mm_test.csv\"\n",
-    "inference_data_path = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/uci_electro_small_public_mm_infer/uci_electro_small_mm_inference.csv\""
+    "train_data_path = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/uci-demand-pipeline-data-mm/train/uci_electro_small_mm_train.parquet\"\n",
+    "test_data_path = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/uci-demand-pipeline-data-mm/test/uci_electro_small_mm_test.parquet\"\n",
+    "inference_data_path = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/uci-demand-pipeline-data-mm/inference/uci_electro_small_mm_inference.parquet\""
    ]
   },
   {
@@ -242,11 +248,11 @@
    "source": [
     "def create_folder_and_save_as_parquet(file_uri, output_folder):\n",
     "    os.makedirs(output_folder, exist_ok=True)\n",
-    "    data_frame = pd.read_csv(file_uri, parse_dates=[time_column_name])\n",
-    "    file_name = os.path.splitext(os.path.split(file_uri)[-1])[0] + \".parquet\"\n",
+    "    data_frame = pd.read_parquet(file_uri)\n",
+    "    file_name = os.path.split(file_uri)[-1]\n",
     "    data_path = os.path.join(output_folder, file_name)\n",
     "    data_frame.to_parquet(data_path, index=False)\n",
-    "    return data_frame\n",
+    "    return None\n",
     "\n",
     "\n",
     "create_folder_and_save_as_parquet(train_data_path, \"./data/train\")\n",
@@ -258,7 +264,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The following cells read and print the first few rows of the training data as well as print the number of unique time series in the data."
+    "The following cells read and print the first few rows of the training data as well as the number of unique time series in the dataset."
    ]
   },
   {
@@ -285,7 +291,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## <pre> 5. Import Components From Registry <a id=\"ImportComponents\"></pre>\n",
+    "## 5. Import Components From Registry  <a id=\"ImportComponents\">\n",
     "\n",
     "An Azure Machine Learning component is a self-contained piece of code that does one step in a machine learning pipeline. A component is analogous to a function - it has a name, inputs, outputs, and a body. Components are the building blocks of the Azure Machine Learning pipelines. It's a good engineering practice to build a machine learning pipeline where each step has well-defined inputs and outputs. In Azure Machine Learning, a component represents one reusable step in a pipeline. Components are designed to help improve the productivity of pipeline building. Specifically, components offer:\n",
     "- Well-defined interface: Components require a well-defined interface (input and output). The interface allows the user to build steps and connect steps easily. The interface also hides the complex logic of a step and removes the burden of understanding how the step is implemented.\n",
@@ -463,7 +469,7 @@
     "tags": []
    },
    "source": [
-    "## <pre> 6. Create a Pipeline <a id=\"CreatePipeline\"></pre>\n",
+    "## 6. Create a Pipeline  <a id=\"CreatePipeline\">\n",
     "\n",
     "Now that we imported the components we will build an evaluation pipeline. This pipeline will allow us to partition the data, train best models for each partition, genererate rolling forecasts on the test set, and, finally, calculate metrics on the test set output."
    ]
@@ -778,7 +784,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## <pre> 7. Kick Off Pipeline Runs <a id=\"PipelineRuns\"></pre>\n",
+    "## 7. Kick Off Pipeline Runs  <a id=\"PipelineRuns\">\n",
     "\n",
     "Now that the pipeline is defined, we will use it to kick off several runs. First, we will kick off an experiment which will train, inference and evaluate the performance for the best AutoML model for each partition. Next, we will kick off the same pipeline which will only use the naive model for the same partitions. This will allow us to establish a baseline and compare performance results."
    ]
@@ -957,7 +963,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## <pre> 8. Download Pipeline Output <a id=\"DownloadOutput\"></pre>\n",
+    "## 8. Download Pipeline Output  <a id=\"DownloadOutput\">\n",
     "Next, we will download the output files generated by the compute metrics components for each executed pipeline and save them in the corresponfing subfolder of the `output` folder. First, we create corresponding output directories. Then, we execute the `ml_client.jobs.download` command which downloads experiments' outputs."
    ]
   },
@@ -1023,7 +1029,7 @@
     }
    },
    "source": [
-    "## <pre> 9. Compare Evaluation Results <a id=\"CompareResults\"></pre>\n",
+    "## 9. Compare Evaluation Results  <a id=\"CompareResults\">\n",
     "\n",
     "### 9.1. Examine Metrics\n",
     "\n",
@@ -1231,7 +1237,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## <pre> 10. Deployment <a id=\"Deployment\"></pre>\n",
+    "## 10. Deployment  <a id=\"Deployment\">\n",
     "\n",
     "In this section, we will illustrate how to deploy and inference models using batch endpoint. Batch endpoints are endpoints that are used to do batch inferencing on large volumes of data in asynchronous way. Batch endpoints receive pointers to data and run jobs asynchronously to process the data in parallel on compute clusters and store outputs to a datastore for further analysis. For more information on batch endpoints see this [link](https://learn.microsoft.com/en-us/azure/machine-learning/concept-endpoints-batch?view=azureml-api-2).\n",
     "\n",