Skip to content

Latest commit

 

History

History
262 lines (167 loc) · 10.6 KB

concept-azure-machine-learning-v2.md

File metadata and controls

262 lines (167 loc) · 10.6 KB

How Azure Machine Learning works: resources and assets

[!INCLUDE dev v2]

This article applies to the second version of the Azure Machine Learning CLI & Python SDK (v2). For version one (v1), see How Azure Machine Learning works: Architecture and concepts (v1)

Azure Machine Learning includes several resources and assets to enable you to perform your machine learning tasks. These resources and assets are needed to run any job.

  • Resources: setup or infrastructural resources needed to run a machine learning workflow. Resources include:
  • Assets: created using Azure ML commands or as part of a training/scoring run. Assets are versioned and can be registered in the Azure ML workspace. They include:

This document provides a quick overview of these resources and assets.

Workspace

The workspace is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning. The workspace keeps a history of all jobs, including logs, metrics, output, and a snapshot of your scripts. The workspace stores references to resources like datastores and compute. It also holds all assets like models, environments, components and data asset.

Create a workspace

To create a workspace using CLI v2, use the following command:

[!INCLUDE cli v2]

az ml workspace create --file my_workspace.yml

For more information, see workspace YAML schema.

To create a workspace using Python SDK v2, you can use the following code:

[!INCLUDE sdk v2]

ws_basic = Workspace(
    name="my-workspace",
    location="eastus", # Azure region (location) of workspace
    display_name="Basic workspace-example",
    description="This example shows how to create a basic workspace"
)
ml_client.workspaces.begin_create(ws_basic) # use MLClient to connect to the subscription and resource group and create workspace

This Jupyter notebook shows more ways to create an Azure ML workspace using SDK v2.

Compute

A compute is a designated compute resource where you run your job or host your endpoint. Azure Machine learning supports the following types of compute:

  • Compute cluster - a managed-compute infrastructure that allows you to easily create a cluster of CPU or GPU compute nodes in the cloud.
  • Compute instance - a fully configured and managed development environment in the cloud. You can use the instance as a training or inference compute for development and testing. It's similar to a virtual machine on the cloud.
  • Inference cluster - used to deploy trained machine learning models to Azure Kubernetes Service. You can create an Azure Kubernetes Service (AKS) cluster from your Azure ML workspace, or attach an existing AKS cluster.
  • Attached compute - You can attach your own compute resources to your workspace and use them for training and inference.

To create a compute using CLI v2, use the following command:

[!INCLUDE cli v2]

az ml compute --file my_compute.yml

For more information, see compute YAML schema.

To create a compute using Python SDK v2, you can use the following code:

[!INCLUDE sdk v2]

cluster_basic = AmlCompute(
    name="basic-example",
    type="amlcompute",
    size="STANDARD_DS3_v2",
    location="westus",
    min_instances=0,
    max_instances=2,
    idle_time_before_scale_down=120,
)
ml_client.begin_create_or_update(cluster_basic)

This Jupyter notebook shows more ways to create compute using SDK v2.

Datastore

Azure Machine Learning datastores securely keep the connection information to your data storage on Azure, so you don't have to code it in your scripts. You can register and create a datastore to easily connect to your storage account, and access the data in your underlying storage service. The CLI v2 and SDK v2 support the following types of cloud-based storage services:

  • Azure Blob Container
  • Azure File Share
  • Azure Data Lake
  • Azure Data Lake Gen2

To create a datastore using CLI v2, use the following command:

[!INCLUDE cli v2]

az ml datastore create --file my_datastore.yml

For more information, see datastore YAML schema.

To create a datastore using Python SDK v2, you can use the following code:

[!INCLUDE sdk v2]

blob_datastore1 = AzureBlobDatastore(
    name="blob-example",
    description="Datastore pointing to a blob container.",
    account_name="mytestblobstore",
    container_name="data-container",
    credentials={
        "account_key": "XXXxxxXXXxXXXXxxXXXXXxXXXXXxXxxXxXXXxXXXxXXxxxXXxxXXXxXxXXXxxXxxXXXXxxxxxXXxxxxxxXXXxXXX"
    },
)
ml_client.create_or_update(blob_datastore1)

This Jupyter notebook shows more ways to create datastores using SDK v2.

Model

Azure machine learning models consist of the binary file(s) that represent a machine learning model and any corresponding metadata. Models can be created from a local or remote file or directory. For remote locations https, wasbs and azureml locations are supported. The created model will be tracked in the workspace under the specified name and version. Azure ML supports three types of storage format for models:

  • custom_model
  • mlflow_model
  • triton_model

Creating a model

To create a model using CLI v2, use the following command:

[!INCLUDE cli v2]

az ml model create --file my_model.yml

For more information, see model YAML schema.

To create a model using Python SDK v2, you can use the following code:

[!INCLUDE sdk v2]

my_model = Model(
    path="model.pkl", # the path to where my model file is located
    type="custom_model", # can be custom_model, mlflow_model or triton_model
    name="my-model",
    description="Model created from local file.",
)

ml_client.models.create_or_update(my_model) # use the MLClient to connect to workspace and create/register the model

Environment

Azure Machine Learning environments are an encapsulation of the environment where your machine learning task happens. They specify the software packages, environment variables, and software settings around your training and scoring scripts. The environments are managed and versioned entities within your Machine Learning workspace. Environments enable reproducible, auditable, and portable machine learning workflows across a variety of computes.

Types of environment

Azure ML supports two types of environments: curated and custom.

Curated environments are provided by Azure Machine Learning and are available in your workspace by default. Intended to be used as is, they contain collections of Python packages and settings to help you get started with various machine learning frameworks. These pre-created environments also allow for faster deployment time. For a full list, see the curated environments article.

In custom environments, you're responsible for setting up your environment and installing packages or any other dependencies that your training or scoring script needs on the compute. Azure ML allows you to create your own environment using

  • A docker image
  • A base docker image with a conda YAML to customize further
  • A docker build context

Create an Azure ML custom environment

To create an environment using CLI v2, use the following command:

[!INCLUDE cli v2]

az ml environment create --file my_environment.yml

For more information, see environment YAML schema.

To create an environment using Python SDK v2, you can use the following code:

[!INCLUDE sdk v2]

my_env = Environment(
    image="pytorch/pytorch:latest", # base image to use
    name="docker-image-example", # name of the model
    description="Environment created from a Docker image.",
)

ml_client.environments.create_or_update(my_env) # use the MLClient to connect to workspace and create/register the environment

This Jupyter notebook shows more ways to create custom environments using SDK v2.

Data

Azure Machine Learning allows you to work with different types of data:

  • URIs (a location in local/cloud storage)
    • uri_folder
    • uri_file
  • Tables (a tabular data abstraction)
    • mltable
  • Primitives
    • string
    • boolean
    • number

For most scenarios, you'll use URIs (uri_folder and uri_file) - a location in storage that can be easily mapped to the filesystem of a compute node in a job by either mounting or downloading the storage to the node.

mltable is an abstraction for tabular data that is to be used for AutoML Jobs, Parallel Jobs, and some advanced scenarios. If you're just starting to use Azure Machine Learning and aren't using AutoML, we strongly encourage you to begin with URIs.

Component

An Azure Machine Learning component is a self-contained piece of code that does one step in a machine learning pipeline. Components are the building blocks of advanced machine learning pipelines. Components can do tasks such as data processing, model training, model scoring, and so on. A component is analogous to a function - it has a name, parameters, expects input, and returns output.

Next steps