diff --git a/content/docs/components/misc/metadata.md b/content/docs/components/misc/metadata.md index 6bf4037436..773d846702 100644 --- a/content/docs/components/misc/metadata.md +++ b/content/docs/components/misc/metadata.md @@ -4,43 +4,183 @@ description = "Tracking and managing metadata of machine learning workflows in K weight = 5 +++ -The goal of the [Metadata](https://github.com/kubeflow/metadata) project is to help Kubeflow users understand and manage their machine learning workflows by tracking and managing the metadata of workflows. +The goal of the [Metadata](https://github.com/kubeflow/metadata) project is to +help Kubeflow users understand and manage their machine learning (ML) workflows +by tracking and managing the metadata that the workflows produce. -## Installation +In this context, _metadata_ means information about executions (runs), models, +datasets, and other artifacts. _Artifacts_ are the files and objects that form +the inputs and outputs of the components in your ML workflow. -The Metadata component is installed by default for Kubeflow versions >= 0.6.1. +{{% alert title="Alpha version" color="warning" %}} +This is an alpha release of the Metadata API. The next version of Kubeflow +will introduce breaking changes. The development team is interested in any +feedback you have while using the Metadata component, and in particular your +feedback on any gaps in the functionality that the component offers. +{{% /alert %}} -If you want to install the latest version of the Metadata component or install it as an application in your Kubernetes cluster, you can follow these steps: +## Installing the Metadata component -1. Download the Kubeflow manifests repository. -``` -git clone https://github.com/kubeflow/manifests -``` +Kubeflow v0.6.1 and later versions install the Metadata component by default. +You can skip this section if you are running Kubeflow v0.6.1 or later. -2. Run the following commands in the manifest repository to deploy services of the Metadata component. -``` -cd manifests/metadata/base -kustomize build . | kubectl apply -n kubeflow -f - -``` +If you want to install the latest version of the Metadata component or to +install the component as an application in your Kubernetes cluster, follow these +steps: + +1. Download the Kubeflow manifests repository: -## Python Library + ``` + git clone https://github.com/kubeflow/manifests + ``` -The Metadata project publishes a [Python library](https://github.com/kubeflow/metadata/tree/master/sdk/python#python-client) for logging metadata. +2. Run the following commands to deploy the services of the Metadata component: + + ``` + cd manifests/metadata/base + kustomize build . | kubectl apply -n kubeflow -f - + ``` + +## Using the Metadata SDK to record metadata + +The Metadata project publishes a +[Python library (SDK)](https://github.com/kubeflow/metadata/tree/master/sdk/python#python-client) +that you can use to log (record) your metadata. + +Run the following command to install the Metadata SDK: -You can install it via the following command: ``` pip install kfmd ``` -To help you describe your ML workflows, the Python library has [predefined types](https://github.com/kubeflow/metadata/tree/master/schema) to capture models, datasets, evaluation metrics, and executions. + +### Try the Metadata SDK in a sample Jupyter notebook + +You can find an example of how to use the Metadata SDK in this +[`demo` notebook](https://github.com/kubeflow/metadata/blob/master/sdk/python/demo.ipynb). + +To run the notebook in your Kubeflow cluster: + +1. Follow the guide to + [setting up your Jupyter notebooks in Kubeflow](/docs/notebooks/setup/). +1. Go to the [`demo` notebook on + GitHub](https://github.com/kubeflow/metadata/blob/master/sdk/python/demo.ipynb). +1. Download the notebook code by opening the **Raw** view of the file, then + right-clicking on the content and saving the file locally as `demo.ipynb`. +1. Go back to your Jupyter notebook server in the Kubeflow UI. (If you've + moved away from the notebooks section in Kubeflow, click + **Notebook Servers** in the left-hand navigation panel to get back there.) +1. In the Jupyter notebook UI, click **Upload** and follow the prompts to upload + the `demo.ipynb` notebook. +1. Click the notebook name (`demo.ipynb`) to open the notebook in your Kubeflow + cluster. +1. Run the steps in the notebook to install and use the Metadata SDK. + +When you have finished running through the steps in the `demo.ipynb` notebook, +you can view the resulting metadata on the Kubeflow UI: + +1. Click **Artifact Store** in the left-hand navigation panel on the Kubeflow + UI. +1. On the **Artifacts** screen you should see the following items: + + * A **model** metadata item with the name **MNIST**. + * A **metrics** metadata item with the name **MNIST-evaluation**. + * A **dataset** metadata item with the name **mytable-dump**. + + You can click the name of each item to view the details. See the section + below about the [Metadata UI](#metadata-ui) for more details. + +### Learn more about the Metadata SDK + +The Metadata SDK includes the following +[predefined types](https://github.com/kubeflow/metadata/tree/master/schema) +that you can use to describe your ML workflows: + +* [`data_set.json`](https://github.com/kubeflow/metadata/blob/master/schema/alpha/artifacts/data_set.json) + to capture metadata for a dataset that forms the input into or the output of + a component in your workflow. +* [`execution.json`](https://github.com/kubeflow/metadata/blob/master/schema/alpha/execution.json) + to capture metadata for an execution (run) of your ML workflow. +* [`metrics.json`](https://github.com/kubeflow/metadata/blob/master/schema/alpha/artifacts/metrics.json) + to capture metadata for the metrics used to evaluate an ML model. +* [`model.json`](https://github.com/kubeflow/metadata/blob/master/schema/alpha/artifacts/model.json) + to capture metadata for an ML model that your workflow produces. -You can find an example of how to use the logging API in this [notebook](https://github.com/kubeflow/metadata/blob/master/sdk/python/demo.ipynb). + +## Tracking artifacts on the Metadata UI -## Backend +You can view a list of logged artifacts and the details of each individual +artifact in the **Artifact Store** on the Kubeflow UI. -The backend uses [ML-Metadata](https://github.com/google/ml-metadata/blob/master/g3doc/get_started.md) to manage all the metadata and relations. It exposes a [REST API](/docs/reference/metadata/v1alpha1/kubeflow-metadata-api-spec/). +1. Go to Kubeflow in your browser. (If you haven't yet opened the + Kubeflow UI, find out how to [access the + Kubeflow UIs](https://www.kubeflow.org/docs/other-guides/accessing-uis/).) +1. Click **Artifact Store** in the left-hand navigation panel: + Metadata UI -## UI +1. The **Artifacts** screen opens and displays a list of items for all the + metadata events that your workflows have logged. You can click the name of + each item to view the details. + + The following examples show the items that appear when you run the + `demo.ipynb` notebook described [above](#demo-notebook): + + A list of metadata items + + * Example of **model** metadata with the name "MNIST": + + Model metadata for an example MNIST model + + * Example of **metrics** metadata with the name "MNIST-evaluation": + + Metrics metadata for an evaluation of an MNIST model + + * Example of **dataset** metadata with the name "mytable-dump": + + Dataset metadata + + + +## Backend and REST API + +The Kubeflow metadata backend uses [ML Metadata +(MLMD)](https://github.com/google/ml-metadata/blob/master/g3doc/get_started.md) +to manage the metadata and relationships. + +The backend exposes a +[REST API](/docs/reference/metadata/v1alpha1/kubeflow-metadata-api-spec/). + +You can add your own metadata types so that you can log metadata for custom +artifacts. To add a custom type, send a REST API request to the +[`artifact_types` endpoint](/docs/reference/metadata/v1alpha1/kubeflow-metadata-api-spec/#operation--api-v1alpha1-artifact_types-post). + +For example, The following request registers an artifact type with +_name_ `myorg/mytype/v1` and three _properties_: + +* `f1` (string) +* `f2` (integer) +* `f3` (double) + +``` +curl -X POST http://localhost:8080/api/v1alpha1/artifact_types \ + --header "Content-Type: application/json" -d \ + '{"name":"myorg/mytype/v1","properties":{"f1":"STRING", "f2":"INT", "f3": "DOUBLE"}}' +``` -You can view a list of logged artifacts and the details of each individual artifact via the _Artifact Store_ on [Kubeflow UIs](https://www.kubeflow.org/docs/other-guides/accessing-uis/). +## Next steps +Run the +[xgboost-synthetic notebook](https://github.com/kubeflow/examples/tree/master/xgboost_synthetic) +to build, train, and deploy an XGBoost model using Kubeflow Fairing and Kubeflow +Pipelines with synthetic data. Examine the metadata output after running +through the steps in the notebook. \ No newline at end of file diff --git a/content/docs/images/metadata-artifacts-list.png b/content/docs/images/metadata-artifacts-list.png new file mode 100644 index 0000000000..bdfad926b7 Binary files /dev/null and b/content/docs/images/metadata-artifacts-list.png differ diff --git a/content/docs/images/metadata-dataset.png b/content/docs/images/metadata-dataset.png new file mode 100644 index 0000000000..ca61e40345 Binary files /dev/null and b/content/docs/images/metadata-dataset.png differ diff --git a/content/docs/images/metadata-metrics.png b/content/docs/images/metadata-metrics.png new file mode 100644 index 0000000000..f360bf6543 Binary files /dev/null and b/content/docs/images/metadata-metrics.png differ diff --git a/content/docs/images/metadata-model.png b/content/docs/images/metadata-model.png new file mode 100644 index 0000000000..a5db4d7491 Binary files /dev/null and b/content/docs/images/metadata-model.png differ diff --git a/content/docs/images/metadata-ui-option.png b/content/docs/images/metadata-ui-option.png new file mode 100644 index 0000000000..01045812a2 Binary files /dev/null and b/content/docs/images/metadata-ui-option.png differ