Skip to content

Commit

Permalink
Expanded docs for the Metadata component (kubeflow#1061)
Browse files Browse the repository at this point in the history
* WIP Expanded docs for Metadata component.

* Added more about metadata.

* Addressed review comments.
  • Loading branch information
sarahmaddox authored and k8s-ci-robot committed Aug 13, 2019
1 parent 5b6caa1 commit 8177767
Show file tree
Hide file tree
Showing 6 changed files with 162 additions and 22 deletions.
184 changes: 162 additions & 22 deletions content/docs/components/misc/metadata.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,43 +4,183 @@ description = "Tracking and managing metadata of machine learning workflows in K
weight = 5
+++

The goal of the [Metadata](https://github.com/kubeflow/metadata) project is to help Kubeflow users understand and manage their machine learning workflows by tracking and managing the metadata of workflows.
The goal of the [Metadata](https://github.com/kubeflow/metadata) project is to
help Kubeflow users understand and manage their machine learning (ML) workflows
by tracking and managing the metadata that the workflows produce.

## Installation
In this context, _metadata_ means information about executions (runs), models,
datasets, and other artifacts. _Artifacts_ are the files and objects that form
the inputs and outputs of the components in your ML workflow.

The Metadata component is installed by default for Kubeflow versions >= 0.6.1.
{{% alert title="Alpha version" color="warning" %}}
This is an <b>alpha</b> release of the Metadata API. The next version of Kubeflow
will introduce breaking changes. The development team is interested in any
feedback you have while using the Metadata component, and in particular your
feedback on any gaps in the functionality that the component offers.
{{% /alert %}}

If you want to install the latest version of the Metadata component or install it as an application in your Kubernetes cluster, you can follow these steps:
## Installing the Metadata component

1. Download the Kubeflow manifests repository.
```
git clone https://github.com/kubeflow/manifests
```
Kubeflow v0.6.1 and later versions install the Metadata component by default.
You can skip this section if you are running Kubeflow v0.6.1 or later.

2. Run the following commands in the manifest repository to deploy services of the Metadata component.
```
cd manifests/metadata/base
kustomize build . | kubectl apply -n kubeflow -f -
```
If you want to install the latest version of the Metadata component or to
install the component as an application in your Kubernetes cluster, follow these
steps:

1. Download the Kubeflow manifests repository:

## Python Library
```
git clone https://github.com/kubeflow/manifests
```

The Metadata project publishes a [Python library](https://github.com/kubeflow/metadata/tree/master/sdk/python#python-client) for logging metadata.
2. Run the following commands to deploy the services of the Metadata component:

```
cd manifests/metadata/base
kustomize build . | kubectl apply -n kubeflow -f -
```

## Using the Metadata SDK to record metadata

The Metadata project publishes a
[Python library (SDK)](https://github.com/kubeflow/metadata/tree/master/sdk/python#python-client)
that you can use to log (record) your metadata.

Run the following command to install the Metadata SDK:

You can install it via the following command:
```
pip install kfmd
```

To help you describe your ML workflows, the Python library has [predefined types](https://github.com/kubeflow/metadata/tree/master/schema) to capture models, datasets, evaluation metrics, and executions.
<a id="demo-notebook"></a>
### Try the Metadata SDK in a sample Jupyter notebook

You can find an example of how to use the Metadata SDK in this
[`demo` notebook](https://github.com/kubeflow/metadata/blob/master/sdk/python/demo.ipynb).

To run the notebook in your Kubeflow cluster:

1. Follow the guide to
[setting up your Jupyter notebooks in Kubeflow](/docs/notebooks/setup/).
1. Go to the [`demo` notebook on
GitHub](https://github.com/kubeflow/metadata/blob/master/sdk/python/demo.ipynb).
1. Download the notebook code by opening the **Raw** view of the file, then
right-clicking on the content and saving the file locally as `demo.ipynb`.
1. Go back to your Jupyter notebook server in the Kubeflow UI. (If you've
moved away from the notebooks section in Kubeflow, click
**Notebook Servers** in the left-hand navigation panel to get back there.)
1. In the Jupyter notebook UI, click **Upload** and follow the prompts to upload
the `demo.ipynb` notebook.
1. Click the notebook name (`demo.ipynb`) to open the notebook in your Kubeflow
cluster.
1. Run the steps in the notebook to install and use the Metadata SDK.

When you have finished running through the steps in the `demo.ipynb` notebook,
you can view the resulting metadata on the Kubeflow UI:

1. Click **Artifact Store** in the left-hand navigation panel on the Kubeflow
UI.
1. On the **Artifacts** screen you should see the following items:

* A **model** metadata item with the name **MNIST**.
* A **metrics** metadata item with the name **MNIST-evaluation**.
* A **dataset** metadata item with the name **mytable-dump**.

You can click the name of each item to view the details. See the section
below about the [Metadata UI](#metadata-ui) for more details.

### Learn more about the Metadata SDK

The Metadata SDK includes the following
[predefined types](https://github.com/kubeflow/metadata/tree/master/schema)
that you can use to describe your ML workflows:

* [`data_set.json`](https://github.com/kubeflow/metadata/blob/master/schema/alpha/artifacts/data_set.json)
to capture metadata for a dataset that forms the input into or the output of
a component in your workflow.
* [`execution.json`](https://github.com/kubeflow/metadata/blob/master/schema/alpha/execution.json)
to capture metadata for an execution (run) of your ML workflow.
* [`metrics.json`](https://github.com/kubeflow/metadata/blob/master/schema/alpha/artifacts/metrics.json)
to capture metadata for the metrics used to evaluate an ML model.
* [`model.json`](https://github.com/kubeflow/metadata/blob/master/schema/alpha/artifacts/model.json)
to capture metadata for an ML model that your workflow produces.

You can find an example of how to use the logging API in this [notebook](https://github.com/kubeflow/metadata/blob/master/sdk/python/demo.ipynb).
<a id="metadata-ui"></a>
## Tracking artifacts on the Metadata UI

## Backend
You can view a list of logged artifacts and the details of each individual
artifact in the **Artifact Store** on the Kubeflow UI.

The backend uses [ML-Metadata](https://github.com/google/ml-metadata/blob/master/g3doc/get_started.md) to manage all the metadata and relations. It exposes a [REST API](/docs/reference/metadata/v1alpha1/kubeflow-metadata-api-spec/).
1. Go to Kubeflow in your browser. (If you haven't yet opened the
Kubeflow UI, find out how to [access the
Kubeflow UIs](https://www.kubeflow.org/docs/other-guides/accessing-uis/).)
1. Click **Artifact Store** in the left-hand navigation panel:
<img src="/docs/images/metadata-ui-option.png"
alt="Metadata UI"
class="mt-3 mb-3 border border-info rounded">

## UI
1. The **Artifacts** screen opens and displays a list of items for all the
metadata events that your workflows have logged. You can click the name of
each item to view the details.

The following examples show the items that appear when you run the
`demo.ipynb` notebook described [above](#demo-notebook):

<img src="/docs/images/metadata-artifacts-list.png"
alt="A list of metadata items"
class="mt-3 mb-3 border border-info rounded">

* Example of **model** metadata with the name "MNIST":

<img src="/docs/images/metadata-model.png"
alt="Model metadata for an example MNIST model"
class="mt-3 mb-3 border border-info rounded">

* Example of **metrics** metadata with the name "MNIST-evaluation":

<img src="/docs/images/metadata-metrics.png"
alt="Metrics metadata for an evaluation of an MNIST model"
class="mt-3 mb-3 border border-info rounded">

* Example of **dataset** metadata with the name "mytable-dump":

<img src="/docs/images/metadata-dataset.png"
alt="Dataset metadata"
class="mt-3 mb-3 border border-info rounded">



## Backend and REST API

The Kubeflow metadata backend uses [ML Metadata
(MLMD)](https://github.com/google/ml-metadata/blob/master/g3doc/get_started.md)
to manage the metadata and relationships.

The backend exposes a
[REST API](/docs/reference/metadata/v1alpha1/kubeflow-metadata-api-spec/).

You can add your own metadata types so that you can log metadata for custom
artifacts. To add a custom type, send a REST API request to the
[`artifact_types` endpoint](/docs/reference/metadata/v1alpha1/kubeflow-metadata-api-spec/#operation--api-v1alpha1-artifact_types-post).

For example, The following request registers an artifact type with
_name_ `myorg/mytype/v1` and three _properties_:

* `f1` (string)
* `f2` (integer)
* `f3` (double)

```
curl -X POST http://localhost:8080/api/v1alpha1/artifact_types \
--header "Content-Type: application/json" -d \
'{"name":"myorg/mytype/v1","properties":{"f1":"STRING", "f2":"INT", "f3": "DOUBLE"}}'
```

You can view a list of logged artifacts and the details of each individual artifact via the _Artifact Store_ on [Kubeflow UIs](https://www.kubeflow.org/docs/other-guides/accessing-uis/).
## Next steps

Run the
[xgboost-synthetic notebook](https://github.com/kubeflow/examples/tree/master/xgboost_synthetic)
to build, train, and deploy an XGBoost model using Kubeflow Fairing and Kubeflow
Pipelines with synthetic data. Examine the metadata output after running
through the steps in the notebook.
Binary file added content/docs/images/metadata-artifacts-list.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added content/docs/images/metadata-dataset.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added content/docs/images/metadata-metrics.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added content/docs/images/metadata-model.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added content/docs/images/metadata-ui-option.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 8177767

Please sign in to comment.