Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to metadata pages in UI #2086

Closed
neuromage opened this issue Sep 11, 2019 · 26 comments
Closed

Improvements to metadata pages in UI #2086

neuromage opened this issue Sep 11, 2019 · 26 comments

Comments

@neuromage
Copy link
Contributor

Now that metadata pages are in KFP's UI thanks to Riley's work, there are still a few more items to take care of in terms of polish:

  • We should add a column showing the creation time of each artifact. In order to get this, we'll need to query the metadata API server for events, and find the associated event that produced any given artifact. The event will have a timestamp.

  • The list of executions page seems a little buggy. When I click on an execution, the first one in the group (they are grouped by pipeline) works, but the following items don't seem to be working.

  • For each execution, other than properties, we should also show the inputs and outputs that went into it. It would also be nice to be able to link to the said input and output.

Assigning to Yuan to start work on this. I'll update this issue with any other outstanding items I find. Yuan, we can also chat in person to clarify these items as required. Thanks!

/assign @Bobgy

/cc @jessiezcc
/cc @paveldournov
/cc @dushyanthsc
/cc @gaoning777

@neuromage
Copy link
Contributor Author

For reference, here's a simple pipeline you can run (it's using TFX DSL) which will output some basic metadata in your cluster:

import argparse
import os
import tensorflow as tf

from typing import Text

import kfp
from tfx.components.evaluator.component import Evaluator
from tfx.components.example_gen.csv_example_gen.component import CsvExampleGen
from tfx.components.example_validator.component import ExampleValidator
from tfx.components.model_validator.component import ModelValidator
from tfx.components.pusher.component import Pusher
from tfx.components.schema_gen.component import SchemaGen
from tfx.components.statistics_gen.component import StatisticsGen
from tfx.components.trainer.component import Trainer
from tfx.components.transform.component import Transform
from tfx.orchestration import metadata
from tfx.orchestration import pipeline
from tfx.orchestration.kubeflow import kubeflow_dag_runner
from tfx.proto import evaluator_pb2
from tfx.utils.dsl_utils import csv_input
from tfx.proto import pusher_pb2
from tfx.proto import trainer_pb2
from tfx.extensions.google_cloud_ai_platform.trainer import executor as ai_platform_trainer_executor

_output_bucket = 'gs://your-bucket-here'

def _create_test_pipeline(pipeline_name: Text, pipeline_root: Text,
                          csv_input_location: Text, taxi_module_file: Text)
  """Creates a simple Kubeflow-based Chicago Taxi TFX pipeline for testing.

  Args:
    pipeline_name: The name of the pipeline.
    pipeline_root: The root of the pipeline output.
    csv_input_location: The location of the input data directory.
    taxi_module_file: The location of the module file for Transform/Trainer.
    container_image: The container image to use.

  Returns:
    A logical TFX pipeline.Pipeline object.
  """
  examples = csv_input(csv_input_location)

  example_gen = CsvExampleGen(input_base=examples)
  statistics_gen = StatisticsGen(input_data=example_gen.outputs.examples)
  infer_schema = SchemaGen(
      stats=statistics_gen.outputs.output, infer_feature_shape=False)
  validate_stats = ExampleValidator(
      stats=statistics_gen.outputs.output, schema=infer_schema.outputs.output)
  transform = Transform(
      input_data=example_gen.outputs.examples,
      schema=infer_schema.outputs.output,
      module_file=taxi_module_file)
  trainer = Trainer(
      module_file=taxi_module_file,
      transformed_examples=transform.outputs.transformed_examples,
      schema=infer_schema.outputs.output,
      transform_output=transform.outputs.transform_output,
      train_args=trainer_pb2.TrainArgs(num_steps=10000),
      eval_args=trainer_pb2.EvalArgs(num_steps=5000))
  model_analyzer = Evaluator(
      examples=example_gen.outputs.examples,
      model_exports=trainer.outputs.output,
      feature_slicing_spec=evaluator_pb2.FeatureSlicingSpec(specs=[
          evaluator_pb2.SingleSlicingSpec(
              column_for_slicing=['trip_start_hour'])
      ]))
  model_validator = ModelValidator(
      examples=example_gen.outputs.examples, model=trainer.outputs.output)
  pusher = Pusher(
      model_export=trainer.outputs.output,
      model_blessing=model_validator.outputs.blessing,
      push_destination=pusher_pb2.PushDestination(
          filesystem=pusher_pb2.PushDestination.Filesystem(
              base_directory=os.path.join(pipeline_root, 'model_serving'))))

  return pipeline.Pipeline(
      pipeline_name=pipeline_name,
      pipeline_root=pipeline_root,
      components=[
          example_gen, statistics_gen, infer_schema, validate_stats, transform,
          trainer, model_analyzer, model_validator, pusher
      ],
      enable_cache=False,  # Or True to use cache
  )


if __name__ == '__main__':
  # Copy sample CSV file from chicago taxi pipeline example to this location
  data_root = 'gs://your-bucket/data' 
  taxi_module_file = 'gs://your-bucket/taxi_utils.py'

  pipeline_name = 'kubeflow-simple-taxi-metadata'
  pipeline_root = 'gs://your-bucket/test'
  pipeline = _create_test_pipeline(pipeline_name, pipeline_root, data_root,                                  taxi_module_file)
  config = kubeflow_dag_runner.KubeflowRunnerConfig()

  kubeflow_dag_runner.KubeflowDagRunner(config=config).run(pipeline)

@Bobgy
Copy link
Contributor

Bobgy commented Sep 11, 2019

Thanks @neuromage! I'm taking a day off today and will start on these tomorrow.

A few questions on context:

@Bobgy
Copy link
Contributor

Bobgy commented Sep 11, 2019

/priority p0

@k8s-ci-robot
Copy link
Contributor

@Bobgy: The label(s) area/frontend cannot be appled. These labels are supported: api-review, community/discussion, community/maintenance, community/question, cuj/build-train-deploy, cuj/multi-user, platform/aws, platform/azure, platform/gcp, platform/minikube, platform/other

In response to this:

/area frontend

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

1 similar comment
@k8s-ci-robot
Copy link
Contributor

@Bobgy: The label(s) area/frontend cannot be appled. These labels are supported: api-review, community/discussion, community/maintenance, community/question, cuj/build-train-deploy, cuj/multi-user, platform/aws, platform/azure, platform/gcp, platform/minikube, platform/other

In response to this:

/area frontend

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Bobgy
Copy link
Contributor

Bobgy commented Sep 11, 2019

/area front-end

@rmgogogo
Copy link
Contributor

/cc @rmgogogo

@Bobgy
Copy link
Contributor

Bobgy commented Sep 11, 2019

@neuromage How do you deploy pipeline with metadata?
I tried kfp lite, it has some errors of missing 'mysql-credential' when starting up metadata server.
Should I use helm to deploy the marketplace one?

@dushyanthsc
Copy link
Contributor

@Bobgy The mysql credentials are picked up using K8 secret object. Basically create a Kubernetes Secret object named - "mysql-credential" with keys- "username" and "password" rest should be automatically be taken care of

@neuromage
Copy link
Contributor Author

neuromage commented Sep 11, 2019

Thanks @Bobgy!

  • Can you send me a reference to MLMD api?

Yes, here it is: https://github.com/google/ml-metadata/blob/master/ml_metadata/proto/metadata_store_service.proto

It's in the KFP repo (kubeflow/pipelines) under /frontend

@Bobgy
Copy link
Contributor

Bobgy commented Sep 12, 2019

@dushyanthsc Thanks, I got the servers up.

@Bobgy
Copy link
Contributor

Bobgy commented Sep 12, 2019

@neuromage I'm trying to run the tfx sample you provided, but I'm stuck with how to get it running.

env:

  • tensorflow: 1.14.0
  • tfx: 0.14.0
  • kfp: 0.1.29

Here's what I tried:

  1. Copy the code sample and name it metadata_sample.py
  2. Follow https://www.kubeflow.org/docs/pipelines/sdk/install-sdk/ to install kfp sdk
  3. Also install tensorflow, tfx by pip in that conda environment
  4. Copy taxi data and utils from https://github.com/tensorflow/tfx/tree/master/tfx/examples/chicago_taxi_pipeline to my own bucket
  5. Change config values in metadata_sample.py to my own bucket
  6. python metadata_sample.py
    • I got some errors first, so I changed a little:
      • Add a ":" after def _create_test_pipeline(...):
      • Changed KubeflowRunnerConfig to KubeflowDagRunnerConfig because it seems to be renamed recently.

Here's what I got after fixing obvious problems. It has a lot of warnings, but I didn't see any errors.
Can you give me some reference of how to run it?

/Users/gongyuan/miniconda3/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/Users/gongyuan/miniconda3/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/Users/gongyuan/miniconda3/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/Users/gongyuan/miniconda3/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/Users/gongyuan/miniconda3/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/Users/gongyuan/miniconda3/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
/Users/gongyuan/miniconda3/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/Users/gongyuan/miniconda3/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/Users/gongyuan/miniconda3/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/Users/gongyuan/miniconda3/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/Users/gongyuan/miniconda3/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/Users/gongyuan/miniconda3/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
/Users/gongyuan/miniconda3/lib/python3.7/site-packages/apache_beam/__init__.py:84: UserWarning: Some syntactic constructs of Python 3 are not yet fully supported by Apache Beam.
  'Some syntactic constructs of Python 3 are not yet fully supported by '
WARNING:tensorflow:From /Users/gongyuan/miniconda3/lib/python3.7/site-packages/tfx/components/transform/executor.py:57: The name tf.FixedLenFeature is deprecated. Please use tf.io.FixedLenFeature instead.

WARNING:tensorflow:From /Users/gongyuan/miniconda3/lib/python3.7/site-packages/tfx/components/transform/executor.py:57: from_feature_spec (from tensorflow_transform.tf_metadata.dataset_schema) is deprecated and will be removed in a future version.
Instructions for updating:
from_feature_spec is a deprecated, use schema_utils.schema_from_feature_spec
WARNING:tensorflow:From /Users/gongyuan/miniconda3/lib/python3.7/site-packages/tfx/orchestration/pipeline.py:131: The name tf.logging.warning is deprecated. Please use tf.compat.v1.logging.warning instead.

WARNING:tensorflow:metadata_db_root is deprecated, metadata_connection_config will be required in next release
WARNING:tensorflow:From /Users/gongyuan/miniconda3/lib/python3.7/site-packages/tfx/orchestration/kubeflow/base_component.py:125: The name tf.logging.info is deprecated. Please use tf.compat.v1.logging.info instead.

@neuromage
Copy link
Contributor Author

You can ignore the warnings. You should get a compiled pipeline file, just like when using KFP SDK. Then you'll need to upload that and run it as before.

@Bobgy
Copy link
Contributor

Bobgy commented Sep 13, 2019

Thanks, I got the pipeline file successfully.

@Bobgy
Copy link
Contributor

Bobgy commented Sep 13, 2019

@neuromage which tfx version do you use?

I first tried 0.14.0, and met this issue: tensorflow/tfx#603
Then I tried 0.13.0, and it seems needed features are not there yet.
Then I tried 0.14.0rc1 and I got the following error when running the pipeline

/opt/venv/lib/python3.6/site-packages/apache_beam/__init__.py:84: UserWarning: Some syntactic constructs of Python 3 are not yet fully supported by Apache Beam.
  'Some syntactic constructs of Python 3 are not yet fully supported by '
Traceback (most recent call last):
  File "/tfx-src/tfx/orchestration/kubeflow/container_entrypoint.py", line 200, in <module>
    main()
  File "/tfx-src/tfx/orchestration/kubeflow/container_entrypoint.py", line 171, in main
    connection_config = _get_metadata_connection_config(kubeflow_metadata_config)
  File "/tfx-src/tfx/orchestration/kubeflow/container_entrypoint.py", line 68, in _get_metadata_connection_config
    kubeflow_metadata_config.mysql_db_service_host)
TypeError: None has type NoneType, but expected one of: bytes, unicode

I am using a KFP lite deployment, how should I config kubeflow_metadata_config?

@Bobgy
Copy link
Contributor

Bobgy commented Sep 13, 2019

Never mind, I used the following config and it seems to work.

def _get_metadata_config():
    config = kubeflow_pb2.KubeflowMetadataConfig()
    config.mysql_db_service_host.environment_variable = 'MYSQL_SERVICE_HOST'
    config.mysql_db_service_port.environment_variable = 'MYSQL_SERVICE_PORT'
    config.mysql_db_name.value = 'metadb'
    config.mysql_db_user.value = 'root'
    config.mysql_db_password.value = ''

    return config

@Bobgy
Copy link
Contributor

Bobgy commented Sep 16, 2019

The list of executions page seems a little buggy. When I click on an execution, the first one in the group (they are grouped by pipeline) works, but the following items don't seem to be working.

@neuromage Can you explain what is expected behavior of execution list page?
This is what I can see now: https://drive.google.com/file/d/1LJbth1bK-_ZCTe5d60M8nDRzrFjgux-n/view

  • What should happen when we click on expanded rows?
  • When we click on the first row, it should open execution detail page. Is that right? I will try to fix this. UPDATE: I fixed this in: Fix execution detail page fetch params. #2127
  • Should names of the executions be nonempty? Where is data?
    • I tried to debug this, UI is expecting NAME property in execution, but it's not present in the response sent from backend. @neuromage, is this an issue of backend? (I'm using image: gcr.io/kubeflowtryouts/ml_metadata_store_server:latest)
      An example response for a single execution I get is
{
  "id": 2,
  "typeId": 5,
  "propertiesMap": [
    ["component_id", {
      "stringValue": "StatisticsGen"
    }],
    ["pipeline_name", {
      "stringValue": "kubeflow-simple-taxi-metadata"
    }],
    ["pipeline_root", {
        "stringValue": "gs://gongyuan-test/kfp-test"
    }],
    ["run_id", {
        "stringValue": "kubeflow-simple-taxi-metadata-t5t59"
    }],
    ["state", {
        "stringValue": "complete"
    }]
  ],
  "customPropertiesMap": []
}"

For each execution, other than properties, we should also show the inputs and outputs that went into it. It would also be nice to be able to link to the said input and output.

Do we need this in execution list page or detail page, (or both)? Do we have a UX mock I can refer to?

@neuromage
Copy link
Contributor Author

Thanks @Bobgy !

  • For name, yeah, that looks wrong. Can we use component_id for the name instead?
  • For execution inputs/outputs, I think it's ok to put it in the execution details page. Unfortunately, there is no mock for this, so I'll leave it up to you on how best to implement. I imagine a simple section for inputs, with a listing of Artifact name and id would be great. If the id is deep-linked to the artifact detail page, that would be nice too. Similarly for output.

@Bobgy
Copy link
Contributor

Bobgy commented Sep 17, 2019

@neuromage thanks a lot!

  • Sure, I will use component_id instead
  • Then I will use my own judgement about that.

@neuromage
Copy link
Contributor Author

@Bobgy I have a few more requests :-)

  • Can we show URIs in the artifact detail page?
  • Can we make GCS URIs clickable, in both artifact detail page and artifact listings page?
  • If a field has serialized json, can we attempt to parse and pretty print this?
  • The execution list still does not show the name of the execution, and I'm unable to click on any execution except the first one still (ignore if you already fixed this)

Stretch goal, which I think we can discuss and track in a separate issue if needed: show a preview for each artifact type. How we preview would be based on the type of the artifact. For example, if it's a SchemaPath, we can show the schema text proto as JSON or something. If it's ExamplesPath, we can show the first 10 rows maybe. This could use ajchili's visualization server. This may need some in depth discussion, so feel free to schedule something on my calendar.

@neuromage
Copy link
Contributor Author

/cc @paveldournov

@Bobgy
Copy link
Contributor

Bobgy commented Sep 18, 2019

@neuromage

Can we show URIs in the artifact detail page?

SG, will do so

Can we make GCS URIs clickable, in both artifact detail page and artifact listings page?

I need to investigate, which page should it link to? A page on google cloud console?

If a field has serialized json, can we attempt to parse and pretty print this?

SG, will do so.

The execution list still does not show the name of the execution, and I'm unable to click on any execution except the first one still (ignore if you already fixed this)

Already fixed in #2135, I think it didn't make it to the version you tested.

@Bobgy
Copy link
Contributor

Bobgy commented Sep 18, 2019

@neuromage regarding the stretch goal, can you create a separate issue for this? What would be the priority? I have other p0 issues at hand, so I will only be able to take a look after other things.

@neuromage
Copy link
Contributor Author

I need to investigate, which page should it link to? A page on google cloud console?

Yes, a page showing the bucket on Pantheon would be great. Thanks!

@Bobgy
Copy link
Contributor

Bobgy commented Sep 23, 2019

@neuromage Do you think if there are further gaps in UI that should be p0? Shall we close this and make another dedicated issue for tracking the stretch goal?

@neuromage
Copy link
Contributor Author

Yes, this looks great now, thanks @Bobgy!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants