Skip to content

Latest commit

 

History

History
32 lines (22 loc) · 1.66 KB

MLflow.md

File metadata and controls

32 lines (22 loc) · 1.66 KB

Integration with MLflow

MLflow tracking is activated when mlflow package is installed (pip install mlflow or conda install mlflow) and when the tracking uri is set.

To setup MLflow tracking you need set the MLFLOW_TRACKING_URI environment variable to a tracking server’s URI or call mlflow.set_tracking_uri().

The MLflow example can be found here here.

Currently tf-yarn logs the following metrics by default:

  • Learning speed of the chief only (steps/sec)
  • Statistic about the evaluator (Awake/idle ratio, Eval step mean duration in seconds)

For MLFlow artifact logging to HDFS you also need PyArrow (pip install pyarrow or conda install pyarrow)

tf-yarn adds the following artifacts by default:

  • Container duration times
  • Container log urls & final status

Distributed metrics are implemented via TensorFlow hooks. You can add you own metrics by adding new Hooks and choose if you want to log from all nodes or from specific nodes (chief, worker, ..) via the tf_yarn.cluster module.

For example to log a metric from the evaluator only you can call:

from tf_yarn import cluster, mlflow

class MyHook(tf.train.SessionRunHook):
   ...

   def after_run(self, run_context, run_values):
       if cluster.is_evaluator():
           mlflow.log_tag(..)

An example hook logging the steps/sec can be found here StepPerSecondHook.