Skip to content

Releases: tensorflow/tfx

TFX 0.22.0 Release

11 Jun 23:13
83c4806
Compare
Choose a tag to compare

Major Features and Improvements

  • Introduced experimental Python function component decorator (@component
    decorator under tfx.dsl.component.experimental.decorators) allowing
    Python function-based component definition.
  • Added the experimental TemplatedExecutorContainerSpec executor class that
    supports structural placeholders (not Jinja placeholders).
  • Added the experimental function "create_container_component" that
    simplifies creating container-based components.
  • Implemented a TFJS rewriter.
  • Added the scripts/run_component.py script which makes it easy to run the
    component code and executor code. (Similar to scripts/run_executor.py)
  • Added support for container component execution to BeamDagRunner.
  • Introduced experimental generic Artifact types for ML workflows.
  • Added support for float execution properties.

Bug fixes and other changes

  • Migrated BigQueryExampleGen to the new (experimental) ReadFromBigQuery
    PTramsform when not using Dataflow runner.
  • Enhanced add_downstream_node / add_upstream_node to apply symmetric changes
    when being called. This method enables task-based dependencies by enforcing
    execution order for synchronous pipelines on supported platforms. Currently,
    the supported platforms are Airflow, Beam, and Kubeflow Pipelines. Note that
    this API call should be considered experimental, and may not work with
    asynchronous pipelines, sub-pipelines and pipelines with conditional nodes.
  • Added the container-based sample pipeline (download, filter, print)
  • Removed the incomplete cifar10 example.
  • Removed python-snappy from [all] extra dependency list.
  • Tests depends on apache-airflow>=1.10.10,<2;
  • Removed test dependency to tzlocal.
  • Fixes unintentional overriding of user-specified setup.py file for Dataflow
    jobs when running on KFP container.
  • Made ComponentSpec().inputs and .outputs behave more like real dictionaries.
  • Depends on kerastuner>=1,<2.
  • Depends on pyyaml>=3.12,<6.
  • Depends on apache-beam[gcp]>=2.21,<3.
  • Depends on grpcio>=2.18.1,<3.
  • Depends on kubernetes>=10.0.1,<12.
  • Depends on tensorflow>=1.15,!=2.0.*,<3.
  • Depends on tensorflow-data-validation>=0.22.0,<0.23.0.
  • Depends on tensorflow-model-analysis>=0.22.1,<0.23.0.
  • Depends on tensorflow-transform>=0.22.0,<0.23.0.
  • Depends on tfx-bsl>=0.22.0,<0.23.0.
  • Depends on ml-metadata>=0.22.0,<0.23.0.
  • Fixed a bug in io_utils.copy_dir which prevent it to work correctly for
    nested sub-directories.

Breaking changes

For pipeline authors

  • Changed custom config for the Do function of Trainer and Pusher to accept
    a JSON-serialized dict instead of a dict object. This also impacts all the
    Do functions under tfx.extensions.google_cloud_ai_platform and
    tfx.extensions.google_cloud_big_query_ml. Note that this breaking
    change occurs at the signature of the executor's Do function. Therefore, if
    the user did not customize the Do function, and the compile time SDK version
    is aligned with the run time SDK version, previous pipelines should still
    work as intended. If the user is using a custom component with customized
    Do function, custom_config should be assumed to be a JSON-serialized
    string from next release.
  • For users of BigQueryExampleGen, --temp_location is now a required Beam
    argument, even for DirectRunner. Previously this argument was only required
    for DataflowRunner. Note that the specified value of --temp_location
    should point to a Google Cloud Storage bucket.
  • Revert current per-component cache API (with enable_cache, which was only
    available in tfx>=0.21.3,<0.22), in preparing for a future redesign.

For component authors

  • Converted the BaseNode class attributes to the constructor parameters. This
    won't affect any components derived from BaseComponent.
  • Changed the encoding of the Integer and Float artifacts to be more portable.

Documentation updates

  • Added concept guides for understanding TFX pipelines and components.
  • Added guides to building Python function-based components and
    container-based components.
  • Added BulkInferrer component and TFX CLI documentation to the table of
    contents.

Deprecations

  • Deprecating Py2 support

TFX 0.22.0-rc0

03 Jun 20:07
592d245
Compare
Choose a tag to compare
TFX 0.22.0-rc0 Pre-release
Pre-release

Version 0.22.0

Major Features and Improvements

  • Implemented a TFJS rewriter.
  • Introduced experimental Python function component decorator (@component
    decorator under tfx.dsl.component.experimental.decorators) allowing
    Python function-based component definition.
  • Added the experimental TemplatedExecutorContainerSpec executor class that
    supports structural placeholders (not Jinja placeholders).
  • Migrated BigQueryExampleGen to the new (experimental) ReadFromBigQuery
    PTramsform when not using Dataflow runner.
  • Added the experimental function "create_container_component" that
    simplifies creating container-based components.
  • Removed the incomplete cifar10 example.
  • Enhanced add_downstream_node / add_upstream_node to apply symmetric changes
    when being called. This method enables task-based dependencies by enforcing
    execution order for synchronous pipelines on supported platforms. Currently,
    the supported platforms are Airflow, Beam, and Kubeflow Pipelines. Note that
    this API call should be considered experimental, and may not work with
    asynchronous pipelines, sub-pipelines and pipelines with conditional nodes.
  • Added Tuner component.
  • Added the container-based sample pipeline (download, filter, print)
  • Added the scripts/run_component.py script which makes it easy to run the
    component code and executor code. (Similar to scripts/run_executor.py)
  • Added support for container component execution to BeamDagRunner.
  • Introduced experimental generic Artifact types for ML workflows.

Bug fixes and other changes

  • Removed python-snappy from [all] extra dependency list.
  • Tests depends on apache-airflow>=1.10.10,<2;
  • Removed test dependency to tzlocal.
  • Fixes unintentional overriding of user-specified setup.py file for Dataflow
    jobs when running on KFP container.
  • Made ComponentSpec().inputs and .outputs behave more like real dictionaries.
  • Depends on kerastuner>=1,<2.
  • Depends on pyyaml>=3.12,<6.
  • Depends on apache-beam[gcp]>=2.21,<3.
  • Depends on grpcio>=2.18.1,<3.
  • Depends on kubernetes>=10.0.1,<12.
  • Depends on tensorflow>=1.15,!=2.0.*,<3.
  • Depends on tensorflow-data-validation>=0.22.0,<0.23.0.
  • Depends on tensorflow-model-analysis>=0.22.1,<0.23.0.
  • Depends on tensorflow-transform>=0.22.0,<0.23.0.
  • Depends on tfx-bsl>=0.22.0,<0.23.0.
  • Depends on ml-metadata>=0.22.0,<0.23.0.

Breaking changes

For pipeline authors

  • Changed custom config for the Do function of Trainer and Pusher to accept
    a JSON-serialized dict instead of a dict object. This also impacts all the
    Do functions under tfx.extensions.google_cloud_ai_platform and
    tfx.extensions.google_cloud_big_query_ml. Note that this breaking
    change occurs at the signature of the executor's Do function. Therefore, if
    the user did not customize the Do function, and the compile time SDK version
    is aligned with the run time SDK version, previous pipelines should still
    work as intended. If the user is using a custom component with customized
    Do function, custom_config should be assumed to be a JSON-serialized
    string from next release.
  • For users of BigQueryExampleGen, --temp_location is now a required Beam
    argument, even for DirectRunner. Previously this argument was only required
    for DataflowRunner. Note that the specified value of --temp_location
    should point to a Google Cloud Storage bucket.
  • Revert current per-component cache API (with enable_cache, which was only
    available in tfx>=0.21.3,<0.22), in preparing for a future redesign.

For component authors

  • Converted the BaseNode class attributes to the constructor parameters. This
    won't affect any components derived from BaseComponent.

Documentation updates

  • N/A

Deprecations

  • Deprecating Py2 support

Version 0.21.4

15 Apr 21:49
7c4f240
Compare
Choose a tag to compare

Major Features and Improvements

Bug fixes and other changes

  • Fixed InfraValidator signal handling bug on BeamDagRunner.
  • Dropped "Type" suffix from primitive type artifact names (Integer, Float,
    String, Bytes).

Deprecations

Breaking changes

For pipeline authors

For component authors

Documentation updates

Version 0.21.3

10 Apr 22:05
f0e403f
Compare
Choose a tag to compare

Version 0.21.3

Major Features and Improvements

  • Added run/pipeline link when creating runs/pipelines on KFP through TFX CLI.
  • Added support for ValueArtifact, whose attribute value allows users to
    access the content of the underlying file directly in the executor. Support
    Bytes/Integer/String/Float type. Note: interactive resolution does not
    support this for now.
  • Added InfraValidator component that is used as an early warning layer
    before pushing a model into production.

Bug fixes and other changes

  • Starting this version, TFX will only release python3 packages.
  • Replaced relative import with absolute import in generated templates.
  • Added a native keras model in the taxi template and the template now uses
    generic Trainer.
  • Added support of TF 2.1 runtime configuration for AI Platform Prediction
    Pusher.
  • Added support for using ML Metadata ArtifactType messages as Artifact
    classes.
  • Changed CLI behavior to create new versions of pipelines instead of
    delete and create new ones when pipelines are updated for KFP. (Requires
    kfp >= 0.3.0)
  • Added ability to enable quantization in tflite rewriter.
  • Added k8s pod labels when the pipeline is executed via KubeflowDagRunner for
    better usage telemetry.
  • Parameterized the GCP taxi pipeline sample for easily ramping up to full
    taxi dataset.
  • Added support for hyphens(dash) in addition to underscores in CLI flags.
    Underscores will be supported as well.
  • Fixed ill-formed underscore in the markdown visualization when running on
    KFP.

Deprecations

Breaking changes

For pipeline authors

For component authors

Documentation updates

Release 0.21.2

25 Mar 16:44
Compare
Choose a tag to compare

Version 0.21.2

Major Features and Improvements

  • Updated StatisticsGen to optionally consume a schema Artifact.
  • Added support for configuring the StatisticsGen component via serializable
    parts of StatsOptions.
  • Added Keras guide doc.
  • Changed Iris model_to_estimator e2e example to use generic Trainer.
  • Demonstrated how TFLite is supported in TFX by extending MNIST example
    pipeline to also train a TFLite model.

Bug fixes and other changes

  • Fix the behavior of Trainer Tensorboard visualization when caching is used.
  • Added component documentation and guide on using TFLite in TFX.
  • Relaxed the PyYaml dependency.

Deprecations

  • Model Validator (its functionality is now provided by the Evaluator).

Breaking changes

For pipeline authors

For component authors

Documentation updates

Release 0.21.1

05 Mar 05:07
2131e4a
Compare
Choose a tag to compare

Version 0.21.1

Major Features and Improvements

  • Pipelines compiled using KubeflowDagRunner now defaults to using the
    gRPC-based MLMD server deployed in Kubeflow Pipelines clusters when
    performing operations on pipeline metadata.
  • Added tfx model rewriting and tflite rewriter.
  • Added LatestBlessedModelResolver as an experimental feature which gets the
    latest model that was blessed by model validator.
  • The specific Artifact subclass that was serialized (if defined in the
    deserializing environment) will be used when deserializing Artifacts and
    when reading Artifacts from ML Metadata (previously, objects of the
    generic tfx.types.artifact.Artifact class were created in some cases).
  • Updated Evaluator's executor to support model validation.
  • Introduced awareness of chief worker to Trainer's executor, in case running
    in distributed training cluster.
  • Added a Chicago Taxi example with native Keras.
  • Updated TFLite converter to work with TF2.
  • Enabled filtering by artifact producer and output key in ResolverNode.

Bug fixes and other changes

  • Added --skaffold_cmd flag when updating a pipeline for kubeflow in CLI.
  • Changed python_version to 3.7 when using TF 1.15 and later for Cloud AI Platform Prediction.
  • Added 'tfx_runner' label for CAIP, BQML and Dataflow jobs submitted from
    TFX components.
  • Fixed the Taxi Colab notebook.
  • Adopted the generic trainer executor when using CAIP Training.
  • Depends on 'tensorflow-data-validation>=0.21.4,<0.22'.
  • Depends on 'tensorflow-model-analysis>=0.21.4,<0.22'.
  • Depends on 'tensorflow-transform>=0.21.2,<0.22'.

Deprecations

Breaking changes

  • Remove "NOT_BLESSED" artifact.
  • Change constants ARTIFACT_PROPERTY_BLESSED_MODEL_* to ARTIFACT_PROPERTY_BASELINE_MODEL_*.

For pipeline authors

For component authors

Documentation updates

Release 0.21.0

14 Feb 17:07
8f5a13f
Compare
Choose a tag to compare

Version 0.21.0

Major Features and Improvements

  • Pipelines compiled using KubeflowDagRunner now defaults to using the
    gRPC-based MLMD server deployed in Kubeflow Pipelines clusters when
    performing operations on pipeline metadata.
  • Added tfx model rewriting and tflite rewriter.
  • Added LatestBlessedModelResolver as an experimental feature which gets the
    latest model that was blessed by model validator.
  • TFX version 0.21.0 will be the last version of TFX supporting Python 2.
  • Added support for RuntimeParameters to allow users can specify templated
    values at runtime. This is currently only supported in Kubeflow Pipelines.
    Currently, only attributes in ComponentSpec.PARAMETERS and the URI of
    external artifacts can be parameterized (component inputs / outputs can
    not yet be parameterized). See
    tfx/examples/chicago_taxi_pipeline/taxi_pipeline_runtime_parameter.py
    for example usage.
  • Users can access the parameterized pipeline root when defining the
    pipeline by using the pipeline.ROOT_PARAMETER placeholder in
    KubeflowDagRunner.
  • Users can pass appropriately encoded Python dict objects to specify
    protobuf parameters in ComponentSpec.PARAMETERS; these will be decoded
    into the proper protobuf type. Users can avoid manually constructing complex
    nested protobuf messages in the component interface.
  • Added support in Trainer for using other model artifacts. This enables
    scenarios such as warm-starting.
  • Updated trainer executor to pass through custom config to the user module.
  • Artifact type-specific properties can be defined through overriding the
    PROPERTIES dictionary of a types.artifact.Artifact subclass.
  • Added new example of chicago_taxi_pipeline on Google Cloud Bigquery ML.
  • Added support for multi-core processing in the Flink and Spark Chicago Taxi
    PortableRunner example.
  • Added a metadata adapter in Kubeflow to support logging the Argo pod ID as
    an execution property.
  • Added a prototype Tuner component and an end-to-end iris example.
  • Created new generic trainer executor for non estimator based model, e.g.,
    native Keras.
  • Updated to support passing tfma.EvalConfig in evaluator when calling TFMA.
  • Users can create a pipeline using a new experimental CLI command,
    template.
  • Added an iris example with native Keras.

Bug fixes and other changes

  • Added --skaffold_cmd flag when updating a pipeline for kubeflow in CLI.
  • Changed python_version to 3.7 when using TF 1.15 and later for Cloud AI Platform Prediction.
  • Switched the default behavior of KubeflowDagRunner to not mounting GCP
    secret.
  • Fixed "invalid spec: spec.arguments.parameters[6].name 'pipeline-root' is
    not unique" error when the user include pipeline.ROOT_PARAMETER and run
    pipeline on KFP.
  • Added support for an hparams artifact as an input to Trainer in
    preparation for tuner support.
  • Refactored common dependencies in the TFX dockerfile to a base image to
    improve the reliability of image building process.
  • Fixes missing Tensorboard link in KubeflowDagRunner.
  • Depends on apache-beam[gcp]>=2.17,<2.18
  • Depends on ml-metadata>=0.21,<0.22.
  • Depends on tensorflow-data-validation>=0.21,<0.22.
  • Depends on tensorflow-model-analysis>=0.21,<0.22.
  • Depends on tensorflow-transform>=0.21,<0.22.
  • Depends on tfx-bsl>=0.21,<0.22.
  • Depends on pyarrow>=0.14,<0.15.
  • Removed tf.compat.v1 usage for iris and cifar10 examples.
  • CSVExampleGen: started using the CSV decoding utilities in tfx-bsl
    (tfx-bsl>=0.15.2)
  • Fixed problems with Airflow tutorial notebooks.
  • Added performance improvements for the Transform Component (for statistics
    generation).
  • Raised exceptions when container building fails.
  • Enhanced custom slack component by adding a kubeflow example.
  • Allowed windows style paths in Transform component cache.
  • Fixed bug in CLI (--engine=kubeflow) which uses hard coded obsolete image
    (TFX 0.14.0) as the base image.
  • Fixed bug in CLI (--engine=kubeflow) which could not handle skaffold
    response when an already built image is reused.
  • Allowed users to specify the region to use when serving with AI Platform.
  • Allowed users to give deterministic job id to AI Platform Training job.
  • System-managed artifact properties ("name", "state", "pipeline_name" and
    "producer_component") are now stored as ML Metadata artifact custom
    properties.
  • Fixed loading trainer and transformation functions from python module files
    without the .py extension.
  • Fixed some ill-formed visualization when running on KFP.
  • Removed system info from artifact properties and use channels to hold info
    for generating MLMD queries.
  • Rely on MLMD context for inter-component artifact resolution and execution
    publishing.
  • Added pipeline level context and component run level context.
  • Included test data for examples/chicago_taxi_pipeline in package.
  • Changed BaseComponentLauncher to require the user to pass in an ML
    Metadata connection object instead of a ML Metadata connection config.
  • Capped version of Tensorflow runtime used in Google Cloud integration to
    1.15.
  • Updated Chicago Taxi example dependencies to Beam 2.17.0, Flink 1.9.1, Spark
    2.4.4.
  • Fixed an issue where build_ephemeral_package() used an incorrect path to
    locate the tfx directory.
  • The ImporterNode now allows specification of general artifact properties.
  • Added 'tfx_executor', 'tfx_version' and 'tfx_py_version' labels for CAIP,
    BQML and Dataflow jobs submitted from TFX components.

Deprecations

Breaking changes

For pipeline authors

  • Standard artifact TYPE_NAME strings were reconciled to match their class
    names in types.standard_artifacts.
  • The "split" property on multiple artifacts has been replaced with the
    JSON-encoded "split_names" property on a single grouped artifact.
  • The execution caching mechanism was changed to rely on ML Metadata
    pipeline context. Existing cached executions will not be reused when running
    on this version of TFX for the first time.
  • The "split" property on multiple artifacts has been replaced with the
    JSON-encoded "split_names" property on a single grouped artifact.

For component authors

  • Artifact type name strings to the types.artifact.Artifact and
    types.channel.Channel classes are no longer supported; usage here should
    be replaced with references to the artifact subclasses defined in
    types.standard_artfacts.* or to custom subclasses of
    types.artifact.Artifact.

Documentation updates

Release 0.21.0rc0

30 Jan 20:40
220dff9
Compare
Choose a tag to compare
Release 0.21.0rc0 Pre-release
Pre-release

Version 0.21.0rc0

Major Features and Improvements

  • TFX version 0.21.0 will be the last version of TFX supporting Python 2.
  • Added support for RuntimeParameters to allow users can specify templated
    values at runtime. This is currently only supported in Kubeflow Pipelines.
    Currently, only attributes in ComponentSpec.PARAMETERS and the URI of
    external artifacts can be parameterized (component inputs / outputs can
    not yet be parameterized). See
    tfx/examples/chicago_taxi_pipeline/taxi_pipeline_runtime_parameter.py
    for example usage.
  • Users can access the parameterized pipeline root when defining the
    pipeline by using the pipeline.ROOT_PARAMETER placeholder in
    KubeflowDagRunner.
  • Users can pass appropriately encoded Python dict objects to specify
    protobuf parameters in ComponentSpec.PARAMETERS; these will be decoded
    into the proper protobuf type. Users can avoid manually constructing complex
    nested protobuf messages in the component interface.
  • Added support in Trainer for using other model artifacts. This enables
    scenarios such as warm-starting.
  • Updated trainer executor to pass through custom config to the user module.
  • Artifact type-specific properties can be defined through overriding the
    PROPERTIES dictionary of a types.artifact.Artifact subclass.
  • Added new example of chicago_taxi_pipeline on Google Cloud Bigquery ML.
  • Added support for multi-core processing in the Flink and Spark Chicago Taxi
    PortableRunner example.
  • Added a metadata adapter in Kubeflow to support logging the Argo pod ID as
    an execution property.
  • Added a prototype Tuner component and an end-to-end iris example.
  • Created new generic trainer executor for non estimator based model, e.g.,
    native Keras.
  • Updated to support passing tfma.EvalConfig in evaluator when calling TFMA.
  • Users can create a pipeline using a new experimental CLI command,
    template.

Bug fixes and other changes

  • Added support for an hparams artifact as an input to Trainer in
    preparation for tuner support.
  • Refactored common dependencies in the TFX dockerfile to a base image to
    improve the reliability of image building process.
  • Fixes missing Tensorboard link in KubeflowDagRunner.
  • Depends on apache-beam[gcp]>=2.17,<3.
  • Depends on ml-metadata>=0.21,<0.22.
  • Depends on tensorflow-data-validation>=0.21,<0.22.
  • Depends on tensorflow-model-analysis>=0.21,<0.22.
  • Depends on tensorflow-transform>=0.21,<0.22.
  • Depends on tfx-bsl>=0.21,<0.22.
  • Depends on pyarrow>=0.14,<0.15.
  • Removed tf.compat.v1 usage for iris and cifar10 examples.
  • CSVExampleGen: started using the CSV decoding utilities in tfx-bsl
    (tfx-bsl>=0.15.2)
  • Fixed problems with Airflow tutorial notebooks.
  • Added performance improvements for the Transform Component (for statistics
    generation).
  • Raised exceptions when container building fails.
  • Enhanced custom slack component by adding a kubeflow example.
  • Allowed windows style paths in Transform component cache.
  • Fixed bug in CLI (--engine=kubeflow) which uses hard coded obsolete image
    (TFX 0.14.0) as the base image.
  • Fixed bug in CLI (--engine=kubeflow) which could not handle skaffold
    response when an already built image is reused.
  • Allowed users to specify the region to use when serving with AI Platform.
  • Allowed users to give deterministic job id to AI Platform Training job.
  • System-managed artifact properties ("name", "state", "pipeline_name" and
    "producer_component") are now stored as ML Metadata artifact custom
    properties.
  • Fixed loading trainer and transformation functions from python module files
    without the .py extension.
  • Fixed some ill-formed visualization when running on KFP.
  • Removed system info from artifact properties and use channels to hold info
    for generating MLMD queries.
  • Rely on MLMD context for inter-component artifact resolution and execution
    publishing.
  • Added pipeline level context and component run level context.
  • Included test data for examples/chicago_taxi_pipeline in package.
  • Changed BaseComponentLauncher to require the user to pass in an ML
    Metadata connection object instead of a ML Metadata connection config.
  • Capped version of Tensorflow runtime used in Google Cloud integration to
    1.15.
  • Updated Chicago Taxi example dependencies to Beam 2.17.0, Flink 1.9.1, Spark
    2.4.4.
  • Fixed an issue where build_ephemeral_package() used an incorrect path to
    locate the tfx directory.
  • The ImporterNode now allows specification of general artifact properties.
  • Added 'tfx_executor', 'tfx_version' and 'tfx_py_version' labels for CAIP,
    BQML and Dataflow jobs submitted from TFX components.

Deprecations

Breaking changes

For pipeline authors

  • Standard artifact TYPE_NAME strings were reconciled to match their class
    names in types.standard_artifacts.
  • The "split" property on multiple artifacts has been replaced with the
    JSON-encoded "split_names" property on a single grouped artifact.
  • The execution caching mechanism was changed to rely on ML Metadata
    pipeline context. Existing cached executions will not be reused when running
    on this version of TFX for the first time.
  • The "split" property on multiple artifacts has been replaced with the
    JSON-encoded "split_names" property on a single grouped artifact.

For component authors

  • Artifact type name strings to the types.artifact.Artifact and
    types.channel.Channel classes are no longer supported; usage here should
    be replaced with references to the artifact subclasses defined in
    types.standard_artfacts.* or to custom subclasses of
    types.artifact.Artifact.

Documentation updates

Release 0.15.0

11 Nov 23:50
Compare
Choose a tag to compare

Version 0.15.0

Major Features and Improvements

  • Offered unified CLI for tfx pipeline actions on various orchestrators
    including Apache Airflow, Apache Beam and Kubeflow.
  • Polished experimental interactive notebook execution and visualizations so
    they are ready for use.
  • Added BulkInferrer component to TFX pipeline, and corresponding offline
    inference taxi pipeline.
  • Introduced ImporterNode as a special TFX node to register external resource
    into MLMD so that downstream nodes can use as input artifacts. An example
    taxi_pipeline_importer.py enabled by ImporterNode was added to showcase
    the user journey of user-provided schema (issue #571).
  • Added experimental support for TFMA fairness indicator thresholds.
  • Demonstrated DirectRunner multi-core processing in Chicago Taxi example,
    including Airflow and Beam.
  • Introduced PipelineConfig and BaseComponentConfig to control the
    platform specific settings for pipelines and components.
  • Added a custom Executor of Pusher to push model to BigQuery ML for serving.
  • Added KubernetesComponentLauncher to support launch ExecutorContainerSpec in a
    Kubernetes cluster.
  • Made model validator executor forward compatible with TFMA change.
  • Added Iris flowers classification example.
  • Added support for serialization and deserialization of components.
  • Made component launcher extensible to support launching components on
    multiple platforms.
  • Simplified component package names.
  • Introduced BaseNode as the base class of any node in a TFX pipeline DAG.
  • Added docker component launcher to launch container component.
  • Added support for specifying pipeline root in runtime when run on
    KubeflowDagRunner. A default value can be provided when constructing the TFX
    pipeline.
  • Added basic span support in ExampleGen to ingest file based data sources
    that can be updated regularly by upstream.
  • Branched serving examples under chicago_taxi_pipeline/ from chicago_taxi/
    example.
  • Supported beam arg 'direct_num_workers' for multi-processing on local.
  • Improved naming of standard component inputs and outputs.
  • Improved visualization functionality in the experimental TFX notebook
    interface.
  • Allowed users to specify output file format when compiling TFX pipelines
    using KubeflowDagRunner.
  • Introduced ResolverNode as a special TFX node to resolve input artifacts for
    downstream nodes. ResolverNode is a convenient way to wrap TFX Resolver, a
    logical unit for resolving input artifacts.
  • Added cifar-10 example to demonstrate image classification.
  • Added container builder feature in the CLI tool for container-based custom
    python components. This is specifically for the Kubeflow orchestration
    engine, which requires containers built with the custom python code.
  • Demonstrated DirectRunner multi-core processing in Chicago Taxi example,
    including Airflow and Beam.
  • Added Kubeflow artifact visualization of inputs, outputs and
    execution properties for components using a Markdown file. Added Tensorboard
    to Trainer components as well.

Bug fixes and other changes

  • Bumped test dependency to kfp (Kubeflow Pipelines SDK) to be at version
    0.1.31.2.
  • Fixed trainer executor to correctly make transform_output optional.
  • Updated Chicago Taxi example dependency tensorflow to version >=1.14.0.
  • Updated Chicago Taxi example dependencies tensorflow-data-validation,
    tensorflow-metadata, tensorflow-model-analysis, tensorflow-serving-api, and
    tensorflow-transform to version >=0.14.
  • Updated Chicago Taxi example dependencies to Beam 2.14.0, Flink 1.8.1, Spark
    2.4.3.
  • Adopted new recommended way to access component inputs/outputs as
    component.outputs['output_name'] (previously, the syntax was
    component.outputs.output_name).
  • Updated Iris example to skip transform and use Keras model.
  • Fixed the check for input artifact existence in base driver.
  • Fixed bug in AI Platform Pusher that prevents pushes after first model, and
    not being marked as default.
  • Replaced all usage of deprecated tensorflow.logging with absl.logging.
  • Used special user agent for all HTTP requests through googleapiclient and
    apitools.
  • Transform component updated to use tf.compat.v1 according to the TF 2.0
    upgrading procedure.
  • TFX updated to use tf.compat.v1 according to the TF 2.0 upgrading
    procedure.
  • Added Kubeflow local example pipeline that executes components in-cluster.
  • Fixed a bug that prevents updating execution type.
  • Fixed a bug in model validator driver that reads across pipeline boundaries
    when resolving latest blessed model.
  • Depended on apache-beam[gcp]>=2.16,<3
  • Depended on ml-metadata>=0.15,<0.16
  • Depended on tensorflow>=1.15,<3
  • Depended on tensorflow-data-validation>=0.15,<0.16
  • Depended on tensorflow-model-analysis>=0.15.2,<0.16
  • Depended on tensorflow-transform>=0.15,<0.16
  • Depended on 'tfx_bsl>=0.15.1,<0.16'
  • Made launcher return execution information, containing populated inputs,
    outputs, and execution id.
  • Updated the default configuration for accessing MLMD from pipelines running
    in Kubeflow.
  • Updated Airflow developer tutorial
  • CSVExampleGen: started using the CSV decoding utilities in tfx-bsl
    (tfx-bsl>=0.15.2)
  • Added documentation for Fairness Indicators.

Deprecations

  • Deprecated component_type in favor of type.
  • Deprecated component_id in favor of id.
  • Move beam_pipeline_args out of additional_pipeline_args as top level
    pipeline param
  • Deprecated chicago_taxi folder, beam setup scripts and serving examples are
    moved to chicago_taxi_pipeline folder.

Breaking changes

  • Moved beam setup scripts from examples/chicago_taxi/ to
    examples/chicago_taxi_pipeline/
  • Moved interactive notebook classes into tfx.orchestration.experimental
    namespace.
  • Starting from 1.15, package tensorflow comes with GPU support. Users
    won't need to choose between tensorflow and tensorflow-gpu. If any GPU
    devices are available, processes spawned by all TFX components will try to
    utilize them; note that in rare cases, this may exhaust the memory of the
    device(s).
  • Caveat: tensorflow 2.0.0 is an exception and does not have GPU
    support. If tensorflow-gpu 2.0.0 is installed before installing
    tfx, it will be replaced with tensorflow 2.0.0.
    Re-install tensorflow-gpu 2.0.0 if needed.
  • Caveat: MLMD schema auto-upgrade is now disabled by default. For users who
    upgrades from 0.13 and do not want to lose the data in MLMD, please refer to
    MLMD documentation
    for guide to upgrade or downgrade MLMD database. Users who upgraded from TFX
    0.14 should not be affected since there is not schema change between these
    two versions.

For pipeline authors

  • Deprecated the usage of tf.contrib.training.HParams in Trainer as it is
    deprecated in TF 2.0. User module relying on member method of that class
    will not be supported. Dot style property access will be the only supported
    style from now on.
  • Any SavedModel produced by tf.Transform <=0.14 using any tf.contrib ops
    (or tf.Transform ops that used tf.contrib ops such as tft.quantiles,
    tft.bucketize, etc.) cannot be loaded with TF 2.0 since the contrib library
    has been removed in 2.0. Please refer to this [issue]
    (#838).

For component authors

Documentation updates

  • Added conceptual info on Artifacts to guide/index.md

Release 0.15.0rc0

24 Oct 03:58
Compare
Choose a tag to compare
Release 0.15.0rc0 Pre-release
Pre-release

Version 0.15.0rc0

Major Features and Improvements

  • Offered unified CLI for tfx pipeline actions on various orchestrators
    including Apache Airflow, Apache Beam and Kubeflow.
  • Polished experimental interactive notebook execution and visualizations
    so they are ready for use.
  • Added BulkInferrer component to TFX pipeline, and corresponding offline
    inference taxi pipeline.
  • Introduced ImporterNode as a special TFX node to register external resource
    into MLMD so that downstream nodes can use as input artifacts. An example
    taxi_pipeline_importer.py enabled by ImporterNode was added to showcase
    the user journey of user-provided schema (issue #571).
  • Added experimental support for TFMA fairness indicator thresholds.
  • Demonstrated DirectRunner multi-core processing in Chicago Taxi example,
    including Airflow and Beam.
  • Made model validator executor forward compatible with TFMA change.
  • Added Iris flowers classification example.
  • Added support for serialization and deserialization of components.
  • Made component launcher extensible to support launching components on
    multiple platforms.
  • Added option to use fixed Schema artifact for ExampleValidator, Transform
    and Trainer.
  • Simplified component package names.
  • Introduced BaseNode as the base class of any node in a TFX pipeline DAG.
  • Added docker component launcher to launch container component.
  • Added support for specifying pipeline root in runtime when run on KubeflowDagRunner.
    A default value can be provided when constructing the TFX pipeline.
  • Added basic span support in ExampleGen to ingest file based data sources
    that can be updated regularly by upstream.
  • Branched serving examples under chicago_taxi_pipeline/ from
    chicago_taxi/ example.
  • Supported beam arg 'direct_num_workers' for multi-processing on local.
  • Improved naming of standard component inputs and outputs.
  • Improved visualization functionality in the experimental TFX notebook
    interface.
  • Allowed users to specify output file format when compiling TFX pipelines
    using KubeflowDagRunner.
  • Introduced ResolverNode as a special TFX node to resolve input artifacts for
    downstream nodes. ResolverNode is a convenient way to wrap TFX Resolver, a
    logical unit for resolving input artifacts.
  • Added cifar-10 example to demonstrate image classification.
  • Added container builder feature in the CLI tool for container-based custom
    python components. This is specifically for the Kubeflow orchestration
    engine, which requires containers built with the custom python code.
  • Demonstrated DirectRunner multi-core processing in Chicago Taxi example,
    including Airflow and Beam.

Bug fixes and other changes

  • Bumped test dependency to kfp (Kubeflow Pipelines SDK) to
    be at version 0.1.31.2.
  • Fixed trainer executor to correctly make transform_output optional.
  • Updated Chicago Taxi example dependency tensorflow to version >=1.14.0.
  • Updated Chicago Taxi example dependencies tensorflow-data-validation,
    tensorflow-metadata, tensorflow-model-analysis, tensorflow-serving-api, and
    tensorflow-transform to version >=0.14.
  • Updated Chicago Taxi example dependencies to Beam 2.14.0, Flink 1.8.1, Spark
    2.4.3.
  • Adopted new recommended way to access component inputs/outputs as
    component.outputs['output_name'] (previously, the syntax was
    component.outputs.output_name).
  • Updated Iris example to skip transform and use Keras model.
  • Fixed the check for input artifact existence in base driver.
  • Fixed bug in AI Platform Pusher that prevents pushes after first model,
    and not being marked as default.
  • Replaced all usage of deprecated tensorflow.logging with absl.logging.
  • Used special user agent for all HTTP requests through
    googleapiclient and apitools.
  • Transform component updated to use tf.compat.v1 according to the TF 2.0
    upgrading procedure.
  • TFX updated to use tf.compat.v1 according to the TF 2.0 upgrading
    procedure.
  • Added Kubeflow simple example that executes all components in-cluster.
  • Fixed a bug that prevents updating execution type.
  • Depended on apache-beam[gcp]>=2.16,<3
  • Depended on ml-metadata>=0.15,<0.16
  • Depended on tensorflow>=1.15,<3
  • Depended on tensorflow-data-validation>=0.15,<0.16
  • Depended on tensorflow-model-analysis>=0.15.2,<0.16
  • Depended on tensorflow-transform>=0.15,<0.16
  • Depended on 'tfx_bsl>=0.15.1,<0.16'

Deprecations

  • Deprecated component_type in favor of type.
  • Deprecated component_id in favor of id.
  • Move beam_pipeline_args out of additional_pipeline_args as top level
    pipeline param
  • Deprecated chicago_taxi folder, beam setup scripts and serving examples are
    moved to chicago_taxi_pipeline folder.

Breaking changes

  • Moved beam setup scripts from examples/chicago_taxi/ to
    examples/chicago_taxi_pipeline/
  • Moved interactive notebook classes into tfx.orchestration.experimental
    namespace.

For pipeline authors

  • Deprecated the usage of tf.contrib.training.HParams in Trainer as it is
    deprecated in TF 2.0. User module relying on member method of that class
    will not be supported. Dot style property access will be the only supported
    style from now on.

For component authors

Documentation updates

  • Added conceptual info on Artifacts to guide/index.md