diff --git a/content/en/docs/pipelines/sdk/gcp.md b/content/en/docs/pipelines/sdk/gcp.md index c0e5ce39fc..18e9307afa 100644 --- a/content/en/docs/pipelines/sdk/gcp.md +++ b/content/en/docs/pipelines/sdk/gcp.md @@ -4,10 +4,6 @@ description = "SDK features that are available on Google Cloud Platform (GCP) on weight = 130 +++ -{{% alert title="Out of date" color="warning" %}} -This guide contains outdated information pertaining to Kubeflow 1.0. This guide -needs to be updated for Kubeflow 1.1. -{{% /alert %}} For pipeline features that are specific to GCP, including SDK features, see the [GCP section of the docs](/docs/gke/pipelines/). diff --git a/content/en/docs/pipelines/sdk/install-sdk.md b/content/en/docs/pipelines/sdk/install-sdk.md index 4e1018f7d4..5c45660ada 100644 --- a/content/en/docs/pipelines/sdk/install-sdk.md +++ b/content/en/docs/pipelines/sdk/install-sdk.md @@ -4,10 +4,6 @@ description = "Setting up your Kubeflow Pipelines development environment" weight = 20 +++ -{{% alert title="Out of date" color="warning" %}} -This guide contains outdated information pertaining to Kubeflow 1.0. This guide -needs to be updated for Kubeflow 1.1. -{{% /alert %}} This guide tells you how to install the [Kubeflow Pipelines SDK](https://github.com/kubeflow/pipelines/tree/master/sdk) @@ -80,16 +76,20 @@ Run the following command to install the Kubeflow Pipelines SDK: ```bash pip3 install kfp --upgrade ``` + **Note:** If you are not using a virtual environment, such as `conda`, when installing the Kubeflow Pipelines SDK, you may receive the following error: + ```bash -ERROR: Could not install packages due to an EnvironmentError: [Errno 13] Permission denied: '/usr/local/lib/python3.5/dist-packages/kfp-0.2.0.dist-info' +ERROR: Could not install packages due to an EnvironmentError: [Errno 13] Permission denied: '/usr/local/lib/python3.5/dist-packages/kfp-.dist-info' Consider using the `--user` option or check the permissions. ``` If you get this error, install `kfp` with the `--user` option: + ```bash pip3 install kfp --upgrade --user ``` + This command installs the `dsl-compile` and `kfp` binaries under `~/.local/bin`, which is not part of the PATH in some Linux distributions, such as Ubuntu. You can add `~/.local/bin` to your PATH by appending the following to a new line at the end of your `.bashrc` file: ```bash diff --git a/content/en/docs/pipelines/sdk/parameters.md b/content/en/docs/pipelines/sdk/parameters.md index f83d2caf3d..dbeb677aa3 100644 --- a/content/en/docs/pipelines/sdk/parameters.md +++ b/content/en/docs/pipelines/sdk/parameters.md @@ -4,10 +4,6 @@ description = "Passing data between pipeline components" weight = 70 +++ -{{% alert title="Out of date" color="warning" %}} -This guide contains outdated information pertaining to Kubeflow 1.0. This guide -needs to be updated for Kubeflow 1.1. -{{% /alert %}} The [`kfp.dsl.PipelineParam` class](https://kubeflow-pipelines.readthedocs.io/en/latest/source/kfp.dsl.html#kfp.dsl.PipelineParam) @@ -23,7 +19,7 @@ The task output references can again be passed to other components as arguments. In most cases you do not need to construct `PipelineParam` objects manually. -The following code sample shows how to define pipeline with parameters: +The following code sample shows how to define a pipeline with parameters: ```python @kfp.dsl.pipeline( @@ -36,7 +32,9 @@ def my_pipeline( my_url: str = 'http://example.com' ): ... + # In the pipeline function body you can use the `my_num`, `my_name`, + # `my_url` arguments as PipelineParam objects. ``` -See more in the guide to [building a -component](/docs/pipelines/sdk/build-component/#create-a-python-class-for-your-component). +For more information, you can refer to the guide on +[building components and pipelines](/docs/pipelines/sdk/build-component/#create-a-python-class-for-your-component). \ No newline at end of file diff --git a/content/en/docs/pipelines/sdk/static-type-checking.md b/content/en/docs/pipelines/sdk/static-type-checking.md index 92d7a9eed0..0fdcb96f41 100644 --- a/content/en/docs/pipelines/sdk/static-type-checking.md +++ b/content/en/docs/pipelines/sdk/static-type-checking.md @@ -4,10 +4,6 @@ description = "Statically check the component I/O types" weight = 100 +++ -{{% alert title="Out of date" color="warning" %}} -This guide contains outdated information pertaining to Kubeflow 1.0. This guide -needs to be updated for Kubeflow 1.1. -{{% /alert %}} This page describes how to integrate the type information in the pipeline and utilize the static type checking for fast development iterations. @@ -37,6 +33,7 @@ In the component YAML, types are specified as a string or a dictionary with the "*component a*" expects an input with Integer type and emits three outputs with the type GCSPath, customized_type and GCRPath. Among these types, Integer, GCSPath, and GCRPath are core types that are predefined in the SDK while customized_type is a user-defined type. + ```yaml name: component a description: component desc @@ -58,55 +55,80 @@ implementation: field_n: /feature.txt field_o: /output.txt ``` + Similarly, when you write a component with the decorator, you can annotate I/O with types in the function signature, as shown below. ```python from kfp.dsl import component from kfp.dsl.types import Integer, GCRPath + + @component -def task_factory_a(field_l: Integer()) -> {'field_m': {'GCSPath': {'openapi_schema_validator': '{"type": "string", "pattern": "^gs://.*$"}'}}, - 'field_n': 'customized_type', - 'field_o': GCRPath() - }: - return ContainerOp( - name = 'operator a', - image = 'gcr.io/ml-pipeline/component-a', - command = [ - 'python3', '/pipelines/component/src/train.py' - ], - arguments = [ - '--field-l', field_l, - ], - file_outputs = { - 'field_m': '/schema.txt', - 'field_n': '/feature.txt', - 'field_o': '/output.txt' +def task_factory_a(field_l: Integer()) -> { + 'field_m': { + 'GCSPath': { + 'openapi_schema_validator': + '{"type": "string", "pattern": "^gs://.*$"}' } - ) + }, + 'field_n': 'customized_type', + 'field_o': GCRPath() +}: + return ContainerOp( + name='operator a', + image='gcr.io/ml-pipeline/component-a', + command=['python3', '/pipelines/component/src/train.py'], + arguments=[ + '--field-l', + field_l, + ], + file_outputs={ + 'field_m': '/schema.txt', + 'field_n': '/feature.txt', + 'field_o': '/output.txt' + }) ``` + You can also annotate pipeline inputs with types and the input are checked against the component I/O types as well. For example, + ```python @component -def task_factory_a(field_m: {'GCSPath': {'openapi_schema_validator': '{"type": "string", "pattern": "^gs://.*$"}'}}, field_o: 'Integer'): - return ContainerOp( - name = 'operator a', - image = 'gcr.io/ml-pipeline/component-a', - arguments = [ - '--field-l', field_m, - '--field-o', field_o, - ], - ) +def task_factory_a( + field_m: { + 'GCSPath': { + 'openapi_schema_validator': + '{"type": "string", "pattern": "^gs://.*$"}' + } + }, field_o: 'Integer'): + return ContainerOp( + name='operator a', + image='gcr.io/ml-pipeline/component-a', + arguments=[ + '--field-l', + field_m, + '--field-o', + field_o, + ], + ) + # Pipeline input types are also checked against the component I/O types. -@dsl.pipeline(name='type_check', - description='') -def pipeline(a: {'GCSPath': {'openapi_schema_validator': '{"type": "string", "pattern": "^gs://.*$"}'}}='good', b: Integer()=12): - task_factory_a(field_m=a, field_o=b) +@dsl.pipeline(name='type_check', description='') +def pipeline( + a: { + 'GCSPath': { + 'openapi_schema_validator': + '{"type": "string", "pattern": "^gs://.*$"}' + } + } = 'good', + b: Integer() = 12): + task_factory_a(field_m=a, field_o=b) + try: - compiler.Compiler().compile(pipeline, 'pipeline.tar.gz', type_check=True) + compiler.Compiler().compile(pipeline, 'pipeline.tar.gz', type_check=True) except InconsistentTypeException as e: - print(e) + print(e) ``` ## How does the type checking work? @@ -127,26 +149,35 @@ If inconsistent types are detected, it throws an [InconsistentTypeException](htt Type checking is enabled by default and it can be disabled in two ways: If you compile the pipeline programmably: + ```python compiler.Compiler().compile(pipeline_a, 'pipeline_a.tar.gz', type_check=False) ``` + If you compile the pipeline using the dsl-compiler tool: + ```bash dsl-compiler --py pipeline.py --output pipeline.zip --disable-type-check ``` + ### Fine-grained configuration Sometimes, you might want to enable the type checking but disable certain arguments. For example, when the upstream component generates an output with type "*Float*" and the downstream can ingest either "*Float*" or "*Integer*", it might fail if you define the type as "*Float_or_Integer*". Disabling the type checking per-argument is also supported as shown below. + ```python -@dsl.pipeline(name='type_check_a', - description='') +@dsl.pipeline(name='type_check_a', description='') def pipeline(): - a = task_factory_a(field_l=12) - # For each of the arguments, you can also ignore the types by calling ignore_type function. - b = task_factory_b(field_x=a.outputs['field_n'], field_y=a.outputs['field_o'], field_z=a.outputs['field_m'].ignore_type()) + a = task_factory_a(field_l=12) + # For each of the arguments, you can also ignore the types by calling + # ignore_type function. + b = task_factory_b( + field_x=a.outputs['field_n'], + field_y=a.outputs['field_o'], + field_z=a.outputs['field_m'].ignore_type()) + compiler.Compiler().compile(pipeline, 'pipeline.tar.gz', type_check=True) ``` @@ -158,4 +189,6 @@ type checking would still fail if some I/Os lack the type information and some I ## Next steps -* See [type checking sample](https://github.com/kubeflow/pipelines/blob/master/samples/core/dsl_static_type_checking/dsl_static_type_checking.ipynb). +Learn how to define a KubeFlow pipeline with Python DSL and compile the +pipeline with type checking: a +[Jupyter notebook demo](https://github.com/kubeflow/pipelines/blob/master/samples/core/dsl_static_type_checking/dsl_static_type_checking.ipynb).