Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v2compat] configurable default pipeline root #5704

Closed
Bobgy opened this issue May 20, 2021 · 2 comments · Fixed by #5750
Closed

[v2compat] configurable default pipeline root #5704

Bobgy opened this issue May 20, 2021 · 2 comments · Fixed by #5750
Assignees
Labels

Comments

@Bobgy
Copy link
Contributor

Bobgy commented May 20, 2021

Quoted from #4649 (comment)
Blocks #5680

FYI, KFP v2 compatible mode has been released, see documentation: https://www.kubeflow.org/docs/components/pipelines/sdk/v2/.

It doesn't support artifact repository configuration, this is one of the things we want to support too. So I'm posting early thoughts on this related issue.

Let me first try to summarize requirements for configuring artifact repositories for both KFP v2 compatible and v2.

Object store specific credentials requirements

For GCS, AWS S3, we suggest setting up credentials, so that they represent identity of the pipeline step, so that not only artifact upload/download, calls to other Cloud services should also use the same credentials. For this reason, we don't recommend setting credentials in artifact repository config. The suggestion is to configure the identity transparently if possible using GCP workload identity or AWS IRSA. If credentials are really necessary, they can be configured using pipeline DSL via kfp.gcp.use_gcp_secret or kfp.aws.use_aws_secret etc. These principles should apply to other Cloud Providers that has credentials that can be used with all its services.

For on-prem object store like MinIO, the credentials do not represent an identity, they are only used to access a specified object store instance. Therefore, it's reasonable to include them in artifact repository config.

In summary

  • For GCS, only pipeline root needs to be configurable.
  • For AWS S3, besides pipeline root, we also need region, endpoint etc to be configurable.
  • For MinIO or similar on-prem object stores, besides pipeline root, we also need endpoint, credentials to be configurable.

We cannot implement a spec for every possible object stores, so likely we should use the same spec as what Go CDK supports or depend on cloud provider contributions.

Go CDK supports provider specific query params to configure some things other than object key, so we might consider adopting these query params, so that pipeline root can be more expressive, so we might not need other configurations.
e.g. for S3, it's possible to configure region via a query param: https://gocloud.dev/howto/blob/#s3

s3://my-bucket?region=us-west-1

How we configure pipeline root, other configs and credentials?

Ideally, pipeline root can include all other configs, so that we can uniquely identify an artifact.
Ideally, credentials should be configurable transparently.

When both ideal requirements are met, we only need to support namespace level default pipeline root. All other configurations can be done by specifying different pipeline roots.

However, now MinIO violates the requirement that credentials can be configured transparently. Therefore, we need a mechanism to either

  • configure which credentials should be used with which pipeline root (probably, write rules like which pipeline_root prefix/query param should use which credentials)
  • or configure credentials with pipeline root together as artifact repository (but then we should specify artifact repos, not pipeline roots)
  • or ask users to configure credentials separately from pipeline_root

We probably need more thoughts on the exact config format, this seems like a complex problem.

@Bobgy
Copy link
Contributor Author

Bobgy commented May 20, 2021

/assign @capri-xiyue

@Bobgy Bobgy changed the title Configurable default pipeline root [v2compat] configurable default pipeline root May 20, 2021
@Bobgy Bobgy added the size/M label May 24, 2021
@Bobgy Bobgy assigned Bobgy and unassigned capri-xiyue May 25, 2021
@Bobgy
Copy link
Contributor Author

Bobgy commented May 28, 2021

Discussed with @capri-xiyue, we agree on:

For this issue we only target P0 items

google-oss-robot pushed a commit that referenced this issue Jun 2, 2021
…ot. Part of #5680. Fixes #5704 (#5750)

* feat(deployment): configurable v2 compatible mode default pipeline root

* clarify documentation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants