Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v2 compatible on full Kubeflow #5680

Closed
Bobgy opened this issue May 19, 2021 · 10 comments · Fixed by #5697
Closed

v2 compatible on full Kubeflow #5680

Bobgy opened this issue May 19, 2021 · 10 comments · Fixed by #5697
Assignees
Labels

Comments

@Bobgy
Copy link
Contributor

Bobgy commented May 19, 2021

UPDATE:
PRs have been written:

===

This issue tracks efforts to make v2 compatible pipelines run on full Kubeflow.

Depends on #4649

The major differences from KFP standalone are:

  • MLMD grpc service is in kubeflow namespace by default.
  • MinIO service is in kubeflow namespace by default.
  • In Kubeflow on GCP, the default bucket is on GCS, so it's not mlpipeline.

For MLMD grpc service, we are already reading a configmap called metadata-grpc-configmap. The configmap is already configured in full Kubeflow for each user namespace:

{
"apiVersion": "v1",
"kind": "ConfigMap",
"metadata": {
"name": "metadata-grpc-configmap",
"namespace": namespace,
},
"data": {
"METADATA_GRPC_SERVICE_HOST":
"metadata-grpc-service.kubeflow",
"METADATA_GRPC_SERVICE_PORT": "8080",
},

Therefore, it already works out of the box.

Because in Kubeflow on GCP, the default bucket is not mlpipeline, we need configurations that can set default pipeline root and credentials for each namespace, this is essentially #4649 for v2 compatible pipelines.

The following comments are investigation logs towards the conclusion I summarized here.

@Bobgy
Copy link
Contributor Author

Bobgy commented May 19, 2021

/assign @Bobgy

@Bobgy
Copy link
Contributor Author

Bobgy commented May 19, 2021

I'm trying to run v2 compatible pipeline on full Kubeflow, as expected, it fails when uploading to minio storage:

F0519 07:16:43.219511       1 main.go:56] Failed to execute component: failed to upload output artifact "output_dataset_one" to remote storage URI "minio://mlpipeline/v2/artifacts/two_step_pipeline/two-step-pipeline-z9pxp/preprocess/output_dataset_one": uploadFile(): unable to complete copying "/minio/mlpipeline/v2/artifacts/two_step_pipeline/two-step-pipeline-z9pxp/preprocess/output_dataset_one" to remote storage "two_step_pipeline/two-step-pipeline-z9pxp/preprocess/output_dataset_one": failed to close Writer for bucket: blob (key "two_step_pipeline/two-step-pipeline-z9pxp/preprocess/output_dataset_one") (code=Unknown): RequestError: send request failed
caused by: Put "
http://minio-service:9000/mlpipeline/two_step_pipeline/two-step-pipeline-z9pxp/preprocess/output_dataset_one
": dial tcp: lookup minio-service on 10.11.240.10:53: no such host

@Bobgy
Copy link
Contributor Author

Bobgy commented May 19, 2021

Interesting, when I switch pipeline_root to a gcs path like gs://gongyuan-dev/v2/.
The pipeline runs successfully, so finding mlmd grpc service host was not a problem.

I checked how that's configured -- the mlmd grpc service host and port come from k8s env vars:

command:
...
- '--mlmd_server_address'
- $(METADATA_GRPC_SERVICE_HOST)
- '--mlmd_server_port'
- $(METADATA_GRPC_SERVICE_PORT)

EDIT: I was wrong, the env vars come from a configmap, see

        envFrom:
        - configMapRef:
            name: metadata-grpc-configmap
            optional: true

@Bobgy
Copy link
Contributor Author

Bobgy commented May 19, 2021

Because configmaps needs to be deployed, I propose a simpler default mechanism:

  1. Find if env vars like MINIO_SERVICE_SERVICE_HOST and MINIO_SERVICE_SERVICE_PORT exists, if they do, then we are running in KFP single namespace mode, we can use these env vars to connect to minio service. (because the service is called minio-service, so it's env vars have double SERVICE words MINIO_SERVICE_SERVICE_HOST, we should avoid the service suffix in services in the future.
  2. If the env vars do not exist, we can assume we are running in KFP multi user mode, so default minio service should be minio-service.kubeflow:9000.
  3. If that fails, we'll just tell users it fails

Going forward, a proper configuration for artifact storage will need to be introduced -- that's when we can make this configurable.

@Bobgy
Copy link
Contributor Author

Bobgy commented May 19, 2021

UPDATE: after trying this out, I found that this doesn't work on Kubeflow on GCP, because we do not use the bucket mlpipeline by default -- instead, we use a GCS bucket.

Therefore, I think the current approach is probably fine for full Kubeflow on other platforms with MinIO, but not a good fit for Kubeflow on GCP.
We still need an easy to use configuration for default bucket + artifact store + credentials (if MinIO)

@Bobgy
Copy link
Contributor Author

Bobgy commented May 24, 2021

So actually, configurability of credentials is not a must have for making v2compat run on full Kubeflow.

It's enough to make default pipeline root configurable. For KF on GCP, we can default to gcs path. For others, using minio://mlpipeline is enough.

@Bobgy Bobgy added the size/L label May 24, 2021
google-oss-robot pushed a commit that referenced this issue Jun 2, 2021
…Fixes #5680 (#5697)

* feat(v2): enable v2 compatible mode in full Kubeflow

* fix

* fix

* move objectstore reelated code to separate package

* fix style
google-oss-robot pushed a commit that referenced this issue Jun 2, 2021
…ot. Part of #5680. Fixes #5704 (#5750)

* feat(deployment): configurable v2 compatible mode default pipeline root

* clarify documentation
@Bobgy
Copy link
Contributor Author

Bobgy commented Sep 7, 2021

/reopen
we need to keep track of integration with Kubeflow 1.4

@google-oss-robot
Copy link

@Bobgy: Reopened this issue.

In response to this:

/reopen
we need to keep track of integration with Kubeflow 1.4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Bobgy
Copy link
Contributor Author

Bobgy commented Sep 7, 2021

I think one issue that I previously missed is that, we need to make the new task API available from pipeline pods. That requires some istio authorization policy change.

@Bobgy
Copy link
Contributor Author

Bobgy commented Sep 7, 2021

TODOs:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants