[Testing] postsubmit mkp test failure 2021.3.4 #5236

Bobgy · 2021-03-04T02:50:29Z

First failing mkp test: https://oss-prow.knative.dev/view/gs/oss-prow/logs/kubeflow-pipeline-postsubmit-mkp-e2e-test/1365452157549023232
Root cause seems to be: #5158

Bobgy · 2021-03-04T02:56:41Z

The failure: https://oss-prow.knative.dev/view/gs/oss-prow/logs/kubeflow-pipeline-postsubmit-mkp-e2e-test/1365452157549023232#1:build-log.txt%3A7816

Step #1 - "verify": 37s Warning FailedScheduling pod/ml-pipeline-55bbc45946-m7s66 0/3 nodes are available: 2 Insufficient memory, 3 Insufficient cpu.

Bobgy · 2021-03-04T02:57:39Z

It seems that the added resource request make it impossible to schedule ml-pipeline pod in the cluster.

Bobgy · 2021-03-04T03:15:26Z

The default cluster for marketplace has 3 nodes each with 2 CPUs and 3GB memory allocatable.
So the new requirements set in #5158 seems too large, it marked 4GB memory for ml-pipeline server. I assume it was prepared more for large scale production env.

We need to reduce the request to fit into the default cluster.

Bobgy · 2021-03-04T03:15:40Z

/cc @NikeNano

Bobgy · 2021-03-04T03:15:47Z

/assign

I'll fix this

Bobgy · 2021-03-04T03:17:16Z

Another issue to address is that, mkp test should be triggered on presubmit if a PR touches MKP manifest. Let me add the auto trigger.

Bobgy · 2021-03-04T04:26:12Z

Some reading on requests & limits: https://cloud.google.com/blog/products/containers-kubernetes/kubernetes-best-practices-resource-requests-and-limits

Bobgy · 2021-03-04T04:43:38Z

Some investigation into other OSS projects, argo doesn't provide default for requests/limits for the most part: https://github.com/argoproj/argo-workflows/tree/master/manifests.
There's a doc page to introduce how to improve cost optimization: https://argoproj.github.io/argo-workflows/cost-optimisation/.

Bobgy · 2021-03-04T05:22:57Z

I think we'll need an operator manual documentation to tell people how to adjust their resource requests, but maybe as a next step.
Adding some requests as default only make it better than before -- without setting it, it's assumed to be 0. Also, when requested number is reached, the Pod is not killed like a limit.

NikeNano · 2021-03-04T18:42:57Z

I think we'll need an operator manual documentation to tell people how to adjust their resource requests, but maybe as a next step.

I think this sounds good, also if we could provide some ball park figures. I guess however if we set them to high we will request more resources than actually used for most people.

Bobgy · 2021-04-01T02:07:40Z

Caused by #5148

Bobgy changed the title ~~[Testing] postsubmit integration and mkp test failure 2021.3.4~~ [Testing] postsubmit mkp test failure 2021.3.4 Mar 4, 2021

google-oss-robot assigned Bobgy Mar 4, 2021

Bobgy mentioned this issue Mar 4, 2021

test(kfp): add run_if_changed config for mkp presubmit GoogleCloudPlatform/oss-test-infra#724

Merged

Bobgy mentioned this issue Mar 4, 2021

deployment: adjust default resource requests. Fixes #5236 #5237

Merged

2 tasks

Bobgy closed this as completed in #5237 Mar 4, 2021

Bobgy mentioned this issue Apr 1, 2021

[FR] Default resource requirement/limits for the KFP UI and system services #5148

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Testing] postsubmit mkp test failure 2021.3.4 #5236

[Testing] postsubmit mkp test failure 2021.3.4 #5236

Bobgy commented Mar 4, 2021 •

edited

Loading

Bobgy commented Mar 4, 2021

Bobgy commented Mar 4, 2021

Bobgy commented Mar 4, 2021

Bobgy commented Mar 4, 2021

Bobgy commented Mar 4, 2021

Bobgy commented Mar 4, 2021

Bobgy commented Mar 4, 2021

Bobgy commented Mar 4, 2021

Bobgy commented Mar 4, 2021 •

edited

Loading

NikeNano commented Mar 4, 2021 •

edited

Loading

Bobgy commented Apr 1, 2021

[Testing] postsubmit mkp test failure 2021.3.4 #5236

[Testing] postsubmit mkp test failure 2021.3.4 #5236

Comments

Bobgy commented Mar 4, 2021 • edited Loading

Bobgy commented Mar 4, 2021

Bobgy commented Mar 4, 2021

Bobgy commented Mar 4, 2021

Bobgy commented Mar 4, 2021

Bobgy commented Mar 4, 2021

Bobgy commented Mar 4, 2021

Bobgy commented Mar 4, 2021

Bobgy commented Mar 4, 2021

Bobgy commented Mar 4, 2021 • edited Loading

NikeNano commented Mar 4, 2021 • edited Loading

Bobgy commented Apr 1, 2021

Bobgy commented Mar 4, 2021 •

edited

Loading

Bobgy commented Mar 4, 2021 •

edited

Loading

NikeNano commented Mar 4, 2021 •

edited

Loading