Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Testing] postsubmit mkp test failure 2021.3.4 #5236

Closed
Bobgy opened this issue Mar 4, 2021 · 11 comments · Fixed by #5237
Closed

[Testing] postsubmit mkp test failure 2021.3.4 #5236

Bobgy opened this issue Mar 4, 2021 · 11 comments · Fixed by #5237
Assignees

Comments

@Bobgy
Copy link
Contributor

Bobgy commented Mar 4, 2021

First failing mkp test: https://oss-prow.knative.dev/view/gs/oss-prow/logs/kubeflow-pipeline-postsubmit-mkp-e2e-test/1365452157549023232
Root cause seems to be: #5158

@Bobgy Bobgy changed the title [Testing] postsubmit integration and mkp test failure 2021.3.4 [Testing] postsubmit mkp test failure 2021.3.4 Mar 4, 2021
@Bobgy
Copy link
Contributor Author

Bobgy commented Mar 4, 2021

The failure: https://oss-prow.knative.dev/view/gs/oss-prow/logs/kubeflow-pipeline-postsubmit-mkp-e2e-test/1365452157549023232#1:build-log.txt%3A7816

Step #1 - "verify": 37s Warning FailedScheduling pod/ml-pipeline-55bbc45946-m7s66 0/3 nodes are available: 2 Insufficient memory, 3 Insufficient cpu.

@Bobgy
Copy link
Contributor Author

Bobgy commented Mar 4, 2021

It seems that the added resource request make it impossible to schedule ml-pipeline pod in the cluster.

@Bobgy
Copy link
Contributor Author

Bobgy commented Mar 4, 2021

The default cluster for marketplace has 3 nodes each with 2 CPUs and 3GB memory allocatable.
So the new requirements set in #5158 seems too large, it marked 4GB memory for ml-pipeline server. I assume it was prepared more for large scale production env.

We need to reduce the request to fit into the default cluster.

@Bobgy
Copy link
Contributor Author

Bobgy commented Mar 4, 2021

/cc @NikeNano

@Bobgy
Copy link
Contributor Author

Bobgy commented Mar 4, 2021

/assign

I'll fix this

@Bobgy
Copy link
Contributor Author

Bobgy commented Mar 4, 2021

Another issue to address is that, mkp test should be triggered on presubmit if a PR touches MKP manifest. Let me add the auto trigger.

@Bobgy
Copy link
Contributor Author

Bobgy commented Mar 4, 2021

@Bobgy
Copy link
Contributor Author

Bobgy commented Mar 4, 2021

Some investigation into other OSS projects, argo doesn't provide default for requests/limits for the most part: https://github.com/argoproj/argo-workflows/tree/master/manifests.
There's a doc page to introduce how to improve cost optimization: https://argoproj.github.io/argo-workflows/cost-optimisation/.

@Bobgy
Copy link
Contributor Author

Bobgy commented Mar 4, 2021

I think we'll need an operator manual documentation to tell people how to adjust their resource requests, but maybe as a next step.
Adding some requests as default only make it better than before -- without setting it, it's assumed to be 0. Also, when requested number is reached, the Pod is not killed like a limit.

@NikeNano
Copy link
Member

NikeNano commented Mar 4, 2021

I think we'll need an operator manual documentation to tell people how to adjust their resource requests, but maybe as a next step.

I think this sounds good, also if we could provide some ball park figures. I guess however if we set them to high we will request more resources than actually used for most people.

@Bobgy
Copy link
Contributor Author

Bobgy commented Apr 1, 2021

Caused by #5148

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants