pipelines: Increase default memroy limit for Mysql instance #980

zijianjoy · 2022-01-31T19:21:07Z

Which issue is resolved by this Pull Request:
Resolves #

Description of your changes:

This is because the 1.8.0-rc.0 introduced the memory request for mysql instance:

testing/test-infra/kfp/kfp-standalone-1/kustomize/upstream/third-party/mysql/base/mysql-deployment.yaml

Line 40 in 8644d09

memory: 800Mi

. But there is no default limit, which falls back to the LimitRange file in this PR.

Checklist:

If PR related to Optional-Test-Infra,

Changes need to be generated to aws/GitOps folder:
1. cd aws
2. make optional-generate
3. make optional-test

google-oss-prow · 2022-01-31T19:21:18Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: zijianjoy

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [zijianjoy]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

chensun · 2022-01-31T19:35:04Z

/lgtm

Thanks!

Bobgy · 2022-02-01T00:23:37Z

Hi @zijianjoy, did you consider adding a memory limit to the MySQL manifest instead?
The default limit affects how test pipeline pods can be scheduled, because those pods usually have very low memory usage. If I have a machine with 4GB memory, with 500M limit, we can schedule 8 pods on it, with 1GB limit, we can only schedule 4 pods on it, this basically increased resource usage by 2x.

Bobgy · 2022-02-01T00:39:46Z

kubeflow/pipelines#5148
See original context, ideally we also wanted to add a limit for every server. It's just that as a first step it was safer to only add requests.
Now, we can consider using test KFP usage to find a better default value.

Just my thoughts, take it with a grain of salt

chensun · 2022-02-01T10:13:14Z

Hi @zijianjoy, did you consider adding a memory limit to the MySQL manifest instead? The default limit affects how test pipeline pods can be scheduled, because those pods usually have very low memory usage. If I have a machine with 4GB memory, with 500M limit, we can schedule 8 pods on it, with 1GB limit, we can only schedule 4 pods on it, this basically increased resource usage by 2x.

Note that we have this 800Mi request for MySQL since about 10 months ago per change history: https://github.com/kubeflow/pipelines/blob/28ac092e927bd76294e9f29ad9f0ba929f6a3060/manifests/kustomize/third-party/mysql/base/mysql-deployment.yaml#L40

Looks like MySQL should be able perform under 512M memory: https://dev.mysql.com/doc/refman/5.7/en/memory-use.html
So it sounds good to me to test out changing MySQL manifest here.

That being said, I'm curious why didn't we set a lower default request (like 512M) with a higher default limit (like 1G)?

zijianjoy · 2022-02-01T10:24:30Z

@Bobgy It sounds good to me about setting the resource limit in mysql deployment, instead of the LimitRange. The question is: Does setting a resource limit in the KFP manifest cause potential OOM for heavy users? Or maybe we can set such limit for the kfp-ci only?

Another issue is we don't know what is causing the cluster in kfp-ci project to fail. Do we know which cluster should we check out for the gitops update on this cluster?

chensun · 2022-02-01T10:37:04Z

So it sounds good to me to test out changing MySQL manifest here.

I realize running make kfp-update will pull the manifest from kfp repo, so if we're going to change the MySQL manifest, we need to do it from kfp repo. Need to the change in the next release. So going to stick with the limitrange change for now to try to unblock CI.

Also I realize that this PR didn't propagate the change to the acm-repos/kfp-standalone-1/kfp-all.yaml file.
So I ran make hydrate-kfp-manifests, and committed the change, and sent it via #981 .

chensun · 2022-02-01T10:46:02Z

Another issue is we don't know what is causing the cluster in kfp-ci project to fail. Do we know which cluster should we check out for the gitops update on this cluster?

Follow up on this, @Bobgy we were trying to understand what's the tigger for updating the cluster in kfp-ci. It looks like it was managed by some manage cluster which is not in the same project. And the change we made here are not auto deployed on merge?

zijianjoy · 2022-02-02T00:15:42Z

Thank you Chen! I am able to only apply memory limit to mysql deployment in our CI cluster: #982.

Also thank you for running the make hydrate-kfp-manifests for this PR!

Bobgy · 2022-02-02T01:06:07Z

That being said, I'm curious why didn't we set a lower default request (like 512M) with a higher default limit (like 1G)?

This is a trade off.

If we set low request and high limit, probably more pods run without resource config, because their mem usage is between low and high.

However, remember that only requests affect pod scheduling. We may end up with a 4GB node with 8 pods each requesting 0.5GB mem and 1GB limit. They cannot all get the 1GB memory, because the node only has 4GB in total. Therefore, some of them may randomly OOM by coincidence.

For test infra, we hit random OOMs before, I think a better trade off is to configure more Pods with explicit memory request=limit and reduce the likelihood of random OOMs.

More thorough doc: https://cloud.google.com/blog/products/containers-kubernetes/kubernetes-best-practices-resource-requests-and-limits

pipelines: Increase default memroy limit for Mysql instance

1b7a7c1

zijianjoy requested review from chensun and Bobgy January 31, 2022 19:21

google-oss-prow bot requested review from pingsutw and theofpa January 31, 2022 19:21

google-oss-prow bot added approved size/XS labels Jan 31, 2022

google-oss-prow bot assigned chensun Jan 31, 2022

google-oss-prow bot added the lgtm label Jan 31, 2022

google-oss-prow bot merged commit 38e41b8 into kubeflow:master Jan 31, 2022

chensun mentioned this pull request Feb 1, 2022

test(kfp): update kfp-all.yaml #981

Merged

1 task

zijianjoy mentioned this pull request Feb 2, 2022

kfp: Set memory limit only for mysql #982

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pipelines: Increase default memroy limit for Mysql instance #980

pipelines: Increase default memroy limit for Mysql instance #980

zijianjoy commented Jan 31, 2022

google-oss-prow bot commented Jan 31, 2022

chensun commented Jan 31, 2022

Bobgy commented Feb 1, 2022

Bobgy commented Feb 1, 2022

chensun commented Feb 1, 2022

zijianjoy commented Feb 1, 2022

chensun commented Feb 1, 2022

chensun commented Feb 1, 2022

zijianjoy commented Feb 2, 2022

Bobgy commented Feb 2, 2022 •

edited

Loading

pipelines: Increase default memroy limit for Mysql instance #980

pipelines: Increase default memroy limit for Mysql instance #980

Conversation

zijianjoy commented Jan 31, 2022

google-oss-prow bot commented Jan 31, 2022

chensun commented Jan 31, 2022

Bobgy commented Feb 1, 2022

Bobgy commented Feb 1, 2022

chensun commented Feb 1, 2022

zijianjoy commented Feb 1, 2022

chensun commented Feb 1, 2022

chensun commented Feb 1, 2022

zijianjoy commented Feb 2, 2022

Bobgy commented Feb 2, 2022 • edited Loading

Bobgy commented Feb 2, 2022 •

edited

Loading