Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pipelines: Increase default memroy limit for Mysql instance #980

Merged
merged 1 commit into from
Jan 31, 2022

Conversation

zijianjoy
Copy link
Contributor

Which issue is resolved by this Pull Request:
Resolves #

Description of your changes:

This is because the 1.8.0-rc.0 introduced the memory request for mysql instance:

. But there is no default limit, which falls back to the LimitRange file in this PR.

Checklist:

If PR related to Optional-Test-Infra,

  • Changes need to be generated to aws/GitOps folder:
    1. cd aws
    2. make optional-generate
    3. make optional-test

@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: zijianjoy

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@chensun
Copy link
Member

chensun commented Jan 31, 2022

/lgtm

Thanks!

@google-oss-prow google-oss-prow bot added the lgtm label Jan 31, 2022
@google-oss-prow google-oss-prow bot merged commit 38e41b8 into kubeflow:master Jan 31, 2022
@Bobgy
Copy link
Contributor

Bobgy commented Feb 1, 2022

Hi @zijianjoy, did you consider adding a memory limit to the MySQL manifest instead?
The default limit affects how test pipeline pods can be scheduled, because those pods usually have very low memory usage. If I have a machine with 4GB memory, with 500M limit, we can schedule 8 pods on it, with 1GB limit, we can only schedule 4 pods on it, this basically increased resource usage by 2x.

@Bobgy
Copy link
Contributor

Bobgy commented Feb 1, 2022

kubeflow/pipelines#5148
See original context, ideally we also wanted to add a limit for every server. It's just that as a first step it was safer to only add requests.
Now, we can consider using test KFP usage to find a better default value.

Just my thoughts, take it with a grain of salt

@chensun
Copy link
Member

chensun commented Feb 1, 2022

Hi @zijianjoy, did you consider adding a memory limit to the MySQL manifest instead? The default limit affects how test pipeline pods can be scheduled, because those pods usually have very low memory usage. If I have a machine with 4GB memory, with 500M limit, we can schedule 8 pods on it, with 1GB limit, we can only schedule 4 pods on it, this basically increased resource usage by 2x.

Note that we have this 800Mi request for MySQL since about 10 months ago per change history: https://github.com/kubeflow/pipelines/blob/28ac092e927bd76294e9f29ad9f0ba929f6a3060/manifests/kustomize/third-party/mysql/base/mysql-deployment.yaml#L40

Looks like MySQL should be able perform under 512M memory: https://dev.mysql.com/doc/refman/5.7/en/memory-use.html
So it sounds good to me to test out changing MySQL manifest here.

That being said, I'm curious why didn't we set a lower default request (like 512M) with a higher default limit (like 1G)?

@zijianjoy
Copy link
Contributor Author

@Bobgy It sounds good to me about setting the resource limit in mysql deployment, instead of the LimitRange. The question is: Does setting a resource limit in the KFP manifest cause potential OOM for heavy users? Or maybe we can set such limit for the kfp-ci only?

Another issue is we don't know what is causing the cluster in kfp-ci project to fail. Do we know which cluster should we check out for the gitops update on this cluster?

@chensun chensun mentioned this pull request Feb 1, 2022
1 task
@chensun
Copy link
Member

chensun commented Feb 1, 2022

So it sounds good to me to test out changing MySQL manifest here.

I realize running make kfp-update will pull the manifest from kfp repo, so if we're going to change the MySQL manifest, we need to do it from kfp repo. Need to the change in the next release. So going to stick with the limitrange change for now to try to unblock CI.

Also I realize that this PR didn't propagate the change to the acm-repos/kfp-standalone-1/kfp-all.yaml file.
So I ran make hydrate-kfp-manifests, and committed the change, and sent it via #981 .

@chensun
Copy link
Member

chensun commented Feb 1, 2022

Another issue is we don't know what is causing the cluster in kfp-ci project to fail. Do we know which cluster should we check out for the gitops update on this cluster?

Follow up on this, @Bobgy we were trying to understand what's the tigger for updating the cluster in kfp-ci. It looks like it was managed by some manage cluster which is not in the same project. And the change we made here are not auto deployed on merge?

@zijianjoy
Copy link
Contributor Author

Thank you Chen! I am able to only apply memory limit to mysql deployment in our CI cluster: #982.

Also thank you for running the make hydrate-kfp-manifests for this PR!

@Bobgy
Copy link
Contributor

Bobgy commented Feb 2, 2022

That being said, I'm curious why didn't we set a lower default request (like 512M) with a higher default limit (like 1G)?

This is a trade off.

If we set low request and high limit, probably more pods run without resource config, because their mem usage is between low and high.

However, remember that only requests affect pod scheduling. We may end up with a 4GB node with 8 pods each requesting 0.5GB mem and 1GB limit. They cannot all get the 1GB memory, because the node only has 4GB in total. Therefore, some of them may randomly OOM by coincidence.

For test infra, we hit random OOMs before, I think a better trade off is to configure more Pods with explicit memory request=limit and reduce the likelihood of random OOMs.

More thorough doc: https://cloud.google.com/blog/products/containers-kubernetes/kubernetes-best-practices-resource-requests-and-limits

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants