Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR] Default resource requirement/limits for the KFP UI and system services #5148

Closed
Bobgy opened this issue Feb 18, 2021 · 5 comments · Fixed by #5409
Closed

[FR] Default resource requirement/limits for the KFP UI and system services #5148

Bobgy opened this issue Feb 18, 2021 · 5 comments · Fixed by #5409
Assignees
Labels
help wanted The community is welcome to contribute.

Comments

@Bobgy
Copy link
Contributor

Bobgy commented Feb 18, 2021

UPDATE: at the end, we decided to only add resource requirements, see discussion in #5236 (comment)

It's desirable to provide a set of default resource requirement & limits for KFP UI & system services, to make sure their QoS is Guaranteed by default.

https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/
I'm not exactly sure what will be reasonable, because if they are set too low, the services may stop operating when there are workloads reaching a limit.
But setting them to make QoS Guaranteed is also important, because otherwise when there are many other workloads, KFP UI & API services may be evicted because default QoS is BestEffort and BestEffort Pods are the first to be evicted by Kubernetes when it runs out of resources.

@Bobgy Bobgy added the help wanted The community is welcome to contribute. label Feb 18, 2021
@Bobgy
Copy link
Contributor Author

Bobgy commented Feb 19, 2021

Got some help from Sid Palas:

A couple of example request settings:
ml-pipeline (api server)
        requests:
          cpu: '2'
          memory: 4Gi
ml-pipeline-ui
        requests:
          cpu: 10m
          memory: 70Mi
workflow-controller (argo)
        requests:
          cpu: 200m
          memory: 3Gi
minio
          requests:
            cpu: 20m
            memory: 25Mi
persistent-agent
          requests:
            cpu: 120m
            memory: 2Gi

see thread https://kubeflow.slack.com/archives/CE10KS9M4/p1613655024114300

@NikeNano
Copy link
Member

According to the argo documentation the memory and cpu usage for argo scales linearly with the nbr of workflows, see. So users will probably have to adjust this according if they are running heavier workloads or like to reduce costs.

I would be happy to update this!

/assign

@Bobgy
Copy link
Contributor Author

Bobgy commented Feb 26, 2021

thank you @NikeNano

@Bobgy
Copy link
Contributor Author

Bobgy commented Apr 1, 2021

/reopen
after #5273, we need to apply default resource requirement to argo pods again

@google-oss-robot
Copy link

@Bobgy: Reopened this issue.

In response to this:

/reopen
after #5273, we need to apply default resource requirement to argo pods again

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Bobgy added a commit that referenced this issue Apr 1, 2021
* fix(deployment): fix default resource requests

* fix mkp presubmit for rc version
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted The community is welcome to contribute.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants