Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubectl argo rollouts dashboard OOM kill #2095

Closed
kzcPo opened this issue Jun 14, 2022 · 18 comments · Fixed by #3966
Closed

kubectl argo rollouts dashboard OOM kill #2095

kzcPo opened this issue Jun 14, 2022 · 18 comments · Fixed by #3966
Labels
bug Something isn't working dashboard

Comments

@kzcPo
Copy link

kzcPo commented Jun 14, 2022

Summary

What happened/what you expected to happen?
kubectl argo rollouts dashboard high memory usage.
image

before long time, this pod OOM kill!!

Diagnostics

kubectl-argo-rollouts: v1.2.1+51c874c
  BuildDate: 2022-05-13T20:40:45Z
  GitCommit: 51c874cb18e6adccf677766ac561c3dbf69a8ec1
  GitTreeState: clean
  GoVersion: go1.17.6
  Compiler: gc
  Platform: linux/amd64

Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.

@kzcPo kzcPo added the bug Something isn't working label Jun 14, 2022
@perenesenko
Copy link
Member

@kzcPo Could you provide steps to reproduce?
How many rollouts
Manifest for rollout

@kzcPo
Copy link
Author

kzcPo commented Jun 15, 2022

@kzcPo Could you provide steps to reproduce? How many rollouts Manifest for rollout

  1. I run a command kubectl argo rollouts dashboard.
  2. My program calls API every 10s. /api/v1/rollouts/online/info
  3. Through dashboash, you can see that the response time of this interface is 3S.

image

  1. At this time, the pod CPU continues to grow until the OOM kill.

It also takes 3S to execute by command.

[root@argo-rollouts-dashboard-5cf947cdc9-ltkmn /]# time kubectl argo rollouts get rollout eff-noahbe -n online
Name:            eff-noahbe
Namespace:       online
Status:          ✔ Healthy
Strategy:        BlueGreen
Images:          xxxx.com/taqu/eff-noahbe:online_202206141748_release_v2 (stable, active)
Replicas:
  Desired:       1
  Current:       1
  Updated:       1
  Ready:         1
  Available:     1

NAME                                    KIND        STATUS        AGE    INFO
⟳ eff-noahbe                            Rollout     ✔ Healthy     14d    
├──# revision:7                                                          
│  └──⧉ eff-noahbe-79dc855c76           ReplicaSet  ✔ Healthy     15h    stable,active
│     └──□ eff-noahbe-79dc855c76-rbqs5  Pod         ✔ Running     15h    ready:1/1
├──# revision:6                                                          
│  └──⧉ eff-noahbe-7b7fbc6cf6           ReplicaSet  • ScaledDown  4d18h  
├──# revision:5                                                          
│  └──⧉ eff-noahbe-74675b9cff           ReplicaSet  • ScaledDown  4d22h  
├──# revision:4                                                          
│  └──⧉ eff-noahbe-5ccfdd6d88           ReplicaSet  • ScaledDown  5d18h  
├──# revision:3                                                          
│  └──⧉ eff-noahbe-7fb8d7ffbd           ReplicaSet  • ScaledDown  7d22h  
├──# revision:2                                                          
│  └──⧉ eff-noahbe-bbdd56bf             ReplicaSet  • ScaledDown  12d    
└──# revision:1                                                          
   └──⧉ eff-noahbe-59b98696c7           ReplicaSet  • ScaledDown  14d    

real    0m2.956s
user    0m3.185s
sys     0m0.743s

Is there an API server that can be provided for my program calls?
Can this query be optimized for time consumption?

@github-actions
Copy link
Contributor

This issue is stale because it has been open 60 days with no activity.

@harikrongali harikrongali added this to the v1.4 milestone Oct 20, 2022
@zachaller zachaller removed this from the v1.4 milestone Dec 1, 2022
@github-actions
Copy link
Contributor

This issue is stale because it has been open 60 days with no activity.

@kzcPo
Copy link
Author

kzcPo commented Jan 31, 2023 via email

@github-actions
Copy link
Contributor

github-actions bot commented Apr 3, 2023

This issue is stale because it has been open 60 days with no activity.

@kzcPo
Copy link
Author

kzcPo commented Apr 3, 2023 via email

@github-actions
Copy link
Contributor

github-actions bot commented Jun 3, 2023

This issue is stale because it has been open 60 days with no activity.

@kzcPo
Copy link
Author

kzcPo commented Jun 3, 2023 via email

@github-actions
Copy link
Contributor

github-actions bot commented Aug 4, 2023

This issue is stale because it has been open 60 days with no activity.

@kzcPo
Copy link
Author

kzcPo commented Aug 4, 2023 via email

@github-actions
Copy link
Contributor

github-actions bot commented Oct 5, 2023

This issue is stale because it has been open 60 days with no activity.

@kzcPo
Copy link
Author

kzcPo commented Oct 5, 2023 via email

@m-shalenko
Copy link

Hello, guys! Are there any updates?

@kzcPo
Copy link
Author

kzcPo commented Dec 11, 2023 via email

@michellebeard
Copy link

Hello! Are there any updates. I have also observed a similar behavior with Argo Rollouts Dashboard being OOM killed due to rising memory. We expect that there is a memory leak.

@y-elip
Copy link

y-elip commented Aug 27, 2024

It appears that even in the latest version 1.7.2, there is still a memory leak in the Argo Rollouts dashboard. With just 9 rollouts across 2 applications, after only an hour of simple browsing, it starts consuming over 200MB of RAM.

Just a quick update: I investigated the cause of the RAM usage increase, and it seems that each time a Rollout is in progress, something is being added to memory and not cleared afterward. This might be related to specific strategy parameters, as we don't observe this behavior on the staging cluster.

Upd 2: it looks like the issue is when Rollout with unlimited pause is processing
( - pause: {} )

@ppaez-life360
Copy link

ppaez-life360 commented Oct 2, 2024

We observed a similar behavior. The memory usage will increase over time, and it will stop only when the pod is terminated (the screenshot from below is from a cluster with ~20 rollouts):
Screenshot 2024-10-02 at 10 38 39

unrolled added a commit to unrolled/argo-rollouts that referenced this issue Nov 29, 2024
unrolled added a commit to unrolled/argo-rollouts that referenced this issue Nov 29, 2024
Rizwana777 pushed a commit to Rizwana777/argo-rollouts that referenced this issue Dec 12, 2024
…proj#2095 (argoproj#3966)

fix(dashboard): cleanup viewcontroller after each request. Fixes argoproj#2095

Signed-off-by: Cory Jacobsen <cory@7shifts.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working dashboard
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants