Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Master/202411] [Chassis] very high CPU on zebra after performing port toggle on all interfaces simultaneously #21008

Open
mannytaheri opened this issue Dec 3, 2024 · 5 comments
Labels
Triaged this issue has been triaged

Comments

@mannytaheri
Copy link

Description

 After doing port toggle on all interfaces simultaneously, zebra CPU starts going very high (250% to 300%) as soon as interfaces start coming up and will continue to stay high for another 6 to 8 minutes after all the interfaces are up.
top - 21:14:36 up  2:46,  2 users,  load average: 5.67, 3.78, 2.99
Tasks: 483 total,   6 running, 467 sleeping,   0 stopped,  10 zombie
%Cpu(s): 39.5 us, 29.6 sy,  0.0 ni, 30.3 id,  0.0 wa,  0.0 hi,  0.6 si,  0.0 st 
MiB Mem :  31961.9 total,   8002.3 free,  21398.8 used,   3562.5 buff/cache     
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  10563.2 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                             
   9134 300       20   0 7385448   6.4g  10660 R 315.2  20.4  29:24.39 zebra                                                                                                               
   9133 300       20   0 5563880   4.6g  10516 R 263.6  14.9  19:09.57 zebra                                                                                                               
   9271 300       20   0 1439408   1.0g  14672 R  97.7   3.3   5:28.94 bgpd                                                                                                                
   2704 uuidd     20   0  415012  82816  10128 R  89.4   0.3  27:15.86 redis-server                                                                                                        
   2696 uuidd     20   0  308516  84204  10124 R  88.7   0.3  24:23.72 redis-server    

Steps to reproduce the issue:

  1. Bring down all interfaces simultaneously. Ensure all interfaces are down.
  2. Bring up all interfaces simultaneously. Check CPU usage.
  3. Continue monitoring CPU usage after all interfaces are up. The CPU for zebra will stay high for 6 to 8 minutes after all interfaces are up.

Describe the results you received:

very high CPU on zebra

Describe the results you expected:

The CPU usage should go down after interfaces are up

Output of show version:

Latest master image

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

@saksarav-nokia
Copy link
Contributor

saksarav-nokia commented Dec 6, 2024

@arlakshm @abdosi @rlhui @cscarpitta for your viz
With the latest github master, we are seeing high memory usage/memory leak and IMM reboots when the ports are toggled with 34K routes. We had the same issue with 202405 before the FRR memory optimization commit was made.
We believe that the following commits re-introduced the high memory issue and we don't see if we revert these 2 commits.
#18715
#20585

We even tried the image with above 2 commits + #20269, but still see the memory issue.

@rlhui
Copy link
Contributor

rlhui commented Dec 11, 2024

@kperumalbfn , @abdosi - please note this

@rlhui rlhui added the Triaged this issue has been triaged label Dec 11, 2024
@rlhui rlhui changed the title [Chassis] very high CPU on zebra after performing port toggle on all interfaces simultaneously [Master/202411] [Chassis] very high CPU on zebra after performing port toggle on all interfaces simultaneously Dec 11, 2024
@kperumalbfn
Copy link
Contributor

@cscarpitta @ahsalam Could you please check the high CPU and memory usage

@ahsalam
Copy link

ahsalam commented Jan 8, 2025

@mannytaheri @kperumalbfn @rlhui @saksarav-nokia This issue has been resolved in this PR (#21146) which has been merged.

@kperumalbfn
Copy link
Contributor

Thanks @ahsalam We will add the fix to 202411 branch as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Triaged this issue has been triaged
Projects
Status: No status
Development

No branches or pull requests

5 participants