Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kindling-agent导致k8s节点cpu 使用率和load暴增,影响节点稳定性 #591

Closed
yanhongchang opened this issue Nov 9, 2023 · 4 comments
Labels
bug Something isn't working

Comments

@yanhongchang
Copy link
Contributor

Describe the bug
部署了kindling-agent服务后,过了约5个小时后,发现集群部分节点load突然增高,影响到集群的稳定性。
并且是三个节点同时出现该问题。如下是三个节点的负载情况:
image

image

image

并且在这个时间点,系统日志开始大量打印如下日志:

image

持续打印,简单统计下,发现每分钟日志量达8K+
image

How to reproduce?
kindling-agent daemonset 和configmap文件如下:

deploy+kindlingcfg.zip

What did you expect to see?
正常情况不应该出现这个问题

What did you see instead?

Screenshots

Logs

Environment (please complete the following information)

  • Kindling agent version v0.8.1
  • Kindlinng-falcon-lib version
  • Node OS version CentOS Linux release 7.7.1908 (Core)
  • Node Kernel version 3.10.0-1062.18.1.el7.x86_64
  • Kubernetes version 1.6 v1.16.9
@yanhongchang yanhongchang added the bug Something isn't working label Nov 9, 2023
@dxsup
Copy link
Member

dxsup commented Nov 9, 2023

这个问题在使用内核模块时可能会出现。系统触发了epoll事件,而内核模块处理该事件的流程较长,会持续打印内核日志,导致CPU load升高。该问题已经在 #590 中解决。

@yanhongchang
Copy link
Contributor Author

这个问题在使用内核模块时可能会出现。系统触发了epoll事件,而内核模块处理该事件的流程较长,会持续打印内核日志,导致CPU load升高。该问题已经在 #590 中解决。

感谢,不过我看 #590 修复是是内核模块去掉了epoll事件的输出,那去掉对epoll事件的输出和没去掉之前有功能上的差异吗?比如某些需要的事件会没有了?

@dxsup
Copy link
Member

dxsup commented Nov 9, 2023

该事件主要在Trace Profiling功能中需要,网络指标中不受影响。

在Trace Profiling功能中,如果使用内核模块,在线程上会无法看到epoll事件的详细信息。

@yanhongchang
Copy link
Contributor Author

该事件主要在Trace Profiling功能中需要,网络指标中不受影响。

在Trace Profiling功能中,如果使用内核模块,在线程上会无法看到epoll事件的详细信息。

收到,感谢!

@dxsup dxsup closed this as completed Nov 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants