Metrics scraper pod overloads and crashes #38

Joseph-Goergen · 2021-01-28T14:41:08Z

We have a cluster here that has 34 nodes and 2497 pod. The metrics scraper seemed to reach 5000m of cpu and 6.7G of memory before eventually crashing.

Dashboard version v2.0.5
Metric scraper version v1.0.6

The metrics scraper produces roughly 500000 log lines per hour and look like this

Jan 27 20:07:24 dashboard-metrics-scraper-5cccbddcc-fpr6k dashboard-metrics-scraper 172.30.160.74 - - [27/Jan/2021:18:07:24 +0000] "GET /api/v1/dashboard/nodes/<node>/metrics/cpu/usage_rate HTTP/1.1" 200 874 "" "dashboard/v2.0.5"
Jan 27 20:07:24 dashboard-metrics-scraper-5cccbddcc-fpr6k dashboard-metrics-scraper 172.30.160.74 - - [27/Jan/2021:18:07:24 +0000] "GET /api/v1/dashboard/nodes/<node>/metrics/cpu/usage_rate HTTP/1.1" 200 875 "" "dashboard/v2.0.5"
Jan 27 20:07:24 dashboard-metrics-scraper-5cccbddcc-fpr6k dashboard-metrics-scraper 172.30.160.74 - - [27/Jan/2021:18:07:24 +0000] "GET /api/v1/dashboard/nodes/<node>/metrics/cpu/usage_rate HTTP/1.1" 200 878 "" "dashboard/v2.0.5"
Jan 27 20:07:24 dashboard-metrics-scraper-5cccbddcc-fpr6k dashboard-metrics-scraper 172.30.160.74 - - [27/Jan/2021:18:07:24 +0000] "GET /api/v1/dashboard/nodes/<node>/metrics/cpu/usage_rate HTTP/1.1" 200 888 "" "dashboard/v2.0.5"
Jan 27 20:07:24 dashboard-metrics-scraper-5cccbddcc-fpr6k dashboard-metrics-scraper 172.30.160.74 - - [27/Jan/2021:18:07:24 +0000] "GET /api/v1/dashboard/nodes/<node>/metrics/cpu/usage_rate HTTP/1.1" 200 892 "" "dashboard/v2.0.5"
Jan 27 20:07:24 dashboard-metrics-scraper-5cccbddcc-fpr6k dashboard-metrics-scraper 172.30.160.74 - - [27/Jan/2021:18:07:24 +0000] "GET /api/v1/dashboard/nodes/<node>/metrics/cpu/usage_rate HTTP/1.1" 200 891 "" "dashboard/v2.0.5"

It seems like it's handling the requests like it should, it's just getting overloaded and can't handle that very well. I think adding a cpu and memory limit wouldn't help very much because I think that also causes the pod to keep crashing once it hits it.
This is about as much info as I have about it on this cluster. The user did delete the pod and it came back and overloaded and crashed again.

The text was updated successfully, but these errors were encountered:

fejta-bot · 2021-04-28T14:57:12Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

fejta-bot · 2021-05-28T15:19:59Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

fejta-bot · 2021-06-27T15:54:47Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

k8s-ci-robot · 2021-06-27T15:54:50Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Joseph-Goergen · 2021-06-28T13:26:08Z

/reopen

k8s-ci-robot · 2021-06-28T13:26:12Z

@Joseph-Goergen: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Joseph-Goergen · 2021-06-28T13:26:23Z

/remove-lifecycle rotten

k8s-triage-robot · 2021-09-26T13:39:22Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Joseph-Goergen · 2021-09-27T14:13:11Z

/remove-lifecycle stale

k8s-triage-robot · 2021-12-26T14:17:15Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Joseph-Goergen · 2022-01-04T15:00:34Z

/remove-lifecycle stale

k8s-triage-robot · 2022-04-04T15:43:30Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2022-05-04T16:12:10Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

maciaszczykm · 2022-05-10T07:58:32Z

/lifecycle frozen

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 28, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 28, 2021

k8s-ci-robot closed this as completed Jun 27, 2021

k8s-ci-robot reopened this Jun 28, 2021

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jun 28, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 26, 2021

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 27, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 26, 2021

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 4, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 4, 2022

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 4, 2022

k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels May 10, 2022

maciaszczykm mentioned this issue Jul 7, 2023

[Metrics scraper] Metrics scraper pod overloads and crashes kubernetes/dashboard#8015

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metrics scraper pod overloads and crashes #38

Metrics scraper pod overloads and crashes #38

Joseph-Goergen commented Jan 28, 2021

fejta-bot commented Apr 28, 2021

fejta-bot commented May 28, 2021

fejta-bot commented Jun 27, 2021

k8s-ci-robot commented Jun 27, 2021

Joseph-Goergen commented Jun 28, 2021

k8s-ci-robot commented Jun 28, 2021

Joseph-Goergen commented Jun 28, 2021

k8s-triage-robot commented Sep 26, 2021

Joseph-Goergen commented Sep 27, 2021

k8s-triage-robot commented Dec 26, 2021

Joseph-Goergen commented Jan 4, 2022

k8s-triage-robot commented Apr 4, 2022

k8s-triage-robot commented May 4, 2022

maciaszczykm commented May 10, 2022

Metrics scraper pod overloads and crashes #38

Metrics scraper pod overloads and crashes #38

Comments

Joseph-Goergen commented Jan 28, 2021

fejta-bot commented Apr 28, 2021

fejta-bot commented May 28, 2021

fejta-bot commented Jun 27, 2021

k8s-ci-robot commented Jun 27, 2021

Joseph-Goergen commented Jun 28, 2021

k8s-ci-robot commented Jun 28, 2021

Joseph-Goergen commented Jun 28, 2021

k8s-triage-robot commented Sep 26, 2021

Joseph-Goergen commented Sep 27, 2021

k8s-triage-robot commented Dec 26, 2021

Joseph-Goergen commented Jan 4, 2022

k8s-triage-robot commented Apr 4, 2022

k8s-triage-robot commented May 4, 2022

maciaszczykm commented May 10, 2022