Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add kubernetes.container.status.last.reason metric #30306

Conversation

tetianakravchenko
Copy link
Contributor

Signed-off-by: Tetiana Kravchenko tetiana.kravchenko@elastic.co

What does this PR do?

Introduce new metric kubernetes.container.status.last.reason

Why is it important?

we do already have kubernetes.container.status.reason (that relies on kube_pod_container_status_terminated_reason and kube_pod_container_status_waiting_reason - https://github.com/elastic/beats/blob/main/metricbeat/module/kubernetes/state_container/state_container.go#L74-L75 ), but this one will only be reported when the container/pod is actually down. This might be a quite short period of time ( this is the initial reason of adding kube_pod_container_status_last_terminated_reason - kubernetes/kube-state-metrics#535)

I did not map kube_pod_container_status_last_terminated_reason to kubernetes.container.status.reason, because existence of kube_pod_container_status_last_terminated_reason do not exclude kube_pod_container_status_waiting_reason or kube_pod_container_status_terminated_reason ( for kube_pod_container_status_waiting_reason and kube_pod_container_status_terminated_reason it is fine to be maped to the same kubernetes.container.status.reason, because those stated exclude each other - https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#container-states)

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

Related issues

Use cases

described in #27840:

 In incidents the customer experiences, the pod is restarting so fast, there is only a little chance to capture this event.

Screenshots

Logs

Signed-off-by: Tetiana Kravchenko <tetiana.kravchenko@elastic.co>
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Feb 9, 2022
@tetianakravchenko tetianakravchenko added the Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team label Feb 9, 2022
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Feb 9, 2022
@mergify
Copy link
Contributor

mergify bot commented Feb 9, 2022

This pull request does not have a backport label. Could you fix it @tetianakravchenko? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v./d./d./d is the label to automatically backport to the 7./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

@mergify mergify bot added the backport-skip Skip notification from the automated backport with mergify label Feb 9, 2022
@tetianakravchenko tetianakravchenko added the backport-v8.0.0 Automated backport with mergify label Feb 9, 2022
@mergify mergify bot removed the backport-skip Skip notification from the automated backport with mergify label Feb 9, 2022
@elasticmachine
Copy link
Collaborator

elasticmachine commented Feb 9, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-02-11T08:55:51.937+0000

  • Duration: 41 min 40 sec

Test stats 🧪

Test Results
Failed 0
Passed 708
Skipped 248
Total 956

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@@ -180,6 +180,7 @@ https://github.com/elastic/beats/compare/v7.0.0-alpha2...main[Check the HEAD dif
- Add `xpack.enabled` support for Enterprise Search module. {pull}29871[29871]
- Add gcp firestore metricset. {pull}29918[29918]
- Remove strict parsing on RabbitMQ module {pull}30090[30090]
- Add `kubernetes.container.status.last.reason` metric {issue}27840|[27840]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Add `kubernetes.container.status.last.reason` metric {issue}27840|[27840]
- Add `kubernetes.container.status.last.reason` metric {issue}30306|[30306]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the issue number is 27840, but anyway - changed it to pull 28f2f7e

Copy link
Contributor

@MichaelKatsoulis MichaelKatsoulis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general it looks good to me. The only thing I am not 100% sure about is the name of the field. kubernetes.container.status.last.reason will refer only to reasons of termination. I mean that while kubernetes.container.status.reason can be either waiting or terminated, the new field is only for terminated. We could name it kubernetes.container.status.last.terminated_reason to be more precise. I won't insist on that. Just raising a concern. I would like also @ChrsMark opinion

Copy link
Member

@ChrsMark ChrsMark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would go with sth like kubernetes.container.status.terminated_reason.

@tetianakravchenko
Copy link
Contributor Author

@ChrsMark kubernetes.container.status.terminated_reason might be confusing and potentially mixed up with metric kube_pod_container_status_terminated_reason, that describes the reason the pod is currently in terminated state.

The most precise naming would be kubernetes.container.status.last.terminated_reason as @MichaelKatsoulis suggested, or maybe kubernetes.container.status.last_terminated_reason if we want to avoid additional layer of nested object. Let me know what do you think about it.

@ChrsMark
Copy link
Member

+1 for kubernetes.container.status.last_terminated_reason

Signed-off-by: Tetiana Kravchenko <tetiana.kravchenko@elastic.co>
Signed-off-by: Tetiana Kravchenko <tetiana.kravchenko@elastic.co>
Copy link
Member

@ChrsMark ChrsMark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@tetianakravchenko
Copy link
Contributor Author

/test

@tetianakravchenko tetianakravchenko merged commit aaa36aa into elastic:main Feb 11, 2022
mergify bot pushed a commit that referenced this pull request Feb 11, 2022
* add kubernetes.container.status.last.reason metric

Signed-off-by: Tetiana Kravchenko <tetiana.kravchenko@elastic.co>

* rename status.last.reason -> status.last_terminated_reason

Signed-off-by: Tetiana Kravchenko <tetiana.kravchenko@elastic.co>

* fix changelog

Signed-off-by: Tetiana Kravchenko <tetiana.kravchenko@elastic.co>
(cherry picked from commit aaa36aa)

# Conflicts:
#	metricbeat/module/kubernetes/fields.go
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-v8.0.0 Automated backport with mergify Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Metric kube_pod_container_status_last_terminated_reason representation in Metricbeat
4 participants