Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

record a metric with open census to ensure backwards compat #9065

Conversation

codeboten
Copy link
Contributor

@codeboten codeboten commented Dec 11, 2023

The healthcheck extension currently relies on an OpenCensus view to determine the health of the collector. This behaviour is not great as it only looks at span exports, but it exists today. To avoid breaking it, I suggest this PR that adds the same OC view even on the otel path, and records the same metric.

Note to reviewers: This is only a hack that's needed until the new healthcheck extension that uses component status is ready to go. cc @mwear

Alex Boten added 2 commits December 11, 2023 10:14
The healthcheck extension currently relies on an OpenCensus view to determine the health of the collector. This behaviour is not great as it only looks at span exports, but it exists today. To avoid breaking it, I suggest this PR that adds the same OC view even on the otel path, and records the same metric.

Signed-off-by: Alex Boten <aboten@lightstep.com>
Signed-off-by: Alex Boten <aboten@lightstep.com>
@codeboten codeboten marked this pull request as ready for review December 11, 2023 18:22
@codeboten codeboten requested review from a team and djaglowski December 11, 2023 18:22
Copy link

codecov bot commented Dec 11, 2023

Codecov Report

Attention: 3 lines in your changes are missing coverage. Please review.

Comparison is base (9b6a18b) 91.49% compared to head (a2d74a0) 91.48%.
Report is 1 commits behind head on main.

Files Patch % Lines
service/telemetry.go 66.66% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9065      +/-   ##
==========================================
- Coverage   91.49%   91.48%   -0.01%     
==========================================
  Files         316      316              
  Lines       17181    17201      +20     
==========================================
+ Hits        15720    15737      +17     
- Misses       1165     1167       +2     
- Partials      296      297       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@TylerHelmuth TylerHelmuth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to fix this problem in the healthcheck extension instead? Like could it be made to check the featuregate and use otel metrics instead of opencensus if it is enabled?

@TylerHelmuth
Copy link
Member

Oh I see now that we already have a way forward for the healthcheckextension, and this is a hack until then

@TylerHelmuth
Copy link
Member

@mwear @codeboten you got a link to the healthcheckextension issue that, when closed, means we can remove this hack? Can we also open an issue to track the removal of this hack?

@codeboten
Copy link
Contributor Author

@TylerHelmuth This issue captures most of the problems w/ the healthcheck extension with this issue open-telemetry/opentelemetry-collector-contrib#26661 capturing the refactoring of it. I can create an issue to track removing this hack once open-telemetry/opentelemetry-collector-contrib#26661 is done

@codeboten
Copy link
Contributor Author

@TylerHelmuth added #9069

@codeboten
Copy link
Contributor Author

Closing this PR. After further investigation, the behaviour in the current healthcheck extension doesn't work as expected. It registers itself as an OpenCensus exporter to be called when census wants to emit a metric. For the exporter/send_failed_requests, this export is triggered every 10s, rather than every time a failure is recorded.

In addition, the extension does not read the value of the metric, which means the threshold it calculates is solely based on the number of times the view is exported, which has no correlation to failures to export.

Since the view is configured only to record measurements for span exports, the healthcheck extension only ever triggered for a single signal, which doesn't make it overly useful.

After discussing this with other collector approvers and maintainers, we've decided to flag the feature as not working as expected in the healthcheck extension. Since this extension is being revamped to use component status, it isn't worth fixing in the short term. I'll open a PR to update the README to warn users of this issue, this is only to prevent a breaking change multiple times in short successions for users. cc @TylerHelmuth @mx-psi

@codeboten codeboten closed this Dec 12, 2023
@codeboten codeboten deleted the codeboten/use-otel-with-healthcheck-view branch December 12, 2023 16:35
codeboten pushed a commit to codeboten/opentelemetry-collector-contrib that referenced this pull request Dec 12, 2023
codeboten pushed a commit to open-telemetry/opentelemetry-collector-contrib that referenced this pull request Dec 12, 2023
This follows
open-telemetry/opentelemetry-collector#9065 (comment)

---------

Signed-off-by: Alex Boten <aboten@lightstep.com>
Co-authored-by: Pablo Baeyens <pbaeyens31+github@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants