Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: prometheus counters should have _total suffix #952

Merged
merged 4 commits into from
Jan 29, 2023

Conversation

zengxilong
Copy link
Contributor

PR for #926

@zengxilong zengxilong requested a review from a team January 19, 2023 07:56
@codecov
Copy link

codecov bot commented Jan 19, 2023

Codecov Report

Base: 67.7% // Head: 67.7% // Increases project coverage by +0.0% 🎉

Coverage data is based on head (cd013fa) compared to base (b67a873).
Patch has no changes to coverable lines.

Additional details and impacted files
@@          Coverage Diff          @@
##            main    #952   +/-   ##
=====================================
  Coverage   67.7%   67.7%           
=====================================
  Files        116     116           
  Lines       9103    9103           
=====================================
+ Hits        6164    6170    +6     
+ Misses      2939    2933    -6     
Impacted Files Coverage Δ
opentelemetry-http/src/lib.rs 8.7% <0.0%> (+5.2%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@jtescher
Copy link
Member

Thanks

@jtescher jtescher merged commit a82d970 into open-telemetry:main Jan 29, 2023
garypen added a commit to apollographql/router that referenced this pull request Jul 12, 2023
The update requires a change to the implementation and test update as
follows:

- In otel 0.18.0, processor factories had a `with_memory(bool)` method
which we were using when building our prometheus exporter. AFAICT, this
used to be a mechanism for controlling how metrics handled stale gauges.
In 0.19.0, [this method was
removed](open-telemetry/opentelemetry-rust#946)
and now gauges are all assumed to be as though they were created with
`false`. We had been providing `true` on our call. I'm not 100% certain
of the impact of this change, but it appears that we can ignore it. We
may need to consider it more carefully if problems arise.
- There are now two standard OTEL attributes:
```otel_scope_name="apollo/router",otel_scope_version=""``` added to
output and a number of tests had to be updated to accommodate that
change.
- One of our tests appeared to be searching for
`apollo_router_cache_hit_count` (and this was working) when it should
have been searching for `apollo_router_cache_hit_count_total` (likewise
for miss). I've updated the test and think this is the correct thing to
do. It looks like a bug was fixed in otel and this change matches the
fix.
 
Regarding that last point. The prometheus spec mandates naming format
and the change was part of the compliance with that spec. This PR made
the change:
open-telemetry/opentelemetry-rust#952

The two affected counters in the router were:

apollo_router_cache_hit_count -> apollo_router_cache_hit_count_total
apollo_router_cache_miss_count -> apollo_router_cache_miss_count_total

It's good that our prometheus metrics are now spec compliant, but we
should note this in the release notes and (if possible) somewhere in our
documentation. I'll add it to the changeset at least.

The upgrade fixes many of the outstanding issues related to
opentelemetry and various APM vendors:

Fixes: #2878
Fixes: #2066 
Fixes: #2959 
Fixes: #2225 
Fixes: #1520 

<!-- start metadata -->

**Checklist**

Complete the checklist (and note appropriate exceptions) before a final
PR is raised.

- [x] Changes are compatible[^1]
- [x] Documentation[^2] completed
- [x] Performance impact assessed and acceptable
- Tests added and passing[^3]
    - [x] Unit Tests
    - [x] Integration Tests
    - [ ] Manual Tests

**Exceptions**

*Note any exceptions here*

**Notes**

[^1]. It may be appropriate to bring upcoming changes to the attention
of other (impacted) groups. Please endeavour to do this before seeking
PR approval. The mechanism for doing this will vary considerably, so use
your judgement as to how and when to do this.
[^2]. Configuration is an important part of many changes. Where
applicable please try to document configuration examples.
[^3]. Tick whichever testing boxes are applicable. If you are adding
Manual Tests:
- please document the manual testing (extensively) in the Exceptions.
- please raise a separate issue to automate the test and label it (or
ask for it to be labeled) as `manual test`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants