Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics: counter name changed in 1.24.0 #3443

Closed
yanns opened this issue Jul 14, 2023 · 5 comments · Fixed by #3471
Closed

Metrics: counter name changed in 1.24.0 #3443

yanns opened this issue Jul 14, 2023 · 5 comments · Fixed by #3471
Assignees

Comments

@yanns
Copy link
Contributor

yanns commented Jul 14, 2023

Describe the bug
The name of counters exposed in metrics has changed in version 1.24.0

The metric apollo_router_http_requests_total is now apollo_router_http_requests_total_total

To Reproduce
Fetch the metrics endpoint scratched by prometheus on http://127.0.0.1:9090/metrics:

# HELP apollo_router_http_requests_total_total apollo_router_http_requests_total # TYPE apollo_router_http_requests_total_total counter apollo_router_http_requests_total_total{error="There was no GraphQL operation to execute. Use the `query` parameter to send an operation, using either GET or POST.",service_name="apollo-router",status="400",otel_scope_name="apollo/router",otel_scope_version=""} 1 # HELP apollo_router_processing_time apollo_router_processing_time # TYPE apollo_router_processing_time histogram apollo_router_processing_time_bucket{service_name="apollo-router",otel_scope_name="apollo/router",otel_scope_version="",le="+Inf"} 1 apollo_router_processing_time_sum{service_name="apollo-router",otel_scope_name="apollo/router",otel_scope_version=""} 0.000226153 apollo_router_processing_time_count{service_name="apollo-router",otel_scope_name="apollo/router",otel_scope_version=""} 1 # HELP apollo_router_session_count_active apollo_router_session_count_active # TYPE apollo_router_session_count_active gauge apollo_router_session_count_active{service_name="apollo-router",otel_scope_name="apollo/router",otel_scope_version=""} 0 # HELP apollo_router_session_count_total apollo_router_session_count_total # TYPE apollo_router_session_count_total gauge apollo_router_session_count_total{listener="http://0.0.0.0:4000",service_name="apollo-router",otel_scope_name="apollo/router",otel_scope_version=""} 0 apollo_router_session_count_total{listener="http://0.0.0.0:8088",service_name="apollo-router",otel_scope_name="apollo/router",otel_scope_version=""} 0 apollo_router_session_count_total{listener="http://0.0.0.0:9090",service_name="apollo-router",otel_scope_name="apollo/router",otel_scope_version=""} 1 # HELP apollo_router_span apollo_router_span # TYPE apollo_router_span histogram apollo_router_span_bucket{kind="busy",service_name="apollo-router",span="request",otel_scope_name="apollo/router",otel_scope_version="",le="+Inf"} 1 apollo_router_span_sum{kind="busy",service_name="apollo-router",span="request",otel_scope_name="apollo/router",otel_scope_version=""} 0.00002987 apollo_router_span_count{kind="busy",service_name="apollo-router",span="request",otel_scope_name="apollo/router",otel_scope_version=""} 1 apollo_router_span_bucket{kind="duration",service_name="apollo-router",span="request",otel_scope_name="apollo/router",otel_scope_version="",le="+Inf"} 1 apollo_router_span_sum{kind="duration",service_name="apollo-router",span="request",otel_scope_name="apollo/router",otel_scope_version=""} 0.000083233 apollo_router_span_count{kind="duration",service_name="apollo-router",span="request",otel_scope_name="apollo/router",otel_scope_version=""} 1 apollo_router_span_bucket{kind="idle",service_name="apollo-router",span="request",otel_scope_name="apollo/router",otel_scope_version="",le="+Inf"} 1 apollo_router_span_sum{kind="idle",service_name="apollo-router",span="request",otel_scope_name="apollo/router",otel_scope_version=""} 0.000042837 apollo_router_span_count{kind="idle",service_name="apollo-router",span="request",otel_scope_name="apollo/router",otel_scope_version=""} 1 # HELP otel_scope_info Instrumentation Scope metadata # TYPE otel_scope_info gauge otel_scope_info{otel_scope_name="apollo/router",otel_scope_version=""} 1

Desktop (please complete the following information):

  • Version 1.24.0

Additional context
My quick guess, as I had a similar issue with other libraries in other languages:
With #2878, maybe the library now automatically adds the _total suffix to all counters?

@garypen
Copy link
Contributor

garypen commented Jul 15, 2023

This change was documented in the release notes for 1.24.0: https://github.com/apollographql/router/releases

Additionally, it does look like a bug in the opentelemetry crate, since it blindly adds _total without checking if it is already present.

@garypen garypen closed this as completed Jul 15, 2023
@garypen garypen reopened this Jul 15, 2023
garypen added a commit that referenced this issue Jul 19, 2023
When producing prometheus statistics the otel crate (0.19.0) now
automatically appends "_total" which is unhelpful.

This fix remove duplicated "_total_total" from our statistics.

fixes: #3443
BrynCooke pushed a commit that referenced this issue Jul 19, 2023
When producing prometheus statistics the otel crate (0.19.0) now
automatically appends "_total" which is unhelpful.

This fix remove duplicated "_total_total" from our statistics.

fixes: #3443

<!-- start metadata -->

**Checklist**

Complete the checklist (and note appropriate exceptions) before a final
PR is raised.

- [x] Changes are compatible[^1]
- [x] Documentation[^2] completed
- [x] Performance impact assessed and acceptable
- Tests added and passing[^3]
    - [ ] Unit Tests
    - [x] Integration Tests
    - [ ] Manual Tests

**Exceptions**

*Note any exceptions here*

**Notes**

[^1]. It may be appropriate to bring upcoming changes to the attention
of other (impacted) groups. Please endeavour to do this before seeking
PR approval. The mechanism for doing this will vary considerably, so use
your judgement as to how and when to do this.
[^2]. Configuration is an important part of many changes. Where
applicable please try to document configuration examples.
[^3]. Tick whichever testing boxes are applicable. If you are adding
Manual Tests:
- please document the manual testing (extensively) in the Exceptions.
- please raise a separate issue to automate the test and label it (or
ask for it to be labeled) as `manual test`
BrynCooke pushed a commit that referenced this issue Jul 19, 2023
When producing prometheus statistics the otel crate (0.19.0) now
automatically appends "_total" which is unhelpful.

This fix remove duplicated "_total_total" from our statistics.

fixes: #3443

<!-- start metadata -->

**Checklist**

Complete the checklist (and note appropriate exceptions) before a final
PR is raised.

- [x] Changes are compatible[^1]
- [x] Documentation[^2] completed
- [x] Performance impact assessed and acceptable
- Tests added and passing[^3]
    - [ ] Unit Tests
    - [x] Integration Tests
    - [ ] Manual Tests

**Exceptions**

*Note any exceptions here*

**Notes**

[^1]. It may be appropriate to bring upcoming changes to the attention
of other (impacted) groups. Please endeavour to do this before seeking
PR approval. The mechanism for doing this will vary considerably, so use
your judgement as to how and when to do this.
[^2]. Configuration is an important part of many changes. Where
applicable please try to document configuration examples.
[^3]. Tick whichever testing boxes are applicable. If you are adding
Manual Tests:
- please document the manual testing (extensively) in the Exceptions.
- please raise a separate issue to automate the test and label it (or
ask for it to be labeled) as `manual test`
BrynCooke added a commit that referenced this issue Jul 20, 2023
> **Note**
>
> When approved, this PR will merge into **the `1.25.0` branch** which
will — upon being approved itself — merge into `main`.
>
> **Things to review in this PR**:
> - Changelog correctness (There is a preview below, but it is not
necessarily the most up to date. See the _Files Changed_ for the true
reality.)
>  - Version bumps
>  - That it targets the right release branch (`1.25.0` in this case!).
>
---
## 🚀 Features

### Persisted Queries w/opt-in safelisting (preview) ([PR
#3347](#3347))

> ⚠️ **Persisted queries is an [Enterprise
feature](https://www.apollographql.com/blog/platform/evaluating-apollo-router-understanding-free-and-open-vs-commercial-features/)
of the Apollo Router.** It requires an organization with a [GraphOS
Enterprise plan](https://www.apollographql.com/pricing/) and the feature
to be enabled for your account.
>
> If your organization _doesn't_ currently have an Enterprise plan, you
can test out this functionality by signing up for a free [Enterprise
trial](https://www.apollographql.com/docs/graphos/org/plans/#enterprise-trials)
and reaching out to enable the feature for your account.

Persisted Queries gives you the tools to prevent unwanted traffic from
reaching your graph.

It has two modes of operation:
* **Unregistered operation monitoring**
* Your router can allow all GraphQL operations, while emitting
structured traces containing unregistered operation bodies.
* **Operation safelisting**
  * Reject unregistered operations
  * Require all operations to be sent as an ID

Unlike automatic persisted queries (APQ), the ability to create a
safelist of operations allows you to prevent a malicious actor from
constructing a free-format query that could overload your subgraphh
services.

For more information con how to register queries and configure your
router see the [Persisted Query
documentation](https://www.apollographql.com/docs/graphos/routing/persisted-queries).

By [@EverlastingBugstopper](https://github.com/EverlastingBugstopper) in
#3347

## 🐛 Fixes

### Fix prometheus statistics issues with _total_total names([Issue
#3443](#3443))

When producing prometheus statistics the otel crate (0.19.0) now
automatically appends `_total` which is unhelpful.

This fix removes `_total_total` from our statistics. However, counter
metrics will still have `_total` appended to them if they did not so
already.

By [@garypen](https://github.com/garypen) in
#3471

### Enforce default buckets for metrics ([PR
#3432](#3432))

When `telemetry.metrics.common` was not configured, no default metrics
buckets were configured.
With this fix by default it set these buckets: `[0.001, 0.005, 0.015,
0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 1.0, 5.0, 10.0]`

By [@bnjjj](https://github.com/bnjjj) in
#3432

## 📃 Configuration

### Add `subscription.enabled` field to enable subscription support
([Issue #3428](#3428))

`enabled` is now required in `subscription` configuration. Example:

```yaml
subscription:
  enabled: true
  mode:
    passthrough:
      all:
        path: /ws
```

By [@bnjjj](https://github.com/bnjjj) in
#3450

### Add option to disable reuse of query fragments ([Issue
#3452](#3452))

A new option has been added to the Router to allow disabling of the
reuse of query fragments. This is useful for debugging purposes.
```yaml
supergraph:
  experimental_reuse_query_fragments: false
```

The default value depends on the version of federation.

By [@BrynCooke](https://github.com/BrynCooke) in
#3453

## 🛠 Maintenance

### Coprocessor: Set a default pool idle timeout duration. ([PR
#3434](#3434))

The default idle pool timeout duration in Hyper can sometimes trigger
situations in which an HTTP request cannot complete (see [this
comment](hyperium/hyper#2136 (comment))
for more information).

This changeset sets a default timeout duration of 5 seconds.

By [@o0Ignition0o](https://github.com/o0Ignition0o) in
#3434

---------

Co-authored-by: bryn <bryn@apollographql.com>
Co-authored-by: Chandrika Srinivasan <chandrikas@users.noreply.github.com>
@yanns
Copy link
Contributor Author

yanns commented Jul 26, 2023

I'm still seeing this issue in v1.25.0

test failed. metrics should contain '# TYPE apollo_router_http_requests_total counter'.
Instead the value of metrics is:
checking connectivity to http://127.0.0.1:9090/metrics # HELP apollo_router_http_requests_total_total apollo_router_http_requests_total # TYPE apollo_router_http_requests_total_total counter

@BrynCooke
Copy link
Contributor

Reopening pending investigation.

@BrynCooke BrynCooke reopened this Jul 26, 2023
@garypen
Copy link
Contributor

garypen commented Jul 26, 2023

@yanns I think your comment was intended for #3491 which is fixed and will be released in 1.26.0. There is an alpha release you can test if you are in a hurry: https://github.com/apollographql/router/releases

@BrynCooke
Copy link
Contributor

Closing, the metric name is correct, but the description was fixed until #3491

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants