Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add coprocessor metrics #3483

Merged
merged 5 commits into from
Jul 24, 2023
Merged

Add coprocessor metrics #3483

merged 5 commits into from
Jul 24, 2023

Conversation

garypen
Copy link
Contributor

@garypen garypen commented Jul 21, 2023

Introduces a new metric for the router:

apollo.router.operations.coprocessor

It has two attributes:

coprocessor.stage: string (RouterRequest, RouterResponse, SubgraphRequest, SubgraphResponse)
coprocessor.succeeded: bool

Checklist

Complete the checklist (and note appropriate exceptions) before a final PR is raised.

  • Changes are compatible[^1]
  • Documentation[^2] completed
  • Performance impact assessed and acceptable
  • Tests added and passing[^3]
    • Unit Tests
    • Integration Tests
    • Manual Tests

Exceptions

Ran a router with all coprocessor stages enabled and prometheus enabled. Sent in a request to generate a success metric. Then stopped the coprocessor and sent in another request to generate a fail metric. Examined the prometheus metrics to ensure they were as expected. e.g.:

# HELP apollo_router_operations_coprocessor_total apollo.router.operations.coprocessor
# TYPE apollo_router_operations_coprocessor_total counter
apollo_router_operations_coprocessor_total{coprocessor_stage="RouterRequest",coprocessor_succeeded="false",service_name="apollo-router",otel_scope_name="apollo/router",otel_scope_version=""} 1
apollo_router_operations_coprocessor_total{coprocessor_stage="RouterRequest",coprocessor_succeeded="true",service_name="apollo-router",otel_scope_name="apollo/router",otel_scope_version=""} 1
apollo_router_operations_coprocessor_total{coprocessor_stage="RouterResponse",coprocessor_succeeded="true",service_name="apollo-router",otel_scope_name="apollo/router",otel_scope_version=""} 1
apollo_router_operations_coprocessor_total{coprocessor_stage="SubgraphRequest",coprocessor_succeeded="true",service_name="apollo-router",otel_scope_name="apollo/router",otel_scope_version=""} 1
apollo_router_operations_coprocessor_total{coprocessor_stage="SubgraphResponse",coprocessor_succeeded="true",service_name="apollo-router",otel_scope_name="apollo/router",otel_scope_version=""} 1

Notes

[^1]. It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this.
[^2]. Configuration is an important part of many changes. Where applicable please try to document configuration examples.
[^3]. Tick whichever testing boxes are applicable. If you are adding Manual Tests:
- please document the manual testing (extensively) in the Exceptions.
- please raise a separate issue to automate the test and label it (or ask for it to be labeled) as manual test

Introduces a new metric for the router:

```
apollo.router.operations.coprocessor
```

It has two attributes:

```
coprocessor.stage: string (RouterRequest, RouterResponse, SubgraphRequest, SubgraphResponse)
coprocessor.succeeded: bool
```

fixes: apollographql/router-private#177
@garypen garypen self-assigned this Jul 21, 2023
@github-actions

This comment has been minimized.

@router-perf
Copy link

router-perf bot commented Jul 21, 2023

CI performance tests

  • xxlarge-request - Stress test with 100 MB request payload
  • step - Basic stress test that steps up the number of users over time
  • events_without_dedup - Stress test for events with a lot of users and deduplication DISABLED
  • events - Stress test for events with a lot of users and deduplication ENABLED
  • large-request - Stress test with a 1 MB request payload
  • const - Basic stress test that runs with a constant number of users
  • no-graphos - Basic stress test, no GraphOS.
  • reload - Reload test over a long period of time at a constant rate of users
  • xlarge-request - Stress test with 10 MB request payload
  • events_big_cap_high_rate - Stress test for events with a lot of users, deduplication enabled and high rate event with a big queue capacity

@garypen garypen marked this pull request as ready for review July 21, 2023 10:01
@garypen garypen requested a review from a team as a code owner July 21, 2023 10:01
Copy link
Contributor

@BrynCooke BrynCooke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with suggestion

It has two attributes:

```
coprocessor.stage: string (RouterRequest, RouterResponse, SubgraphRequest, SubgraphResponse)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit. Would be nice if these names matched the other use in the metrics naming.
e.g.
router.request, router.response etc rather than CamelCase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should monitor this and make the change in a followup PR if it seems sensible. Right now, the values are set to match the exact values transmitted in the coprocessor protocol. It would be nice to have some user feedback before changing.

@garypen garypen merged commit a8121e6 into dev Jul 24, 2023
2 checks passed
@garypen garypen deleted the garypen/177-coprocessor-metrics branch July 24, 2023 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants