Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve business metrics sent from compass #141

Closed
mabdh opened this issue May 31, 2022 · 6 comments · Fixed by #149
Closed

Improve business metrics sent from compass #141

mabdh opened this issue May 31, 2022 · 6 comments · Fixed by #149
Labels
enhancement New feature or request

Comments

@mabdh
Copy link
Member

mabdh commented May 31, 2022

Is your feature request related to a problem? Please describe.
Compass is currently sending metrics but only for some metrics (system/api). We can improve the way compass sending metrics by sending more business metrics and by having an abstraction of predefined metrics that could be use with whatever metric collector that user used (statsd / opentelemetry).

Describe the solution you'd like

  • Analyze system metrics & business metrics needed.
  • Create predefined custom metrics abstraction of compass
  • Implement the changes so compass can send more business metrics that are needed
@mabdh mabdh added the enhancement New feature or request label May 31, 2022
@mabdh
Copy link
Member Author

mabdh commented Jun 7, 2022

We are currently using StatsD for sending calculated metrics. We have our own custom grpc_interceptor that calculates response time and status code here. We are also using telegraf in our helm app chart to collect metrics and relay it to the downstream services.

I am thinking maybe we can improve the metric collection by replacing our StatsD with OpenTelemetry. With OpenTelemetry, we can implement tracing and metrics at the same time. OpenTelemetry provides several built-in library instrumentation. One example is grpc interceptor here. Since OpenTelemetry would be a cloud-native standard, we could also leverage several functionalities related with observability in the future.

If we are replacing StatsD with OpenTelemetry, the changes in compass only metrics should be sent to some open telemetry collector port :4317 instead of port :8125 (statsd). The rest of implementation depends on the user.

We can move slowly to OpenTelemetry by doing metrics first and tracing could be done later.
Here are several strategies to migrate to OpenTelemetry in our helm chart.

  1. Keep Using Telegraf
    • Replace StatsD with OpenTelemetry Instrumentation
    • Use telegraf OpenTelemetry input plugin in our config instead of statsd input plugin
  2. Using otel collector instead of Telegraf
    • Replace StatsD with OpenTelemetry Instrumentation
    • Use opentelemetry-collector helm chart

The easiest approach would be the 1. Keep Using Telegraf

@StewartJingga
Copy link
Contributor

We can also make it configurable, so user can decide which one to use. wdyt? @mabdh

@mabdh
Copy link
Member Author

mabdh commented Jun 7, 2022

I don't think persisting 2 methods would be give much benefit here. But it will be better if there is a functionality to convert otel metrics to statsd metrics so user could still use otel instrumentation to measure the grpc without adding custom code.

Might need to do more analysis for this.

@ravisuhag
Copy link
Member

@mabdh We should move to opentelemetry for this. This is something we are planning for all ODPF products actually.

@mabdh
Copy link
Member Author

mabdh commented Jun 8, 2022

After looking at OpenTelemetry library, the grpc interceptor is currently only support tracing and not metrics. The metrics one is still in progress. Issue: open-telemetry/opentelemetry-go-contrib#194

I think we can move to OpenTelemetry when we are ready to implement tracing or when the otelgrpc has already had metrics implemented.

@mabdh
Copy link
Member Author

mabdh commented Jun 9, 2022

Additional metrics needed:

  • discovery error, this is to narrow down the error in UpsertPatch and Delete API
  • asset service & type count on upsert
  • user stats (incoming new user, existing, unauthorized)

@mabdh mabdh linked a pull request Jun 13, 2022 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants