Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add telemetry to hermes - bootstrap #868

Closed
5 tasks
ancazamfir opened this issue Apr 28, 2021 · 2 comments · Fixed by #985
Closed
5 tasks

Add telemetry to hermes - bootstrap #868

ancazamfir opened this issue Apr 28, 2021 · 2 comments · Fixed by #985
Assignees
Labels
E: gravity External: related to Gravity DEX
Milestone

Comments

@ancazamfir
Copy link
Collaborator

Crate

relayer

Summary

From DEX requirement list: "metrics to monitor the health of the relayer, exposed through business-standard frameworks like Prometheus."

Problem Definition

Proposal

Investigate existing rust crates (e.g. https://crates.io/crates/tracing-opentelemetry, opentelemetry-prometheus). Add telemetry metrics for:

  • packet relaying (success or errors)
  • retries
  • RPC errors

WIP more detail to be added

Acceptance Criteria


For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate milestone (priority) applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
@ancazamfir ancazamfir added this to the 05.2021 milestone Apr 28, 2021
@andynog andynog added the E: gravity External: related to Gravity DEX label Apr 29, 2021
@andynog
Copy link
Contributor

andynog commented May 10, 2021

Some metrics that needs to be tracked (as per meeting with dex team)

METRIC DESCRIPTION UNIT
tx_count Total number of txs processed via Relay tx counter
tx_successful Total number of successful txs processed via Relay tx counter
tx_failed Total number of failed txs processed via Relay tx counter
UP Program Exit Flag gauge
ibc_transfer_send Total number of IBC transfers sent from a chain (source or sink) counter
ibc_transfer_receive Total number of IBC transfers received to a chain (source or sink) counter
tx_msg_ibc_recv_packet Total number of IBC packets received counter
tx_msg_ibc_acknowledge_packet Total number of IBC packets acknowledged counter
ibc_timeout_packet Total number of IBC timeout packets counter
ibc_client_misbehaviour Total number of client misbehaviours counter
relay_chains_num Number of chains the relay is connecting to counter
relay_clients_num Number of clients the relay is connecting to counter
relay_connects_num Number of connects the relay is connecting to counter
relay_channels_num Number of channels the relay is connecting to counter
relay_rpcconnect_num Broken RPC Connection counter
relay_rpcdelay_time Delay rate for RPC connections specified in config ms
relay_txcompleted_time The rate at which tx is completed from chain A to B ms
relay_trustingperiods_time How many trusting_periods are left(Metrics on whether an update is required) ms

@andynog
Copy link
Contributor

andynog commented May 13, 2021

Started to implement this. I believe adding support the telemetry using the Supervisor might be the best option afaiu. But I will check this with people who are more familiar with the Supervisor if this is the right direction.

andynog added a commit that referenced this issue May 18, 2021
andynog added a commit that referenced this issue May 22, 2021
@romac romac closed this as completed in #985 Jun 1, 2021
romac added a commit that referenced this issue Jun 1, 2021
* Initial telemetry support implementation (#868)

* Refactored code for state and service. Replaced hyper with rouille (#868)

* Initial logic to include the telemetry in the Supervisor (#868)

* Refactored logic into server and service. Server working (#868)

* Added new methods for state and server (#868)

* Telemetry service logic working, recording a metric (#868)

* Added more metrics (#868)

* Added logic to disable/enable telemetry service and server (#868)

* Added more metrics to service. Hookup the packet timeout metric (#868)

* Move telemetry service into `ibc-telemetry` crate

* Move `metric!` macro into its own module

* Move telemetry config under `[telemetry]` section

* Disable telemetry by default, fix port to 3001

* Try to fix libm.so error

* Wrap telemetry state in Arc and simplify server a little

* Simplify server a bit more

* Fix glibc version mismatch between CI and Docker image

* Push telemetry handle down into workers

* Implement `workers`, `ibc_client_misbehaviours` and `receive_packets` metrics

* Add `ibc_client_update` metric

* Remove need for telemetry service by passing around the telemetry state

* Add ack and timeout metrics

* Fix compilation when telemetry feature is not included

* FMT

* Rename metric! macro to telemetry!

* Add `clippy --no-default-features` to CI

Co-authored-by: Andy Nogueira <me@andynogueira.dev>
Co-authored-by: Romain Ruetschi <romain@informal.systems>
Co-authored-by: Anca Zamfir <zamfiranca@gmail.com>
Co-authored-by: Adi Seredinschi <adi@informal.systems>
hu55a1n1 pushed a commit to hu55a1n1/hermes that referenced this issue Sep 13, 2022
* Initial telemetry support implementation (informalsystems#868)

* Refactored code for state and service. Replaced hyper with rouille (informalsystems#868)

* Initial logic to include the telemetry in the Supervisor (informalsystems#868)

* Refactored logic into server and service. Server working (informalsystems#868)

* Added new methods for state and server (informalsystems#868)

* Telemetry service logic working, recording a metric (informalsystems#868)

* Added more metrics (informalsystems#868)

* Added logic to disable/enable telemetry service and server (informalsystems#868)

* Added more metrics to service. Hookup the packet timeout metric (informalsystems#868)

* Move telemetry service into `ibc-telemetry` crate

* Move `metric!` macro into its own module

* Move telemetry config under `[telemetry]` section

* Disable telemetry by default, fix port to 3001

* Try to fix libm.so error

* Wrap telemetry state in Arc and simplify server a little

* Simplify server a bit more

* Fix glibc version mismatch between CI and Docker image

* Push telemetry handle down into workers

* Implement `workers`, `ibc_client_misbehaviours` and `receive_packets` metrics

* Add `ibc_client_update` metric

* Remove need for telemetry service by passing around the telemetry state

* Add ack and timeout metrics

* Fix compilation when telemetry feature is not included

* FMT

* Rename metric! macro to telemetry!

* Add `clippy --no-default-features` to CI

Co-authored-by: Andy Nogueira <me@andynogueira.dev>
Co-authored-by: Romain Ruetschi <romain@informal.systems>
Co-authored-by: Anca Zamfir <zamfiranca@gmail.com>
Co-authored-by: Adi Seredinschi <adi@informal.systems>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
E: gravity External: related to Gravity DEX
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants