Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hermes telemetry requirements #1373

Closed
5 tasks done
thanethomson opened this issue Sep 21, 2021 · 1 comment
Closed
5 tasks done

Hermes telemetry requirements #1373

thanethomson opened this issue Sep 21, 2021 · 1 comment
Assignees
Labels
I: logic Internal: related to the relaying logic I: telemetry Internal: related to Telemetry & metrics
Milestone

Comments

@thanethomson
Copy link
Contributor

thanethomson commented Sep 21, 2021

Crate

ibc and ibc-telemetry

Summary

Following from #868 and recent discussions, we need to expand on the metrics we expose via our telemetry mechanism.

Problem Definition

We have been talking about parsing log files to obtain information regarding metrics that could easily be tracked by our telemetry system (i.e. Prometheus).

Proposal

Let's:

  1. Use this issue to discuss and gather all of the metrics that would be useful for us to expose.
  2. List them in the Acceptance Critera section of this issue.
  3. Prioritize the metrics.
  4. Implement as many of them as we can according to priority. Ideally only one metric per PR to ensure quick review.

Acceptance Criteria

When the following metrics are exposed:

Done Metric name Description Meter type
error_websocket_reconnect Number of times Hermes had to reconnect to the WebSocket endpoint Counter
time_to_relay Interval between when Hermes receives an event until the messages in the event are sent to the destination chain, per message ValueRecorder
tx_failed Total number of failed txs processed Counter
tx_retry_failed Total number of failed retries when submitting txs Counter
trusting_periods_left How many trusting_periods are left (metrics on whether an update is required) UpDownCounter
relay_chains_num Number of chains the relay is connecting to UpDownCounter
rpc_query_count Number of RPC queries issued to chain Counter
rpc_query_latency Latency of RPC queries issued to chain ValueRecorder
grpc_query_count Number of gRPC queries issued to chain Counter
grpc_query_latency Latency of gRPC queries issued to chain ValueRecorder
tx_msg_ibc_recv_packet Total number of IBC packets received Counter
tx_msg_ibc_acknowledge_packet Total number of IBC packets acknowledged Counter
ibc_timeout_packet Total number of IBC timeout packets Counter
ibc_client_misbehaviour Total number of client misbehaviours Counter
tx_successful Total number of successful txs processed Counter
tx_count Total number of txs processed Counter
error_rpc Number of RPC errors encountered Counter
ibc_transfer_send Total number of IBC transfers sent from a chain (source or sink) Counter
ibc_transfer_receive Total number of IBC transfers received to a chain (source or sink) Counter
relay_clients_num Number of clients the relay is connecting to UpDownCounter
relay_connects_num Number of connections the relay is connecting to Counter
relay_channels_num Number of channels the relay is connecting to Counter
UP Program exit flag ValueRecorder

For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate milestone (priority) applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
@thanethomson thanethomson self-assigned this Sep 21, 2021
@thanethomson thanethomson added I: logic Internal: related to the relaying logic I: telemetry Internal: related to Telemetry & metrics labels Sep 21, 2021
@adizere
Copy link
Member

adizere commented Apr 21, 2022

Closing in favor of #2112 which will take a more fine-grained, incremental approach to adding metrics.

@adizere adizere closed this as completed Apr 21, 2022
@adizere adizere added this to the v0.14.0 milestone Apr 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I: logic Internal: related to the relaying logic I: telemetry Internal: related to Telemetry & metrics
Projects
No open projects
Status: Closed
Development

No branches or pull requests

2 participants