Skip to content
This repository has been archived by the owner on Jan 13, 2025. It is now read-only.

Report new received crds signatures and their respective origins to metrics #32504

Merged
merged 28 commits into from
Jul 20, 2023

Conversation

gregcusack
Copy link
Contributor

Problem

There is currently no way to track how gossip messages propagate throughout the network.
We have no way to answer the question: for a specific message and origin, how many validators receive the message?

Summary of Changes

Report any new crds values' signature and origin to metrics
Reduce event rate by only reporting signatures with n trailing zeros, where n is 18

18 trailing zeros gives us a difficulty level that results in approximately 1-2 new signatures reported every minute per validator
Approximate number of new push messages received per validator:

  • testnet: ~500k,
  • mainnet: ~280k
    log2(250k) = ~18

data reported as:
series: cluster_info_crds_stats_message_signatures_received
data: crds_origin and crds_signature

Fixes #

@codecov
Copy link

codecov bot commented Jul 15, 2023

Codecov Report

Merging #32504 (da2db53) into master (83ac15d) will increase coverage by 0.0%.
The diff coverage is 88.8%.

@@           Coverage Diff           @@
##           master   #32504   +/-   ##
=======================================
  Coverage    82.0%    82.0%           
=======================================
  Files         780      780           
  Lines      210789   210798    +9     
=======================================
+ Hits       172931   172950   +19     
+ Misses      37858    37848   -10     

gossip/src/cluster_info_metrics.rs Outdated Show resolved Hide resolved
gossip/src/crds.rs Outdated Show resolved Hide resolved
sdk/src/signature.rs Outdated Show resolved Hide resolved
gossip/src/crds.rs Outdated Show resolved Hide resolved
@gregcusack gregcusack requested a review from behzadnouri July 17, 2023 23:26
gossip/src/crds.rs Outdated Show resolved Hide resolved
gossip/src/cluster_info_metrics.rs Outdated Show resolved Hide resolved
gossip/src/crds.rs Outdated Show resolved Hide resolved
gossip/src/crds.rs Outdated Show resolved Hide resolved
gossip/src/crds.rs Outdated Show resolved Hide resolved
gossip/src/crds.rs Outdated Show resolved Hide resolved
@gregcusack gregcusack requested a review from behzadnouri July 18, 2023 19:23
gossip/src/crds.rs Outdated Show resolved Hide resolved
gossip/src/crds.rs Outdated Show resolved Hide resolved
gossip/src/crds.rs Outdated Show resolved Hide resolved
gossip/src/crds.rs Outdated Show resolved Hide resolved
gossip/src/crds.rs Outdated Show resolved Hide resolved
gossip/src/crds.rs Outdated Show resolved Hide resolved
behzadnouri
behzadnouri previously approved these changes Jul 19, 2023
Copy link
Contributor

@behzadnouri behzadnouri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before merging, can you please run a node with this code on testnet and verify that it does not push too many data to metrics server.

gossip/src/crds.rs Outdated Show resolved Hide resolved
@gregcusack
Copy link
Contributor Author

testnet node confirmed the ~1 message report per second with SIGNATURE_SAMPLE_LEADING_ZEROS = 19. ready to merge

@gregcusack gregcusack added the automerge Merge this Pull Request automatically once CI passes label Jul 20, 2023
@behzadnouri
Copy link
Contributor

testnet node confirmed the ~1 message report per second with SIGNATURE_SAMPLE_LEADING_ZEROS = 19. ready to merge

was this 1 message per minute or per second?

@gregcusack
Copy link
Contributor Author

testnet node confirmed the ~1 message report per second with SIGNATURE_SAMPLE_LEADING_ZEROS = 19. ready to merge

was this 1 message per minute or per second?

ahhh sorry! this is 1 message PER MINUTE.

@mergify mergify bot removed the automerge Merge this Pull Request automatically once CI passes label Jul 20, 2023
@mergify
Copy link
Contributor

mergify bot commented Jul 20, 2023

automerge label removed due to a CI failure

@gregcusack gregcusack merged commit 80f7082 into solana-labs:master Jul 20, 2023
@gregcusack gregcusack added the v1.16 PRs that should be backported to v1.16 label Aug 1, 2023
mergify bot pushed a commit that referenced this pull request Aug 1, 2023
…etrics (#32504)

* screwed up old branch and syncing with upstream branch

* add fixed size ring buff instead of variable sized vecdeque for reporting signatures

* modify difficulty to take in n 0 bits and check if signature ending ends in n 0 bits

* update to only push every 18 trailing zero bits. and clean up

* report origin with signature. and set trailing 0s to 8 for local testing

* change back to 18 trailing zeros and rm unused imports

* run cargo rmt

* run ./scripts/cargo-for-all-lock-files.sh tree

* allow integer arithmetic for bit comparison

* rm unused lifetime

* rm duplicate entry?

* re implement ring buf

* put ringbuf in sorted order

* ringbuf in cargo.toml now in sorted order

* rm ring buf, refactor, fix trailing zero bug

* fix bug in trailing zeros. was comparing wrong ones

* fix needless range loop bug

* fix bug

* change trailing zero checking to build in methods and only report first 8 bytes of signature and origin pubkey

* report full origin string and first 8 bytes of signature

* set SIGNATURE_SAMPLE_TRAILING_ZEROS back to 18

* forgot to run cargo tree

* avoid panic and change working

* rm bs58

* pass in Option<String> into datapoint_info

* shorten metric names

(cherry picked from commit 80f7082)
gregcusack pushed a commit that referenced this pull request Aug 1, 2023
…etrics (#32504)

* screwed up old branch and syncing with upstream branch

* add fixed size ring buff instead of variable sized vecdeque for reporting signatures

* modify difficulty to take in n 0 bits and check if signature ending ends in n 0 bits

* update to only push every 18 trailing zero bits. and clean up

* report origin with signature. and set trailing 0s to 8 for local testing

* change back to 18 trailing zeros and rm unused imports

* run cargo rmt

* run ./scripts/cargo-for-all-lock-files.sh tree

* allow integer arithmetic for bit comparison

* rm unused lifetime

* rm duplicate entry?

* re implement ring buf

* put ringbuf in sorted order

* ringbuf in cargo.toml now in sorted order

* rm ring buf, refactor, fix trailing zero bug

* fix bug in trailing zeros. was comparing wrong ones

* fix needless range loop bug

* fix bug

* change trailing zero checking to build in methods and only report first 8 bytes of signature and origin pubkey

* report full origin string and first 8 bytes of signature

* set SIGNATURE_SAMPLE_TRAILING_ZEROS back to 18

* forgot to run cargo tree

* avoid panic and change working

* rm bs58

* pass in Option<String> into datapoint_info

* shorten metric names

(cherry picked from commit 80f7082)
gregcusack pushed a commit that referenced this pull request Aug 1, 2023
…ns to metrics (backport of #32504) (#32674)

Report new received crds signatures and their respective origins to metrics (#32504)

* screwed up old branch and syncing with upstream branch

* add fixed size ring buff instead of variable sized vecdeque for reporting signatures

* modify difficulty to take in n 0 bits and check if signature ending ends in n 0 bits

* update to only push every 18 trailing zero bits. and clean up

* report origin with signature. and set trailing 0s to 8 for local testing

* change back to 18 trailing zeros and rm unused imports

* run cargo rmt

* run ./scripts/cargo-for-all-lock-files.sh tree

* allow integer arithmetic for bit comparison

* rm unused lifetime

* rm duplicate entry?

* re implement ring buf

* put ringbuf in sorted order

* ringbuf in cargo.toml now in sorted order

* rm ring buf, refactor, fix trailing zero bug

* fix bug in trailing zeros. was comparing wrong ones

* fix needless range loop bug

* fix bug

* change trailing zero checking to build in methods and only report first 8 bytes of signature and origin pubkey

* report full origin string and first 8 bytes of signature

* set SIGNATURE_SAMPLE_TRAILING_ZEROS back to 18

* forgot to run cargo tree

* avoid panic and change working

* rm bs58

* pass in Option<String> into datapoint_info

* shorten metric names

(cherry picked from commit 80f7082)

Co-authored-by: Greg Cusack <greg.cusack@solana.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
v1.16 PRs that should be backported to v1.16
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants