-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Prometheus endpoint "polkadot_node_is_authority" does not switch back when getting out of the active set #5664
Comments
I confirm that both metrics have changed behavior since version 0.9.20. |
I can confirm that while ending the active validation session while paravalidating the "polkadot_node_is_parachain_validator" also stays "1". My node came out of the active set during a parachain validating session and both metrics ("polkadot_node_is_authority" and "polkadot_node_is_parachain_validator" stayed "1" while already out of the active set. After a restart they reset to "0" and stay there (until the next session). |
@stakeworld @Bruno-Lussan This was added in 0.9.16 and I can also confirm the Unfortunately I don't have metrics spanning that long ago, but 0.9.20 includes some changes that might have affected the metric update (8bb84d2). Looking into it ...
Can you be more specific about the |
@sandreim , the polkadot_node_is_parachain_validator seems not to go back at all to 0 if the validation session ends in a parachain validation session. It seems to turn on and off normal within an active validation session. My validation session ended this morning in a parachain session and both metrics stayed 1. This evening both were still 1 (so about 10 hours later), after I restart the node everything is normal. In a previous session I had only the polkadot_node_is_authority stay on 1. On a side note also my resources looked like it was still (para)validating, the pvf-host processes were still running and network use was the same as in an active session, while in the js app and 1000 validator site I was not active anymore. After the restart al resources and metrics went back to a not active state. Hope you can find where the problem lies, the metric is very handy for monitoring and alerting active states. |
I figured out my initial implementation was wrong because it did not handle errors properly, relying on side effects from the gossip topology implementation. Going to test the fix some more before PR review. |
Just a note on the flag name A node is acting as |
Isn't there a difference between "validator" and "authority"? I would think a validator which gets in the active set becomes an authority. If not maybe strictly speaking the term "is_active_authority" or something like that would be more fitting but for me as a validator the most important is that the metrics works, so thanks @sandreim ! |
Just saying that the initial purpose of that flag might have been to indicate the role status of the node as described when you start your node:
|
Good point, thats what shows in the logs and in the command line... You are right the terms are confusing and ambiguous. Something to improve for the future i think |
It seems that the name flag actually did not reflect its operation which corresponded to the fact of being in the active set. It would indeed be more logical to make it work so that it reflects the role displayed in the log as you proposed. Also, it would be very useful for us to have a flag that tells us that we are in the active set. This flag could be named polkadot_node_is_active or something similar. Sorry for my English and thank you for the time spent on this topic. And many thanks also to stakeworld. |
You are right about the confusion. Just to clarify, these metrics are updated at each session boundary to reflect the new session info and not the configured role. That being said, I'll followup with a PR to remove this confusion. |
Binaries:
polkadot 0.9.23-a7e188cd966 from default apt repo https://releases.parity.io/deb
prometheus, version 2.34.0
grafana from apt repo version 8.5.5 (commit: d32ae18909, branch: HEAD)
Server:
Ubuntu 20.04 LTS 5.4.0-113-generic #127-Ubuntu SMP x86_64 on a dedicated AMD server, 64 MB ram.
Description:
When getting in the active validator set the prometheus metric endpoint "polkadot_node_is_authority" switches to 1, but when you get out of the active set it does not switch back to 0. If you restart the node it goes to 0 and stays at 0. Witnessed this on 0.9.23 and also before on 0.9.22. The "polkadot_node_is_parachain_validator" endpoint does switch on and off depending on the paravalidating state. Heard rumors that if you end the authority session in a parachain validator state it also stays on 1 but can't confirm because all my parachain sessions ended before the authority ended.
Expected behaviour: when you get out of the active set and are not longer authority I would expect the "polkadot_node_is_authority" switch back to 0.
I've heard in the 1000 validator matrix group it is a known problem but I could not find a previous bug report, if it does exist my excuses for the duplicate. A fellow validator uses the "overseer rate" rate but this seems to change between version so is not perfect.
It seems to be included here #4699 by @sandreim
The text was updated successfully, but these errors were encountered: