Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

.maintain/monitoring: Add alert when continuous task ends #7250

Merged
2 commits merged into from
Oct 5, 2020

Conversation

mxinden
Copy link
Contributor

@mxinden mxinden commented Oct 2, 2020

Through the polkadot_tasks_ended_total Prometheus metric one can tell
when a task ended. Use this metric to alert when specific
known-to-be-continuous tasks end on a node.

This would catch bugs like #7000.

Through the `polkadot_tasks_ended_total` Prometheus metric one can tell
when a task ended. Use this metric to alert when specific
known-to-be-continuous tasks end on a node.
@mxinden mxinden added A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. labels Oct 2, 2020
@mxinden mxinden requested a review from tomaka October 2, 2020 08:54
@tomaka
Copy link
Contributor

tomaka commented Oct 2, 2020

I'm not such a fan of hard-coding the task names here. Someone will unavoidably add a new task and forget to update this list. Not to mention the Polkadot-specific tasks.

What do you think of (polkadot_tasks_spawned_total == 1) - (polkadot_tasks_ended_total == 1)?
This should detect tasks that have been started exactly once and have ended exactly once.
The 3 minutes should ensure that the alert isn't accidentally triggered at the node initialization.

@tomaka
Copy link
Contributor

tomaka commented Oct 2, 2020

(polkadot_tasks_spawned_total) == 1) - (polkadot_tasks_ended_total == 1)?

I'm not actually sure that this works.
I've checked that (avg(polkadot_tasks_spawned_total) by (task_name) == 1) - (avg(polkadot_tasks_ended_total) by (task_name) == 1) does work for a group of instances, but if I understand correctly these alerts work on individual instances.

@tomaka
Copy link
Contributor

tomaka commented Oct 2, 2020

(polkadot_tasks_spawned_total == 1) - on(instance, task_name) (polkadot_tasks_ended_total == 1) should work

@mxinden
Copy link
Contributor Author

mxinden commented Oct 2, 2020

(polkadot_tasks_spawned_total == 1) - on(instance, task_name) (polkadot_tasks_ended_total == 1) should work

Good idea. I just hope no one spawns a singleton task at startup for some setup work. I guess we will find out.

Adjusted. Mind taking another look @tomaka?

@tomaka
Copy link
Contributor

tomaka commented Oct 5, 2020

bot merge

@ghost
Copy link

ghost commented Oct 5, 2020

Trying merge.

@ghost ghost merged commit 3e5ac2a into paritytech:master Oct 5, 2020
ordian added a commit that referenced this pull request Oct 9, 2020
…up-updates

* master:
  Async keystore + Authority-Discovery async/await (#7000)
  Fixes logging of target names with dashes (#7281)
  seal: Add automated weights for contract API calls (#7017)
  add ss58 id for nodle (#7279)
  Refactor CurrencyToVote (#6896)
  bump-allocator: document & poison (#7277)
  Reset flaming fir network (#7274)
  reschedule (#6860)
  Drop system cache for trie benchmarks (#7242)
  client: improve log formatting (#7272)
  Rework `InspectState` (#7271)
  SystemOrigin trait (#7226)
  Update ss58 registry for Dock network (#7263)
  .maintain/monitoring: Add alert when continuous task ends (#7250)
  Rename `TRANSACTION_VERSION` to `EXTRINSIC_VERSION` (#7258)
  Split block announce processing into two parts (#6958)
  Fix offchain election to respect the weight (#7215)
This pull request was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants