Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Embedded (back-pressure) metrics for dashboard #13830

Closed
fuyufjh opened this issue Dec 6, 2023 · 1 comment
Closed

feat: Embedded (back-pressure) metrics for dashboard #13830

fuyufjh opened this issue Dec 6, 2023 · 1 comment
Assignees
Labels
needs-design Don't start your coding work before a detailed design proposed needs-discussion
Milestone

Comments

@fuyufjh
Copy link
Member

fuyufjh commented Dec 6, 2023

Background

We are actively improving observability in this quarter, including

But one missing piece is the diagnosis of performance issues. In this proposal, I'd like to take back-pressure rate as an example, because this is the most helpful metric to identify a performance bottleneck.

We have displayed this on the meta dashboard. However, in on-perm deployments, they often didn't use our provided Prometheus yaml, and thus the meta dashboard can't actually work.

Proposal

Many systems have embedded or self-contained performance monitoring components. For example,

  • Flink's dashboard can show back-pressure rate without support of Prometheus, etc.
  • Spark Web UI can show the progress of each stage in a batch job
  • MySQL has a performance_schema which exposes lots of internal info

Example of Spark web UI:

I'd like to introduce a new self-contained monitoring component on Meta node and compute node. When being requested from RPC, it collects data from each CN and show a back-pressure at this moment (More accurately, in a recent time period e.g. last 15 seconds).

Embedded v.s. Prometheus

The embedded metrics is not intended to replace Prometheus, but just a light-weighted complement.

  • The embedded metrics are just designed for end users. The Prometheus metrics is for us, and it's full and complete.
  • The embedded metrics never store history data. It can only display the current situation.
  • The embedded metrics doesn't need to be persisted. Keep it small and in-memory.

For now, I recommend starting from back-pressure metrics only. I don't have any future plans now. As mentioned before, this is the most helpful metric to identify a performance bottleneck.

@github-actions github-actions bot added this to the release-1.6 milestone Dec 6, 2023
@fuyufjh fuyufjh added needs-discussion needs-design Don't start your coding work before a detailed design proposed labels Dec 6, 2023
@fuyufjh fuyufjh changed the title Discussion: Embedded (back-pressure) metrics for dashboard feat: Embedded (back-pressure) metrics for dashboard Jan 25, 2024
@yufansong
Copy link
Contributor

Close, already finished.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-design Don't start your coding work before a detailed design proposed needs-discussion
Projects
None yet
Development

No branches or pull requests

2 participants