Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NETOBSERV-1343: generate dashboards from metrics API #609

Merged
merged 13 commits into from
May 7, 2024

Conversation

jotak
Copy link
Member

@jotak jotak commented Apr 3, 2024

Description

  • Add dashboard config to metrics API
  • Use this API internally for predefined dashboards
  • Allow using SingleStats
  • New dedicated buckets for latency histograms

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
    • If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
    • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
    • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
    • Standard QE validation, with pre-merge tests unless stated otherwise.
    • Regression tests only (e.g. refactoring with no user-facing change).
    • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Apr 3, 2024

@jotak: This pull request references NETOBSERV-1343 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

In response to this:

Description

  • Add dashboard config to metrics API
  • Use this API internally for predefined dashboards
  • Allow using SingleStats
  • New dedicated buckets for latency histograms

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
  • If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
  • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
  • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
  • Standard QE validation, with pre-merge tests unless stated otherwise.
  • Regression tests only (e.g. refactoring with no user-facing change).
  • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link

openshift-ci bot commented Apr 3, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@jotak
Copy link
Member Author

jotak commented Apr 3, 2024

Pending to do:

  • Clean up API (kubebuilder tags etc.) done
  • Handle new dashboards done
  • Fix and add tests done
  • Handle charts ordering when they come from different FlowMetrics (e.g. new Priority field) done

@jotak jotak force-pushed the custom-metrics-dashboards branch from 7d96d0f to a40f093 Compare April 4, 2024 12:11
@codecov-commenter
Copy link

codecov-commenter commented Apr 8, 2024

Codecov Report

Attention: Patch coverage is 93.42561% with 38 lines in your changes are missing coverage. Please review.

Project coverage is 67.03%. Comparing base (10a6846) to head (515c7e5).
Report is 10 commits behind head on main.

❗ Current head 515c7e5 differs from pull request most recent head fe3a9e6. Consider uploading reports for the commit fe3a9e6 to get more accurate results

Files Patch % Lines
apis/flowmetrics/v1alpha1/zz_generated.deepcopy.go 44.00% 14 Missing ⚠️
controllers/monitoring/monitoring_controller.go 74.41% 5 Missing and 6 partials ⚠️
controllers/reconcilers/reconcilers.go 71.42% 3 Missing and 1 partial ⚠️
pkg/dashboards/model.go 92.15% 4 Missing ⚠️
controllers/flp/flp_pipeline_builder.go 85.71% 1 Missing and 1 partial ⚠️
controllers/reconcilers/common.go 0.00% 2 Missing ⚠️
pkg/metrics/predefined_charts.go 99.54% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #609      +/-   ##
==========================================
+ Coverage   66.35%   67.03%   +0.68%     
==========================================
  Files          65       66       +1     
  Lines        7321     7557     +236     
==========================================
+ Hits         4858     5066     +208     
- Misses       2104     2133      +29     
+ Partials      359      358       -1     
Flag Coverage Δ
unittests 67.03% <93.42%> (+0.68%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jotak jotak force-pushed the custom-metrics-dashboards branch from bcec79e to 044d976 Compare April 8, 2024 11:20
@jotak jotak marked this pull request as ready for review April 8, 2024 12:23
@jotak
Copy link
Member Author

jotak commented Apr 8, 2024

/test ci-index-noo-bundle

var targets []Target
for _, q := range c.Queries {
query := strings.ReplaceAll(q.PromQL, "$METRIC", "netobserv_"+c.mptr.Spec.MetricName)
query = fmt.Sprintf("topk(7, %s)", query)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why top 7 here ? Can't it be configurable to fine grain Prom perfs ?

Copy link
Member Author

@jotak jotak Apr 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100% subjective yes, but from personal testing it was the best compromise to see relevant info without overloading the charts. Top 10 easily leads to overcrowded charts (especially as it's a top per datapoint, not a "global" top); while too small top leads to showing more partial metrics/chunks. Of course that's very dependent on what we run on the cluster.
So yes I agree it can be made configurable.
We just need to care about finding a good balance between too much and too few settings.
I guess it doesn't hurt to add that one

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool ! thanks @jotak

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in 5b02dd1

@jotak jotak added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 15, 2024
Copy link

New images:

  • quay.io/netobserv/network-observability-operator:e1458c2
  • quay.io/netobserv/network-observability-operator-bundle:v0.0.0-e1458c2
  • quay.io/netobserv/network-observability-operator-catalog:v0.0.0-e1458c2

They will expire after two weeks.

To deploy this build:

# Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:e1458c2 make deploy

# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-e1458c2

Or as a Catalog Source:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: netobserv-dev
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-e1458c2
  displayName: NetObserv development catalog
  publisher: Me
  updateStrategy:
    registryPoll:
      interval: 1m

@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 15, 2024
@jotak jotak requested a review from jpinsonneau April 15, 2024 13:06
Copy link
Contributor

@jpinsonneau jpinsonneau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks @jotak !

Do we need to provide more examples ? 🤔
I'm thinking about optionnal features (Drops / DNS / RTT). That can be done in followups

@jotak
Copy link
Member Author

jotak commented Apr 16, 2024

Do we need to provide more examples ? 🤔 I'm thinking about optionnal features (Drops / DNS / RTT). That can be done in followups

There's already drops/dns/rtt metrics with their charts as part of the predefined metrics, which can be enabled via the includeList; but we could have more "combined" examples such as external traffic latency; I can add that

jotak added 6 commits April 16, 2024 17:01
- Add dashboard config to metrics API
- Use this API internally for predefined dashboards
- Allow using SingleStats
- New dedicated buckets for latency histograms
- Allow users to reference/create new dashboards
  (this requires the monitoring controller to fetch all dashboards from
  the dashboards namespace)
  All dashboard names are prefixed "NetObserv / " ; so, rename the main
dashboard as "NetObserv / Main"
- Add/update tests
@memodi
Copy link
Contributor

memodi commented Apr 17, 2024

/ok-to-test

@openshift-ci openshift-ci bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 17, 2024
Copy link

New images:

  • quay.io/netobserv/network-observability-operator:8487560
  • quay.io/netobserv/network-observability-operator-bundle:v0.0.0-8487560
  • quay.io/netobserv/network-observability-operator-catalog:v0.0.0-8487560

They will expire after two weeks.

To deploy this build:

# Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:8487560 make deploy

# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-8487560

Or as a Catalog Source:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: netobserv-dev
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-8487560
  displayName: NetObserv development catalog
  publisher: Me
  updateStrategy:
    registryPoll:
      interval: 1m

@memodi
Copy link
Contributor

memodi commented Apr 18, 2024

@jotak - feedback on unit, there's no way to leave it empty as instruction suggest, perhaps it's possible in yaml, is it possible to set custom units like fps (flows per second) ?

image

Co-authored-by: Mehul Modi <memodi@redhat.com>
@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 19, 2024
@jotak
Copy link
Member Author

jotak commented Apr 19, 2024

@jotak - feedback on unit, there's no way to leave it empty as instruction suggest, perhaps it's possible in yaml, is it possible to set custom units like fps (flows per second) ?

No we can't set custom units (that wouldn't work with the console's dashboard engine), but you're right we need to add something as a fallback unit .. perhaps "numeric"

@memodi
Copy link
Contributor

memodi commented Apr 26, 2024

/ok-to-test

@openshift-ci openshift-ci bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 26, 2024
Copy link

New images:

  • quay.io/netobserv/network-observability-operator:e973630
  • quay.io/netobserv/network-observability-operator-bundle:v0.0.0-e973630
  • quay.io/netobserv/network-observability-operator-catalog:v0.0.0-e973630

They will expire after two weeks.

To deploy this build:

# Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:e973630 make deploy

# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-e973630

Or as a Catalog Source:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: netobserv-dev
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-e973630
  displayName: NetObserv development catalog
  publisher: Me
  updateStrategy:
    registryPoll:
      interval: 1m

Copy link

openshift-ci bot commented Apr 26, 2024

@jotak: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-operator fe3a9e6 link false /test e2e-operator

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@memodi
Copy link
Contributor

memodi commented Apr 26, 2024

/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved QE has approved this pull request label Apr 26, 2024
@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Apr 26, 2024

@jotak: This pull request references NETOBSERV-1343 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

In response to this:

Description

  • Add dashboard config to metrics API
  • Use this API internally for predefined dashboards
  • Allow using SingleStats
  • New dedicated buckets for latency histograms

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
  • If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
  • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
  • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
  • Standard QE validation, with pre-merge tests unless stated otherwise.
  • Regression tests only (e.g. refactoring with no user-facing change).
  • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added the lgtm label Apr 30, 2024
@jotak
Copy link
Member Author

jotak commented May 7, 2024

/approve

Copy link

openshift-ci bot commented May 7, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jotak

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved label May 7, 2024
@jotak jotak merged commit 33b5849 into netobserv:main May 7, 2024
9 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved jira/valid-reference lgtm ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. qe-approved QE has approved this pull request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants