Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NETOBSERV-881: Added field to disable alerts creation #295

Merged
merged 3 commits into from
Mar 16, 2023

Conversation

OlivierCazade
Copy link
Contributor

Added disable list to not create alerts. Each alert listed in this field would not be created.

@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Mar 14, 2023

@OlivierCazade: This pull request references NETOBSERV-881 which is a valid jira issue.

In response to this:

Added disable list to not create alerts. Each alert listed in this field would not be created.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@@ -53,6 +53,10 @@ func (r *FlowCollector) ConvertTo(dstRaw conversion.Hub) error {
dst.Spec.Processor.ConnectionEndTimeout = restored.Spec.Processor.ConnectionEndTimeout
}

if restored.Spec.Processor.Metrics.DisableAlerts != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of curiosity, did you have "something" telling you that you had something to do here, or was it something you had to keep in mind yourself? (trying to understand how error prone it can be if we forget about the conversion)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After adding the new field in v1beta1 running make generate was triggering some warning because of the new fields.

But then I mostly looked at previous PR to understand how to do it.

@@ -280,6 +280,10 @@ type FLPMetrics struct {
// ignoreTags is a list of tags to specify which metrics to ignore
//+kubebuilder:default:={"egress","packets"}
IgnoreTags []string `json:"ignoreTags,omitempty"`

// disableAlerts is a list of alerts related to FLP that should not be created
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what is the best is terms of usage here, a deny-list or an allow-list. Thoughts?
In any case we should document here what are the acceptable values, ie NetObservNoFlows and NetObservLokiError at the moment

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's good to use a deny-list as we want alerts, in general, being set by default

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think as a default behavior we want to enable alerts. Using an allow list would make upgrade with new alert a bit more trickier since we would not know if it is still the default list and if we should add the new alert to the upgrade.

I will improve the documentation to mention the possibilities.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I would remove the FLP mention, as it might not always be clear to the users when it's related to FLP or not (they don't necessarily know well our architecture)

Suggested change
// disableAlerts is a list of alerts related to FLP that should not be created
// disableAlerts is a list of alerts that should not be created. Possible values are: `NetObservNoFlows`, which is triggered when no flows are being observed for a certain period, and `NetObservLokiError`, which is triggered when flows are being dropped due to Loki errors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @skrthomas could you please also review the CRD doc added here?
Thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added your change but also moved to an enum so we can control the potential values.

@codecov
Copy link

codecov bot commented Mar 14, 2023

Codecov Report

Merging #295 (70b7d52) into main (8a4db3c) will decrease coverage by 0.04%.
The diff coverage is 63.63%.

@@            Coverage Diff             @@
##             main     #295      +/-   ##
==========================================
- Coverage   47.61%   47.57%   -0.04%     
==========================================
  Files          43       43              
  Lines        5015     5036      +21     
==========================================
+ Hits         2388     2396       +8     
- Misses       2416     2427      +11     
- Partials      211      213       +2     
Flag Coverage Δ
unittests 47.57% <63.63%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
api/v1alpha1/flowcollector_webhook.go 0.00% <0.00%> (ø)
api/v1alpha1/zz_generated.conversion.go 0.27% <0.00%> (+<0.01%) ⬆️
api/v1beta1/flowcollector_types.go 100.00% <ø> (ø)
api/v1beta1/zz_generated.deepcopy.go 42.33% <0.00%> (-0.60%) ⬇️
controllers/flowlogspipeline/flp_common_objects.go 84.95% <89.74%> (-0.37%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

)

// Name of a processor alert
// Possible values are: `NetObservNoFlows`, which is triggered when no flows are being observed for a certain period, and `NetObservLokiError`, which is triggered when flows are being dropped due to Loki errors.disableAlerts is a list of alerts related to FLP that should not be created
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The end of the comment seems wrong here: disableAlerts is a list of alerts related to FLP that should not be created

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. The bit "related to FLP" could be unpacked a little. Could it be a list of alerts related to the health of flows in the FLP? 🤔 Also , a space is needed after the period and before disableAlerts.

Copy link
Contributor

@skrthomas skrthomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments. Mostly a couple of punctuation suggestions and also wondering about the same phrase Joel was asking about.

)

// Name of a processor alert
// Possible values are: `NetObservNoFlows`, which is triggered when no flows are being observed for a certain period, and `NetObservLokiError`, which is triggered when flows are being dropped due to Loki errors.disableAlerts is a list of alerts related to FLP that should not be created
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Possible values are: `NetObservNoFlows`, which is triggered when no flows are being observed for a certain period, and `NetObservLokiError`, which is triggered when flows are being dropped due to Loki errors.disableAlerts is a list of alerts related to FLP that should not be created
// Possible values are: `NetObservNoFlows`, which is triggered when no flows are being observed for a certain period, and `NetObservLokiError`, which is triggered when flows are being dropped due to Loki errors. disableAlerts is a list of alerts related to FLP that should not be created

)

// Name of a processor alert
// Possible values are: `NetObservNoFlows`, which is triggered when no flows are being observed for a certain period, and `NetObservLokiError`, which is triggered when flows are being dropped due to Loki errors.disableAlerts is a list of alerts related to FLP that should not be created
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. The bit "related to FLP" could be unpacked a little. Could it be a list of alerts related to the health of flows in the FLP? 🤔 Also , a space is needed after the period and before disableAlerts.

description: disableAlerts is a list of alerts that should
not be created.
items:
description: 'Name of a processor alert Possible values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
description: 'Name of a processor alert Possible values
description: 'Name of a processor alert. Possible values

AlertLokiError = "NetObservLokiError"
)

// Name of a processor alert
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Name of a processor alert
// Name of a processor alert.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as my comment below. A period is needed here.

@OlivierCazade
Copy link
Contributor Author

@Amoghrd this PR is ready for testing, using the new field you should be able to see that the alert rule is not set.

To see alert rule go to CustomResourceDefinition -> prometheusRules, there should be an instance called flowlogs-pipeline-alert and in the yaml of this instance you should have a list of the alerts under spec -> groups

@Amoghrd
Copy link
Contributor

Amoghrd commented Mar 14, 2023

/ok-to-test

@openshift-ci openshift-ci bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Mar 14, 2023
@github-actions
Copy link

New images:

  • quay.io/netobserv/network-observability-operator:27d0843
  • quay.io/netobserv/network-observability-operator-bundle:v0.0.0-27d0843
  • quay.io/netobserv/network-observability-operator-catalog:v0.0.0-27d0843

They will expire after two weeks.

Catalog source:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: netobserv-dev
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-27d0843
  displayName: NetObserv development catalog
  publisher: Me
  updateStrategy:
    registryPoll:
      interval: 1m

Copy link
Member

@jotak jotak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@jotak
Copy link
Member

jotak commented Mar 16, 2023

@skrthomas since there is both this PR and also Julien's one that require regenerating the doc, what I can do is once they are merged I'll open a new one with just the doc update to have all at once

Copy link
Contributor

@skrthomas skrthomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, @jotak. lgtm!

@Amoghrd
Copy link
Contributor

Amoghrd commented Mar 16, 2023

/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved QE has approved this pull request label Mar 16, 2023
@jotak
Copy link
Member

jotak commented Mar 16, 2023

@OlivierCazade I'm merging this, I have a pending PR that conflicts with this one
/approve

@openshift-ci
Copy link

openshift-ci bot commented Mar 16, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jotak

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jotak jotak merged commit 88c67a6 into netobserv:main Mar 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved jira/valid-reference lgtm ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. qe-approved QE has approved this pull request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants