Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic Notification Policy Routes #1800

Merged
merged 49 commits into from
Jan 29, 2025

Conversation

msvechla
Copy link
Contributor

See #1789 for the related feature proposal.

This implementation is still work in progress for now and should support the feature proposal.

Updates on GrafanaNotificationPolicyRoutes are tracked via watches. I first tried to accomplish this via ownerReferences (see commits), but this is not supported cross-namespace.


Example of how the sample policies in hack/kind/resources/default/grafana-notification-policy.yaml are rendered:

CleanShot 2024-12-18 at 11 58 53

Status updates on the GrafanaNotificationPolicy

❯ kubectl get grafananotificationpolicy grafananotificationpolicy-sample -o jsonpath="{.status}" |jq
{
  "conditions": [
    {
      "lastTransitionTime": "2024-12-17T15:52:31Z",
      "message": "Notification Policy was successfully applied to 1 instances",
      "observedGeneration": 9,
      "reason": "ApplySuccessful",
      "status": "True",
      "type": "NotificationPolicySynchronized"
    }
  ],
  "discoveredRoutes": [
    "grafana-crds/dynamic-c (priority: 1)",
    "default/dynamic-d (priority: 2)",
    "default/dynamic-e (priority: nil)"
  ]
}

Events emitted for the merged routes:

❯ kubectl get events

LAST SEEN   TYPE     REASON   OBJECT                                     MESSAGE
20m         Normal   Merged   grafananotificationpolicyroute/dynamic-c   Route merged into NotificationPolicy default/grafananotificationpolicy-sample
3s          Normal   Merged   grafananotificationpolicyroute/dynamic-d   Route merged into NotificationPolicy default/grafananotificationpolicy-sample
3s          Normal   Merged   grafananotificationpolicyroute/dynamic-e   Route merged into NotificationPolicy default/grafananotificationpolicy-sample

Adapted PROJECT to new go module version path and ran:

```
./bin/operator-sdk create api --group grafana --version v1beta1 --kind GrafanaNotificationPolicyRoute --controller false
```
- during the reconcile loop in notificationpolicy_controller.go, we have
  to fetch all matching GrafanaNotificationPolicyRoutes for the currently reconciled GrafanaNotificationPolicy
- this can be very easily achieved with a routeSelector, which will be a Kubernetes LabelSelector
- if we would go with instanceSelector, we would have to fetch all available
  GrafanaNotificationPolicyRoutes and then do some filtering afterwards,
  to see if the instanceSelector matches, which would be both more inefficient and more complex
The GrafanaNotificationPolicy Controller now watches
GrafanaNoticationPolicyRoutes instead of using ownerReferences, as
ownerReferences do not support cross-namespace references.

We now also emit a event on the GrafanaNotificationPolicyRoute to
indicate that it has been merged into a specific policy.
@msvechla msvechla changed the title Notification policy routes impl Dynamic Notification Policy Routes Dec 18, 2024
Copy link

This PR hasn't been updated for a while, marking as stale

@github-actions github-actions bot added the stale label Jan 20, 2025
@github-actions github-actions bot added the documentation Issues relating to documentation, missing, non-clear etc. label Jan 21, 2025
@msvechla
Copy link
Contributor Author

I have updated the draft PR with the latest discussions from #1789:

  • RouteSelector has been moved to the Route object, allowing to dynamically inject routes on all levels now
  • Logic to detect reference loops has been implemented
  • Tests have been added accordingly
  • Priority logic has been removed

There are no checks for RouteSelector and Routes being mutually exclusive so far.
Unfortunately I think we can not implement this validation on the OpenAPI level, as the Routes attribute uses // +kubebuilder:validation:Schemaless, which disables all validation.

Another idea would be to solve this by implementing a ValidationWebhook. @theSuess do you have any thoughts on this?

Results of updates Samples

Assembled hack/kind/resources/default/grafana-notification-policy.yaml:

CleanShot 2025-01-21 at 11 06 27

❮ kubectl get grafananotificationpolicy grafananotificationpolicy-sample -o jsonpath="{.status}" |jq
{
  "conditions": [
    {
      "lastTransitionTime": "2025-01-21T09:48:19Z",
      "message": "Notification Policy was successfully applied to 1 instances",
      "observedGeneration": 1,
      "reason": "ApplySuccessful",
      "status": "True",
      "type": "NotificationPolicySynchronized"
    }
  ],
  "discoveredRoutes": [
    "default/dynamic-e",
    "grafana-crds/dynamic-c",
    "default/dynamic-d"
  ]
}
❯ kubectl get events

LAST SEEN   TYPE     REASON   OBJECT                                     MESSAGE
20m         Normal   Merged   grafananotificationpolicyroute/dynamic-d   Route merged into NotificationPolicy default/grafananotificationpolicy-sample
20m         Normal   Merged   grafananotificationpolicyroute/dynamic-d   Route merged into NotificationPolicy default/grafananotificationpolicy-sample
20m         Normal   Merged   grafananotificationpolicyroute/dynamic-e   Route merged into NotificationPolicy default/grafananotificationpolicy-sample
20m         Normal   Merged   grafananotificationpolicyroute/dynamic-e   Route merged into NotificationPolicy default/grafananotificationpolicy-sample

@msvechla msvechla force-pushed the notification_policy_routes_impl branch from 3da8a88 to 95a1fd8 Compare January 21, 2025 10:11
@msvechla
Copy link
Contributor Author

I rebased on the latest master branch.

Additionally I looked into the ValidationWebhook for ensuring that routeSelector and routes are mutually exlusive.

While the implementation of the logic is quite straight-forward, setting up webhooks is a bit more complex, especially bringing this to the Helm chart.

I can certainly do this if you think this is the way forward, we can find another way to do the validation, or maybe skip this validation for now.

Any thoughts?

@github-actions github-actions bot removed the stale label Jan 22, 2025
@theSuess
Copy link
Member

Sorry for the deleted comment, missed the part about api level validation rules in your previous note.

We had some discussions around Validation Webhooks before and came to the conclusion that these are too finicky to get right.

If the validation rules don't work, I'd just go for a status condition in the respective object when both values are set which tells the user that the dynamic matching takes priority and the routes field is ignored

@msvechla
Copy link
Contributor Author

Unfortunately, as far as I can see, the validation expressions will not work due to the +kubebuilder:validation:Schemaless on the Routes object. I can see if I can find a workaround.

Alternatively, I will go with the hint in the status condition as you suggested 👍

@msvechla
Copy link
Contributor Author

I hope I caught everything now, if not let me know!

Thanks a lot for your support! I learned a lot and this was a really nice experience!

controllers/notificationpolicy_controller.go Outdated Show resolved Hide resolved
}
assembledNotificationPolicy, mergedRoutes, err = assembleNotificationPolicyRoutes(ctx, r.Client, namespace, assembledNotificationPolicy)
r.Log.Info("assembled notification policy routes", "mergedRoutes", mergedRoutes)
if err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The synchronize condition is specifically used when talking about API calls to a Grafana instance.
For other errors, the plan is to re-use or create new shared conditions to use.
Or potentially look into publishing events as you have done here 😉.
Hence, putting it right after the apply/synchronize loop makes the most sense.
a simpler defer is generally better also.

This also allows you to remove the applyError[globalApplyError] which could confuse users that have a Grafana instance named global(Guilty)

As a side note, dissimilar to other projects' use of conditions, Grafana-Operator removes old conditions immediately rather than setting the status False.
This avoids multiple conditions where you have to look for the ones that are True

@msvechla
Copy link
Contributor Author

msvechla commented Jan 28, 2025

@Baarsgaard I tried to adapt this in the latest commit 00ebd92

  • reverted changes that moved building synchronized condition to defer
  • added a specific error type for loop detected errors
  • set condition for specific assembly errors and emit events otherwise

Thanks for your explanation. If the synchronized condition was built specifically for Grafana API errors, then of course it makes sense to adapt this.

@msvechla msvechla force-pushed the notification_policy_routes_impl branch from 29f70cd to 34c81f7 Compare January 28, 2025 09:23
- reverted changes that moved building synchronized condition to defer
- added a specific error type for loop detected errors
- set condition for specific errors and emit events otherwise for
  assembly errors
@msvechla msvechla force-pushed the notification_policy_routes_impl branch from 34c81f7 to 00ebd92 Compare January 28, 2025 09:25
@msvechla msvechla requested a review from Baarsgaard January 28, 2025 09:53
Copy link
Contributor

@Baarsgaard Baarsgaard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huge fan of the changes to the Reconcile function, it's a lot simpler now :D

controllers/notificationpolicy_controller.go Outdated Show resolved Hide resolved
controllers/notificationpolicy_controller.go Outdated Show resolved Hide resolved
@msvechla msvechla requested a review from Baarsgaard January 29, 2025 11:06
@theSuess theSuess added the feature this PR introduces a new feature label Jan 29, 2025
Copy link
Member

@theSuess theSuess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for all the work on this! I've pushed a small cleanup commit and queued this for merging. Looks like the next release will be full of cool new features!

@theSuess theSuess enabled auto-merge January 29, 2025 13:13
@theSuess theSuess disabled auto-merge January 29, 2025 13:47
@theSuess theSuess force-pushed the notification_policy_routes_impl branch from 0ebdb21 to 2167f71 Compare January 29, 2025 13:51
@theSuess theSuess force-pushed the notification_policy_routes_impl branch from 2167f71 to 56d4f5e Compare January 29, 2025 14:02
@theSuess theSuess added this pull request to the merge queue Jan 29, 2025
Merged via the queue into grafana:master with commit ff3fb07 Jan 29, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Issues relating to documentation, missing, non-clear etc. feature this PR introduces a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants