Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-blocking event recorder #425

Open
darkowlzz opened this issue Dec 6, 2022 · 0 comments
Open

Non-blocking event recorder #425

darkowlzz opened this issue Dec 6, 2022 · 0 comments
Labels
area/runtime Controller runtime related issues and pull requests enhancement New feature or request help wanted Extra attention is needed

Comments

@darkowlzz
Copy link
Contributor

The event recorder currently runs in a blocking fashion due to which it can affect the reconcilers when they emit an event and the event webhook server takes time to respond. The http client waits and retries on failure. This results in the reconciler to wait until the event is posted. Sometimes reconcilers time out waiting for the event emitting to complete, resulting in failed reconciliation. A non-blocking event recorder would help prevent the reconcilers to be affected by failure in event recording.

Following are some ideas to address this issue:

non-blocking request to the event webhook server

A quick and simple change to solve the immediate issue would be to make the http request to the event webhook server non-blocking by running it in a goroutine. This would unblock the reconciler but doesn't guarantee the ordering of the requests. If the webhook server is offline and the reconciliation is failing, the reconcilers retrying may create multiple of these goroutines which may keep retrying to post the event with back-off. Due to the variation in the back-off duration, when the webhook server becomes available, the events would be posted out of order. Since there's no de-duplication at the event source level, the webhook server will have to serve all the accumulated event requests and do de-duplication on its own. This may result in the creation of too many goroutines that are trying to do the same thing. But the goroutines can be configured to fail after certain number of attempts to ensure they get cleaned up.

per controller event processor

Another approach would be to introduce event processor per controller. The events package can provide some API to run an event processor, typically in the main.go file before setting up the reconcilers. The event recorder in the reconcilers would send event to the event processor through a buffered channel. The event processor would collect the events to be posted to the event webhook server, categorize them and process them based on certain strategies. Since all the events go through a central events processor, it can be used to add more functionalities in the event source. Order of the events can be maintained. Event de-duplication can be done at the source and spamming the event server can be avoided. If the event server isn't ready, the event processor can perform one health check and hold all the event processing. If the event buffer gets filled, it can drop certain events based on certain strategies. More interesting things can be done at the source of events centrally at the controller level.
This may be similar to the events notification broadcaster in kubernetes apimachiner https://github.com/kubernetes/apimachinery/blob/fd8a60496be5e4ce19d586738adb48ac6fa66ef9/pkg/watch/mux.go#L43 .

Some other variation of event processor could be to run event processor per reconciler or even per object and that'll create opportunities to handle the events in different ways. Like a tenant configuring the events related to their objects to be sent to their own event server which they manage.

@darkowlzz darkowlzz added enhancement New feature or request help wanted Extra attention is needed area/runtime Controller runtime related issues and pull requests labels Dec 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/runtime Controller runtime related issues and pull requests enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant