Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spike: create warning events for parsing errors #3088

Closed
wants to merge 4 commits into from

Conversation

czeslavo
Copy link
Contributor

@czeslavo czeslavo commented Oct 21, 2022

What this PR does / why we need it:

It adds a way to collect translation errors while running Parser and use them to publish Kubernetes API Event objects with a detailed message describing every failure. Every error can be associated with one or more k8s objects and one event per such object gets created.

Special notes for your reviewer:

This PR serves as a reference prototype and is meant to be thrown away. It's expected that some of the tests won't pass. Please focus on the general idea of introducing Events publishing to Parser.

Fully-fledged implementation will be done once we create an issue and groom it.

Please take a look at my in-line comments to quicker grasp the main points of the changes.

@czeslavo czeslavo added do not merge let the author merge this, don't merge for them. area/spike labels Oct 21, 2022
@czeslavo czeslavo temporarily deployed to Configure ci October 21, 2022 16:36 Inactive
Comment on lines 320 to 325
if errors := p.GetParsingErrors(); errors != nil {
c.createParsingErrorsEvents(errors)
c.prometheusMetrics.TranslationCount.With(prometheus.Labels{
metrics.SuccessKey: metrics.SuccessFalse,
}).Inc()
c.logger.Debugf("%d translation errors occurred when building data-plane configuration", len(errors))
Copy link
Contributor Author

@czeslavo czeslavo Oct 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case any translation error is detected, we consider that a failed translation thus propagating the metric with SuccessFalse value. IMO it makes sense as that will allow setting up alerting on that.

Comment on lines +453 to +460
func (c *KongClient) createParsingErrorsEvents(errors []parser.ParsingError) {
const reason = "TranslationToKongConfigurationFailed"
for _, err := range errors {
for _, obj := range err.RelatedObjects() {
c.eventsRecorder.Event(obj, corev1.EventTypeWarning, reason, err.Reason())
}
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eventsRecorder creates Kubernetes Event object associated with the obj.

Comment on lines +73 to +75
func (c *parsingErrorsCollector) ParsingError(reason string, relatedObjects ...client.Object) {
c.errors = append(c.errors, NewParsingError(reason, relatedObjects...))
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ParsingError is used to gather parsing errors during the translation. It accepts related/affected objects regarding the error.

Comment on lines +42 to +45
type ParsingError struct {
relatedObjects []client.Object
reason string
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every parsing error has a human-readable reason string and related kubernetes objects.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from this deserving its own file I feel this is a good approach.

@@ -144,7 +144,18 @@ func Run(ctx context.Context, c *Config, diagnostic util.ConfigDumpDiagnostic) e
if err != nil {
return fmt.Errorf("%f is not a valid number of seconds to the timeout config for the kong client: %w", c.ProxyTimeoutSeconds, err)
}
dataplaneClient, err := dataplane.NewKongClient(deprecatedLogger, timeoutDuration, c.IngressClassName, c.EnableReverseSync, c.SkipCACertificates, diagnostic, kongConfig)

dataplaneEventRecorder := mgr.GetEventRecorderFor("kubernetes-ingress-controller-data-plane")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's straightforward to create an event recorder from the controller manager.

@czeslavo czeslavo temporarily deployed to Configure ci October 21, 2022 17:00 Inactive
@czeslavo czeslavo temporarily deployed to Configure ci October 21, 2022 17:00 Inactive
@czeslavo czeslavo marked this pull request as ready for review October 24, 2022 10:32
@czeslavo czeslavo requested a review from a team as a code owner October 24, 2022 10:32
@@ -154,6 +191,12 @@ func (p *Parser) GenerateKubernetesObjectReport() []client.Object {
return report
}

func (p *Parser) GetParsingErrors() []ParsingError {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to see a comment on this func and perhaps I'd suggest renaming it as getters usually are considered to not mutate the state of the object they are called on, whereas here that's not the case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, sure: e4c6a1b

Comment on lines 23 to 26
func (p *Parser) getCACerts(
log logrus.FieldLogger,
storer store.Storer,
) []kong.CACertificate {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this changes to have a *Parser receiver then I believe we don't need the params anymore do we? Given that those are stored in Parser anyway (and that's how it's being called).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's 100% true, thanks: c5b45ca

Comment on lines +13 to +14
// todo: adapt to implementation
_ = []kongstate.Plugin{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the PR is already marked as ready to review, I believe this needs addressing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note it's marked as ready for review (as Michał has suggested doing), but still is not meant to be merged to main in this form. I can adapt the tests but IMO it's not worth it for now until we decide that we wanna go with this approach. The key output I wanted to get out of reviewing this is whether the team thinks it's the way we wanna go regarding events publishing (mainly, if it's ok to do that directly from Parser).

@czeslavo czeslavo temporarily deployed to Configure ci October 24, 2022 14:46 Inactive
@czeslavo czeslavo temporarily deployed to Configure ci October 24, 2022 14:51 Inactive
Copy link
Member

@pmalek pmalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

One note though w.r.t to introducing this as part of the user facing interface: we'd ideally document this in a relatively easily discoverable place (https://docs.konghq.com/kubernetes-ingress-controller/latest/ ? )

Comment on lines +42 to +45
type ParsingError struct {
relatedObjects []client.Object
reason string
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from this deserving its own file I feel this is a good approach.

Comment on lines +196 to +197
errors := p.errorsCollector.errors
p.errorsCollector.errors = nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would ideally be placed in parsingErrorsCollector

Copy link
Member

@mlavacca mlavacca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the approach of emitting events related to specific parsing errors that can happen in the controller. Definitely 👍 for this approach. As far as I get, the purpose of this PR is only the validation of a concept, therefore I think we can discuss the implementation details in the actual PRs.

@czeslavo
Copy link
Contributor Author

Closing this as work on the proper implementation has been kicked off under #3097

@czeslavo czeslavo closed this Oct 26, 2022
@czeslavo czeslavo deleted the spike/parsing-error-events-poc branch February 13, 2023 13:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/spike do not merge let the author merge this, don't merge for them. size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants