-
Notifications
You must be signed in to change notification settings - Fork 437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle missing TCP upstreams as warnings #8461
Handle missing TCP upstreams as warnings #8461
Conversation
Visit the preview URL for this PR (updated for commit 11683c4): https://gloo-edge--pr8461-5163-missing-tcp-ups-yfjo5roc.web.app (expires Fri, 21 Jul 2023 15:06:35 GMT) 🔥 via Firebase Hosting GitHub Action 🌎 Sign: 77c2b86e287749579b7ff9cadb81e099042ef677 |
…olo-io/gloo into 5163-missing-tcp-upstreams-as-warnings
Issues linked to changelog: |
Co-authored-by: Nathan Fudenberg <nathan.fudenberg@solo.io>
…olo-io/gloo into 5163-missing-tcp-upstreams-as-warnings
func (invHostErr *TcpHostError) Error() string { | ||
var errStr string | ||
if invHostErr.Err != nil { | ||
invHostErr.Err.Error() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
invHostErr.Err.Error() | |
return invHostErr.Err.Error() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this is made obsolete by the changes in this comment: https://github.com/solo-io/gloo/pull/8461/files/3a5d40209a80d62a16170bd2817ab85f2992439c#r1261386540
@@ -16,6 +16,21 @@ var ( | |||
RouteIdentifierTxt = "Route Name" | |||
) | |||
|
|||
// TcpHostError reports an error and the host which is associated with the | |||
// error | |||
type TcpHostError struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this is always treated as a warning, should we rename it TcpHostWarning
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this has been done here: c69486e#diff-d8eb0189e46d1e52cd986c9ca9442663af1ea2287d7757d6483fd4926ef6f16dR35
validationapi.TcpListenerReport_Error_ProcessingError, | ||
fmt.Sprintf("listener %s: %s", t.parentListener.GetName(), err.Error())) | ||
// unwrap the multierror object returned by CreateTcpFilterChains; | ||
// treat DestinationNotFoundError as a warning and all others as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like we're not actually checking for DestinationNotFoundError
below, we're just converting all TcpHostErrors into InvalidDestinationWarnings. if we add more warning/error types in the future, this logic may not work anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not 100% sure i understood what you're calling out here, but i did add some more rigorous checks in this commit: 42c951e
namely i made sure that the error type was a warning before using the AppendTcpHostWarning
function call, and otherwise used the AppendTcpListenerError
function. i think this resolves your concern, but if not, do let me know
} | ||
} | ||
} else { | ||
validation.AppendTCPListenerError(t.report, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it's not a multierror, do we still need to check for type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added that as part of this commit: 42c951e
Co-authored-by: Jenny Shu <28537278+jenshu@users.noreply.github.com>
…olo-io/gloo into 5163-missing-tcp-upstreams-as-warnings
if tcpHostWarn, ok := err.(*validation.TcpHostWarning); ok && tcpHostWarn.ErrorLevel() == validation.ErrorLevels_WARNING { | ||
validation.AppendTcpHostWarning( | ||
t.report.GetTcpHostReports()[tcpHostWarn.HostNum], | ||
validationapi.TcpHostReport_Warning_InvalidDestinationWarning, | ||
fmt.Sprintf("listener %s: %s", t.parentListener.GetName(), tcpHostWarn.Error())) | ||
} else { | ||
validation.AppendTCPListenerError(t.report, | ||
validationapi.TcpListenerReport_Error_ProcessingError, | ||
fmt.Sprintf("listener %s: %s", t.parentListener.GetName(), err.Error())) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like this whole section, which also appears above, could be extracted out into a reusable func
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yessss thanks, that was really bothering me. i just couldn't figure out how to address it. here's the fix: 42c951e
Earlier commits simply added more fields to the ErrorWithKnownLevel interface to aid in providing the necessary information required to report the error. This commit is an attempt to break that out into a separate object so additional fields do not need to be added every time more contextual information is needed to correctly report the error
switch errType := errReport.(type) { | ||
case validation.ErrorWithKnownLevel: | ||
switch errType.ErrorLevel() { | ||
case validation.ErrorLevels_WARNING: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i feel like passing around error levels and hostNum adds extra complexity.
instead of needing pass through hostNum, can we just make sure that when the error itself is created, it contains a reference to the TcpHost (i.e. we could include tcpHost.Name
as part of the error message)?
instead of checking type = (warning or error) and inferring whether it should be translated to a TcpHostWarning or TCPListenerError, can we just check the actual error type, i.e. if the error is of type TcpHostWarning
then here we call AppendTcpHostWarning, otherwise we call AppendTCPListenerError?
if we do this, i think we can remove the whole ErrorWithKnownLevel interface
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are likely going to be adding more warning type errors in the future right? shouldnt we have a standard way to denote that it should just be a warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, regarding the complexity. i tried to do what you suggested initially, but then i found out that CreateFilterChain
is a part of an interface, and I don't think we can change it, so it is messy to pass that information into the function. hence i had to find a way to pass hostNum
back upward. that's a large part of what makes this tricky...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps a middle ground would be to stop passing up the error level (warning/error) and instead just pass the error type InvalidDestinationError
up to the caller, and let the caller decide whether it should be an error or warning?
Expect(warnings).To(HaveLen(3)) | ||
Expect(errors).To(HaveOccurred()) | ||
Expect(warnings).To(HaveLen(4)) | ||
Expect(errors).NotTo(HaveOccurred()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error that's no longer occurring is this:
TcpListener Error: ProcessingError. Reason: listener tcp-listener: 1 error occurred:
* *v1.Upstream { gloo-system.test } not found
switch errType := errReport.(type) { | ||
case validation.ErrorWithKnownLevel: | ||
switch errType.ErrorLevel() { | ||
case validation.ErrorLevels_WARNING: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, regarding the complexity. i tried to do what you suggested initially, but then i found out that CreateFilterChain
is a part of an interface, and I don't think we can change it, so it is messy to pass that information into the function. hence i had to find a way to pass hostNum
back upward. that's a large part of what makes this tricky...
switch errType := errReport.(type) { | ||
case validation.ErrorWithKnownLevel: | ||
switch errType.ErrorLevel() { | ||
case validation.ErrorLevels_WARNING: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps a middle ground would be to stop passing up the error level (warning/error) and instead just pass the error type InvalidDestinationError
up to the caller, and let the caller decide whether it should be an error or warning?
Previously, we were just using InvalidDestinationWarning for all warning types - we remove that now and use a switch statement to use the right kind
…olo-io/gloo into 5163-missing-tcp-upstreams-as-warnings
* Handle missing TCP upstreams as warnings * Move changelog to new location * Fix a broken test * Apply suggestions from code review Co-authored-by: Nathan Fudenberg <nathan.fudenberg@solo.io> * Fix error in changelog * Address nil safety issue on TcpHostError * Fix nil safety check * Address a few review comments * Update projects/gloo/api/grpc/validation/gloo_validation.proto Co-authored-by: Jenny Shu <28537278+jenshu@users.noreply.github.com> * Populate ErrorLevel for TcpHostWarning * Mark InvalidDestinationError as deprecated * Add some stricter error checking * Move duplicated code into a reusable function * Codegen * Fix an error from refactor * Use the ErrorWithKnownLevel interface to switch on error report types * Handle the case where tcp host num is not provided * Provide a separate Context object to carry additional information Earlier commits simply added more fields to the ErrorWithKnownLevel interface to aid in providing the necessary information required to report the error. This commit is an attempt to break that out into a separate object so additional fields do not need to be added every time more contextual information is needed to correctly report the error * Add docstrings on exported types * Adding changelog file to new location * Deleting changelog file from old location * Adding changelog file to new location * Deleting changelog file from old location * Clean up hard-coded warning type Previously, we were just using InvalidDestinationWarning for all warning types - we remove that now and use a switch statement to use the right kind * Fix an inaccurate comment --------- Co-authored-by: soloio-bulldozer[bot] <48420018+soloio-bulldozer[bot]@users.noreply.github.com> Co-authored-by: Nathan Fudenberg <nathan.fudenberg@solo.io> Co-authored-by: Jenny Shu <28537278+jenshu@users.noreply.github.com> Co-authored-by: changelog-bot <changelog-bot>
* Handle missing TCP upstreams as warnings (#8461) * Handle missing TCP upstreams as warnings * Move changelog to new location * Fix a broken test * Apply suggestions from code review Co-authored-by: Nathan Fudenberg <nathan.fudenberg@solo.io> * Fix error in changelog * Address nil safety issue on TcpHostError * Fix nil safety check * Address a few review comments * Update projects/gloo/api/grpc/validation/gloo_validation.proto Co-authored-by: Jenny Shu <28537278+jenshu@users.noreply.github.com> * Populate ErrorLevel for TcpHostWarning * Mark InvalidDestinationError as deprecated * Add some stricter error checking * Move duplicated code into a reusable function * Codegen * Fix an error from refactor * Use the ErrorWithKnownLevel interface to switch on error report types * Handle the case where tcp host num is not provided * Provide a separate Context object to carry additional information Earlier commits simply added more fields to the ErrorWithKnownLevel interface to aid in providing the necessary information required to report the error. This commit is an attempt to break that out into a separate object so additional fields do not need to be added every time more contextual information is needed to correctly report the error * Add docstrings on exported types * Adding changelog file to new location * Deleting changelog file from old location * Adding changelog file to new location * Deleting changelog file from old location * Clean up hard-coded warning type Previously, we were just using InvalidDestinationWarning for all warning types - we remove that now and use a switch statement to use the right kind * Fix an inaccurate comment --------- Co-authored-by: soloio-bulldozer[bot] <48420018+soloio-bulldozer[bot]@users.noreply.github.com> Co-authored-by: Nathan Fudenberg <nathan.fudenberg@solo.io> Co-authored-by: Jenny Shu <28537278+jenshu@users.noreply.github.com> Co-authored-by: changelog-bot <changelog-bot> * Fix changelog location * Mark as resolving issue --------- Co-authored-by: soloio-bulldozer[bot] <48420018+soloio-bulldozer[bot]@users.noreply.github.com> Co-authored-by: Nathan Fudenberg <nathan.fudenberg@solo.io> Co-authored-by: Jenny Shu <28537278+jenshu@users.noreply.github.com>
Description
Prior to the fix for #8405, errors for TCP listeners were not being reported. Upon fixing this, gloo pods began reporting validation errors when deleting upstreams if TCP gateways still depended on them (see for example: https://github.com/solo-io/solo-projects/issues/5156). In HTTP gateways, we typically handle these as warnings rather than errors (due to user patterns). This change treats TCP gateways the same as HTTP gateways, and changes reporting of missing upstreams from errors to warnings.
Context
This was causing test flakes, and also may have been impacting users who were reporting issues deleting TCP gateways.
Checklist:
make -B install-go-tools generated-code
to ensure there will be no code diffBOT NOTES:
resolves https://github.com/solo-io/solo-projects/issues/5163