Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle missing TCP upstreams as warnings #8461

Merged
merged 33 commits into from
Jul 14, 2023

Conversation

ashishb-solo
Copy link
Contributor

@ashishb-solo ashishb-solo commented Jul 12, 2023

Description

Prior to the fix for #8405, errors for TCP listeners were not being reported. Upon fixing this, gloo pods began reporting validation errors when deleting upstreams if TCP gateways still depended on them (see for example: https://github.com/solo-io/solo-projects/issues/5156). In HTTP gateways, we typically handle these as warnings rather than errors (due to user patterns). This change treats TCP gateways the same as HTTP gateways, and changes reporting of missing upstreams from errors to warnings.

Context

This was causing test flakes, and also may have been impacting users who were reporting issues deleting TCP gateways.

Checklist:

  • I included a concise, user-facing changelog (for details, see https://github.com/solo-io/go-utils/tree/main/changelogutils) which references the issue that is resolved.
  • If I updated APIs (our protos) or helm values, I ran make -B install-go-tools generated-code to ensure there will be no code diff
  • I followed guidelines laid out in the Gloo Edge contribution guide
  • I opened a draft PR or added the work in progress label if my PR is not ready for review
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works

BOT NOTES:
resolves https://github.com/solo-io/solo-projects/issues/5163

@ashishb-solo ashishb-solo requested a review from a team as a code owner July 12, 2023 15:14
@github-actions github-actions bot added the keep pr updated signals bulldozer to keep pr up to date with base branch label Jul 12, 2023
@github-actions
Copy link

github-actions bot commented Jul 12, 2023

Visit the preview URL for this PR (updated for commit 11683c4):

https://gloo-edge--pr8461-5163-missing-tcp-ups-yfjo5roc.web.app

(expires Fri, 21 Jul 2023 15:06:35 GMT)

🔥 via Firebase Hosting GitHub Action 🌎

Sign: 77c2b86e287749579b7ff9cadb81e099042ef677

changelog/v1.15.0-beta18/handle-missing-tcp-upstreams.yaml Outdated Show resolved Hide resolved
projects/gloo/pkg/utils/validation/proxy_validation.go Outdated Show resolved Hide resolved
projects/gloo/pkg/utils/validation/proxy_validation.go Outdated Show resolved Hide resolved
projects/gloo/pkg/plugins/tcp/plugin.go Show resolved Hide resolved
projects/gloo/pkg/utils/vhosts.go Outdated Show resolved Hide resolved
projects/gloo/pkg/utils/vhosts.go Outdated Show resolved Hide resolved
@solo-changelog-bot
Copy link

Issues linked to changelog:
https://github.com/solo-io/solo-projects/issues/5163

func (invHostErr *TcpHostError) Error() string {
var errStr string
if invHostErr.Err != nil {
invHostErr.Err.Error()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
invHostErr.Err.Error()
return invHostErr.Err.Error()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -16,6 +16,21 @@ var (
RouteIdentifierTxt = "Route Name"
)

// TcpHostError reports an error and the host which is associated with the
// error
type TcpHostError struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is always treated as a warning, should we rename it TcpHostWarning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validationapi.TcpListenerReport_Error_ProcessingError,
fmt.Sprintf("listener %s: %s", t.parentListener.GetName(), err.Error()))
// unwrap the multierror object returned by CreateTcpFilterChains;
// treat DestinationNotFoundError as a warning and all others as
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like we're not actually checking for DestinationNotFoundError below, we're just converting all TcpHostErrors into InvalidDestinationWarnings. if we add more warning/error types in the future, this logic may not work anymore

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not 100% sure i understood what you're calling out here, but i did add some more rigorous checks in this commit: 42c951e

namely i made sure that the error type was a warning before using the AppendTcpHostWarning function call, and otherwise used the AppendTcpListenerError function. i think this resolves your concern, but if not, do let me know

}
}
} else {
validation.AppendTCPListenerError(t.report,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it's not a multierror, do we still need to check for type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added that as part of this commit: 42c951e

Comment on lines 75 to 84
if tcpHostWarn, ok := err.(*validation.TcpHostWarning); ok && tcpHostWarn.ErrorLevel() == validation.ErrorLevels_WARNING {
validation.AppendTcpHostWarning(
t.report.GetTcpHostReports()[tcpHostWarn.HostNum],
validationapi.TcpHostReport_Warning_InvalidDestinationWarning,
fmt.Sprintf("listener %s: %s", t.parentListener.GetName(), tcpHostWarn.Error()))
} else {
validation.AppendTCPListenerError(t.report,
validationapi.TcpListenerReport_Error_ProcessingError,
fmt.Sprintf("listener %s: %s", t.parentListener.GetName(), err.Error()))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this whole section, which also appears above, could be extracted out into a reusable func

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yessss thanks, that was really bothering me. i just couldn't figure out how to address it. here's the fix: 42c951e

Earlier commits simply added more fields to the ErrorWithKnownLevel interface
to aid in providing the necessary information required to report the error.
This commit is an attempt to break that out into a separate object so
additional fields do not need to be added every time more contextual
information is needed to correctly report the error
@nfuden nfuden requested a review from jenshu July 13, 2023 13:35
switch errType := errReport.(type) {
case validation.ErrorWithKnownLevel:
switch errType.ErrorLevel() {
case validation.ErrorLevels_WARNING:
Copy link
Contributor

@jenshu jenshu Jul 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i feel like passing around error levels and hostNum adds extra complexity.

instead of needing pass through hostNum, can we just make sure that when the error itself is created, it contains a reference to the TcpHost (i.e. we could include tcpHost.Name as part of the error message)?

instead of checking type = (warning or error) and inferring whether it should be translated to a TcpHostWarning or TCPListenerError, can we just check the actual error type, i.e. if the error is of type TcpHostWarning then here we call AppendTcpHostWarning, otherwise we call AppendTCPListenerError?

if we do this, i think we can remove the whole ErrorWithKnownLevel interface

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are likely going to be adding more warning type errors in the future right? shouldnt we have a standard way to denote that it should just be a warning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, regarding the complexity. i tried to do what you suggested initially, but then i found out that CreateFilterChain is a part of an interface, and I don't think we can change it, so it is messy to pass that information into the function. hence i had to find a way to pass hostNum back upward. that's a large part of what makes this tricky...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps a middle ground would be to stop passing up the error level (warning/error) and instead just pass the error type InvalidDestinationError up to the caller, and let the caller decide whether it should be an error or warning?

projects/gloo/pkg/utils/validation/proxy_validation.go Outdated Show resolved Hide resolved
Expect(warnings).To(HaveLen(3))
Expect(errors).To(HaveOccurred())
Expect(warnings).To(HaveLen(4))
Expect(errors).NotTo(HaveOccurred())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error that's no longer occurring is this:

TcpListener Error: ProcessingError. Reason: listener tcp-listener: 1 error occurred:
	* *v1.Upstream { gloo-system.test } not found

changelog/v1.15.0-beta18/handle-missing-tcp-upstreams.yaml Outdated Show resolved Hide resolved
projects/gloo/api/grpc/validation/gloo_validation.proto Outdated Show resolved Hide resolved
projects/gloo/pkg/utils/validation/proxy_validation.go Outdated Show resolved Hide resolved
projects/gloo/pkg/plugins/tcp/plugin.go Show resolved Hide resolved
projects/gloo/pkg/utils/vhosts.go Show resolved Hide resolved
switch errType := errReport.(type) {
case validation.ErrorWithKnownLevel:
switch errType.ErrorLevel() {
case validation.ErrorLevels_WARNING:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, regarding the complexity. i tried to do what you suggested initially, but then i found out that CreateFilterChain is a part of an interface, and I don't think we can change it, so it is messy to pass that information into the function. hence i had to find a way to pass hostNum back upward. that's a large part of what makes this tricky...

switch errType := errReport.(type) {
case validation.ErrorWithKnownLevel:
switch errType.ErrorLevel() {
case validation.ErrorLevels_WARNING:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps a middle ground would be to stop passing up the error level (warning/error) and instead just pass the error type InvalidDestinationError up to the caller, and let the caller decide whether it should be an error or warning?

@ashishb-solo ashishb-solo requested a review from nfuden July 14, 2023 15:09
@soloio-bulldozer soloio-bulldozer bot merged commit 181c57f into main Jul 14, 2023
12 checks passed
@soloio-bulldozer soloio-bulldozer bot deleted the 5163-missing-tcp-upstreams-as-warnings branch July 14, 2023 15:34
ashishb-solo added a commit that referenced this pull request Jul 14, 2023
* Handle missing TCP upstreams as warnings

* Move changelog to new location

* Fix a broken test

* Apply suggestions from code review

Co-authored-by: Nathan Fudenberg <nathan.fudenberg@solo.io>

* Fix error in changelog

* Address nil safety issue on TcpHostError

* Fix nil safety check

* Address a few review comments

* Update projects/gloo/api/grpc/validation/gloo_validation.proto

Co-authored-by: Jenny Shu <28537278+jenshu@users.noreply.github.com>

* Populate ErrorLevel for TcpHostWarning

* Mark InvalidDestinationError as deprecated

* Add some stricter error checking

* Move duplicated code into a reusable function

* Codegen

* Fix an error from refactor

* Use the ErrorWithKnownLevel interface to switch on error report types

* Handle the case where tcp host num is not provided

* Provide a separate Context object to carry additional information

Earlier commits simply added more fields to the ErrorWithKnownLevel interface
to aid in providing the necessary information required to report the error.
This commit is an attempt to break that out into a separate object so
additional fields do not need to be added every time more contextual
information is needed to correctly report the error

* Add docstrings on exported types

* Adding changelog file to new location

* Deleting changelog file from old location

* Adding changelog file to new location

* Deleting changelog file from old location

* Clean up hard-coded warning type

Previously, we were just using InvalidDestinationWarning for all warning types
- we remove that now and use a switch statement to use the right kind

* Fix an inaccurate comment

---------

Co-authored-by: soloio-bulldozer[bot] <48420018+soloio-bulldozer[bot]@users.noreply.github.com>
Co-authored-by: Nathan Fudenberg <nathan.fudenberg@solo.io>
Co-authored-by: Jenny Shu <28537278+jenshu@users.noreply.github.com>
Co-authored-by: changelog-bot <changelog-bot>
soloio-bulldozer bot added a commit that referenced this pull request Jul 14, 2023
* Handle missing TCP upstreams as warnings (#8461)

* Handle missing TCP upstreams as warnings

* Move changelog to new location

* Fix a broken test

* Apply suggestions from code review

Co-authored-by: Nathan Fudenberg <nathan.fudenberg@solo.io>

* Fix error in changelog

* Address nil safety issue on TcpHostError

* Fix nil safety check

* Address a few review comments

* Update projects/gloo/api/grpc/validation/gloo_validation.proto

Co-authored-by: Jenny Shu <28537278+jenshu@users.noreply.github.com>

* Populate ErrorLevel for TcpHostWarning

* Mark InvalidDestinationError as deprecated

* Add some stricter error checking

* Move duplicated code into a reusable function

* Codegen

* Fix an error from refactor

* Use the ErrorWithKnownLevel interface to switch on error report types

* Handle the case where tcp host num is not provided

* Provide a separate Context object to carry additional information

Earlier commits simply added more fields to the ErrorWithKnownLevel interface
to aid in providing the necessary information required to report the error.
This commit is an attempt to break that out into a separate object so
additional fields do not need to be added every time more contextual
information is needed to correctly report the error

* Add docstrings on exported types

* Adding changelog file to new location

* Deleting changelog file from old location

* Adding changelog file to new location

* Deleting changelog file from old location

* Clean up hard-coded warning type

Previously, we were just using InvalidDestinationWarning for all warning types
- we remove that now and use a switch statement to use the right kind

* Fix an inaccurate comment

---------

Co-authored-by: soloio-bulldozer[bot] <48420018+soloio-bulldozer[bot]@users.noreply.github.com>
Co-authored-by: Nathan Fudenberg <nathan.fudenberg@solo.io>
Co-authored-by: Jenny Shu <28537278+jenshu@users.noreply.github.com>
Co-authored-by: changelog-bot <changelog-bot>

* Fix changelog location

* Mark as resolving issue

---------

Co-authored-by: soloio-bulldozer[bot] <48420018+soloio-bulldozer[bot]@users.noreply.github.com>
Co-authored-by: Nathan Fudenberg <nathan.fudenberg@solo.io>
Co-authored-by: Jenny Shu <28537278+jenshu@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
keep pr updated signals bulldozer to keep pr up to date with base branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants