Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pull over Validate_Clusters to api package in 1.17.x #173

Open
3 of 4 tasks
dhiaayachi opened this issue Sep 25, 2024 · 0 comments
Open
3 of 4 tasks

pull over Validate_Clusters to api package in 1.17.x #173

dhiaayachi opened this issue Sep 25, 2024 · 0 comments

Comments

@dhiaayachi
Copy link
Owner

Backport

This PR is manually generated related to hashicorp#21587.

The below text is copied from the body of the original PR.


Description

The validate_clusters option in Envoy's route configuration says:

"An optional boolean that specifies whether the clusters that the route table refers to will be validated by the cluster manager. If set to true and a route refers to a non-existent cluster, the route table will not load. If set to false and a route refers to a non-existent cluster, the route table will load and the router filter will return a 404 if the route is selected at runtime. This setting defaults to true if the route table is statically defined via the route_config option. This setting default to false if the route table is loaded dynamically via the rds option. Users may wish to override the default behavior in certain cases (for example when using CDS with a static route table)."

We are setting it dynamically via RDS, but overriding the default value to set it explicitly to true. This means when a cluster that the route is supposed to point to doesn't exist, the route can fail to route to any of its backends. This case can be triggered if you have a router -> resolver where the resolver has backends on different peers/wan federated backends, and you add a route to a backend that doesn't exist. The non-existent backend causes the existing backends to fail. I was not able to trigger this case in a single cluster setup, but with a peered backend it can be triggered.

Because, the traffic doesn't just blackhole, but rather returns a 503, this actually seems to be the desired behavior, rather than making all other routing paths within that route fail due to a missing cluster. This is similar to the conclusion that was reached within the Jira ticket.

This PR removes the code that overrides the default value of this validate_clusters option.

Testing & Reproduction steps

Links

PR Checklist

  • updated test coverage
  • external facing docs updated
  • appropriate backport labels added
  • not a security concern

Overview of commits
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant