Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tightening validation + adding basic validation tests #772

Merged
merged 1 commit into from
Aug 19, 2021

Conversation

robscott
Copy link
Member

What type of PR is this?
/kind cleanup

What this PR does / why we need it:
This adds regex validation and map keys throughout the v1alpha2 API, consolidates shared types, and adds some very simple validation tests. Much of this logic was pulled from upstream Kubernetes validation.

Which issue(s) this PR fixes:
Fixes #767

Does this PR introduce a user-facing change?:

Many string fields are now validated with regular expressions.

/cc @hbagdi @Miciah @bowei

@k8s-ci-robot k8s-ci-robot added the kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. label Aug 12, 2021
@k8s-ci-robot k8s-ci-robot requested a review from bowei August 12, 2021 16:35
@k8s-ci-robot
Copy link
Contributor

@robscott: GitHub didn't allow me to request PR reviews from the following users: Miciah.

Note that only kubernetes-sigs members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

What type of PR is this?
/kind cleanup

What this PR does / why we need it:
This adds regex validation and map keys throughout the v1alpha2 API, consolidates shared types, and adds some very simple validation tests. Much of this logic was pulled from upstream Kubernetes validation.

Which issue(s) this PR fixes:
Fixes #767

Does this PR introduce a user-facing change?:

Many string fields are now validated with regular expressions.

/cc @hbagdi @Miciah @bowei

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot requested a review from hbagdi August 12, 2021 16:35
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 12, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: robscott

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 12, 2021
// +kubebuilder:validation:MaxLength=63
type Namespace string

// SectionName is the name of a section in a Kubernetes resource. It must be a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should be restrictive. Its an extension point, section names in custom types may not have such strict validation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular suggestions for loosening this? This was just pulled from the upstream validation for an RFC 1123 subdomain that's used pretty widely throughout k8s. That seemed like a fairly generic starting point which we can always loosen later if necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Start small and tight, loosen when needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to starting with RFC1123 subdomain first, and seeing if we need to expand out from there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which specific section of rfc1123 is this definition from? can you put a href? I'm searching the rfc for the word subdomain and I can't seem to find it easily.

Isn't this just a RFC1123 domain name (not sub)?

Also, we should give examples in each of the comments. People are not going to be able to stare at the regex and figure out what the intent is. This applies to all of the really complex validations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added examples for most of these + references to where they originated from. Due to how the regexes are combined together in upstream source it can be difficult to find the true source of the logic, but it's fairly easy to get the regex used by just applying a resource with an invalid value. For example, here's how I figured out the validation for Group:

The CustomResourceDefinition "gateways.gateway.networking.k8s.io" is invalid:
* spec.group: Invalid value: "gateway.networ<king.k8s.io": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')

@@ -249,4 +238,41 @@ type RouteStatus struct {
//
// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=253
// +kubebuilder:validation:Pattern=`^(\*.)?[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This matches IPv4 but not IPv6. Spec declares IPs are not allowd at all though..

Copy link
Member Author

@robscott robscott Aug 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a regex that could allow all valid hostnames but still invalidate IPv4 addresses? For reference, this is pulled from the upstream Kubernetes wildcard subdomain validation (with the exception that the leading *. is optional here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is, because IPv4 addresses are still technically valid domain names as well, sadly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think trying to encode really complicated things into a regex will be problematic. It becomes quite hard to verify that a given regex does exactly what you want by looking at it.

I would comment that this validation may not remove all cases and controller will need to check.

hack/invalid-examples/httproute/invalid-path.yaml Outdated Show resolved Hide resolved
spec:
gatewayClassName: acme-lb
listeners:
- name: same
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is this validated?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wonders of +listMapKey=name

@robscott robscott force-pushed the regex-validation branch 2 times, most recently from 4b65d8a to dea0bc8 Compare August 12, 2021 23:02
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 12, 2021
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 12, 2021
Copy link
Contributor

@hbagdi hbagdi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really bad with regex so I didn't review them. I'm worse with regex than a person who knows nothing about them.
This PR looks good otherwise.
The deal with such PRs is that to review them you have to look at the part of the codebase that hasn't changed. Will improve iteratively if something is missed.

Thanks for pulling this in last minute.

//
// +kubebuilder:validation:MaxLength=253
// +kubebuilder:validation:Pattern=`^$|^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$`
type Group string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish this was defined in upstream and we could simply pull it from there. Some other day. No action required.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be very helpful. It is in a place that's easily importable for go clients, but I don't think there's a great way to translate that to CRD validation yet, would be very nice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a reference as to where this regex is using as the basis for tis definition?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a note for this in one of my favorite issues: #629

// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=253
Kind string `json:"kind"`
Kind Kind `json:"kind"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rant: Why can't github highlight the specific text that has changed in this line.

// +kubebuilder:validation:MaxLength=63
type Namespace string

// SectionName is the name of a section in a Kubernetes resource. It must be a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Start small and tight, loosen when needed.

apis/v1alpha2/httproute_types.go Outdated Show resolved Hide resolved
apis/v1alpha2/httproute_types.go Show resolved Hide resolved
//
// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=256
// +kubebuilder:validation:Pattern=`^[-A-Za-z0-9]+$`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this forbids pseudo headers and a bunch of other valid things like periods, slashes, percent signs, etc. rfc7230 3.2.6 describes the full list

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, are you able to understand from the RFC which characters are valid for a header name? It's hard for me to distinguish, since that specific section seems to be focused on the field value? If the token represents the name, it seems like the allowed set of chars would be a-z0-9 + this:

"!" / "#" / "$" / "%" / "&" / "'" / "*" / "+" / "-" /  "." / "^" / "_" / "`" / "|" / "~" 

Then to further complicate things, the spec seems to explicitly rule out : as a value, which seems like it would be an issue for pseudo headers...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, here's where the current regex is coming from: https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apimachinery/pkg/util/validation/validation.go#L388-L400. You're completely right that it would prevent pseudo headers though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all of those are valid.

The trick for pseudo header is that : is also the separator for the k/v - so in some cases it ends up being treated as "" as the key. Not sure if its standard though:

$ curl httpbin.org/get -H ':foo: bar'
{
  "args": {},
  "headers": {
    "": "foo: bar"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the RFC is hard to follow but if you look at https://datatracker.ietf.org/doc/html/rfc7230#section-3.2, a header key is a token, and a token is one of those you listed above. Looking at the golang source was easier than the RFC...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So to clarify, you're saying that we wouldn't need : to be a valid header name character?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to support only real headers - no :.

If we want to support pseudo headers though, then we do want it. So that would allow header match on :path, :scheme, etc.

I think either one is valid, but we should be deliberate. If all pseudoheaders are representable based on other first class fields, then its plausble we do not allow them

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's been at least some discussion around pseudo header matching being useful in some cases, but since I don't think we've reached any consensus on that, it seems safer to leave it out with the possibility of adding it later if necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allowing pseudoheader pros:

  • allows people to quickly try out some more advanced routing things, before the API supports it.
    Cons:
  • less incentive to have the API support it, because you can use pseudoheaders.

I think it would be better long-term to have the functionality explicitly defined rather than do the pseudoheader thing, but it's not a strong preference.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extended regex to include characters listed above, avoided pseudo headers for now as support will be much easier to add than remove in the future.

Copy link
Contributor

@youngnick youngnick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, I'm inclined to err on the side of not allowing functionality we might want to take away later (in favour of explicit config).

@@ -249,4 +238,41 @@ type RouteStatus struct {
//
// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=253
// +kubebuilder:validation:Pattern=`^(\*.)?[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is, because IPv4 addresses are still technically valid domain names as well, sadly.

// +kubebuilder:validation:MaxLength=63
type Namespace string

// SectionName is the name of a section in a Kubernetes resource. It must be a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to starting with RFC1123 subdomain first, and seeing if we need to expand out from there.

//
// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=256
// +kubebuilder:validation:Pattern=`^[-A-Za-z0-9]+$`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allowing pseudoheader pros:

  • allows people to quickly try out some more advanced routing things, before the API supports it.
    Cons:
  • less incentive to have the API support it, because you can use pseudoheaders.

I think it would be better long-term to have the functionality explicitly defined rather than do the pseudoheader thing, but it's not a strong preference.

@bowei
Copy link
Contributor

bowei commented Aug 16, 2021

LGTM -- but please put examples in each of the docstrings. the regexes are pretty complicated to parse to understand what a generally valid string looks like.

@robscott robscott force-pushed the regex-validation branch 3 times, most recently from 9691e7f to 138c3b4 Compare August 19, 2021 00:34
@robscott
Copy link
Member Author

Thanks for the great feedback on this @bowei, @hbagdi, @howardjohn, and @youngnick! I think I've addressed everything, PTAL.

@bowei
Copy link
Contributor

bowei commented Aug 19, 2021

A few minor comments

LGTM, ping me after changes, I will apply real LGTM

@@ -262,6 +262,7 @@ type HTTPPathMatch struct {
//
// +optional
// +kubebuilder:default="/"
// +kubebuilder:validation:MaxLength=1024
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apparently 2000 is the accepted folk limit around the web

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opinion:
Can you imagine anyone using a 2000 character path to do matching though?
If anything, I think we are already generous here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's likely safer to start with the lower limit and expand if necessary. At least that's been our practice so far.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe? There is data that 2000 is effectively the max which is well known. For the others, yes, we have less info so just have to pick a number. I'm not super tied to increase this -- just that we know that more about what the max is.

apis/v1alpha2/httproute_types.go Outdated Show resolved Hide resolved
@@ -282,6 +283,23 @@ const (
HeaderMatchImplementationSpecific HeaderMatchType = "ImplementationSpecific"
)

// HTTPHeaderName is the name of an HTTP header.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to say it's case-insensitive?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We say in the place where this type is used. I'm not sure what is the right place for it. The place of use seems more correct because that gets added to the CRD. I can argue either way though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's more appropriate to specify that it's case insensitive on the match field itself (current state). There's nothing on this type definition that can require that matching is case insensitive.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with anything as long as the generated documentation is clear.

apis/v1alpha2/httproute_types.go Outdated Show resolved Hide resolved
// +optional
// +kubebuilder:validation:MaxItems=16
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any harm to increase this to 30-50? I can see hitting 16 limit, probably not 30

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't really thought of a use case with that many matches, but I can increase to 32.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we're using 16 as a limit in a lot of places, including header matching. Unless there's a clear use case I'd be tempted to leave that as is to minimize the changes here.

// StatusCode is the HTTP status code to be used in response.
//
// Support: Core
//
// +optional
// +kubebuilder:default=302
// +kubebuilder:validation=301;302
// +kubebuilder:validation:Enum=301;302
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we should just include all the valid redirects from here:

https://developer.mozilla.org/en-US/docs/Web/HTTP/Status#redirection_messages

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that we should add this, but didn't want to expand the scope of this PR so added #793 to track this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a conformance concern for not adding them in the first place.

apis/v1alpha2/object_reference_types.go Outdated Show resolved Hide resolved
apis/v1alpha2/object_reference_types.go Show resolved Hide resolved
apis/v1alpha2/shared_types.go Outdated Show resolved Hide resolved
@@ -262,6 +262,7 @@ type HTTPPathMatch struct {
//
// +optional
// +kubebuilder:default="/"
// +kubebuilder:validation:MaxLength=1024
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opinion:
Can you imagine anyone using a 2000 character path to do matching though?
If anything, I think we are already generous here.

@@ -282,6 +283,23 @@ const (
HeaderMatchImplementationSpecific HeaderMatchType = "ImplementationSpecific"
)

// HTTPHeaderName is the name of an HTTP header.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We say in the place where this type is used. I'm not sure what is the right place for it. The place of use seems more correct because that gets added to the CRD. I can argue either way though.

apis/v1alpha2/httproute_types.go Outdated Show resolved Hide resolved
// StatusCode is the HTTP status code to be used in response.
//
// Support: Core
//
// +optional
// +kubebuilder:default=302
// +kubebuilder:validation=301;302
// +kubebuilder:validation:Enum=301;302
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a conformance concern for not adding them in the first place.

@@ -1219,6 +1289,7 @@ spec:
detached from the Gateway. \n Support: Core"
maxLength: 253
minLength: 1
pattern: ^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$
type: string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the largest CRD yet? We should register in that competition if one exists.

apis/v1alpha2/shared_types.go Outdated Show resolved Hide resolved
apis/v1alpha2/shared_types.go Outdated Show resolved Hide resolved
@robscott
Copy link
Member Author

Thanks for the last round of feedback @bowei and @hbagdi! I think I've responded to everything, PTAL.

@bowei
Copy link
Contributor

bowei commented Aug 19, 2021

/hold
/lgtm for me

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 19, 2021
@bowei
Copy link
Contributor

bowei commented Aug 19, 2021

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 19, 2021
@hbagdi
Copy link
Contributor

hbagdi commented Aug 19, 2021

Great job Rob!
/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 19, 2021
@k8s-ci-robot k8s-ci-robot merged commit 09526c9 into kubernetes-sigs:master Aug 19, 2021
robscott added a commit to robscott/gateway-api that referenced this pull request Aug 20, 2021
@robscott robscott deleted the regex-validation branch January 8, 2022 01:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stricter regex-based validation
6 participants