-
Notifications
You must be signed in to change notification settings - Fork 503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tightening validation + adding basic validation tests #772
Tightening validation + adding basic validation tests #772
Conversation
@robscott: GitHub didn't allow me to request PR reviews from the following users: Miciah. Note that only kubernetes-sigs members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: robscott The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
61654fb
to
43c8791
Compare
// +kubebuilder:validation:MaxLength=63 | ||
type Namespace string | ||
|
||
// SectionName is the name of a section in a Kubernetes resource. It must be a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this should be restrictive. Its an extension point, section names in custom types may not have such strict validation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular suggestions for loosening this? This was just pulled from the upstream validation for an RFC 1123 subdomain that's used pretty widely throughout k8s. That seemed like a fairly generic starting point which we can always loosen later if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Start small and tight, loosen when needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to starting with RFC1123 subdomain first, and seeing if we need to expand out from there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which specific section of rfc1123 is this definition from? can you put a href? I'm searching the rfc for the word subdomain and I can't seem to find it easily.
Isn't this just a RFC1123 domain name (not sub)?
Also, we should give examples in each of the comments. People are not going to be able to stare at the regex and figure out what the intent is. This applies to all of the really complex validations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added examples for most of these + references to where they originated from. Due to how the regexes are combined together in upstream source it can be difficult to find the true source of the logic, but it's fairly easy to get the regex used by just applying a resource with an invalid value. For example, here's how I figured out the validation for Group:
The CustomResourceDefinition "gateways.gateway.networking.k8s.io" is invalid:
* spec.group: Invalid value: "gateway.networ<king.k8s.io": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')
apis/v1alpha2/shared_types.go
Outdated
@@ -249,4 +238,41 @@ type RouteStatus struct { | |||
// | |||
// +kubebuilder:validation:MinLength=1 | |||
// +kubebuilder:validation:MaxLength=253 | |||
// +kubebuilder:validation:Pattern=`^(\*.)?[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This matches IPv4 but not IPv6. Spec declares IPs are not allowd at all though..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a regex that could allow all valid hostnames but still invalidate IPv4 addresses? For reference, this is pulled from the upstream Kubernetes wildcard subdomain validation (with the exception that the leading *.
is optional here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there is, because IPv4 addresses are still technically valid domain names as well, sadly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think trying to encode really complicated things into a regex will be problematic. It becomes quite hard to verify that a given regex does exactly what you want by looking at it.
I would comment that this validation may not remove all cases and controller will need to check.
spec: | ||
gatewayClassName: acme-lb | ||
listeners: | ||
- name: same |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how is this validated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The wonders of +listMapKey=name
4b65d8a
to
dea0bc8
Compare
dea0bc8
to
4e3e1dd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really bad with regex so I didn't review them. I'm worse with regex than a person who knows nothing about them.
This PR looks good otherwise.
The deal with such PRs is that to review them you have to look at the part of the codebase that hasn't changed. Will improve iteratively if something is missed.
Thanks for pulling this in last minute.
// | ||
// +kubebuilder:validation:MaxLength=253 | ||
// +kubebuilder:validation:Pattern=`^$|^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$` | ||
type Group string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish this was defined in upstream and we could simply pull it from there. Some other day. No action required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be very helpful. It is in a place that's easily importable for go clients, but I don't think there's a great way to translate that to CRD validation yet, would be very nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a reference as to where this regex is using as the basis for tis definition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a note for this in one of my favorite issues: #629
// +kubebuilder:validation:MinLength=1 | ||
// +kubebuilder:validation:MaxLength=253 | ||
Kind string `json:"kind"` | ||
Kind Kind `json:"kind"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rant: Why can't github highlight the specific text that has changed in this line.
// +kubebuilder:validation:MaxLength=63 | ||
type Namespace string | ||
|
||
// SectionName is the name of a section in a Kubernetes resource. It must be a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Start small and tight, loosen when needed.
4e3e1dd
to
bb1e32d
Compare
apis/v1alpha2/httproute_types.go
Outdated
// | ||
// +kubebuilder:validation:MinLength=1 | ||
// +kubebuilder:validation:MaxLength=256 | ||
// +kubebuilder:validation:Pattern=`^[-A-Za-z0-9]+$` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this forbids pseudo headers and a bunch of other valid things like periods, slashes, percent signs, etc. rfc7230 3.2.6 describes the full list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, are you able to understand from the RFC which characters are valid for a header name? It's hard for me to distinguish, since that specific section seems to be focused on the field value? If the token
represents the name, it seems like the allowed set of chars would be a-z0-9 + this:
"!" / "#" / "$" / "%" / "&" / "'" / "*" / "+" / "-" / "." / "^" / "_" / "`" / "|" / "~"
Then to further complicate things, the spec seems to explicitly rule out :
as a value, which seems like it would be an issue for pseudo headers...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reference, here's where the current regex is coming from: https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apimachinery/pkg/util/validation/validation.go#L388-L400. You're completely right that it would prevent pseudo headers though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all of those are valid.
The trick for pseudo header is that :
is also the separator for the k/v - so in some cases it ends up being treated as ""
as the key. Not sure if its standard though:
$ curl httpbin.org/get -H ':foo: bar'
{
"args": {},
"headers": {
"": "foo: bar"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the RFC is hard to follow but if you look at https://datatracker.ietf.org/doc/html/rfc7230#section-3.2, a header key is a token
, and a token is one of those you listed above. Looking at the golang source was easier than the RFC...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So to clarify, you're saying that we wouldn't need :
to be a valid header name character?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to support only real headers - no :
.
If we want to support pseudo headers though, then we do want it. So that would allow header match on :path
, :scheme
, etc.
I think either one is valid, but we should be deliberate. If all pseudoheaders are representable based on other first class fields, then its plausble we do not allow them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's been at least some discussion around pseudo header matching being useful in some cases, but since I don't think we've reached any consensus on that, it seems safer to leave it out with the possibility of adding it later if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allowing pseudoheader pros:
- allows people to quickly try out some more advanced routing things, before the API supports it.
Cons: - less incentive to have the API support it, because you can use pseudoheaders.
I think it would be better long-term to have the functionality explicitly defined rather than do the pseudoheader thing, but it's not a strong preference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extended regex to include characters listed above, avoided pseudo headers for now as support will be much easier to add than remove in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, I'm inclined to err on the side of not allowing functionality we might want to take away later (in favour of explicit config).
apis/v1alpha2/shared_types.go
Outdated
@@ -249,4 +238,41 @@ type RouteStatus struct { | |||
// | |||
// +kubebuilder:validation:MinLength=1 | |||
// +kubebuilder:validation:MaxLength=253 | |||
// +kubebuilder:validation:Pattern=`^(\*.)?[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there is, because IPv4 addresses are still technically valid domain names as well, sadly.
// +kubebuilder:validation:MaxLength=63 | ||
type Namespace string | ||
|
||
// SectionName is the name of a section in a Kubernetes resource. It must be a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to starting with RFC1123 subdomain first, and seeing if we need to expand out from there.
apis/v1alpha2/httproute_types.go
Outdated
// | ||
// +kubebuilder:validation:MinLength=1 | ||
// +kubebuilder:validation:MaxLength=256 | ||
// +kubebuilder:validation:Pattern=`^[-A-Za-z0-9]+$` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allowing pseudoheader pros:
- allows people to quickly try out some more advanced routing things, before the API supports it.
Cons: - less incentive to have the API support it, because you can use pseudoheaders.
I think it would be better long-term to have the functionality explicitly defined rather than do the pseudoheader thing, but it's not a strong preference.
bb1e32d
to
d5d946e
Compare
LGTM -- but please put examples in each of the docstrings. the regexes are pretty complicated to parse to understand what a generally valid string looks like. |
9691e7f
to
138c3b4
Compare
Thanks for the great feedback on this @bowei, @hbagdi, @howardjohn, and @youngnick! I think I've addressed everything, PTAL. |
138c3b4
to
b0634f6
Compare
A few minor comments LGTM, ping me after changes, I will apply real LGTM |
@@ -262,6 +262,7 @@ type HTTPPathMatch struct { | |||
// | |||
// +optional | |||
// +kubebuilder:default="/" | |||
// +kubebuilder:validation:MaxLength=1024 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apparently 2000 is the accepted folk limit around the web
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
opinion:
Can you imagine anyone using a 2000 character path to do matching though?
If anything, I think we are already generous here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's likely safer to start with the lower limit and expand if necessary. At least that's been our practice so far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe? There is data that 2000 is effectively the max which is well known. For the others, yes, we have less info so just have to pick a number. I'm not super tied to increase this -- just that we know that more about what the max is.
@@ -282,6 +283,23 @@ const ( | |||
HeaderMatchImplementationSpecific HeaderMatchType = "ImplementationSpecific" | |||
) | |||
|
|||
// HTTPHeaderName is the name of an HTTP header. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to say it's case-insensitive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We say in the place where this type is used. I'm not sure what is the right place for it. The place of use seems more correct because that gets added to the CRD. I can argue either way though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's more appropriate to specify that it's case insensitive on the match field itself (current state). There's nothing on this type definition that can require that matching is case insensitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with anything as long as the generated documentation is clear.
// +optional | ||
// +kubebuilder:validation:MaxItems=16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any harm to increase this to 30-50? I can see hitting 16 limit, probably not 30
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hadn't really thought of a use case with that many matches, but I can increase to 32.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually we're using 16 as a limit in a lot of places, including header matching. Unless there's a clear use case I'd be tempted to leave that as is to minimize the changes here.
// StatusCode is the HTTP status code to be used in response. | ||
// | ||
// Support: Core | ||
// | ||
// +optional | ||
// +kubebuilder:default=302 | ||
// +kubebuilder:validation=301;302 | ||
// +kubebuilder:validation:Enum=301;302 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like we should just include all the valid redirects from here:
https://developer.mozilla.org/en-US/docs/Web/HTTP/Status#redirection_messages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that we should add this, but didn't want to expand the scope of this PR so added #793 to track this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a conformance concern for not adding them in the first place.
@@ -262,6 +262,7 @@ type HTTPPathMatch struct { | |||
// | |||
// +optional | |||
// +kubebuilder:default="/" | |||
// +kubebuilder:validation:MaxLength=1024 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
opinion:
Can you imagine anyone using a 2000 character path to do matching though?
If anything, I think we are already generous here.
@@ -282,6 +283,23 @@ const ( | |||
HeaderMatchImplementationSpecific HeaderMatchType = "ImplementationSpecific" | |||
) | |||
|
|||
// HTTPHeaderName is the name of an HTTP header. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We say in the place where this type is used. I'm not sure what is the right place for it. The place of use seems more correct because that gets added to the CRD. I can argue either way though.
// StatusCode is the HTTP status code to be used in response. | ||
// | ||
// Support: Core | ||
// | ||
// +optional | ||
// +kubebuilder:default=302 | ||
// +kubebuilder:validation=301;302 | ||
// +kubebuilder:validation:Enum=301;302 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a conformance concern for not adding them in the first place.
@@ -1219,6 +1289,7 @@ spec: | |||
detached from the Gateway. \n Support: Core" | |||
maxLength: 253 | |||
minLength: 1 | |||
pattern: ^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$ | |||
type: string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the largest CRD yet? We should register in that competition if one exists.
b0634f6
to
7cd616f
Compare
/hold |
/lgtm |
Great job Rob! |
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
This adds regex validation and map keys throughout the v1alpha2 API, consolidates shared types, and adds some very simple validation tests. Much of this logic was pulled from upstream Kubernetes validation.
Which issue(s) this PR fixes:
Fixes #767
Does this PR introduce a user-facing change?:
/cc @hbagdi @Miciah @bowei