Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retool how validators register and are called: everything's a tag! #89

Merged
merged 72 commits into from
Feb 7, 2025

Conversation

thockin
Copy link
Collaborator

@thockin thockin commented Jan 4, 2025

Looking at how to not fail silently when tags are used wrong. I started adding more lint rules for things like "you can't use +optional on a type". It struck me that this is something that should be handled more first-class, rather than ex post facto. We will inevitably forget to add the lint rules for some tags.

We have the tag docs struct which includes a "contexts" list, which indicates when a tag is usable. I tried automating that check as a lint rule and it's not ideal as-is. It also highlighted that most of our tag docs are just WRONG (cargo culted) and that the context definitions we have are not really great.

Examples:

  • optional/required should only be on types
  • eachVal/eachKey claim they can be used on list values (they can't today)
  • listMapKey claims it can be used on list values (I don't think that makes sense?)
  • subfield claims it can be used on list values (I don't think it can)
  • unionDiscriminator/unionMember claim they can be used on list values (I don't think that makes sense?)

Lastly, each ExtractValidations() call doesn't have enough information to DTRT even if we fixed the above.

This PR started with expanding the contexts to distinguish struct-fields from list/map keys/values. It evolved into a different way of approaching tags.

Rather than each plugin's ExtractValidations() being responsible for checking that it is used in an appropriate context (and getting new args to indicate the extra information), I tried to do something a little more "scaffolding" centric. It changes the init sequence to register validators which include the tag string and contexts (basically the docs struct, but more fine-grained), but also having generic code look up the tags, ensure correct context, and then call a method to produce the specific FunctionGen objects.

Basically, centralize the tag-processing. Now generation will fail with errors like:

failed to generate validations: k8s.io/code-generator/cmd/validation-gen/output_tests/optional.T1: tag "k8s:optional" cannot be specified on type definitions

or

failed to generate validations: field k8s.io/code-generator/cmd/validation-gen/output_tests/options.T1.S2: tag "k8s:ifOptionDisabled": takes exactly 1 argument

...but the optional plugin didn't have to do the work. Some plugins still need to do work, like if they only apply to strings, but that seems OK to me.

This PR includes all my commits, even when later commits undo or change part of it, so readers can see the thought process. In particular, this centralized tag handling is insufficient for things like union which needs to run on the TYPE, but the tags are on FIELDs.

Now there are "tag validators" and "type validators" (which run AFTER child fields). Union has tag handlers that are used to validate context (e.g. "+union only applies to struct fields") and accumulate info, and a type handler which uses the accumulated info.

I wanted to move registry to a different package, but to avoid a circular dep, it has to become 3 packages, and it didn't seem worthwhile.

@thockin thockin changed the title Retool how validators register and are called DNM: Retool how validators register and are called Jan 4, 2025
@thockin thockin force-pushed the validation-gen_tag-descriptors branch from 5165747 to ca520e4 Compare January 6, 2025 01:02
@thockin
Copy link
Collaborator Author

thockin commented Jan 6, 2025

For giggles I added some WIP commits to make +eachVal a "regular" tag. To do that, I had to do listMapKey and listType, too.

This probably could have been simpler, and list-map could be treated "specially" (with info passed to validators), but I wanted to see if I could do it functional-style.

It sorta works. There are some issues around pointerness, still, and it's not as understandable as I'd like, but I think you can get the idea. Basically there's a family of list-oriented validators which work together to collect info about keys and then extract old values when we can.

The big remaining problem is that tags are discovered in sort-order, which means that +eachVal is run before +listMapKey. I see two ways to solve it:

  1. internally -- collect a list of funcs to call from the new FieldValidator, which runs after all tags.

  2. let tags declare a sort order for discover (they already do for execution). E.g. "schematic" vs "content" (words TBD).

If we like the approach, we can see how subfield would work, too.

@thockin thockin force-pushed the validation-gen_tag-descriptors branch from ca520e4 to 4e424a7 Compare January 6, 2025 17:19
@thockin
Copy link
Collaborator Author

thockin commented Jan 6, 2025

For more giggles, I made +k8s:subfield(ls)=+k8s:eachVal2=+k8s:format=ip-sloppy work - that's running on each value of a subfield of list type.

I can't yet reverse that, until I convert subfield to a regular tag.

@thockin thockin force-pushed the validation-gen_tag-descriptors branch 2 times, most recently from 2d63642 to 0cc2ffa Compare January 8, 2025 08:01
@thockin thockin force-pushed the validation-gen_tag-descriptors branch 17 times, most recently from 188a158 to cb73101 Compare January 13, 2025 18:29
@jpbetz
Copy link
Owner

jpbetz commented Jan 13, 2025

I've caught up on the commits. I'll put some comments in the code, but would like to start with some high level thoughts:

The improvements to how validators are registered and how context scopes are checked all looks great.

Separating out TypeValidators looks fine. Unions are the only tag we have (that I'm aware of) that needs to inspect multiple tags.

Re: Making each* normal tags- The idea of using our tag discovery in other generators (like openapi, defaulting, maybe even conversion) that we talked about last week is something I'd like to circle back on here. Currently, we have a nice separation between the discovery pass and the emit pass. But the childNode data structure that the passes share is specialized to validation. I like how making each* normal tags moves us in the direction of having less validation specific information in childNode. So I'm happy with this PR in the sense that it moves us in the right general direction.

I do wonder if we're doing to end up needing a third pass to make this all clean? Currently we have:

(gengo universe) -> discovery pass -> (childNode) -> validation emission pass -> (validation code)

Unless we add more passes and intermediate data structures, any validation specific code in childNode that we want to remove only has one places to go: the emission pass. Should we consider having additional passes? Maybe something more like:

(gengo universe) -> discovery pass -> (childNode) -> validation prep pass -> (validationChildNode) -> validation emission pass -> (validation code)

This would allow discovery pass to be fully generalized to use with other generators, while still giving us a data structure that we can use to fully represent the validation code that we'd like to emit.

I don't think we should hold this PR up on this idea. But this seems like a good place to discuss the longer term goals.

@thockin
Copy link
Collaborator Author

thockin commented Jan 13, 2025

ACK the sentiment! I think the intermediate that you are looking for is ultimately going to be the "iptables" version of this. Since discussion on Friday, I have been thinking a lot about it, and I think I see how to get there, but I want to focus on delivering SOMETHING before retooling it all again. I think this "regularity" of tags helps.

I have spent some time hacking on weekend, cleaning up the output tests for tags (found some bugs!!) and when I am done witht htat I will rip out all the now unused generator code, but in discovery and emit passes. I think the result will make it plain how we can proceed.

Cleanup and tests will be in subsequent commits.
Cleanup and tests will be in subsequent commits.
Cleanup and tests will be in subsequent commits.
Upon trying to clean up the now obsolete builti-in code, I realized I
had totally missed the non-tag iteration of lists (e.g. calling the
type-validation func for each item).  Fixing that was deeper than I
expected.

After this:
- the eachVal/eachKey validators expose a helper function for use by
  main generation code (they are special).
- discover() looks for the case of aliases to maps/slices and adds
  type-validation calls to "for each" the key/val types' validation funcs.
- discoverStruct() does the same for map/slice fields
- discoverAlias() is gone, it had been reduced to ~5 LOC and a comment.
- all built-in iteration logic is gone, in favor of the eachVal and
  eachKey tag logic
- safe.NewListMap is gone
- hasValidations() is now a global rather than only available to
  generation-phase (with hazard documented)
- the last built-in tag is gone
- Some edge cases we just don't support (because it's hard and we don't
  need them):
  - alias to pointer
  - pointer to alias of slide/map/pointer

I went through all the test differences and made sure they made sense.
In hindsight, these issues should have been obvious in earlier commits.

There's still some room for cleanup after this.
@thockin thockin force-pushed the validation-gen_tag-descriptors branch from 34faa1e to 45c00a6 Compare February 7, 2025 20:17
@jpbetz
Copy link
Owner

jpbetz commented Feb 7, 2025

LGTM

I'm all caught up. This is excellent progress. Let's iterate on the dev branch from here

@jpbetz jpbetz merged commit b1f7440 into jpbetz:validation-gen Feb 7, 2025
1 check failed
@caozhuozi
Copy link

Hi, sry to borther! I came across this PR and I'm interested in this topic. Is there any KEP that provides background information on it?

@yongruilin
Copy link
Collaborator

Hi, sry to borther! I came across this PR and I'm interested in this topic. Is there any KEP that provides background information on it?

Hi @caozhuozi , you can check on kubernetes/enhancements#5074 . Thanks for your interests.

@caozhuozi
Copy link

@yongruilin Thank you!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants