Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Experimental] KReference.Group to avoid being explicit on the APIVersion #5086

Open
4 of 11 tasks
slinkydeveloper opened this issue Mar 17, 2021 · 33 comments · Fixed by #5520
Open
4 of 11 tasks

[Experimental] KReference.Group to avoid being explicit on the APIVersion #5086

slinkydeveloper opened this issue Mar 17, 2021 · 33 comments · Fixed by #5520
Labels
kind/bug Categorizes issue or PR as related to a bug. roadmap Issues for linking from the roadmap triage/accepted Issues which should be fixed (post-triage)

Comments

@slinkydeveloper
Copy link
Contributor

slinkydeveloper commented Mar 17, 2021

Description
I propose to add a new field to KReference, called Group. For example, in order to refer to a KafkaChannel, more than using this one:

Spec:
  Channel:
    API Version:  messaging.knative.dev/v1alpha1
    Kind:         KafkaChannel
    Name:         testchannel

The user will be able to just use this:

Spec:
  Channel:
    Group:  messaging.knative.dev
    Kind:         KafkaChannel
    Name:         testchannel

This approach brings the following benefits:

When the api version is needed, an algorithm can just retrieve the latest for the provided api group and kind and then query the api server using such version.

Background
This came out while working on the KafkaChannel v1alpha1 removal: knative-extensions/eventing-kafka#426 (comment). This upgrade ended up being quite painful and we even got it wrong with brokers knative-extensions/eventing-kafka#624.

Not tieing to the api version seems to me logically more correct, because the api version is a detail of the resource. With the reference system we have today, every time we need to remove an old api, we'll have to ask users to accept some downtime to fix the api version manually, or we need to end up with hacks like https://github.com/knative/eventing/pull/5085/files#diff-74513e8fb938944cdb1b786c568c843a282eecd46ba5440a4374763245e2ffeaR30. Both are in my opinion unacceptable.

Exit Criteria
Users are now able to use references without explicitly define the api version, but just using the api group.

Experimental flag name: kreference-group

Experimental feature stages plan

Below the proposed plan for the feature stages (this list implicitly includes the requirements defined in the process)

Affected WG

  • Eventing WG

Prior discussions

@slinkydeveloper slinkydeveloper added the kind/bug Categorizes issue or PR as related to a bug. label Mar 17, 2021
@matzew
Copy link
Member

matzew commented Mar 18, 2021

Not tieing to the api version also seems more logically correct, because the api version is a detail of the resource.

it sounds like a limitation in the first place... but I am not sure if ignoring the version is a good idea... b/c they shapes could be completely different (e.g. v1alpha1 versus v2)

/cc @vaikas

@slinkydeveloper
Copy link
Contributor Author

b/c they shapes could be completely different (e.g. v1alpha1 versus v2)

But the shape ain't nothing to do with the fact that I'm referring to A or B or C, right?

@vaikas
Copy link
Contributor

vaikas commented Mar 18, 2021

Yeah, the shape is different so you need to know which version of the resource you're getting so you can patch the right resource. Also, knowing the version is important to set up the informers, etc. otherwise you're going to have racy time.

@slinkydeveloper
Copy link
Contributor Author

Yeah, the shape is different so you need to know which version of the resource you're getting so you can patch the right resource.

Can't we discover that while fetching the resource? And isn't the storage version one anyway for every CRD?

@vaikas
Copy link
Contributor

vaikas commented Mar 22, 2021

Yes, that's why the label on the resource indicated which duck it conforms to. I don't quite understand what the storage has to do with it, but yes, there can only be one storage version for a given CRD.

@slinkydeveloper
Copy link
Contributor Author

slinkydeveloper commented Mar 22, 2021

I don't quite understand what the storage has to do with it, but yes, there can only be one storage version for a given CRD.

And that's exactly my point. When the user gives us:

Spec:
  Channel:
    API Version:  messaging.knative.dev
    Kind:         KafkaChannel
    Name:         testchannel

The duck controller can discover from kube what is the storage version of messaging.knative.dev KafkaChannel, and then the controller can just query the stored versions to perform the changes on the resource instance. It doesn't need to know the "original version" of the crd, it won't make any difference to it.

E.g. in this case, the controller finds out v1beta1 is the stored version, so it will just use messaging.knative.dev/v1beta1 KafkaChannel to modify the subscribers list

@grantr
Copy link
Contributor

grantr commented Mar 22, 2021

This is going to be a tough one to fix since the reference shape is foundational to a lot of our apis.

@grantr grantr added the priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. label Mar 22, 2021
@slinkydeveloper
Copy link
Contributor Author

Please check out the #5131

@aliok
Copy link
Member

aliok commented Mar 23, 2021

#5131 looks good to me.

The PR does something only if the version is missing the in the subscription ref.
That is backward compatible.

I don't think this would require us to change any other code to adapt the code here.

@slinkydeveloper
Copy link
Contributor Author

BTW, how hard would it be to use latest served version instead of storage version?
There can be multiple served versions and they're strings. Some semver magic?

I honestly don't wanna dig in the semver magic 😄 I also think is more "correct" to use the storage version because:

  • you don't to pay unnecessary roundtrips through version conversions
  • the storage version is the ultimate source of truth for every controller operating on that resource

@n3wscott
Copy link
Contributor

An apiVersion without version is not an apiVersion.

#5131 is breaking the contract we have with the resources. The subscription should just go ready false and an operator should fix it. No need for magic here.

@matzew
Copy link
Member

matzew commented Mar 24, 2021

Can you share details on what you mean with "an operator should fix it" ?

@slinkydeveloper
Copy link
Contributor Author

Looping in eventing wg leads @lionelvillard @grantr @vaikas @devguyio how should we move this one forward? Is this a good candidate as experimental feature (since it's a behavioural change of a field?)

@omerbensaadon
Copy link

FWIW, this is cleaner UX

@grantr
Copy link
Contributor

grantr commented Apr 21, 2021

My concerns with this feature are technical. The k8s API requires a version identifier to access a resource, so we have to deduce what the version is, and there may be situations in which the version cannot be deduced or the deduction is incorrect. If the technical difficulties can be worked out, then I agree this change would result in an improved UX.

The current behavior must always be supported.

IMO allowing the APIVersion field to omit the version is going to cause issues because it's harder to know what the user's intent is, and APIVersion everywhere else in the k8s API means "Group + Version". Instead, I'd prefer to add an APIGroup field to the reference type (though in Go, this should probably be implemented as a separate type rather than modifying the existing KRef type) because that's a clearer statement of what the field is expected to contain, and a clearer statement about the user's intent. I could be convinced I'm wrong based on actual prototypes and usage, so this is a recommendation, not a mandate.

Practically, I'm concerned that adding this fundamental experiment to eventing core now will disrupt the 1.0 spec and conformance work, which for now is a Knative-wide priority. I suggest working out the technical details and UX in an experimental feature of an alternate broker implementation rather than eventing core itself.

@vaikas
Copy link
Contributor

vaikas commented Apr 21, 2021

@slinkydeveloper Thanks for nudging us and making sure this gets eyeballs, and our apologies for taking so long to find the time to make sure we bring our concerns out in a researched manner.

TL;DR:

  • let's not overload the semantic meaning of APIVersion
  • let's add Group into KReference
  • Use the logic of peeking at the CRD for stored version to determine what version to use

@n3wscott and I just spent a bunch of time going through the deep bowels of the codes and the helpers and what various parsers that k8s ecosystem (including dynamic clients expect, etc. etc. etc.).

Just to make sure we're all on the same page and we have concrete examples, bear with this lengthy note and revisiting some basic concepts :)

APIVersion has a very specific and well understood meaning (just like @grantr points out above) but unfortunately also has legacy issues (for example) Service looks like this in ObjectReference:
{Kind:Service Namespace:test-embssksj Name:t0-fucnyczu UID: APIVersion:v1 ResourceVersion: FieldPath:}

However Broker looks like this:
{Kind:Broker Namespace:test-embssksj Name:routing-test-6 UID: APIVersion:eventing.knative.dev/v1 ResourceVersion: FieldPath:}

And Deployment looks like this:
{Kind:Deployment Namespace:test-embssksj Name:t0-fucnyczu UID: APIVersion:apps/v1 ResourceVersion: FieldPath:}

So, given these, and if for example you use a standard way to parse these (like this one below) this would then mean that we'd have to special case around the special case due to k8s 'core' special handling.
https://github.com/kubernetes/apimachinery/blob/57f2a0733447cfd41294477d833cce6580faaca3/pkg/runtime/schema/group_version.go#L209-L227

func ParseGroupVersion(gv string) (GroupVersion, error) {
	// this can be the internal version for the legacy kube types
	// TODO once we've cleared the last uses as strings, this special case should be removed.
	if (len(gv) == 0) || (gv == "/") {
		return GroupVersion{}, nil
	}
	switch strings.Count(gv, "/") {
	case 0:
		return GroupVersion{"", gv}, nil
	case 1:
		i := strings.Index(gv, "/")
		return GroupVersion{gv[:i], gv[i+1:]}, nil
	default:
		return GroupVersion{}, fmt.Errorf("unexpected GroupVersion string: %v", gv)
	}
}

(I'm sure the TODO will be removed any day now ;) )

Also as part of the learnings with duck types and their usage in anger has lead to.
https://github.com/knative-sandbox/discovery/blob/main/pkg/apis/discovery/v1alpha1/clusterducktype_types.go#L155-L180

We (and sounds like @grantr) feel the best way forward is probably to not overload the semantic meaning of APIVersion, but to add Group (which is again a k8s standard concept) field into the KReference. We might need to special case the validation of KReference for resources whose controllers do this lookup.

This still leaves the problem of how do we get the Version, and we can use the logic in this PR to do that for the time being and see how that goes. But in the future it would be better to utilize some of the Discovery work for better ducks support and without us redefining the well understood k8s fields / names / semantics. The whole goal of the Discovery work is to solve the mapping of duck version to versions of that resource implementing that duck in a more generic as well as more discoverable manner overall.

And as @aliok points out we do have the annotation to tell us the duck shape once it's fetched but the discovery would allow us to not rely on the annotations, but will work fine until discovery lands in all it's fullness.

@slinkydeveloper
Copy link
Contributor Author

slinkydeveloper commented Apr 22, 2021

The k8s API requires a version identifier to access a resource, so we have to deduce what the version is, and there may be situations in which the version cannot be deduced or the deduction is incorrect.
This still leaves the problem of how do we get the Version, and we can use the logic in this PR to do that for the time being and see how that goes. But in the future it would be better to utilize some of the Discovery work for better ducks support and without us redefining the well understood k8s fields / names / semantics. The whole goal of the Discovery work is to solve the mapping of duck version to versions of that resource implementing that duck in a more generic as well as more discoverable manner overall.

I personally don't see how there could be any difficulties in that sense, because whatever version the user pushed, the version living in the server is only one. And as you pointed out, we have the annotation to get the duck type version, hence from my understanding this seems to me an orthogonal problem, not related to this particular issue, but due the fact that k8s allows storing a CR only with a specific version. At the end of the day, this issue is exploiting this particular aspect of CRDs design.

But I'm happy to be proven wrong and I'm open to analyze situations where this version resolution breaks.

The current behavior must always be supported.

+1

IMO allowing the APIVersion field to omit the version is going to cause issues because it's harder to know what the user's intent is, and APIVersion everywhere else in the k8s API means "Group + Version". Instead, I'd prefer to add an APIGroup field to the reference type (though in Go, this should probably be implemented as a separate type rather than modifying the existing KRef type) because that's a clearer statement of what the field is expected to contain, and a clearer statement about the user's intent.
We (and sounds like @grantr) feel the best way forward is probably to not overload the semantic meaning of APIVersion, but to add Group (which is again a k8s standard concept) field into the KReference. We might need to special case the validation of KReference for resources whose controllers do this lookup.

Agree, making sure the user expectation is correct is definitely a good point. I like the idea of having a specific field for it.

I suggest working out the technical details and UX in an experimental feature of an alternate broker implementation rather than eventing core itself.

This is not possible without modifying the core apis, because brokers/channel implementations relies on core apis anyway. In some cases, vendors relies even on core controllers, like the subscription controller. Even if we move it forward as an experimental feature, we need to modify the core anyway.

So, given these, and if for example you use a standard way to parse these (like this one below) this would then mean that we'd have to special case around the special case due to k8s 'core' special handling.

Yep in my PR I exactly copied that logic and adapted it 😄.

Proposal

Ok given this feedback, I propose one of the two following course of action:

  • We add a new field to KReference called APIGroup and we frame this field as experimental.
  • We follow the process of experimental features for inclusion, graduation, etc so we can find out if any problems arises down the road. I guess we can keep the feature in alpha (not enabled by default) for a while and give room to people to find out if there are any implementation problems.

OR:

  • We create a new struct exactly like KReference but with APIGroup too (feel free to bikeshed on the name, I don't mind 😄). The APIGroup field is flagged as experimental, the others are not. Wherever we want to experiment this feature (e.g. in the Destination), we replace the KReference with this new type.
  • We follow the process of experimental features for inclusion, graduation, etc so we can find out if any problems arises down the road. I guess we can keep the feature in alpha (not enabled by default) for a while and give room to people to find out if there are any implementation problems.

Personally I prefer the first approach, less cloned code and, even if the type is in pkg, it shouldn't create problems to the rest of knative, because without modifying the CRD schemas, the field is unusable anyway, so stable APIs won't be affected. Where we want to test it (we can start from subscriptions and triggers first), we just use the experimental feature field validation to enable it and we modify the CRD schema to preserve unknown fields

@n3wscott
Copy link
Contributor

Re: APIGroup please call it Group, this will match with the usage in CRDs.

@slinkydeveloper
Copy link
Contributor Author

Yet another issue with api version here: knative-extensions/eventing-kafka#624

@slinkydeveloper slinkydeveloper removed the priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. label May 18, 2021
@slinkydeveloper
Copy link
Contributor Author

slinkydeveloper commented May 18, 2021

I refactored the issue as experimental feature, including an implementation plan. Unless there are any particular objections, I'm going to move forward with the initial implementation and the inclusion process (check the issue description for more details).

@slinkydeveloper slinkydeveloper changed the title Why ref should have any knowledge about api version? [Experimental] KReference.Group to avoid being explicit on the APIVersion May 18, 2021
@slinkydeveloper
Copy link
Contributor Author

The implementation PRs for the inclusion are ready to review:

I'm working on the doc PR right now.

@slinkydeveloper
Copy link
Contributor Author

And here is the doc PR: knative/docs#3713

@vaikas
Copy link
Contributor

vaikas commented Jun 4, 2021

The one thing that's not in the 'todo list' is that now every component that potentially does the resolving will need their RBAC rules updated to include listing/watching of the CRDs so as not to always hit the api server. At least I think that will be necessary, but maybe I'm wrong.

@slinkydeveloper
Copy link
Contributor Author

slinkydeveloper commented Jun 4, 2021

The one thing that's not in the 'todo list' is that now every component that potentially does the resolving will need their RBAC rules updated to include listing/watching of the CRDs so as not to always hit the api server. At least I think that will be necessary, but maybe I'm wrong.

Yep, but it sounds like we already require it for other reasons https://github.com/knative/eventing/blame/main/config/core/roles/controller-clusterroles.yaml#L129, so no change there. Is that what you are referring to?

But yes in general, every controller executing KReference.ResolveGroup will require the get crd RBAC

@vaikas
Copy link
Contributor

vaikas commented Jun 4, 2021

Right, for Subscription not necessary, but once this rolls out, say for example something like Sources will need this if it's injected. Not sure if it's only necessary if we flip the flag or if it gets injected and will prevent the controller from starting up. Just jotted it down as I was thinking about it :) We'll see.

@slinkydeveloper
Copy link
Contributor Author

Ah I see what you mean... I'm not aware of any "dynamic" way for the controller to request RBAC at runtime (e.g. based on whether that flag is flipped or not), so the only viable solution I know is to request the RBAC for CRD anyway... Do you think this may be a problem?

@vaikas
Copy link
Contributor

vaikas commented Jun 4, 2021

I'm not sure it's necessarily a problem, but what I was wondering is that once there's something in pkg that uses the CRD lister, then any code that depends on it might need to be updated with those permissions. Again, I'm not sure if that's the case, but just jotted it down. Should be easy enough to test on something that doesn't have that permission, bring your branch in and see if anything breaks? :) 🤞 that it just works.

If that is required, then we just need to do that planning work ahead and explain why those permissions are necessary. I think for the medium term (I think probably the right approach longer term is the Discovery?) which then would require some different permissions anyways :)

@slinkydeveloper
Copy link
Contributor Author

but what I was wondering is that once there's something in pkg that uses the CRD lister, then any code that depends on it might need to be updated with those permissions.

I see what you mean, that shouldn't be a problem because, unless you use Kreference.ResolveGroup() directly, everything behaves as usual and validation won't need such permissions, so nothing should be touched by that.

Should be easy enough to test on something that doesn't have that permission, bring your branch in and see if anything breaks? :) crossed_fingers that it just works.

I can give it a spin with eventing-kafka-broker, which uses that kreference code (importing trigger spec) but doesn't require CRD RBAC

@slinkydeveloper
Copy link
Contributor Author

@vaikas
Copy link
Contributor

vaikas commented Jun 7, 2021

Ok cool! Thanks for checking :)

@slinkydeveloper
Copy link
Contributor Author

The feature it's been approved and merged for the inclusion!

@antoineco
Copy link
Contributor

/unassign slinkydeveloper

@devguyio devguyio added the roadmap Issues for linking from the roadmap label Dec 2, 2021
@github-actions
Copy link

github-actions bot commented Mar 3, 2022

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 3, 2022
@github-actions github-actions bot closed this as completed Apr 2, 2022
@pierDipi pierDipi reopened this Aug 10, 2022
@pierDipi pierDipi added the triage/accepted Issues which should be fixed (post-triage) label Aug 10, 2022
@github-actions github-actions bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. roadmap Issues for linking from the roadmap triage/accepted Issues which should be fixed (post-triage)
Projects
Status: Ready To Work
10 participants