[WIP] Add API fields for AWS OIDC Provider #9271

rifelpet · 2020-06-04T13:25:53Z

This PR allows Kops to create the IAM OIDC Provider resource, and discovery.json and keys.json files necessary to setup IAM Roles for Service Accounts. As discussed in the linked issue, initial implementation will create the json files and require that the user copies them to their chosen url (specifics don't matter: could be a public s3 bucket, static site, etc.)

The rough workflow will be:

User chooses a url to host the json files and determines the CA Thumbprint for that url
Add thumbprint and url to cluster spec
Run kops update cluster --yes
Copy resulting json files from state store to their url
Update any IAM roles' assume role policies to allow assumption via the new OIDC Provider

TODO for this PR:

Create keys.json
Pull the service-account-signer keypair down on masters and pass via flags to kube-controller-manager
Fix integration test (it runs both terraform and cloudformation but this feature isnt supported by cloudformation)
Add docs
Figure out how we can have the Lister and Deleter work in pkg/resources/aws - these are used during cluster tear down. We're only given a cluster name (not the full cluster spec) and oidc providers don't support naming nor tagging, so theres no way for us to attribute an oidc provider back to a cluster without knowing an issuer url or thumbprint.

Roadmap for future PRs:

Add the admission webhook to bootstrapchannelbuilder.go - ~~Currently there is no public docker image of the webhook.~~ EDIT: images are now available here. There may also be hurdles with the webhook's CA bundle - This would be the first occurrence of a webhook in bootstrapchannelbuilder.go and we have to figure out how to generate the CA bundle while also making it available to boostrapchannelbuilder.go which I think means we can't use fitasks.Keypair. Alternatively we use a CSR and have kops-controller approve it.
Determine a reasonable way to publicly host the json files. This would simplify the workflow by not having the user copy them to a url themselves and also not having to determine the CA thumbprint.

ref: #8264

k8s-ci-robot · 2020-06-04T13:26:09Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rifelpet

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [rifelpet]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

justinsb · 2020-06-05T02:11:47Z

pkg/model/awsmodel/oidc_provider.go

+	}
+	c.AddTask(discoveryFile)
+
+	// TODO create keys.json from https://github.com/aws/amazon-eks-pod-identity-webhook/blob/master/hack/self-hosted/main.go


The other thing we can do here is to get them from the apiserver. The big advantage of this is that I understand we're (eventually) planning on rotating the keys in apiserver, and it would be good to coordinate with that!

justinsb · 2020-06-05T02:12:47Z

pkg/resources/aws/aws.go

+			if err != nil {
+				return nil, fmt.Errorf("error getting IAM OIDC Provider: %v", err)
+			}
+			// TODO: only delete oidc providers if they're owned by the cluster.


If the issuer url was 1:1 with the cluster though, I think we could be reasonably sure?

the issuer url is 1:1 with the cluster, but we only have access to the cluster name here, and the initial plan was to require the user host the json files themselves at the url, so the url may be totally unrelated to the cluster name even if they are 1:1.

I was imagining a common pattern would be to use an externally-managed public s3 bucket, making the issuer url something like https://s3.amazonaws.com/peters-oidc-provider-bucket/cluster-foobar/. It's unique to the cluster but if we only have the cluster name i dont think we can guarantee that we've identified a provider to delete unless we document additional requirements like "the url's path must contain the full cluster name".

I proposed some alternative implementations we could explore down the road in #8264. I'm aware of kubernetes/enhancements#1393 and i think it would be good to integrate, perhaps in this case kops could setup an oidc dns record within the cluster's hosted zone that points to a dedicated load balancer targeting the new apiserver endpoints. If we can use that as the issuer url, it not only lets us identify the issuer purely from the cluster name but also lets us compute both the issuer url and the CA thumbprint ourselves so the user doesn't need to provide either. This setup would require an internet-accessible load balancer and the cluster using a public hosted zone, but I think thats a reasonable requirement. Is this what you had in mind?

justinsb · 2020-06-05T02:13:01Z

pkg/resources/aws/aws.go

+				return nil, fmt.Errorf("error getting IAM OIDC Provider: %v", err)
+			}
+			// TODO: only delete oidc providers if they're owned by the cluster.
+			// We need to figure out how we can determine that given only a cluster name.


If the issuer url was 1:1 with the cluster though, I think we could be reasonably sure?

I think you have to match on the URL, possibly in combination with the ClientIDs.

johngmyers

A kops get subcommand to get the documents out of the state store would be helpful.

johngmyers · 2020-06-05T04:55:22Z

pkg/apis/kops/cluster.go

@@ -191,6 +191,8 @@ type ClusterSpec struct {
 	SysctlParameters []string `json:"sysctlParameters,omitempty"`
 	// RollingUpdate defines the default rolling-update settings for instance groups
 	RollingUpdate *RollingUpdate `json:"rollingUpdate,omitempty"`
+	// ServiceOIDCProvider defines the OIDC provider setup for the cluster (AWS only)
+	ServiceOIDCProvider *ServiceOIDCProviderSpec `json:"serviceOIDCProvider,omitempty"`


Perhaps put this inside CloudConfiguration? I'm wary of adding new options that are specific to a cloud provider into the base Cluster struct.

I'm thinking for the v1beta1 API each cloud provider would have a different struct, like the CNI providers do. The flip side is that a second cloud provider could have the concept of a service OIDC provider, but one would hope it wouldn't have such an annoyingly difficult configuration procedure as AWS does.

Yeah - naming for this is going to be tough.

It's not technically specific to AWS; if we publish the JWKS doc from e.g. an on-metal cluster we should be able to authenticate to AWS.

Similarly GCE has some support for this now also, in the form of GKE workload identity, so it's not just to AWS.

It's actually even broader than this; we can also authenticate service accounts across clusters etc.

The word that keeps sticking in my mind is "discovery", and this is specifically "identity discovery". I think it's also possible to do cross-cluster discovery using that same bucket (and those files wouldn't necessarily have to be public, either)

Maybe something like

discovery: store: s3://some-other-bucket/clusters identity: true

I think technically we could even reuse the same S3 bucket, I can't imagine it's worth it from an isolation point of view though :-)

I don't know how things other than AWS consume OIDC, but I would hope the IssuerCAThumbprints field wouldn't be copied by other cloud providers. It would be a lot more sane if one could just configure AWS with the identity document directly, rather than faffing about with serving it on the public Internet. If AWS ever changes the CA they use for their S3 buckets, everyone's IRSA will likely break immediately.

It's a good point on the rotations - I presume this means that S3 effectively can't change CAs quickly, though I don't know what happens in 2025 when the current cert expires, presumably some pre-publication. (I think the cert fingerprint includes the expiration date?)

@rifelpet and I chatted on Friday, and we had an idea about an approach that might be a nice way to get this merged: we define a feature flag (or two) that effectively does all this automagically in a (very) limited set of circumstances - currently you must use a public ELB, with a real DNS name, likely others we don't even know about. The featureflag enables the apiserver with anonymous-access turned on (which normally we don't like to do in kops, particularly for public ELBs), we also grant the anonymous-access user permission to read the JWKS documents directly.

We should need to find a way to not encourage this as a real scenario (e.g. I don't want to preclude private ELBs). But it means we don't need an API surface in this PR and we don't need kops-controller to publish to an S3 bucket in the one I added. That PR is #9352

We can then discuss e.g. whether to have a public bucket, how to name the clusters, what the API should be for enabling AWS integration in a more lesiurely way. But we don't block working on e.g. per-pod roles or figuring out different ways to publish the JWKS doc (e.g. publish from kops CLI vs publish from kops-controller).

This also means that upstream (k/k) can start to see some real test coverage of the feature, and get it to GA.

pkg/apis/kops/cluster.go

pkg/apis/kops/validation/validation.go

pkg/model/awsmodel/oidc_provider.go

pkg/resources/aws/aws.go

johngmyers · 2020-06-05T05:45:02Z

pkg/resources/aws/aws.go

+				return nil, fmt.Errorf("error getting IAM OIDC Provider: %v", err)
+			}
+			// TODO: only delete oidc providers if they're owned by the cluster.
+			// We need to figure out how we can determine that given only a cluster name.


I think you have to match on the URL, possibly in combination with the ClientIDs.

pkg/apis/kops/validation/validation.go

justinsb · 2020-06-11T22:21:23Z

FYI I just sent #9341 which extends this PR, adding the idea of automatically publishing the JWKS document to a bucket. (I expect it to be a different bucket, because the JWKS documents themselves need to be publicly readable)

This has clarified things a bit in my mind; there really are two things going on here:

Publishing the JWKS document publicly, which enables consumers to trust serviceaccounts
Registering the cluster/jwks with consumers, like AWS, so that they recognize k8s service accounts.

I do think we want to do both, but it can make sense to do the second even when not running on AWS, or to register more sources (GCP, Azure, maybe other services in future like github?)

In #9341 I defaulted the AWS registration, BTW (when a feature flag UsePodIAM is passed). We might therefore be able to get away without any API at all in this PR (though we do want it in the end, we might be able to separate the discussions).

johngmyers · 2020-06-13T19:37:20Z

I think creating the JWKS document and publishing it publicly are separable. While publishing to an S3 bucket is convenient (and, incidentally, is what we do) there are other possible mechanisms.

k8s-ci-robot · 2020-12-14T11:02:33Z

@rifelpet: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot · 2021-01-11T11:53:36Z

@rifelpet: The following tests failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
pull-kops-bazel-test	`c754e40`	link	`/test pull-kops-bazel-test`
pull-kops-verify-hashes	`c754e40`	link	`/test pull-kops-verify-hashes`
pull-kops-e2e-k8s-containerd	`c754e40`	link	`/test pull-kops-e2e-k8s-containerd`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

rifelpet · 2021-02-07T20:33:46Z

to be replaced by #10756

k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 4, 2020

k8s-ci-robot requested review from johngmyers and joshbranham June 4, 2020 13:26

k8s-ci-robot added area/api area/provider/aws Issues or PRs related to aws provider approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jun 4, 2020

rifelpet force-pushed the irsa branch 2 times, most recently from f865161 to e99d89b Compare June 4, 2020 13:35

justinsb reviewed Jun 5, 2020

View reviewed changes

justinsb mentioned this pull request Jun 5, 2020

Ensure we have IAM bucket permissions to other S3 buckets #9274

Merged

johngmyers reviewed Jun 5, 2020

View reviewed changes

johngmyers requested changes Jun 6, 2020

View reviewed changes

pkg/apis/kops/validation/validation.go Show resolved Hide resolved

k8s-ci-robot added area/documentation area/nodeup size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 7, 2020

rifelpet force-pushed the irsa branch from 580c578 to 0bca011 Compare June 7, 2020 23:11

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jun 7, 2020

mtaufen mentioned this pull request Jun 8, 2020

Provide OIDC discovery for service account token issuer kubernetes/enhancements#1393

Closed

12 tasks

justinsb mentioned this pull request Jun 11, 2020

WIP: Autopublish IAM Roles for Service Accounts JWKS doc #9341

Closed

This was referenced Jun 15, 2020

Simplified form of IAM Roles for ServiceAccounts #9352

Merged

Add support for AWS OIDC Provider #9375

Merged

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 17, 2020

rifelpet force-pushed the irsa branch from 0bca011 to e11423b Compare June 17, 2020 04:21

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 17, 2020

rifelpet changed the title ~~[WIP] Add initial support for IAM Roles for Service Accounts~~ [WIP] Add API fields for AWS OIDC Provider Jun 17, 2020

rifelpet force-pushed the irsa branch from e11423b to ece0047 Compare June 17, 2020 04:28

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 28, 2020

rifelpet force-pushed the irsa branch from ece0047 to aedb335 Compare October 24, 2020 13:03

Add new api fields for service oidc provider

c754e40

rifelpet force-pushed the irsa branch from aedb335 to c754e40 Compare November 24, 2020 04:07

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 24, 2020

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 14, 2020

olemarkus mentioned this pull request Feb 7, 2021

Instance roles for service accounts (IRSA) contd #10756

Merged

rifelpet closed this Feb 7, 2021

rifelpet deleted the irsa branch May 5, 2021 13:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Add API fields for AWS OIDC Provider #9271

[WIP] Add API fields for AWS OIDC Provider #9271

rifelpet commented Jun 4, 2020 •

edited

Loading

k8s-ci-robot commented Jun 4, 2020

justinsb Jun 5, 2020

justinsb Jun 5, 2020

rifelpet Jun 5, 2020

justinsb Jun 5, 2020

johngmyers Jun 5, 2020

johngmyers left a comment

johngmyers Jun 5, 2020

justinsb Jun 11, 2020

johngmyers Jun 12, 2020

justinsb Jun 13, 2020

johngmyers Jun 5, 2020

justinsb commented Jun 11, 2020

johngmyers commented Jun 13, 2020

k8s-ci-robot commented Dec 14, 2020

k8s-ci-robot commented Jan 11, 2021

rifelpet commented Feb 7, 2021

[WIP] Add API fields for AWS OIDC Provider #9271

[WIP] Add API fields for AWS OIDC Provider #9271

Conversation

rifelpet commented Jun 4, 2020 • edited Loading

k8s-ci-robot commented Jun 4, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johngmyers left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

justinsb commented Jun 11, 2020

johngmyers commented Jun 13, 2020

k8s-ci-robot commented Dec 14, 2020

k8s-ci-robot commented Jan 11, 2021

rifelpet commented Feb 7, 2021

rifelpet commented Jun 4, 2020 •

edited

Loading