Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Azure AD Workload Identity as an identity pod #2487

Closed
JorTurFer opened this issue Jan 17, 2022 · 20 comments
Closed

Support Azure AD Workload Identity as an identity pod #2487

JorTurFer opened this issue Jan 17, 2022 · 20 comments
Labels
azure All issues concerning integration with Azure feature All issues for new features that have been committed to

Comments

@JorTurFer
Copy link
Member

JorTurFer commented Jan 17, 2022

Proposal

Azure AD Workload Identity is the next iteration of Azure AD Pod Identity that enables Kubernetes applications to access Azure cloud resources securely with Azure Active Directory based on annotated service accounts.
https://github.com/Azure/azure-workload-identity

Use-Case

This is the new iteration of AAD-Pod-Identity, so this will apply in any case where AAD-Pod-Identity applies now

Anything else?

It's possible that current implementation supports it, but if it's already support it we should update the docs

@JorTurFer JorTurFer added needs-discussion feature-request All issues for new features that have not been committed to labels Jan 17, 2022
@JorTurFer JorTurFer changed the title Support Azure AD Workload Identity as a identity pod Support Azure AD Workload Identity as an identity pod Jan 17, 2022
@tomkerkhove
Copy link
Member

We currently rely on pod identity and not AD workload identity, but we should support the new approach next to the current one indeed (same auth provider though, but config option + Helm support)

@tomkerkhove tomkerkhove added azure All issues concerning integration with Azure feature All issues for new features that have been committed to and removed feature-request All issues for new features that have not been committed to labels Jan 17, 2022
@tomkerkhove
Copy link
Member

@v-shenoy
Copy link
Contributor

v-shenoy commented Apr 12, 2022

I am willing to take this up. Have been reading a bit about workload identity, can try to understand what changes are required (if any) and come up with a design.

@tomkerkhove
Copy link
Member

That would be great, thanks for coming up with a design!

@v-shenoy
Copy link
Contributor

v-shenoy commented Apr 12, 2022

In the scalers currently we use NewMSIConfig() from the github.com/Azure/go-autorest/autorest/azure/auth package to get the config which is then used to get the authorizer. The authorizer is then used while making the API calls to Azure using the SDK / REST. Example.

Looking at the code here used in the quick-start guide provided by Azure AD Workload Identity we can use the MSAL package to fetch the AAD token and then use NewBearerAuthorizer() to get the authorizer. These changes will have to be done for each of the scalers.

Now, for allowing the user to specify that they want to use Azure AD Workload Identity I was thinking we add another type of pod identity provider as -

podIdentity:
    provider: azure-workload

As for helm support, the pods need a serviceAccountName in their spec so we will need a CLI flag for that which gets added to the keda-operator spec. Workload Identity also supports two annotations for the pods azure.workload.identity/service-account-token-expiration and azure.workload.identity/skip-containers (Optional with default being no service account being used.)

The first one as the name suggests is the duration in which the service account token expires. I feel that this should be up to the user, so we should have another CLI flag for this (Optional with a default value).

The second one is for containers within the pod being given the identity, to allow certain containers within the pod to be skipped from being given the projected service account token volume. As we only have one container within the keda-operator pod and the customer is explicitly configuring it to use workload identity, it doesn't make sense to have this.

To summarize the changes required per my understanding -

  1. Changes in the Azure scalers and key vault to use the new MSAL library to fetch the AAD token and get the authorizer.
  2. azure-workload as a new type of pod identity provider in TriggerAuthentication
  3. Support for CLI flags --podIdentity.azureWorkload.serviceAccountName and
    --podIdentity.azureWorkload.serviceAccountTokenExpiration when using helm

PS - Key vault currently does not support pod identity, but only active directory credentials. I was thinking I could implement support for both together (once we finalize the design, and I start the implementation).

So, what do you guys think? @tomkerkhove @JorTurFer @zroubalik

@v-shenoy
Copy link
Contributor

v-shenoy commented Apr 13, 2022

Actually, I just realized the keda-operator already has its own service account. And we also allow for setting annotations on that with helm. Also need to add a label on the service account for it to be tracked by the webhook. So, we need to discuss the best way to allow for support with helm

Refer this.

@v-shenoy
Copy link
Contributor

v-shenoy commented Apr 13, 2022

Once the code changes have been done for the Azure scalers.
The user flow would be as such -

  1. Configure the Kubernetes cluster to support Workload Identity.
  2. Create an AAD application and give permission to it on the required resource (such as Service Bus, Key Vault, Data Explorer, etc).
  3. Deploy KEDA using the new flags (see above). This would annotate the keda-operator service account with the required annotations and labels. The mutating-webhook will then insert the appropriate environment variables on the pods using the service account, aka the keda-operator and the keda-metric-apiserver pods.
  4. Have the AAD application trust tokens issued by the service account using federated credentials. (Through az cli, azwi cli, Azure Portal)
  5. Deploy a trigger authentication with podIdentity.provider: azure-workload.
  6. Deploy a ScaledObject with an Azure Scaler using this trigger authentication.

@v-shenoy
Copy link
Contributor

v-shenoy commented Apr 13, 2022

#2907 Draft PR with workload identity enabled for Service Bus, to showcase the code changes. The exact code change will differ from scaler to scaler, but the general idea is to fetch the AAD token using the new MSAL library using the service account token.

Tested this with the following definitions, and KEDA is able to fetch the metrics and scale.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: keda-azwi-sb-deployment
  namespace: default
spec:
  selector:
    matchLabels:
      app: keda-azwi-sb
  template:
    metadata:
      labels:
        app: keda-azwi-sb
    spec:
      containers:
      - name: nginx
        image: nginx:1.16.1
---
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: keda-azwi-sb-ta
  namespace: default
spec:
  podIdentity:
    provider: azure-workload
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: keda-azwi-sb-so
  namespace: default
  labels:
    deploymentName: keda-azwi-sb-deployment
spec:
  scaleTargetRef:
    name: keda-azwi-sb-deployment
  pollingInterval: 5
  cooldownPeriod: 10
  minReplicaCount: 0
  maxReplicaCount: 1
  triggers:
  - type: azure-servicebus
    metadata:
      namespace: kedaazwisb
      queueName: sb-queue
    authenticationRef:
      name: keda-azwi-sb-ta
---

I had to manually annotate the service accounts and redeploy the operators and metric server for the mutating-webhook to pick the chages and insert the env variables. But this is only until we add helm support.

@tomkerkhove
Copy link
Member

I will have a look next week, thanks!

@tomkerkhove
Copy link
Member

@kedacore/keda-maintainers Do we introduce a new provider similar to the above suggestion:

podIdentity:
    provider: azure-workload

Or do we stick with provider: azure and have another field that tells KEDA which mechanism to use?

Although the first option follows our current AWS approach which might be ideal:

provider: none | azure | gcp | aws-eks | aws-kiam

@JorTurFer
Copy link
Member Author

I think that Azure-workload is good enough. As you said, in AWS we already have 2. We don't have a lot of pod identities for being a problem.
If in the future MSFT removes AAD-Pod-Identity, we can rename it

@tomkerkhove
Copy link
Member

Agreed. I don't think we have to rename it later on though

@v-shenoy
Copy link
Contributor

Any comments on the helm support? As I mentioned, the service account used by the KEDA operator would require some labels and annotations.

I was thinking of the following flags.

  1. --podIdentity.azureWorkload.enabled=true
    The default would obviously be false. When this is set to true, we would add the label azure.workload.identity/use=true to the operator service account.
  2. --podIdentity.azureWorkload.clientId=<their-client-id>
    The value would be used in the azure.workload.identity/client-id annotation for the service account. This would be required when using --podIdentity.azureWorkload.enabled with true.
  3. --podIdentity.azureWorkload.tenantId=<their-tenant-id>
    The value would be used in the azure.workload.identity/tenant-id annotation for the service account. This would be optional as the workload identity uses a default value in the config map used while setting the webhook.
  4. --podIdentity.azureWorkload.tokenExpiration=<their-duration>
    The value would then be used for azure.workload.identity/service-account-token-expiration annotation for the service account. This would also be optional, as workload identity uses a default of 3600 seconds.

When serviceAccount.create is set to true, the service account that gets created with serviceAccount.name would be annotated. If serviceAccount.create is set to false, we assume that the user has created the service account with the required labels and annotations, and only set the appropriate service account on the operator and metrics-api-server pods.

@tomkerkhove
Copy link
Member

I like the above suggestion, thanks for the proposal!

When serviceAccount.create is set to true, the service account that gets created with serviceAccount.name would be annotated. If serviceAccount.create is set to false, we assume that the user has created the service account with the required labels and annotations, and only set the appropriate service account on the operator and metrics-api-server pods.

This is the only concern that I have though, if they don't create it then shouldn't we create it? If not, then do you want to assume serviceAccount.name is the one they want to use?

@v-shenoy
Copy link
Contributor

v-shenoy commented Apr 22, 2022

I like the above suggestion, thanks for the proposal!

When serviceAccount.create is set to true, the service account that gets created with serviceAccount.name would be annotated. If serviceAccount.create is set to false, we assume that the user has created the service account with the required labels and annotations, and only set the appropriate service account on the operator and metrics-api-server pods.

This is the only concern that I have though, if they don't create it then shouldn't we create it? If not, then do you want to assume serviceAccount.name is the one they want to use?

Yes, you are right. We assume that the serviceAccount.name is the one that they want to use, and that they have created it.

@raorugan
Copy link

I am willing to take this up. Have been reading a bit about workload identity, can try to understand what changes are required (if any) and come up with a design.

@v-shenoy v-shenoy reopened this Apr 25, 2022
@tomkerkhove
Copy link
Member

@raorugan I'm not sure I get your comment, would you mind elaborating on it please?

@tomkerkhove
Copy link
Member

Thanks a ton, this is a major addition - Thank you @v-shenoy!

@pinkfloydx33
Copy link

Is there any ETA on when this will be available?

It's already exposed in the Helm chart, which mislead me into believing it was generally available (shame on me for not checking the release notes before getting excited and trying to use it).

@tomkerkhove
Copy link
Member

Soon - https://github.com/kedacore/keda/blob/main/ROADMAP.md#upcoming-release-cycles!

We merged the Helm PR too soon, unfortunately, but didn't revert it back so sorry for the confusion.
As the release is happening soon; there's no need to remove and re-add it though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
azure All issues concerning integration with Azure feature All issues for new features that have been committed to
Projects
Archived in project
Development

No branches or pull requests

5 participants