Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Flux Multi-Tenancy Mode #2093

Closed
wants to merge 2 commits into from
Closed

[RFC] Flux Multi-Tenancy Mode #2093

wants to merge 2 commits into from

Conversation

stefanprodan
Copy link
Member

@stefanprodan stefanprodan commented Nov 16, 2021

For multi-tenant environments, we want to offer an easy way of configuring Flux to enforce tenant isolation.

This RFC describes how that can be achieved with a single flag added to flux bootstrap.

Supersedes #582

@stefanprodan stefanprodan added the area/rfc Feature request proposals in the RFC format label Nov 16, 2021
@stefanprodan stefanprodan force-pushed the rfc-0003 branch 17 times, most recently from e3ec283 to 00dcae3 Compare November 18, 2021 16:12
@scottrigby
Copy link
Member

Note to document this in fluxcd/website#598

rfcs/0003-multi-tenancy-mode/README.md Show resolved Hide resolved
the service account specified in the Flux custom resources (`Kustomizations`, `HelmReleases`).
- When no service account name is specified in a Flux custom resource,
a default will be used e.g. `system:serviceaccount:<tenant-namespace>:flux`.
- When a Flux custom resource (`Kustomizations`, `HelmReleases`, `ImagePolicies`, `ImageUpdateAutomations`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we clarify the exact permissions that we are giving to the Flux controllers in this mode? I'm assuming GET/LIST access to .fluxcd.io CRDS, secrets, etc. across all namespaces?

Copy link
Member Author

@stefanprodan stefanprodan Nov 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The controller cluster role is defined here: https://github.com/fluxcd/flux2/blob/main/manifests/rbac/controller.yaml

Should I paste this in RFC?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this would be good to add into the RFC for everyone's visibility

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming that people read Flux docs not just this RFC. The RBAC and impersonation is covered here https://fluxcd.io/docs/security/#controller-permissions.

Copy link
Contributor

@jonathan-innis jonathan-innis Nov 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the implementation of this security proposal, we can drop the cluster-reconciler in this case for the kustomize-controller and helm-controller, though. I'm thinking this would be good to call out vs. the status-quo where they are given cluster-admin as default

Copy link
Member Author

@stefanprodan stefanprodan Nov 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can’t drop the cluster role binding to cluster-admin, admins can switch between single-tenant and multi-tenant at any time. The controllers use the cluster-admin to impersonate the tenant account.

From the docs I linked above:

However in a soft multi-tenancy setup, Flux does not reconcile a tenant’s repo under the cluster-admin role. Instead you specify a different service account in your manifest, and the Flux controllers will use the Kubernetes Impersonation API under cluster-admin to impersonate that service account.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this mode, cluster-reconciler could bind to a ClusterRole that only provides the impersonation powers it requires. (i.e. allowing a Platform administrator to further limit the ClusterRoleBinding to specific accounts it can run as)

containers:
- name: manager
args:
- --default-service-account=flux
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for this being configurable

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise. This provides a nice way to adopt ServiceAccounts that are already set up.

interval: 10m0s
path: ./
prune: true
serviceAccountName: flux
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this proposed method, the kustomize-controller or helm-controller can reconcile any service account within the namespace of the CR. With this in mind, a user needs to be hyper-aware that any service account in that namespace could be used for reconciliation and must not role-bind to give any service account any kind of elevated privilege that it did not intend for the flux reconcilers (such as giving access to create HelmReleases across the cluster, which essentially opens up Pandora's box). Will this be noted somewhere specifically/is there a way to ensure that flux can only reconcile with service accounts with .metadata.labels intended specifically for flux?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we shouldn't enforce any limitations for the service accounts that are provisioned by platform admins because they may chose to allow a tenant to own multiple namespaces, so they can bind a SA to all those namespaces. A tenant can't create cluster role bindings so it's not possible for them to evade their boundary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this same concern apply to e.g. a Deployment which accepts any arbitrary service account as a configuration value? Which I assume, isn't a problem. Raising the question why it would be different for a Flux Custom Resource?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this same concern apply to e.g. a Deployment which accepts any arbitrary service account as a configuration value?

These are the same technically -- a Deployment can run a container that uses kubectl to cause mischief. But the opportunity for mistakes is higher when Flux is introduced. With only Deployments and other "normal" objects you would not usually consider creating a service account with wide-ranging permissions. When you're using Flux, you have much more (RBAC) to think about.

Copy link
Member Author

@stefanprodan stefanprodan Dec 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With only Deployments and other "normal" objects you would not usually consider creating a service account with wide-ranging permissions.

Depend on what you consider to be "normal" here, every single Kubernetes addon/controller needs wide-ranging permissions, are these abnormal?

Copy link
Member

@hiddeco hiddeco Dec 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my point is more that if you make extensive use of RBAC, you should be aware of what this entails (for your tenants).

The recommendation of not sharing a service account between multiple resources, or using any arbitrary account that is available to you, does not change for a Deployment or a Flux resource.

Copy link
Member

@hiddeco hiddeco Dec 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not saying here that this means we should not highlight or be upfront about it, but rather that I am not really enthusiastic about adding e.g. label selectors for service accounts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not really enthusiastic about adding e.g. label selectors for service accounts.

Right -- requiring labels is not a protection, since any process that can run with the service account can do mischief, not just the Flux controllers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main challenge on the model in which the tenant's service account is created in the tenant's namespace is that anyone with pod create permissions in the tenant's namespace would be able to privesc to whatever permissions the flux tenant service account is running as - current example is cluster-admin within that namespace.

In this model platform admins may have to lean on admission controllers to block that path to privilege escalation.

Copy link
Member

@pjbgf pjbgf Dec 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some recommendations on how the tenant can be structured to protect their service accounts in the RFC004 (xref: #2086 (comment)) without requiring admission controllers.

Copy link
Contributor

@jonathan-innis jonathan-innis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 from Microsoft 🚀

Comment on lines 1 to 15
# RFC-0003 Flux Multi-Tenancy Mode

## Summary

For multi-tenant environments, we want to offer an easy way of configuring Flux to enforce tenant isolation
(as defined by the Soft Multi-Tenancy model from RFC-0001).

When running in the multi-tenant mode, Flux will lock down access to sources (as defined by RFC-0002),
and will use the tenant service account instead of defaulting to `cluster-admin`.
Copy link
Member

@pjbgf pjbgf Dec 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could also be beneficial beyond the multi-tenancy use-case, allowing Platform admins to run flux in a least-privileged mode, enforcing the permissions the controllers operate under - like in the user story 1. Instead of multi-tenants, a cluster could simply have different trust boundaries that require specific permission levels.

The change would impact most terminology, as the scope would be the same. Meaning, instead of:

flux bootstrap --security-profile=multi-tenant

We could have something more multi-tenancy-agnostic (or any other relevant naming):

flux bootstrap --security-profile=least-privileged

the service account specified in the Flux custom resources (`Kustomizations`, `HelmReleases`).
- When no service account name is specified in a Flux custom resource,
a default will be used e.g. `system:serviceaccount:<tenant-namespace>:flux`.
- When a Flux custom resource (`Kustomizations`, `HelmReleases`, `ImagePolicies`, `ImageUpdateAutomations`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this mode, cluster-reconciler could bind to a ClusterRole that only provides the impersonation powers it requires. (i.e. allowing a Platform administrator to further limit the ClusterRoleBinding to specific accounts it can run as)

interval: 10m0s
path: ./
prune: true
serviceAccountName: flux
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main challenge on the model in which the tenant's service account is created in the tenant's namespace is that anyone with pod create permissions in the tenant's namespace would be able to privesc to whatever permissions the flux tenant service account is running as - current example is cluster-admin within that namespace.

In this model platform admins may have to lean on admission controllers to block that path to privilege escalation.

Copy link
Contributor

@schrej schrej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One policy to get rid of 😄
+1 from DTAG

Signed-off-by: Stefan Prodan <stefan.prodan@gmail.com>
Signed-off-by: Stefan Prodan <stefan.prodan@gmail.com>
@stefanprodan
Copy link
Member Author

Closing this with the same resolution resolution as #2092

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/rfc Feature request proposals in the RFC format
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants