Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CephFS keyring requires nonsensicaly enormous and insecure privileges to work #4677

Closed
benapetr opened this issue Jun 12, 2024 · 6 comments
Closed
Labels
component/cephfs Issues related to CephFS dependency/ceph depends on core Ceph functionality wontfix This will not be worked on

Comments

@benapetr
Copy link

benapetr commented Jun 12, 2024

Describe the bug

Right now (I tried this with many combinations) the smallest working caps that work with CephFS storage class are those:

      - mon: 'allow r'
      - osd: 'allow rw tag cephfs metadata=fs_k8s, allow rw tag cephfs data=fs_k8s'
      - mds: 'allow r fsname=fs_k8s path=/volumes, allow rws fsname=fs_k8s path=/volumes/k8s_pb'
      - mgr: 'allow rw'

That is with dedicated FS "fs_k8s" namespace that is however intended to be shared by multiple separate clusters.

Removing or reducing any bit in any way results in errors like

  Warning  ProvisioningFailed    2m2s (x13 over 16m)   cephfs.csi.ceph.com_ceph-csi-cephfs-provisioner-756d7bb54f-z5pr7_eda038b5-5ace-4e38-940b-68bfe6c76e31  failed to provision volume with StorageClass "ceph-cephfs-sc": rpc error: code = Internal desc = rados: ret=-1, Operation not permitted

Those permissions on low level grant:

  • Full MDS read access to entire /volumes
  • Full access to MGR granting ability to manipulate ANY FS namespace on the cluster and access many other "admin-only" features
  • Full low level (rados) access to all data pools of the given FS namespace (we have dedicated OSD pool for each k8s cluster, but this keyring gives access to all of them). And what is equally bad - full unrestricted access to metadata pool (which seems completely unnecessary) - so this keyring is technically capable of reading or modifying data of any other cluster or FS namespace user.

This is enormous security hole that makes isolation within same FS namespace impossible. Only way to workaround this is to install dedicated CEPH cluster for each CephFS CSI consumer.

You can also create a dedicated FS namespace with own MDS, but that still doesn't prevent the CSI keyring from abusing the MGR rw caps.

Why are such enormous privileges needed? It's perfectly possible to work with CephFS without any access (even read-only is not needed) for metadata pool as only MDS is supposed to access that. RW OSD access is only needed for data pools that are used by folders that cluster subvolume group is mapped to, no need to map all of them.

MGR rw caps are probably needed to access to MGR API for subvolume management, but most of those operations can be handled via alternative ways, like .snap folders for snapshot creation.

Basically list of unnecessary permissions:

  • mgr: no need for rw at all
  • mds: no need for r for entire /volumes
  • osd: no need for any access to metadata pool or unrelated data pools of the FS namespace

This is a big security obstacle if you want to create secure environment

Environment details

  • Image/version of Ceph CSI driver : 3.8.1
  • Helm chart version :
  • Kernel version : 5.15.0-206.153.7.1
  • Mounter used for mounting PVC (for cephFS its fuse or kernel. for rbd its
    krbd or rbd-nbd) :fuse
  • Kubernetes cluster version : v1.25.16
  • Ceph cluster version : 18.2.2

Steps to reproduce

Steps to reproduce the behavior:

Try to create keyring that is restricted only to specific data pool, with no access to metadatapool, or mgr. CephFS is going to be mountable and usable just fine with such keyring, but CephFS Storage class is going to be unusable (only permission denied for anything)

Actual results

Getting permission denied unless the keyring has almost admin-like caps

Expected behavior

Storage class should not require admin-like caps to work with CephFS. Regular restricted caps should be enough.

Logs

  Normal   Provisioning          2m2s (x13 over 16m)   cephfs.csi.ceph.com_ceph-csi-cephfs-provisioner-756d7bb54f-z5pr7_eda038b5-5ace-4e38-940b-68bfe6c76e31  External provisioner is provisioning volume for claim "monitoring-system/prometheus-prometheus-prometheus-db-prometheus-prometheus-prometheus-0"
  Warning  ProvisioningFailed    2m2s (x13 over 16m)   cephfs.csi.ceph.com_ceph-csi-cephfs-provisioner-756d7bb54f-z5pr7_eda038b5-5ace-4e38-940b-68bfe6c76e31  failed to provision volume with StorageClass "ceph-cephfs-sc": rpc error: code = Internal desc = rados: ret=-1, Operation not permitted

Additional context

This was already discussed in #1818 (comment)

@nixpanic nixpanic added component/cephfs Issues related to CephFS dependency/ceph depends on core Ceph functionality labels Jun 13, 2024
@nixpanic
Copy link
Member

Hi @benapetr,

Except for permissions to work with CephFS, Ceph-CSI needs to store additional metadata for mapping of (CSI) volume-handles to CephFS details. This metadata is stored directly in Rados OMAPs, which should explain the need for extra permissions.

If there is a reduced permission set that allows to work with CephFS and Rados, we obviously would appreciate guidance in dropping unneeded capabilities.

Details about the required capabilities are documented in docs/capabilities.md.

@benapetr
Copy link
Author

So does that mean that only safe and truly isolated way to allow multiple k8s clusters to use CephFS is to build dedicated CEPH cluster for each k8s cluster? That is indeed not very efficient.

@zerotens
Copy link
Contributor

zerotens commented Jul 6, 2024

@benapetr

So does that mean that only safe and truly isolated way to allow multiple k8s clusters to use CephFS is to build dedicated CEPH cluster for each k8s cluster? That is indeed not very efficient.

Multitenancy for each k8s cluster on a single ceph filesystem will be possible, with the PR #4652 .
I'm still working on it and as soon it gets merged i can finish it.

@nixpanic

About mgr capabilities of ceph-csi:

info, err := fsa.SubVolumeInfo(s.FsName, s.SubvolumeGroup, s.VolID)

This would call in ceph-go:
https://github.com/ceph/go-ceph/blob/1046b034a1f618f67acd3c6523482917e27c7113/cephfs/admin/subvolume.go#L270-L275

And they can be limited via for each tenant via ceph capabilities allow command group_name prefix '<tenant-subvolumegroup>'.

Those are hopefully all the commands ceph-csi uses internally based on the source code.

allow command 'fs subvolume resize' with vol_name prefix 'k8s-fs' group_name prefix '<subvolumegroup>' sub_name prefix 'csi-', \
allow command 'fs subvolume rm' with vol_name prefix 'k8s-fs' group_name prefix '<subvolumegroup>' sub_name prefix 'csi-', \
allow command 'fs subvolume create' with vol_name prefix 'k8s-fs' group_name prefix '<subvolumegroup>' sub_name prefix 'csi-', \
allow command 'fs subvolume snapshot create' with vol_name prefix 'k8s-fs' group_name prefix '<subvolumegroup>' sub_name prefix 'csi-', \
allow command 'fs subvolume snapshot rm' with vol_name prefix 'k8s-fs' group_name prefix '<subvolumegroup>' sub_name prefix 'csi-', \
allow command 'fs subvolume snapshot clone' with vol_name prefix 'k8s-fs' group_name prefix '<subvolumegroup>' sub_name prefix 'csi-', \
allow command 'fs subvolume snapshot metadata set' with vol_name prefix 'k8s-fs' group_name prefix '<subvolumegroup>' sub_name prefix 'csi-', \
allow command 'fs subvolume snapshot metadata rm' with vol_name prefix 'k8s-fs' group_name prefix '<subvolumegroup>' sub_name prefix 'csi-', \
allow command 'fs subvolume metadata set' with vol_name prefix 'k8s-fs' group_name prefix '<subvolumegroup>' sub_name prefix 'csi-', \
allow command 'fs subvolume metadata rm' with vol_name prefix 'k8s-fs' group_name prefix '<subvolumegroup>' sub_name prefix 'csi-', \
allow command 'fs subvolume getpath' with vol_name prefix 'k8s-fs' group_name prefix '<subvolumegroup>' sub_name prefix 'csi-', \
allow command 'fs subvolume ls' with vol_name prefix 'k8s-fs' group_name prefix '<subvolumegroup>', \
allow command 'fs subvolume info' with vol_name prefix 'k8s-fs' group_name prefix '<subvolumegroup>' sub_name prefix 'csi-', \
allow command 'fs subvolume snapshot info' with vol_name prefix 'k8s-fs' group_name prefix '<subvolumegroup>' sub_name prefix 'csi-', \
allow command 'fs clone status' with vol_name prefix 'k8s-fs' group_name prefix '<subvolumegroup>', \
allow command 'fs volume ls', \
allow command 'fs dump', \
allow command 'fs ls'" \

See also https://docs.ceph.com/en/latest/rados/operations/user-management/

´Manager capabilities can also be specified for specific commands, for all commands exported by a built-in manager service, or for all commands exported by a specific add-on module.´

Is this something you are willing to maintain that in future ceph-csi releases this list of mgr caps gets updated?
Maybe they can be generated from source for each release?

@zerotens
Copy link
Contributor

zerotens commented Jul 9, 2024

So i had some time.
About other caps OSD/MDS and the last pieces:

export CLUSTER="my-k8s-cluster-1"
# create subvolumegroup for cluster
ceph fs subvolumegroup create k8s-fs $CLUSTER

# set specific radosNamespace for data written to the subvolumegroup
setfattr -n ceph.dir.layout.pool_namespace -v $CLUSTER /cephfs/volumes/$CLUSTER

Caps for OSD / MDS

osd "allow rw pool=k8s-fs-data namespace=$CLUSTER, allow rw pool=k8s-fs-metadata namespace=$CLUSTER"
mds "allow rw fsname=k8s-fs path=/volumes/$CLUSTER"

So whats happening, we ask CephFS with setfattr to place any data written to the subvolume in an specifc radosNamespace.
Metadata access is limited to the path /volumes/$CLUSTER.
Until the PR is merged the caps for osd must be:
osd "allow rw pool=k8s-fs-data namespace=$CLUSTER, allow rw pool=k8s-fs-metadata namespace=csi"

With this caps we should reach multitenancy for CephFS.

Copy link

github-actions bot commented Aug 8, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the wontfix This will not be worked on label Aug 8, 2024
Copy link

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/cephfs Issues related to CephFS dependency/ceph depends on core Ceph functionality wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

3 participants