Multiple pool storage classes informed by topologyKeys #559

mmgaggle · 2019-08-16T21:27:17Z

Describe the feature you'd like to have

Ability to have multiple pools per storage class, and use topologyKeys from the CSINode to inform pool selection.

What is the value to the end user? (why is it a priority?)

This would allow for the construction of a storage class that behaves in a similar fashion as Amazon EBS, Google Persistent Disk, and Azure Disk drivers provide. These provisioners do not expose a storage-class per failure-domain, end users should be able to provide a similar experience. As it currently stands, a pool and storage class would need to be created for each failure-domain and the end user need to select from them. By extension the pod would need to follow the PV (VOLUME_ACCESSIBILITY_CONSTRAINTS), instead of the PV being created where the pod is scheduled. This is particularly problematic if node placement rules conflict with the chosen storage class.

How will we know we have a good solution? (acceptance criteria)

Users will be able to request a PVC from a single named storage-class, and have the PV procured from a pool based on the CSINode topologyKeys.

Additional context

Proposed storageclass.yaml (for block)

apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
  name: standard-rbd-a
  namespace: rook-ceph
spec:
  crushRoot: us-west-2a
  failureDomain: host
  replicated:
    size: 3
---
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
  name: standard-rbd-b
  namespace: rook-ceph
spec:
  crushRoot: us-west-2b
  failureDomain: host
  replicated:
    size: 3
---
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
  crushRoot: us-west-2c
  name: standard-rbd-c
  namespace: rook-ceph
spec:
  failureDomain: host
  replicated:
    size: 3
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
   name: standard-rbd
provisioner: rook-ceph.rbd.csi.ceph.com
parameters:
    clusterID: rook-ceph
    imageFormat: "2"
    imageFeatures: layering
    csi.storage.k8s.io/provisioner-secret-name: rook-ceph-csi
    csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
    csi.storage.k8s.io/node-stage-secret-name: rook-ceph-csi
    csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
    csi.storage.k8s.io/fstype: xfs
    zones:
      us-west-2a:
        pool: standard-rbd-a
      us-west-2b:
        pool: standard-rbd-b
      us-west-2c:
        pool: standard-rbd-c
reclaimPolicy: Delete

The text was updated successfully, but these errors were encountered:

obnoxxx · 2019-10-30T14:57:43Z

@mmgaggle writes

These provisioners do not expose a storage-class per failure-domain ...

I am not sure I follow: We (ceph/rook/ceph-csi) do distribute our pools across failure domains and usually don't have failure-domain specific storage classes (unless the admin explicitly sets it up like this). So what is this feature request actually intending to achieve?

dillaman · 2019-10-30T15:24:13Z

@obnoxxx I thought the goal w/ this was topology aware provisioning. If you want to have a failure domain of zone A (and not spread your IOs across multiple failure domain zones A+B+C and incur additional inter-zone networking costs), right now you would need to have multiple storage classes to represent each possible failure domain. Instead, you should be able to have a single storage class that, upon provisioning of a PV, the CSI would map the zone to a backing RBD pool whose failure domain matches the zone.

mykaul · 2019-10-31T11:21:18Z

@dillaman - who created those different RBD pools in the 1st place?

dillaman · 2019-10-31T12:01:33Z

@mykaul Rook would both create the per AZ pools (and inter-AZ pool) along w/ the two storage classes -- one for locally-redundant storage and the other for zone-redundant storage (to borrow Azure terminology)

obnoxxx · 2019-10-31T22:10:52Z

@dillaman well, to my understanding, we're not usually creating one pool/SC per zone but spreading one pool across failure domains as much as possible for availability. Of course we can work on affinity, but the topology aware provisioning feature of kubernetes sounds orthogonal to what we're doing.

Of course you could do the multiple pools thing (to spare costs etc), doing a 90 degree turn from what we're currently doing. In this case it does make sense. But what I'm never to sure about: what happens if the pod that's using this PVC is rescheduled on a different zone?...

Well, in the end, ceph-csi doesn't set up the pools, so it is fair to say that it should be able to properly deal with any situation.

mmgaggle · 2019-11-02T03:36:21Z

The idea is to both have a ‘regional’ storage class that is backed by a pool spread across multiple AZs (default crush root, zone failure domain). We currently do this downstream.

A proposed new ability is to have per AZ pools that map to a single ‘standard’/‘zonal’ storage class.

Just like with EBS, if a pod is defined with an affinity the PVC will be satisfied by a dynamically provisioned PV in the affine zone (WaitForFirstConsumer). If the pod moves, then that affinity will ensure the pod is in the same zone as the PVC. If the pod does not have a affinity definition, then the pod will follow the PV (round robin dynamic provisioning) due to the VOLUME_ACCESSIBILITY_CONSTRAINTS.

Proposal document addressing the following issues, Updates: ceph#440, ceph#559 Signed-off-by: ShyamsundarR <srangana@redhat.com>

Madhu-1 · 2020-03-13T06:53:03Z

Adding here for reference. a comment by @ShyamsundarR on https://github.com/ceph/ceph-csi/pull/760/files#r368081823

The changes to the StorageClass as specified in #559 will not work, as the StorageClass parameters are of type map[string]string. Thus passing a more complex structure in here will not work (thanks to @JohnStrunk for pointing it out).

Instead a single key:value pair is proposed as below, that has a JSON structue in the value that can be parsed and used by the plugin.

New StorageClass parameter to detail pools and their topology is as follows,

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
   name: csi-rbd-sc
parameters:
  topologyConstrainedPools: |
      "[{"poolName":"pool0",
         "domainSegments":[
           {"domainLabel":"region","value":"vagrant"},
           {"domainLabel":"zone","value":"zone0"}]},
       {"poolName":"pool1",
         "domainSegments":[
           {"domainLabel":"region","value":"vagrant"},
           {"domainLabel":"zone","value":"zone1"]}
       ]"

Proposal document addressing the following issues, Updates: ceph#440, ceph#559 Signed-off-by: ShyamsundarR <srangana@redhat.com>

humblec · 2020-09-30T07:23:01Z

We have topology aware support with current code, however we have to revisit/improve it and move to beta support. @ShyamsundarR can give more details on this.

stale · 2021-07-21T13:50:11Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

github-actions · 2021-09-20T21:06:29Z

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.

mmgaggle mentioned this issue Aug 16, 2019

Make Ceph-CSI topology aware. #440

Closed

ShyamsundarR added a commit to ShyamsundarR/ceph-csi that referenced this issue Dec 19, 2019

Topology aware provisioning support for Ceph-CSI

8b34371

Proposal document addressing the following issues, Updates: ceph#440, ceph#559 Signed-off-by: ShyamsundarR <srangana@redhat.com>

ShyamsundarR mentioned this issue Dec 19, 2019

[Proposal] Topology aware provisioning support for Ceph-CSI #760

Closed

ceph-csi-bot pushed a commit to ShyamsundarR/ceph-csi that referenced this issue Aug 17, 2020

Topology aware provisioning support for Ceph-CSI

31b0766

Proposal document addressing the following issues, Updates: ceph#440, ceph#559 Signed-off-by: ShyamsundarR <srangana@redhat.com>

stale bot added the wontfix This will not be worked on label Jul 21, 2021

github-actions bot closed this as completed Sep 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple pool storage classes informed by topologyKeys #559

Multiple pool storage classes informed by topologyKeys #559

mmgaggle commented Aug 16, 2019 •

edited

Loading

obnoxxx commented Oct 30, 2019

dillaman commented Oct 30, 2019

mykaul commented Oct 31, 2019

dillaman commented Oct 31, 2019

obnoxxx commented Oct 31, 2019

mmgaggle commented Nov 2, 2019

Madhu-1 commented Mar 13, 2020

humblec commented Sep 30, 2020

stale bot commented Jul 21, 2021

github-actions bot commented Sep 20, 2021

Multiple pool storage classes informed by topologyKeys #559

Multiple pool storage classes informed by topologyKeys #559

Comments

mmgaggle commented Aug 16, 2019 • edited Loading

Describe the feature you'd like to have

What is the value to the end user? (why is it a priority?)

How will we know we have a good solution? (acceptance criteria)

Additional context

obnoxxx commented Oct 30, 2019

dillaman commented Oct 30, 2019

mykaul commented Oct 31, 2019

dillaman commented Oct 31, 2019

obnoxxx commented Oct 31, 2019

mmgaggle commented Nov 2, 2019

Madhu-1 commented Mar 13, 2020

humblec commented Sep 30, 2020

stale bot commented Jul 21, 2021

github-actions bot commented Sep 20, 2021

mmgaggle commented Aug 16, 2019 •

edited

Loading