-
Notifications
You must be signed in to change notification settings - Fork 555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple pool storage classes informed by topologyKeys #559
Comments
@mmgaggle writes
I am not sure I follow: We (ceph/rook/ceph-csi) do distribute our pools across failure domains and usually don't have failure-domain specific storage classes (unless the admin explicitly sets it up like this). So what is this feature request actually intending to achieve? |
@obnoxxx I thought the goal w/ this was topology aware provisioning. If you want to have a failure domain of zone A (and not spread your IOs across multiple failure domain zones A+B+C and incur additional inter-zone networking costs), right now you would need to have multiple storage classes to represent each possible failure domain. Instead, you should be able to have a single storage class that, upon provisioning of a PV, the CSI would map the zone to a backing RBD pool whose failure domain matches the zone. |
@dillaman - who created those different RBD pools in the 1st place? |
@mykaul Rook would both create the per AZ pools (and inter-AZ pool) along w/ the two storage classes -- one for locally-redundant storage and the other for zone-redundant storage (to borrow Azure terminology) |
@dillaman well, to my understanding, we're not usually creating one pool/SC per zone but spreading one pool across failure domains as much as possible for availability. Of course we can work on affinity, but the topology aware provisioning feature of kubernetes sounds orthogonal to what we're doing. Of course you could do the multiple pools thing (to spare costs etc), doing a 90 degree turn from what we're currently doing. In this case it does make sense. But what I'm never to sure about: what happens if the pod that's using this PVC is rescheduled on a different zone?... Well, in the end, ceph-csi doesn't set up the pools, so it is fair to say that it should be able to properly deal with any situation. |
The idea is to both have a ‘regional’ storage class that is backed by a pool spread across multiple AZs (default crush root, zone failure domain). We currently do this downstream. A proposed new ability is to have per AZ pools that map to a single ‘standard’/‘zonal’ storage class. Just like with EBS, if a pod is defined with an affinity the PVC will be satisfied by a dynamically provisioned PV in the affine zone (WaitForFirstConsumer). If the pod moves, then that affinity will ensure the pod is in the same zone as the PVC. If the pod does not have a affinity definition, then the pod will follow the PV (round robin dynamic provisioning) due to the VOLUME_ACCESSIBILITY_CONSTRAINTS. |
Adding here for reference. a comment by @ShyamsundarR on https://github.com/ceph/ceph-csi/pull/760/files#r368081823 The changes to the StorageClass as specified in #559 will not work, as the StorageClass parameters are of type map[string]string. Thus passing a more complex structure in here will not work (thanks to @JohnStrunk for pointing it out). Instead a single key:value pair is proposed as below, that has a JSON structue in the value that can be parsed and used by the plugin. New StorageClass parameter to detail pools and their topology is as follows,
|
We have topology aware support with current code, however we have to revisit/improve it and move to |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions. |
This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation. |
Describe the feature you'd like to have
Ability to have multiple pools per storage class, and use topologyKeys from the CSINode to inform pool selection.
What is the value to the end user? (why is it a priority?)
This would allow for the construction of a storage class that behaves in a similar fashion as Amazon EBS, Google Persistent Disk, and Azure Disk drivers provide. These provisioners do not expose a storage-class per failure-domain, end users should be able to provide a similar experience. As it currently stands, a pool and storage class would need to be created for each failure-domain and the end user need to select from them. By extension the pod would need to follow the PV (VOLUME_ACCESSIBILITY_CONSTRAINTS), instead of the PV being created where the pod is scheduled. This is particularly problematic if node placement rules conflict with the chosen storage class.
How will we know we have a good solution? (acceptance criteria)
Users will be able to request a PVC from a single named storage-class, and have the PV procured from a pool based on the CSINode topologyKeys.
Additional context
Proposed storageclass.yaml (for block)
The text was updated successfully, but these errors were encountered: