-
Notifications
You must be signed in to change notification settings - Fork 556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limits for RBD create PVC from snapshot #1098
Comments
@dillaman isn't covered by flattening the image based on the hard and soft limit? do we need anything else? |
The flatten logic only protects against the "no more than 15 images in the chain" test. For the 511 snapshots per image, that's a new test that could be solved by flattening older k8s snapshots off the RBD image so that the snapshots can be removed. Same w/ the no more than X total clones. |
@dillaman as CSI RBD snapshots will be RBD clones of the parent image and the intermediate RBD snapshot from which these are cloned would be deleted post the clone is completed, does the 511 limitation pertain to,
If the former, CSI should handle this as some form of allowed active snapshot requests in flight per PV/image, and report back a temporary resource exhaustion for a request that breaches the stated limit. If the latter, then CSI should flatten future snapshots (or maybe deny them?), as past CSI-Snapshots maybe in use for other clone from snapshot CSI-Create requests. The X total clones, can follow a similar logic to the latter case above. Is there a reason to prefer flattening older snaps or clones, than newer ones? |
Neither -- it's the total number of RBD snapshots on an RBD image (i.e.
If you prefer the older ones, you could have a soft vs hard-limit to potentially kick-off a background flatten task on the older ones to avoid the need pause new k8s snapshot creation. |
Yikes! understood. We (CSI) were just using the trash as a convenient parking lot I guess, so those count as well.
We may need to start flattening sooner than later then, with soft/hard limits as you state. @Madhu-1 we may need to rethink this based on our earlier conversation. |
Correct. CephFS is going to have its own set of limits so we should also probably starting documenting that somewhere as well (hopefully w/ a similar end-result that the CSI can hide it). |
@dillaman Having such limits as mentioned in this issue is good to have. At the same time, we have to see how this is going to effect our scalability requirements. I believe this is going to be bit of trial and error and adopting what we can do best from our end against it. As a side question# is there an option available in rbd where we can set or trigger the deletion in the backend/trash/purge cache ? |
Scalability means nothing if ceph-csi breaks Ceph |
@dillaman do snapshots in trash (or otherwise) of parents in a clone chain count towards the 511 total snapshots limit? IOW, assuming I did the following,
I would end up with 1 snap for the first image in trash, and one snap for the next cloned image in trash. Now, the image at tail of this chain ( FWIW, I created 513 snapshots of test2 image in the example below and it worked, but I did not mount the image. The limit though I assume is due to the kernel mounter?
|
The limit is in krbd (and kernel CephFS) since they only allocate 1 4KiB page to handle all the snapshot ids for an image / file. The snapshot limit only counts for the image where the snapshot actually exists -- it does not apply to the total number of snapshots in the entire grandparent-parent-child hierarchy. |
Laying out steps as discussed with @Madhu-1 (and based on various comments and discussions, in the snapshot PRs and in this issue from @dillaman ) for implementation. Ensuring clone depth is in checkN: Configured hard limit for image depth
Ensuring total snapshot count is in checkK: Configured maximum number of all snapshots for an image (including ones in trash)
|
Should it return an error or would it be better to just return a "PENDING" error code so that it's retried periodically while a background flatten is taking place? |
Thinking along the lines that snapshot should be as instantaneous as possible (with possible future application and fs quiesce in play in the overall workflow), an error seems better as we would have not started any work to create the snapshot. The case where we return PENDING, for clones or while flattening for a snapshot image, is safer, as the snapshot is already taken, and we are post processing the same. In this corner case, we are yet to take one, hence error out is acceptable. In an "ideal" scenario the error stating resources exhausted should be handled gracefully by the callers. |
Ack -- worst case the logic can be tweaked if it causes UX concerns down the road. |
Nope -- you are correct @ 510. Of course, I'd imagine we would want the CSI hard limit well below that (i.e. 5-10% reserve minimum). There is a large performance hit for small IOs with you have hundreds of snapshots since each write carries along that full list of snapshots again (i.e. so a 512 byte write might have 4KiB of additional overhead just listing the snapshots). |
Describe the bug
Prior to the GA release of snapshot support, we need to ensure that the ceph-csi driver enforces some sane limits on snapshot creation and creating PVCs from snapshots.
RBD snapshot limits
RBD clone limits
The ceph-csi driver can attempt to hide this internal limits by flattening child images as necessary to provide more "space" for future snapshots / cloned images.
The text was updated successfully, but these errors were encountered: