-
Notifications
You must be signed in to change notification settings - Fork 555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for standalone snapshot creation and cloning support #693
Conversation
6864ba6
to
6beeb2a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a good start. One concern that I couldn't easily comment on inline since it wasn't a changed line was the structure of rbdSnapshot
and how RbdImageName
sometimes points to the original volume and sometimes to the clone. I think that it should always refer to the snapshot clone image name -- and therefore RbdSnapName
would always be the same as RbdImageName
in the case of this new v2 snapshot implementation. For the legacy CSI snapshots, they could be different.
pkg/rbd/snapshot.go
Outdated
}() | ||
RESTORE: | ||
// create clone image and delete snapshot | ||
err = restoreSnapshot(ctx, cloneRbdVol, snap, cr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: can we give this method a better name? Perhaps something like cloneRbdImageFromSnapshot
?
vol := new(rbdVolume) | ||
vol.ClusterID = rbdSnap.ClusterID | ||
vol.Monitors = rbdSnap.Monitors | ||
vol.Pool = rbdSnap.Pool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: we should be able to (optionally) store snapshots in a different pool from the source image. It doesn't need to be part of this PR, but just a heads-up.
pkg/rbd/rbd_util.go
Outdated
func createSnapshot(ctx context.Context, pOpts *rbdSnapshot, cr *util.Credentials) error { | ||
var output []byte | ||
|
||
image := pOpts.RbdImageName | ||
snapName := pOpts.RbdSnapName | ||
|
||
klog.V(4).Infof(util.Log(ctx, "rbd: snap create %s using mon %s, pool %s"), image, pOpts.Monitors, pOpts.Pool) | ||
klog.V(4).Infof(util.Log(ctx, "rbd: snap create %s/%s using mon %s, pool %s"), image, snapName, pOpts.Monitors, pOpts.Pool) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep it consistent w/ the standardized RBD image-spec <pool-name>/<image-name>@<snap-name>
:
klog.V(4).Infof(util.Log(ctx, "rbd: snap create %s/%s using mon %s, pool %s"), image, snapName, pOpts.Monitors, pOpts.Pool) | |
klog.V(4).Infof(util.Log(ctx, "rbd: snap create %s/%s@%s using mon %s"), pOpts.Pool, image, snapName, pOpts.Monitors) |
@@ -552,32 +592,13 @@ func createSnapshot(ctx context.Context, pOpts *rbdSnapshot, cr *util.Credential | |||
return nil | |||
} | |||
|
|||
func unprotectSnapshot(ctx context.Context, pOpts *rbdSnapshot, cr *util.Credentials) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: won't you need to handle backwards compatibility? While you won't be creating protected snapshots anymore, you might need to unprotect legacy snapshots when deleting images.
pkg/rbd/rbd_util.go
Outdated
|
||
// attempt to use Ceph manager based deletion support if available | ||
rbdCephMgrSupported, err := rbdManagerTaskDeleteImage(ctx, pOpts, cr) | ||
id, err = getImageIDfromTrash(ctx, pOpts, cr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seams potentially dangerous. It seems like it would be safer to record the image id in the omap tables at CO snapshot / RBD image creation time or prior to moving to the trash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do I need to record the image ID? or can I fetch the image ID from image prior to moving it to trash?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it depends if volume/snapshot IDs are 100% guaranteed to be unique for the lifetime of the CO. If, for example, we had a volume/snapshot named "X" that the CO deleted and we moved to the trash but cannot delete, can the CO ever create a new volume named "X"? If there is no guarantee that these CO names cannot be reused, then for safety we would need to record the image id prior to removing the image.
... but thinking about that some more, I guess we would just immediately fail in the create step since we would see existing omap entries for the volume/snapshot the CO thought it had deleted and then attempted to re-use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are generating some unique ID for each PVC name and we create a mapping for unique ID and PVC name if the image deletion fails we don't remove the mapping from the Omap, if we remove it from Omap means the image is already deleted.
even if the unique ID and PVC name mapping is not present in omap we generate a new ID and use it as image name, I think it should be fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but we have an ask out to the external-provisioner sidecar to provide a deterministic name to the CSI based on the namespace and PVC name. If you then delete a PVC (and the CSI keeps it behind the scenes) and then re-create it, you are saying the CSI will generate a unique internal volume id? If so, I think the current implementation is fine. It could be more efficient to just look up the image id before moving it to the trash, but then you would have unique logic on the recovery path where it needs to scan the full trash which might have hundreds or thousands of images. The middle ground would be to record the id before moving it to the trash to avoid the full trash list and search.
pkg/rbd/rbd_util.go
Outdated
func deleteSnapshot(ctx context.Context, pOpts *rbdSnapshot, cr *util.Credentials) error { | ||
var output []byte | ||
|
||
image := pOpts.RbdImageName | ||
snapName := pOpts.RbdSnapName | ||
|
||
klog.V(4).Infof(util.Log(ctx, "rbd: snap rm %s using mon %s, pool %s"), image, pOpts.Monitors, pOpts.Pool) | ||
klog.V(4).Infof(util.Log(ctx, "rbd: snap rm %s@%s using mon %s, pool %s"), image, snapName, pOpts.Monitors, pOpts.Pool) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: use image-spec style
pkg/rbd/rbd_util.go
Outdated
@@ -601,12 +616,11 @@ func restoreSnapshot(ctx context.Context, pVolOpts *rbdVolume, pSnapOpts *rbdSna | |||
image := pVolOpts.RbdImageName | |||
snapName := pSnapOpts.RbdSnapName | |||
|
|||
klog.V(4).Infof(util.Log(ctx, "rbd: clone %s using mon %s, pool %s"), image, pVolOpts.Monitors, pVolOpts.Pool) | |||
klog.V(4).Infof(util.Log(ctx, "rbd: clone %s@%s %s using mon %s, pool %s"), pSnapOpts.RbdImageName, snapName, image, pVolOpts.Monitors, pVolOpts.Pool) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: use image-spec style
pkg/rbd/rbd_util.go
Outdated
|
||
output, err := execCommand("ceph", args) | ||
if err != nil { | ||
if strings.Contains(string(output), rbdTaskRemoveCmdInvalidString1) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: can we move these to a common helper?
rbdSnapshot refers to original image only during the first time when we need to reserve the snapshot in Omap once its done it will be point to the cloned image name will add support to remove older snapshot |
Except you are also passing the |
setting RBDImageName from source volume here because we are generating the snapshotID and reserving snapshot details in Omap here prior to this we don't have the snapshotID to set the RbdImageName in snapshot struct, once we do the reservation we are setting back to clone volume here to create a final snapshot from cloned volume if am missing anything or doing it wrong let me know |
If you need it for reserving the snapshot, pass the source volume struct instead. This is for clarity to avoid the back-and-forth. |
@dillaman addressed review comments PTAL |
bf49e29
to
88c94b5
Compare
@dillaman still am facing the mounting issue for cloned PVC below are the details
but in my local testing on kernel versions |
b61c4c5
to
90a77a1
Compare
c012134
to
ad44e14
Compare
Currently the kernel on the CentOS-7 has version 3.10.0-1062.4.1.el7 (see the updates here). I do not know what parts related to Ceph have been backported. You would need to check the changelog of the package or check the online version from RHEL. |
Thanks for the info @nixpanic, anyway we can create a vagrant based kube cluster, any idea when centos 8 libvirt vagrant box will be available? i tried to find some info https://bugs.centos.org/view.php?id=16455 |
Even we can upgrade the kernel version in Vm created using vagrant. @nixpanic any idea when can we have E2E setup on centos CI? |
@Madhu-1 Ouch -- I thought krbd passed the optional read-only flag [1] to the @idryomov It's too late to help us now, but can we get that flag added to krbd so that in the future read-write image features don't affect read-only images? [1] https://github.com/ceph/ceph/blob/master/src/cls/rbd/cls_rbd.cc#L862 |
What's the status of snapshot / clone support for the v2.1 release? |
This reverts commit 90fef91.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
``` * create temporary RBD snapshot on source volume's RBD image * RBD clone CO source volume's RBD image's snapshot to new RBD image * remove temporary RBD snapshot on CO source volume's RBD image * serialize work on RBD snapshot image clone * test RBD snapshot image clone's depth * if depth over hard limit flatten clone * create RBD snapshot on RBD snapshot image clone * create RBD volume image clone from RBD snapshot image clone * delete temporary RBD image * delete temporary RBD snapshot ``` Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
@Madhu-1 can you revisit this PR ? |
@humblec am already working on this one, already updated the PR (need to do more work) |
[Status update] |
Details of other use-cases is present here. The decision is to continue with the current design and implementation as it does not impede the future uses as understood at present. |
This pull request now has conflicts with the target branch. Could you please resolve conflicts and force push the corrected changes? 🙏 |
@dillaman an image whose ancestor is undergoing a flatten is still mountable by the clients, right? e.g, if the image chain is, image1 -> image2 (CSI-snap) -> image3 (CSI-clone), when cloning from a CSI-Snapshot. image2 could be undergoing a flatten (either due to soft or hard flatten limits), whereas image3 could be mounted and used in the interim. An image that is undergoing flattening is not mountable by the clients, right? As a result our base rule is always flatten the images that serve as the intermediate CSI-snapshot If the above is true, when cloning from an CSI-Volume rather than from a CSI-Snapshot, where the current image chain would be image1 (source) -> image2 (RBD clone serving as hidden CSI-snapshot) -> image3 (RBD clone serving as the CSI-clone), could we flatten |
Yes, but only if at the "soft" limit. If the total chain length is >15, krbd cannot properly handle opening that image with 14 ancestors. If your entire chain is >=15, we need to pause the creation of the new PVC that violates the limit. Now, we could always just choose to restrict the chain depth upon k8s snapshot creation by just taking the k8s snapshot chain length + 1 (i.e. assume it will get a new child PVC eventually). That would allow you to use the "snapshot creation pending / uploading" status return while it's being flattened and therefore prevent its use via a new PVC from snapshot operation. |
@ShyamsundarR I went down the rabbit hole on this one this afternoon. In order to support thin-provisioned clones and PVCs from snapshots, we require kernel v5.1 or later (RHEL 8.2 or later). Any platform that attempts to use this functionality on older kernels (I won't name names because it's very silly that they are still using such an old OS), will need to fully flatten the clone/PVC from snapshot prior to giving access to the end-user. That means we will need to support flatten checks upon PVC creation (i.e. max image chain length is 1 -- no parents are supported). |
Here is what I understand what CSI must do,
Thoughts/Questions:
|
|
... or just have a node kernel capability flag in the RBD controller saying deep-flatten is supported or not. If it's not supported, always flatten the PVC image. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what reviews were still pending, GitHub suggests I did not post the comments yet?
Will spend some more time on reviewing today.
@@ -1,12 +1,11 @@ | |||
--- | |||
# need for docker build | |||
sudo: true | |||
dist: xenial | |||
dist: bionic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this is already done? Does the branch need a rebase on master?
e2e/rbd.go
Outdated
@@ -91,19 +91,31 @@ var _ = Describe("RBD", func() { | |||
deleteConfigMap(rbdDirPath) | |||
deleteResource(rbdExamplePath + "secret.yaml") | |||
deleteResource(rbdExamplePath + "storageclass.yaml") | |||
<<<<<<< HEAD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defintely not correct.
Please explain in the commit message why the revert is needed. No need to mention that it is a revert in the subject, just point to the commit in the message.
cloneRbd := generateVolFromSnap(rbdSnap) | ||
// add image feature for cloneRbd | ||
cloneRbd.ImageFormat = rbdImageFormat2 | ||
cloneRbd.ImageFeatures = "layering" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use the constant rbd.FeatureNameLayering
for this when you move to go-ceph v0.3 or newer.
// generate cloned volume details from snapshot | ||
cloneRbd := generateVolFromSnap(rbdSnap) | ||
// add image feature for cloneRbd | ||
cloneRbd.ImageFormat = rbdImageFormat2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imageFormat2 is the default, I think
if err != nil { | ||
if _, ok := err.(ErrImageNotFound); !ok { | ||
klog.Errorf(util.Log(ctx, "failed to delete rbd image: %s/%s with error: %v"), rbdVol.Pool, rbdVol.VolName, err) | ||
return nil, status.Error(codes.Internal, err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that deleting an image should succeed in case the image does not exist? Check the CSI spec for that.
// it asynchronously. If command is not found returns a bool set to false | ||
func rbdManagerTaskDeleteImage(ctx context.Context, pOpts *rbdVolume, cr *util.Credentials) (bool, error) { | ||
// rbdManagerTaskTrashRemove adds a ceph manager task to remove image from trash, thus removing it asynchronously. If command is not found returns a bool set to false | ||
func rbdManagerTaskTrashRemove(ctx context.Context, pOpts *rbdVolume, cr *util.Credentials) (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes related to deleting images can be its own commit, reducing the noise in the main change
|
||
klog.V(4).Infof(util.Log(ctx, "rbd: trash mv %s using mon %s, pool %s"), image, pOpts.Monitors, pOpts.Pool) | ||
|
||
args := []string{"trash", "mv", image, "--pool", pOpts.Pool, "--id", cr.ID, "-m", pOpts.Monitors, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can use go-ceph for this, like
img := pOpts.open()
img.Trash()
img.Close()
Add support for standalone snapshot creation and creating new PVC from snapshot
with the current implementation in ceph-csi, it's not possible to delete the cloned volume if the snapshot is present due to the child linking, to remove this dependency we had a discussion (Thanks for @dillaman for suggesting this idea) and come up with an idea to separate out the clone and snapshot, so that we can delete the snapshot and cloned image in any order.
the steps followed to create an independent snapshot as follows
Create a snapshot
--rbd-default-clone-format 2 --image-feature layering,deep-flatten
Create PVC from a snapshot
clones can be (max 10 can be changed based on the configuration)
if the depth is reached flatten the newly cloned image
Delete a snapshot
example commands:-
Signed-off-by: Madhu Rajanna madhupr007@gmail.com
Partial Fixes: #675
Fixes: #403
#467 will be abandoned with this PR
Note: @dillaman I have addressed the review comment in #467 PR PTAL