-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PB-4991: Update ApplicationBackupController to execute the post-exec … #1602
Conversation
Can one of the admins verify this patch? |
Need to check the below scenarios.
|
Since it is a change in the Rule, I would suggest to have functional test case define to cover all the rule cases. |
@@ -262,6 +262,46 @@ func (a *ApplicationBackupController) handle(ctx context.Context, backup *stork_ | |||
if !canDelete { | |||
return nil | |||
} | |||
// run the post exec rules if the backup is in volume stage and execRulesCompleted is not completed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should move this step before deleteBackup api? Since we decided to delete the backup, should wait till backup cancel and deletion to run the post exec rule?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, post exec rule execution doesn't required to be on hold till the backup deletion will update accordingly.
} | ||
|
||
// Delete post-exec rule CR if any | ||
if len(backup.Spec.PostExecRule) != 0 && backup.Annotations[utils.PxbackupAnnotationCreateByKey] == utils.PxbackupAnnotationCreateByValue && backup.Annotations["portworx.io/backup-by"] != "backup-schedule" && backup.Annotations["portworx.io/backup-by"] != "restore" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good check. Can we please some comment, why we are having schedule and restore check? So that it will be useful for others.
Also, We might need to avoid the running the post-Exec rule, if it created by restore case rt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Sure will update the comments.
- Also, We might need to avoid the running the post-Exec rule, if it created by restore case rt? - Yes, existing check will prevent this (backup.Annotations["portworx.io/backup-by"] != "restore") is there anything else am missing ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I meant of avoiding the running post-exec rule if the applicationbackup CR was created by restore operation.
But I think the check in the line 254 might not satisfy for restore's applicationbakup CR. So it should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One concern is that when the application backup CR is deleted, px-backup will also delete the Rule CR as well rt?
Will it not cause a issue that Rule CR might be available at line 264?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A PR is also raised for the same https://github.com/portworx/px-backup/pull/1432
Here we are not cleaning the post exec rule cr from px-backup
Only stork is responsible for the post exec rule CR cleanup independent of whether to run the post-exec rules
log.ApplicationBackupLog(backup).Infof("Error value: %v", err) | ||
|
||
if err != nil && !k8s_errors.IsNotFound(err) { | ||
log.ApplicationBackupLog(backup).Infof("Error while deleting post exec rule: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let us have all the error msg and debug in the context to refer to "during cancel backup" or some other statement.
backup.Status.Status = stork_api.ApplicationBackupStatusFailed | ||
backup.Status.Reason = message | ||
err = a.client.Update(context.TODO(), backup) | ||
if !a.execRulesCompleted[string(backup.UID)] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lalat-das Was there any reason, why we did not had it check in the previous fix? Is it because, it is the last step?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was the comment I had given earlier, guess it was missed incorporating.
logrus.Infof("Sending termination commands to kill pre-exec pod in kdmp or mixed driver path") | ||
channel <- true | ||
} | ||
//terminationChannels = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the commented code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This commented code is an existing one not part of the this PR change.
I assume functional testcases from the px-backup.
|
I want to have test case covering all the rule scenario. Because we had some regression in the rule fixed perviously. |
Unit Testcases for the Fix are document in below sheet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…rule during the finalizer cleanup if they were not during the volume stage
|
||
// Delete post-exec rule CR after executing the postexec rules only for the manual backup as it is managed and created through the px-backup | ||
// And also here the deletion will be skipped if the backupCR is created by the schedule or for the restore from the px-backup. | ||
if len(backup.Spec.PostExecRule) != 0 && backup.Annotations[utils.PxbackupAnnotationCreateByKey] == utils.PxbackupAnnotationCreateByValue && backup.Annotations["portworx.io/backup-by"] != "backup-schedule" && backup.Annotations["portworx.io/backup-by"] != "restore" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move this in cleanupResources() to handle all Cr cleanup.
Also for kubevirt support we have added support do delete the ruleCR created by stork. You may want to check if there is no conflict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You already mentioned px-backup will delete the ruleCr so there will be conflict stork deletes the Cr too and pxBackup deletes too. There should be a way to know that delete is triggered by px-backup. In such case, we should avoid deleting the Cr. In my opinion we should defer deleting ruleCr at all unless its created by stork.
If the applicationBackupCr created by Px-backup but user has issued deleted outside of px-backup, then we should leave the user to delete the Cr as well.
@@ -309,6 +354,12 @@ func (a *ApplicationBackupController) handle(ctx context.Context, backup *stork_ | |||
if backup.Status.Stage == stork_api.ApplicationBackupStageFinal { | |||
return nil | |||
} | |||
|
|||
// Initialize execRulesCompleted for the backup UID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is redundant code. You already checking ifExists..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is in finalizer here it is during initialization.
backup.Status.Status = stork_api.ApplicationBackupStatusFailed | ||
backup.Status.Reason = message | ||
err = a.client.Update(context.TODO(), backup) | ||
if !a.execRulesCompleted[string(backup.UID)] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was the comment I had given earlier, guess it was missed incorporating.
…ion backup controller
Update ApplicationBackupController to execute the post-exec rule if they were not executed due to the backupCR deleteion when it is in-progress.
RCA
If the user cancels the in-progress backup(From px-backup we were update AB CR delete Timestamp) which already in the pre-exec rule stage or the volume beginning stage for any reason like backup snapshot ID not found or Err volume provisioner similar cases we were re-reconciling the request.
This leads to deletion of AB CR because the delete Timestamp is not nil which causing the post exec rule execution missing.
Current Approached Solution in PR
If the delete timestamp is no nil, we check CR is in the volume stage stage if so we additionally check if the post exec rules were ran, if not we can initiate a same here.
Deletion of the backup from px-backup also removes the rule CR’s which were required for the post exec rule execution. So a PR is raised for PX-backup(https://github.com/portworx/px-backup/pull/1432) to not delete the rule CRs during the backup deletion.
This activity can be taken during the AB CR finialzer execution.
What type of PR is this?
What this PR does / why we need it:
Executing the post exec rule during the finalizer cleanup and delete the CR resource once it is completed.
Does this PR change a user-facing CRD or CLI?:
no
Is a release note needed?:
Does this change need to be cherry-picked to a release branch?:
Unit Test Results
Test Rail Link: https://portworx.testrail.net/index.php?/runs/view/8098&group_by=cases:section_id&group_id=27135&group_order=asc