Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PB-4991: Update ApplicationBackupController to execute the post-exec … #1602

Merged
merged 2 commits into from
Feb 6, 2024

Conversation

sgajawada-px
Copy link
Contributor

@sgajawada-px sgajawada-px commented Jan 4, 2024

Update ApplicationBackupController to execute the post-exec rule if they were not executed due to the backupCR deleteion when it is in-progress.

RCA
If the user cancels the in-progress backup(From px-backup we were update AB CR delete Timestamp) which already in the pre-exec rule stage or the volume beginning stage for any reason like backup snapshot ID not found or Err volume provisioner similar cases we were re-reconciling the request.
This leads to deletion of AB CR because the delete Timestamp is not nil which causing the post exec rule execution missing.

Current Approached Solution in PR
If the delete timestamp is no nil, we check CR is in the volume stage stage if so we additionally check if the post exec rules were ran, if not we can initiate a same here.
Deletion of the backup from px-backup also removes the rule CR’s which were required for the post exec rule execution. So a PR is raised for PX-backup(https://github.com/portworx/px-backup/pull/1432) to not delete the rule CRs during the backup deletion.
This activity can be taken during the AB CR finialzer execution.

What type of PR is this?

Uncomment only one and also add the corresponding label in the PR:
bug
feature
improvement
cleanup
api-change
design
documentation
failing-test
unit-test
integration-test

What this PR does / why we need it:
Executing the post exec rule during the finalizer cleanup and delete the CR resource once it is completed.

Does this PR change a user-facing CRD or CLI?:
no

Is a release note needed?:

Issue: Post-exec rule execution was halted due to the cancellation of an in-progress backup initiated by the px-backup.
User Impact: I/O halted on Application or Halt of post-exec rule execution.
Resolution: Executing and removing the post-execution rule Custom Resource (CR) as part of the cleanup process for application backup Custom Resource (CR).

Does this change need to be cherry-picked to a release branch?:

Unit Test Results
Test Rail Link: https://portworx.testrail.net/index.php?/runs/view/8098&group_by=cases:section_id&group_id=27135&group_order=asc
Screenshot from 2024-01-04 06-35-21

@cnbu-jenkins
Copy link
Collaborator

Can one of the admins verify this patch?

@sgajawada-px
Copy link
Contributor Author

Need to check the below scenarios.

  1. What if the px-backup created the AB CR but if the CR is removed manually.
  2. What if the AB CR creation and deletion done manually only depending upon stork.
  3. schedule backup delete while it is in-progress

@siva-portworx
Copy link
Contributor

Since it is a change in the Rule, I would suggest to have functional test case define to cover all the rule cases.

@@ -262,6 +262,46 @@ func (a *ApplicationBackupController) handle(ctx context.Context, backup *stork_
if !canDelete {
return nil
}
// run the post exec rules if the backup is in volume stage and execRulesCompleted is not completed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should move this step before deleteBackup api? Since we decided to delete the backup, should wait till backup cancel and deletion to run the post exec rule?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, post exec rule execution doesn't required to be on hold till the backup deletion will update accordingly.

}

// Delete post-exec rule CR if any
if len(backup.Spec.PostExecRule) != 0 && backup.Annotations[utils.PxbackupAnnotationCreateByKey] == utils.PxbackupAnnotationCreateByValue && backup.Annotations["portworx.io/backup-by"] != "backup-schedule" && backup.Annotations["portworx.io/backup-by"] != "restore" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good check. Can we please some comment, why we are having schedule and restore check? So that it will be useful for others.

Also, We might need to avoid the running the post-Exec rule, if it created by restore case rt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Sure will update the comments.
  2. Also, We might need to avoid the running the post-Exec rule, if it created by restore case rt? - Yes, existing check will prevent this (backup.Annotations["portworx.io/backup-by"] != "restore") is there anything else am missing ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I meant of avoiding the running post-exec rule if the applicationbackup CR was created by restore operation.
But I think the check in the line 254 might not satisfy for restore's applicationbakup CR. So it should be fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One concern is that when the application backup CR is deleted, px-backup will also delete the Rule CR as well rt?
Will it not cause a issue that Rule CR might be available at line 264?

Copy link
Contributor Author

@sgajawada-px sgajawada-px Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A PR is also raised for the same https://github.com/portworx/px-backup/pull/1432
Here we are not cleaning the post exec rule cr from px-backup
Only stork is responsible for the post exec rule CR cleanup independent of whether to run the post-exec rules

log.ApplicationBackupLog(backup).Infof("Error value: %v", err)

if err != nil && !k8s_errors.IsNotFound(err) {
log.ApplicationBackupLog(backup).Infof("Error while deleting post exec rule: %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us have all the error msg and debug in the context to refer to "during cancel backup" or some other statement.

backup.Status.Status = stork_api.ApplicationBackupStatusFailed
backup.Status.Reason = message
err = a.client.Update(context.TODO(), backup)
if !a.execRulesCompleted[string(backup.UID)] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lalat-das Was there any reason, why we did not had it check in the previous fix? Is it because, it is the last step?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was the comment I had given earlier, guess it was missed incorporating.

logrus.Infof("Sending termination commands to kill pre-exec pod in kdmp or mixed driver path")
channel <- true
}
//terminationChannels = nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the commented code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commented code is an existing one not part of the this PR change.

@sgajawada-px sgajawada-px changed the base branch from master to px-backup-2.7.0 January 18, 2024 04:35
@sgajawada-px sgajawada-px added bug release-note Information about this change needs to be added to the release note labels Jan 18, 2024
@sgajawada-px
Copy link
Contributor Author

Since it is a change in the Rule, I would suggest to have functional test case define to cover all the rule cases.

I assume functional testcases from the px-backup.

  1. Need to check the existing test cases which can break due to this change.
  2. Add the new testcases to test the current fix.
    @siva-portworx is my understanding correct ??

@siva-portworx
Copy link
Contributor

Since it is a change in the Rule, I would suggest to have functional test case define to cover all the rule cases.

I assume functional testcases from the px-backup.

  1. Need to check the existing test cases which can break due to this change.
  2. Add the new testcases to test the current fix.
    @siva-portworx is my understanding correct ??

I want to have test case covering all the rule scenario. Because we had some regression in the rule fixed perviously.

@vikasit12 vikasit12 deleted the branch 24.3.0-px-backup-2.7.0 January 18, 2024 05:33
@vikasit12 vikasit12 closed this Jan 18, 2024
@sgajawada-px
Copy link
Contributor Author

Unit Testcases for the Fix are document in below sheet
https://docs.google.com/spreadsheets/d/10rtChd5sHgvzNsqvh8tUHHRAn86nZHV4UIbVlyTxAJM/edit?usp=sharing

@sgajawada-px sgajawada-px reopened this Jan 18, 2024
@sgajawada-px sgajawada-px changed the base branch from px-backup-2.7.0 to 24.3.0-px-backup-2.7.0 January 18, 2024 09:49
Copy link
Contributor

@siva-portworx siva-portworx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

…rule during the finalizer cleanup if they were not during the volume stage

// Delete post-exec rule CR after executing the postexec rules only for the manual backup as it is managed and created through the px-backup
// And also here the deletion will be skipped if the backupCR is created by the schedule or for the restore from the px-backup.
if len(backup.Spec.PostExecRule) != 0 && backup.Annotations[utils.PxbackupAnnotationCreateByKey] == utils.PxbackupAnnotationCreateByValue && backup.Annotations["portworx.io/backup-by"] != "backup-schedule" && backup.Annotations["portworx.io/backup-by"] != "restore" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move this in cleanupResources() to handle all Cr cleanup.
Also for kubevirt support we have added support do delete the ruleCR created by stork. You may want to check if there is no conflict.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You already mentioned px-backup will delete the ruleCr so there will be conflict stork deletes the Cr too and pxBackup deletes too. There should be a way to know that delete is triggered by px-backup. In such case, we should avoid deleting the Cr. In my opinion we should defer deleting ruleCr at all unless its created by stork.
If the applicationBackupCr created by Px-backup but user has issued deleted outside of px-backup, then we should leave the user to delete the Cr as well.

@@ -309,6 +354,12 @@ func (a *ApplicationBackupController) handle(ctx context.Context, backup *stork_
if backup.Status.Stage == stork_api.ApplicationBackupStageFinal {
return nil
}

// Initialize execRulesCompleted for the backup UID
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is redundant code. You already checking ifExists..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is in finalizer here it is during initialization.

backup.Status.Status = stork_api.ApplicationBackupStatusFailed
backup.Status.Reason = message
err = a.client.Update(context.TODO(), backup)
if !a.execRulesCompleted[string(backup.UID)] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was the comment I had given earlier, guess it was missed incorporating.

@sgajawada-px sgajawada-px merged commit 1bad4e2 into 24.3.0-px-backup-2.7.0 Feb 6, 2024
3 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug release-note Information about this change needs to be added to the release note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants