-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable DSPO Debug Logs by Default and modify the "DSPA resource was not found" message to Debug #678
Conversation
A new image has been built to help with testing out this PR: To use this image run the following: cd $(mktemp -d)
git clone git@github.com:opendatahub-io/data-science-pipelines-operator.git
cd data-science-pipelines-operator/
git fetch origin pull/678/head
git checkout -b pullrequest bd8b98f0fcf548b0adb3ae855cc78f6fe14a90e1
oc new-project opendatahub
make deploy IMG="quay.io/opendatahub/data-science-pipelines-operator:pr-678" More instructions here on how to deploy and test a Data Science Pipelines Application. |
controllers/config/defaults.go
Outdated
@@ -178,6 +178,8 @@ const DefaultMaxConcurrentReconciles = 10 | |||
|
|||
const DefaultRequeueTime = time.Second * 20 | |||
|
|||
var DSPAdeleted = make(map[string]bool) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few question:
- Are we going to store all DSPA forever?
- How about concurrent deletion and creation using the same name? How the code is handling that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we going to store all DSPA forever?
We are storing just to compare in the future if the same DSPA name gets created which basically tackles your second question.
How about concurrent deletion and creation using the same name? How the code is handling that?
Line 185 and 186 in dspipeline_controller.go
is specifically created to tackle this condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May we have a unit test for this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is the right approach for this. Storing this sort of information memory is not scalable, and even though we can store a lot of dspa names, we would just lose that information upon a pod restart. Concurrency can be another issue as diego pointed out in the future if we scaled the operator.
the offending piece of code should probably be here:
Essentially we requeue after the cr has been deleted because of pod changes. If we can find a way to not queue after a cr is deleted that would be great, this would be the optimal approach. Have you considered this? if so, I'd be curious to know why we couldn't go this route.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HumairAK Unfortunately, reading through there is no such way as the SetupManager
still looks for the said resources as it belongs to the CR after deletion and is being watched by the Controller as specified (https://github.com/opendatahub-io/data-science-pipelines-operator/blob/main/controllers/dspipeline_controller.go#L507)
I tried using the predicates mentioned here (kubernetes-sigs/kubebuilder#1700 (comment)) to see if that helped filter out the Resources which are already deleted like the code snippet mentioned below but that doesn't help as mentioned in the Operator-SDK doc below.
This is an implementation of the controller that simply filters Delete events on Pods that have been confirmed deleted; the controller receives all Delete events that occur, and we may only care about resources that have not been completely deleted
func ignoreDeletionPredicate() predicate.Predicate {
return predicate.Funcs{
UpdateFunc: func(e event.UpdateEvent) bool {
// Ignore updates to CR status in which case metadata.Generation does not change
return e.ObjectOld.GetGeneration() != e.ObjectNew.GetGeneration()
},
DeleteFunc: func(e event.DeleteEvent) bool {
// Evaluates to false if the object has been confirmed deleted.
return !e.DeleteStateUnknown
},
}
}
func (r *DSPAReconciler) SetupWithManager(mgr ctrl.Manager) error {
return ctrl.NewControllerManagedBy(mgr).
For(&dspav1alpha1.DataSciencePipelinesApplication{}).
Owns(&appsv1.Deployment{}).
....
....
....
WithEventFilter(ignoreDeletionPredicate()).
Complete(r)
}
But reading through your comment, instead of creating a separate variable to store in memory with the flag, is there any harm in storing a bool flag in the DataSciencePipelinesApplication struct itself here (https://github.com/opendatahub-io/data-science-pipelines-operator/blob/main/api/v1alpha1/dspipeline_types.go#L404) and leverage the same to stop reconcilation in dspipeline_controller.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well this happens when dspa is deleted, so adding a bool to dspa is not going to help I think.
Another thought I had was utilizing finalizers for this. Having DSPA only remove the finalizer when all pods are cleaned up, but I don't think it makes sense to block DSPA deletion on that, it would have an unwanted impact to deletion times in the UI.
Since we are already exiting reconcile early if we detect DSPA is not found and not reporting an error. I'm thinking we just convert the "DSPA not found" log to a debug, and just disable debug logs by default (we should be doing this anyway).
I would like to hear others' thoughts on this approach though. @gregsheremeta or @gmfrasca any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finalizers is the correct approach.
1 kube delete is called on DSPA. deletionTimestamp is set
2 UI treats any DSPA with a deletionTimestamp as deleted. "Pipeline Server" is no longer rendered almost immediately.
3 Resources are cleaned up.
4 resources in finalizer are gone -> finalizer is removed -> DSPA is actually deleted from kube apiserver
I've never coded this myself, so I might have this a litte imperfect, but this is roughly the pattern I've seen in production kube systems I've worked on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is roughly the pattern I've seen in production kube systems I've worked on
I do concede, though, that this was quite the nightmare, because things were constantly getting stuck
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are already exiting reconcile early if we detect DSPA is not found and not reporting an error. I'm thinking we just convert the "DSPA not found" log to a debug, and just disable debug logs by default (we should be doing this anyway).
Ok, yeah, I'm sold. I guess this drastically changes this PR -- sorry Achyut :)
so l ine 180 below, change to roughly
log.Debug("DSPA resource was not found, assuming it was recently deleted, nothing to do here")
and then set DSPO default to debug logging (probably in a different PR 😄)
Change to PR detected. A new PR build was completed. |
Change to PR detected. A new PR build was completed. |
Change to PR detected. A new PR build was completed. |
1 similar comment
Change to PR detected. A new PR build was completed. |
Change to PR detected. A new PR build was completed. |
Change to PR detected. A new PR build was completed. |
Change to PR detected. A new PR build was completed. |
Change to PR detected. A new PR build was completed. |
Change to PR detected. A new PR build was completed. |
Change to PR detected. A new PR build was completed. |
Change to PR detected. A new PR build was completed. |
Signed-off-by: Achyut Madhusudan <amadhusu@redhat.com>
Signed-off-by: Achyut Madhusudan <amadhusu@redhat.com>
Change to PR detected. A new PR build was completed. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: diegolovison The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
/verified |
/hold |
gonna split this up into separate tasks |
/close |
@gregsheremeta: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
The issue resolved by this Pull Request:
Resolves RHOAIENG-1650
Description of your changes:
When deleting a DSPA we continue to see these types of logs:
This PR resolves this by ensuring DSPO logs default to Debug logs with the first commit.
The second commit modifies the "DSPA resources was not found" message to a Debug instead of an Info.
Testing instructions
"DEBUG DSPA resource was not found"
does not appearChecklist