-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add temp Job to delete old monitoring stack #146
add temp Job to delete old monitoring stack #146
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly opendatahub-io/opendatahub-operator#703, the ownership of the no longer existent components belong to the opendatahub-operator
. So, I suggest to move this logic (either in same way or in another equivalent form) to the opendatahub-operator
.
The odh-model-controller
should not be responsible for fixing install, nor upgrade issues.
347a4e9
to
71e2aa0
Compare
I'd say that since the model monitoring stack was a component specific to model serving, model serving related repos should own the logic to also delete it during upgrade. |
Looking at this as working groups, I agree with this statement. However, I'm more on the thinking that the component that created the resources is the one that should do the cleanup. Model serving WG is still the owner, but the implementation should be on Furthermore, we should guarantee clean-up even if the serving stack is fully disabled in any given cluster. In such case, if resources went stale for any reason, moving/running this code at the operator is what makes the most sense. Well, I'm going to be strict in that fixing installation issues is not a concern of this repository/component. I'll tag other people to review and if they agree that this is the good repo to host the fix, I'll switch my review to an approval: @terrytangyuan @Jooho @danielezonca . |
Signed-off-by: Vedant Mahabaleshwarkar <vmahabal@redhat.com>
71e2aa0
to
1b1f48a
Compare
Discussed offline with @Jooho @terrytangyuan and @israel-hdez , for now we're ok with having this PR in this repo |
@israel-hdez @VedantMahabaleshwarkar @Jooho I manually tested on OSD cluster v4.12 with RC build. And all tests passed |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: heyselbi, VedantMahabaleshwarkar The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Discussed edgar's comment offline and Selbi has reviewed instead
50c596a
into
opendatahub-io:main
…)" This reverts commit 50c596a. Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>
Fixes:
https://issues.redhat.com/browse/RHOAIENG-293
Context:
The
model-monitoring
stack was deprecated in RHODS 2.5. This means that starting from 2.5, the RHODS Operator does not create the resources formodel-monitoring
during new installs. However, for upgrades from 2.4, the existing resources are not deleted. This PR adds a Job for RHODS 2.6 that deletes the resources from themodel-monitoring
stack that are no longer needed.Resources being deleted :
rhods-prometheus-operator
prometheus-rhods-model-monitoring
modelmesh-federated-metrics
Note: This Job is a temporary addition for RHODS 2.6 ONLY, and will be reverted in 2.7 as it will not be needed anymore.
PR changes :
Testing Instructions :
Prerequisites
Instructions to install RC builds on clusters (doesn't work on ROSA):
Testing Steps
Spin down RHODS Operator deployment to 0 in NS
redhat-ods-operator
Modify role
odh-model-controller-role
in NSredhat-ods-applications
as per the rbac changes in PRget
,list
,delete
permissions in apiGroupapps
for resourcesdeployments
,statefulsets
get
permissions for apiGroupproject.openshift.io
for resourcesprojects
Manually create a Job from the
remove-deprecated-monitoring.yaml
file in the PRManually verify the following resources no longer exist in the cluster in the NS
redhat-ods-monitoring
rhods-prometheus-operator
prometheus-rhods-model-monitoring
modelmesh-federated-metrics
Success! :)
The commits are squashed in a cohesive manner and have meaningful messages.
Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
The developer has manually tested the changes and verified that the changes work