Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(mr): re-enable model registry reconciler and add svc annotation to it #369

Open
wants to merge 3 commits into
base: incubating
Choose a base branch
from

Conversation

Al-Pragliola
Copy link
Contributor

@Al-Pragliola Al-Pragliola commented Feb 7, 2025

Description

Requires opendatahub-io/opendatahub-operator#1682

This PR aims to re-enable the model registry inference reconciler, which was disabled by default after the recent refactoring #324, and add an additional parameter to the reconciler function.

Changes:

  • removed config/base/odh_model_controller_manager_patch.yaml file as it is no longer used
  • added --model-registry-inference-reconcile to the manager manifest
  • added ModelRegistryServiceAnnotation
  • updated dep to github.com/kubeflow/model-registry
  • fixed dsc mr namespace retrieval function

How Has This Been Tested?

Prerequisites:

  • a model registry created using odh
  • a model version deployed

  • Enable image registry in the openshift cluster:
# enable default route for openshift image registry
oc patch configs.imageregistry.operator.openshift.io/cluster --type merge -p '{"spec":{"defaultRoute":true}}'
# get image registry host
export REGISTRY=`oc get route default-route -n openshift-image-registry --template='{{ .spec.host }}'`
# login to openshift registry
docker login -u $(oc whoami) -p $(oc whoami -t) $REGISTRY
  • Build and push the registry:
docker build . -f ./Containerfile -t ${REGISTRY}/opendatahub/odh-model-controller:latest
docker push ${REGISTRY}/opendatahub/odh-model-controller:latest
  • Scale down opendatahub operator to prevent reconciliation
oc scale deployment --replicas=0 -n openshift-operators opendatahub-operator-controller-manager
  • Patch the image in the odh-model-controller's deployment:
oc patch deployment odh-model-controller -n opendatahub --patch='{"spec": {"template": {"spec":{"containers": [{"name": "manager", "image": "image-registry.openshift-image-registry.svc:5000/opendatahub/odh-model-controller:latest", "args":["--metrics-bind-address=:8080", "--leader-elect", "--health-probe-bind-address=:8081", "--model-registry-inference-reconcile"]}]}}}}'
  • Add name label to the inferenceservice
oc label inferenceservices.serving.kserve.io -n inferenceservice-namespace inferenceservice-name modelregistry.opendatahub.io/name=MODEL-REGISTRY-NAME
  • Wait or watch odh-model-controller logs and then check the inferenceservice labels, a label called modelregistry.opendatahub.io/inference-service-id should have been added

  • Restore opendatahub operator replicas

oc scale deployment --replicas=1 -n openshift-operators opendatahub-operator-controller-manager

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

Copy link
Contributor

openshift-ci bot commented Feb 7, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link
Contributor

openshift-ci bot commented Feb 7, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Al-Pragliola
Once this PR has been reviewed and has the lgtm label, please assign hdefazio for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Al-Pragliola Al-Pragliola self-assigned this Feb 7, 2025
@Al-Pragliola Al-Pragliola changed the title feat(mr): re-enable post refactor and add svc annotation feat(mr): re-enable model registry reconciler and add svc annotation to it Feb 7, 2025
@Al-Pragliola Al-Pragliola marked this pull request as ready for review February 10, 2025 14:56
@Al-Pragliola Al-Pragliola force-pushed the al-pragliola-add-mr-svc-annotation branch 2 times, most recently from 4805b5a to 4f92815 Compare February 11, 2025 09:26
@Al-Pragliola
Copy link
Contributor Author

/cc @tarilabs @pboyd @dhirajsb

@hdefazio
Copy link
Contributor

@gabemontero Does this need to be in 2.18? If so, we need to land this today

@gabemontero
Copy link

@gabemontero Does this need to be in 2.18? If so, we need to land this today

I'll defer to @Al-Pragliola on what versions it needs to go into .... I'm still a bit new to ODH, but I was informed we were talking about getting this in 2.24 (which is about to go out) or 2.24.x / 2.25 @hdefazio ; apologies based on 2.24 I'm not sure where putting in 2.18 comes from.

@hdefazio
Copy link
Contributor

@gabemontero No worries - I was talking about RHOAI 2.18, which has a code freeze today so you would need to cherry pick this in a few places. But we can totally wait for ODH 2.25, which will be released in about a week (and therefore it would end up in RHOAI 2.20 based on how we do releases in the team)

@gabemontero
Copy link

@gabemontero No worries - I was talking about RHOAI 2.18, which has a code freeze today so you would need to cherry pick this in a few places. But we can totally wait for ODH 2.25, which will be released in about a week (and therefore it would end up in RHOAI 2.20 based on how we do releases in the team)

thanks for the clarification @hdefazio ... and fwiw @Al-Pragliola just clarified it for me in slack as well :-)

fwiw absolutely long term my team (the Backstage.io based project RHDH (Developer Hub) will want to prereq RHOAI and not ODH but we'll work with @Al-Pragliola and team to determine when it is appropriate to do that

@Al-Pragliola Al-Pragliola requested a review from spolti February 17, 2025 13:43
Comment on lines +18 to +20
errGetDSC = errors.New("failed to get DataScienceCluster")
errGetMRNamespaceFromDSC = errors.New("failed to get Model Registry Namespace from DataScienceCluster")
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the only usage to wrap it is to print this customized message below, does it need to be an error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's not needed, but any errors from this function will be joined into the parent function:

		if err != nil {
			return reconcileResult, errors.Join(reconcileErr, err)
		}

So I think it's more future proof to extract them as errors to have the ability to customize the handling with errors.Is

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure. just try to understand that

Comment on lines 36 to 54
if len(dscList.Items) > 0 {
isNsOk := false
dsc = dscList.Items[0]

ns, found, err := unstructured.NestedFieldCopy(dsc.Object, "spec", "components", "modelRegistry", "registriesNamespace")
if err == nil && found {
mrNamespaceFromDSC = ns.(string)
ns, found, err := unstructured.NestedFieldCopy(dsc.Object, "spec", "components", "modelregistry", "registriesNamespace")
if err != nil || !found {
return nil, fmt.Errorf("%w: %w", errGetMRNamespaceFromDSC, err)
}

mrNamespaceFromDSC, isNsOk = ns.(string)
if !isNsOk {
return nil, fmt.Errorf("%w: invalid namespace", errGetMRNamespaceFromDSC)
}
}

if mrNamespaceFromDSC == "" {
return nil, fmt.Errorf("%w: empty namespace", errGetMRNamespaceFromDSC)
}

log.Info("Model Registry Namespace from DataScienceCluster", "Namespace", mrNamespaceFromDSC)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits:
DSC should always be singleton (checks are done in operator), modelreg namespace will alwasy be set in DSC even it is set to Removed, or user explictility remove namespace when they create DSC

if really need , a check on len == 0 could be good

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but I don't think "local" code should be bound to the logic of external code, as Items is an array, it may be strict to check for length, but I sleep better knowing I'm not blindly going with .Items[0], it's more robust and really doesn't require much effort

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, did not mean to get the len check away, but might be good to have a check on == 0 as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean on dscList.Items ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, you already have
if len(dscList.Items) > 0 { // positive case} but it should be len(dscList) == 1
and the negatvie should cover == 0 || >1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, thanks for the suggestion

@@ -63,6 +63,7 @@ spec:
args:
- --leader-elect
- --health-probe-bind-address=:8081
- --model-registry-inference-reconcile
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do i understand this correctly:
with this set --model-registry-inference-reconicle it will always have "true"
thus NewModelRegistryInferenceServiceReconciler always gets called, even ModelReg is Removed in DSC?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's correct, do you think it's a blocker for merging this PR? cc @spolti

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont take my words as gating your PR if it is approved by serving guys.
i was more on the thinking of, we have built a cross-component depdency between modelreg and odh-model-controller.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spoiler alert - it's a work in progress to extract this controller and put it under the wings of the model registry (upstream)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shocks taken- if that is already on the agenda, good to let us know asap, so we can plan it from ourside

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah as soon as community approves we will let you know 👍🏼

Copy link
Contributor

@israel-hdez israel-hdez Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's correct, do you think it's a blocker for merging this PR?

I think it is a blocker for merging if odh-model-controller is going to fail when MR is disabled in the DSC. E.g. can the registries namespace be empty when disabled?

For developing it is OK to have it always on, we can ignore any errors. Not that much on the releases.

Also, not all users like unused features to be always on (i.e. NIM integration was put under a feature gate because of this)

Copy link
Contributor Author

@Al-Pragliola Al-Pragliola Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for your feedback! I have moved the controller under a feature gate like NIM in the latest commit, please let me know if this is the right approach @israel-hdez @spolti @zdtsw

I also added the needed PR in opendatahub-operator opendatahub-io/opendatahub-operator#1682

@Al-Pragliola Al-Pragliola force-pushed the al-pragliola-add-mr-svc-annotation branch from 651d7c9 to ef82fa3 Compare February 19, 2025 14:54
Signed-off-by: Alessio Pragliola <seth.pro@gmail.com>
@Al-Pragliola Al-Pragliola force-pushed the al-pragliola-add-mr-svc-annotation branch from 331ac04 to 97720b8 Compare February 19, 2025 18:43
Signed-off-by: Alessio Pragliola <seth.pro@gmail.com>
Signed-off-by: Alessio Pragliola <seth.pro@gmail.com>
return nil, fmt.Errorf("%w: %w", errGetDSC, err)
}

if len(dscList.Items) == 0 || len(dscList.Items) > 1 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the opendatahub-operator already checks it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it does, for reference of this change a link to the discussion with @zdtsw #369 (comment)

if err == nil && found {
mrNamespaceFromDSC = ns.(string)
}
if mrNamespaceFromDSC == "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to move this if together with the previous, wdyt?

}

log.Info("Model Registry Namespace from DataScienceCluster", "Namespace", mrNamespaceFromDSC)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the context of this message to log that MR was enabled?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants