-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(mr): re-enable model registry reconciler and add svc annotation to it #369
base: incubating
Are you sure you want to change the base?
feat(mr): re-enable model registry reconciler and add svc annotation to it #369
Conversation
Skipping CI for Draft Pull Request. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Al-Pragliola The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
4805b5a
to
4f92815
Compare
internal/controller/serving/reconcilers/model_registry_reconciler.go
Outdated
Show resolved
Hide resolved
@gabemontero Does this need to be in 2.18? If so, we need to land this today |
I'll defer to @Al-Pragliola on what versions it needs to go into .... I'm still a bit new to ODH, but I was informed we were talking about getting this in 2.24 (which is about to go out) or 2.24.x / 2.25 @hdefazio ; apologies based on 2.24 I'm not sure where putting in 2.18 comes from. |
@gabemontero No worries - I was talking about RHOAI 2.18, which has a code freeze today so you would need to cherry pick this in a few places. But we can totally wait for ODH 2.25, which will be released in about a week (and therefore it would end up in RHOAI 2.20 based on how we do releases in the team) |
thanks for the clarification @hdefazio ... and fwiw @Al-Pragliola just clarified it for me in slack as well :-) fwiw absolutely long term my team (the Backstage.io based project RHDH (Developer Hub) will want to prereq RHOAI and not ODH but we'll work with @Al-Pragliola and team to determine when it is appropriate to do that |
errGetDSC = errors.New("failed to get DataScienceCluster") | ||
errGetMRNamespaceFromDSC = errors.New("failed to get Model Registry Namespace from DataScienceCluster") | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the only usage to wrap it is to print this customized message below, does it need to be an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's not needed, but any errors from this function will be joined into the parent function:
if err != nil {
return reconcileResult, errors.Join(reconcileErr, err)
}
So I think it's more future proof to extract them as errors to have the ability to customize the handling with errors.Is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure. just try to understand that
if len(dscList.Items) > 0 { | ||
isNsOk := false | ||
dsc = dscList.Items[0] | ||
|
||
ns, found, err := unstructured.NestedFieldCopy(dsc.Object, "spec", "components", "modelRegistry", "registriesNamespace") | ||
if err == nil && found { | ||
mrNamespaceFromDSC = ns.(string) | ||
ns, found, err := unstructured.NestedFieldCopy(dsc.Object, "spec", "components", "modelregistry", "registriesNamespace") | ||
if err != nil || !found { | ||
return nil, fmt.Errorf("%w: %w", errGetMRNamespaceFromDSC, err) | ||
} | ||
|
||
mrNamespaceFromDSC, isNsOk = ns.(string) | ||
if !isNsOk { | ||
return nil, fmt.Errorf("%w: invalid namespace", errGetMRNamespaceFromDSC) | ||
} | ||
} | ||
|
||
if mrNamespaceFromDSC == "" { | ||
return nil, fmt.Errorf("%w: empty namespace", errGetMRNamespaceFromDSC) | ||
} | ||
|
||
log.Info("Model Registry Namespace from DataScienceCluster", "Namespace", mrNamespaceFromDSC) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nits:
DSC should always be singleton (checks are done in operator), modelreg namespace will alwasy be set in DSC even it is set to Removed, or user explictility remove namespace when they create DSC
if really need , a check on len == 0 could be good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, but I don't think "local" code should be bound to the logic of external code, as Items is an array, it may be strict to check for length, but I sleep better knowing I'm not blindly going with .Items[0], it's more robust and really doesn't require much effort
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, did not mean to get the len check away, but might be good to have a check on == 0 as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you mean on dscList.Items
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, you already have
if len(dscList.Items) > 0 { // positive case} but it should be len(dscList) == 1
and the negatvie should cover == 0 || >1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, thanks for the suggestion
config/manager/manager.yaml
Outdated
@@ -63,6 +63,7 @@ spec: | |||
args: | |||
- --leader-elect | |||
- --health-probe-bind-address=:8081 | |||
- --model-registry-inference-reconcile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do i understand this correctly:
with this set --model-registry-inference-reconicle
it will always have "true
"
thus NewModelRegistryInferenceServiceReconciler
always gets called, even ModelReg is Removed in DSC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's correct, do you think it's a blocker for merging this PR? cc @spolti
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dont take my words as gating your PR if it is approved by serving guys.
i was more on the thinking of, we have built a cross-component depdency between modelreg and odh-model-controller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spoiler alert - it's a work in progress to extract this controller and put it under the wings of the model registry (upstream)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shocks taken- if that is already on the agenda, good to let us know asap, so we can plan it from ourside
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah as soon as community approves we will let you know 👍🏼
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's correct, do you think it's a blocker for merging this PR?
I think it is a blocker for merging if odh-model-controller is going to fail when MR is disabled in the DSC. E.g. can the registries namespace be empty when disabled?
For developing it is OK to have it always on, we can ignore any errors. Not that much on the releases.
Also, not all users like unused features to be always on (i.e. NIM integration was put under a feature gate because of this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for your feedback! I have moved the controller under a feature gate like NIM in the latest commit, please let me know if this is the right approach @israel-hdez @spolti @zdtsw
I also added the needed PR in opendatahub-operator opendatahub-io/opendatahub-operator#1682
651d7c9
to
ef82fa3
Compare
Signed-off-by: Alessio Pragliola <seth.pro@gmail.com>
331ac04
to
97720b8
Compare
Signed-off-by: Alessio Pragliola <seth.pro@gmail.com>
Signed-off-by: Alessio Pragliola <seth.pro@gmail.com>
return nil, fmt.Errorf("%w: %w", errGetDSC, err) | ||
} | ||
|
||
if len(dscList.Items) == 0 || len(dscList.Items) > 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't the opendatahub-operator already checks it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes it does, for reference of this change a link to the discussion with @zdtsw #369 (comment)
if err == nil && found { | ||
mrNamespaceFromDSC = ns.(string) | ||
} | ||
if mrNamespaceFromDSC == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to move this if together with the previous, wdyt?
} | ||
|
||
log.Info("Model Registry Namespace from DataScienceCluster", "Namespace", mrNamespaceFromDSC) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the context of this message to log that MR was enabled?
Description
Requires opendatahub-io/opendatahub-operator#1682
This PR aims to re-enable the model registry inference reconciler, which was disabled by default after the recent refactoring #324, and add an additional parameter to the reconciler function.
Changes:
config/base/odh_model_controller_manager_patch.yaml
file as it is no longer used--model-registry-inference-reconcile
to the manager manifestModelRegistryServiceAnnotation
github.com/kubeflow/model-registry
How Has This Been Tested?
Prerequisites:
oc patch deployment odh-model-controller -n opendatahub --patch='{"spec": {"template": {"spec":{"containers": [{"name": "manager", "image": "image-registry.openshift-image-registry.svc:5000/opendatahub/odh-model-controller:latest", "args":["--metrics-bind-address=:8080", "--leader-elect", "--health-probe-bind-address=:8081", "--model-registry-inference-reconcile"]}]}}}}'
Wait or watch odh-model-controller logs and then check the inferenceservice labels, a label called
modelregistry.opendatahub.io/inference-service-id
should have been addedRestore opendatahub operator replicas
Merge criteria: