Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: modelmesh controller have error logs when kserve and ModelMesh run in same namespace #127

Open
vaibhavjainwiz opened this issue Dec 7, 2023 · 1 comment
Assignees

Comments

@vaibhavjainwiz
Copy link
Member

vaibhavjainwiz commented Dec 7, 2023

When kserve and modelmeh are running in the same namespace, modelmesh controller show these errors:

{"level":"error","ts":"2023-12-07T11:33:47Z","msg":"Reconciler error","controller":"predictor","controllerGroup":"serving.kserve.io","controllerKind":"Predictor","Predictor":{"name":"caikit-tgis-example-isvc","namespace":"isvc_kserve-demo"},"namespace":"isvc_kserve-demo","name":"caikit-tgis-example-isvc","reconcileID":"868aa907-1733-408b-a8cd-482ac234f616","error":"failed to remove corresponding VModel for deleted Predictor kserve-demo/caikit-tgis-example-isvc: rpc error: code = Unavailable desc = last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 10.128.0.84:8033: i/o timeout\"","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/root/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:329\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/root/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:274\ns...
{"level":"error","ts":"2023-12-07T11:33:47Z","msg":"Reconciler error","controller":"predictor","controllerGroup":"serving.kserve.io","controllerKind":"Predictor","Predictor":{"name":"example-onnx-mnist","namespace":"isvc_kserve-demo"},"namespace":"isvc_kserve-demo","name":"example-onnx-mnist","reconcileID":"4736d0d4-e010-4915-a537-07634c94d85f","error":"failed to SetVModel for InferenceService example-onnx-mnist: rpc error: code = Unavailable desc = last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 10.128.0.84:8033: i/o timeout\"","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/root/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:329\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/root/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:274\nsigs.k8s.io/controller-runtime/pkg/internal/contro...

because kserve-demo namespace is a member of ServiceMeshMemberRole due to which traffic is not passing from modelmesh-controller pod to modelmesh runtime pod. For a quick fix,below NetworkPolicy could be created in kserve-demo namespace which allows traffic from opendatahub namespace.

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-from-opendatahub-ns
  namespace: kserve-demo
  labels:
    app.kubernetes.io/version: release-v1.9
    networking.knative.dev/ingress-provider: istio
spec:
  podSelector: {}
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: opendatahub
  policyTypes:
    - Ingress
@vaibhavjainwiz
Copy link
Member Author

Below NetworkPolicy should be created to resolve this issue:

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-from-opendatahub-ns
  namespace: kserve-demo
spec:
  podSelector: {}
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: opendatahub
  policyTypes:
    - Ingress 

Please follow below thread for more details :
https://redhat-internal.slack.com/archives/C065ARTVA80/p1702293019814919?thread_ts=1701693652.733169&cid=C065ARTVA80

@vaibhavjainwiz vaibhavjainwiz linked a pull request Dec 12, 2023 that will close this issue
3 tasks
@heyselbi heyselbi moved this from New/Backlog to Under Review in ODH Model Serving Planning Dec 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Under Review
Development

Successfully merging a pull request may close this issue.

1 participant