Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

applying tensorflow sample gives webhook failure #235

Closed
ryandawsonuk opened this issue Jul 9, 2019 · 10 comments
Closed

applying tensorflow sample gives webhook failure #235

ryandawsonuk opened this issue Jul 9, 2019 · 10 comments
Labels

Comments

@ryandawsonuk
Copy link
Contributor

ryandawsonuk commented Jul 9, 2019

If I try applying the default model directly with kubectl apply -f https://raw.githubusercontent.com/kubeflow/kfserving/master/docs/samples/tensorflow/tensorflow.yaml then I get:

Error from server (InternalError): error when creating "https://raw.githubusercontent.com/kubeflow/kfserving/master/docs/samples/tensorflow/tensorflow.yaml": Internal error occurred: failed calling webhook "kfservice.kfserving-webhook-server.defaulter": Post https://kfserving-webhook-server-service.kfserving-system.svc:443/mutate-kfservices?timeout=30s: no endpoints available for service "kfserving-webhook-server-service"

Note that if I take the other TF model from the canary example (gs://kfserving-samples/models/tensorflow/flowers-2) and apply just that one then it does let me apply the resource.

This is on v0.1.0. Related #233

@rakelkar
Copy link
Contributor

rakelkar commented Jul 9, 2019

I've seen this happen sometimes too.. maybe try killing the controller? We may want to consider separating the webhook into a separate container to scale it for redundancy?

@yuzisun
Copy link
Member

yuzisun commented Jul 9, 2019

I agree that we should separate the webhook as its own deployment.

@ellistarn
Copy link
Contributor

ellistarn commented Jul 9, 2019 via email

@ryandawsonuk
Copy link
Contributor Author

ryandawsonuk commented Jul 10, 2019

Drilling into the kfserving-controller-manager, I find it has this error in the logs:

{"level":"info","ts":1562748512.9474301,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"kfserving-controller","worker count":1}
E0710 08:48:33.939917       1 runtime.go:69] Observed a panic: &errors.errorString{s:"Unable to unmarshall json string due to invalid character '\"' after object key:value pair "} (Unable to unmarshall json string due to invalid character '"' after object key:value pair )
/go/src/github.com/kubeflow/kfserving/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:76
/go/src/github.com/kubeflow/kfserving/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65
/go/src/github.com/kubeflow/kfserving/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51
/usr/local/go/src/runtime/asm_amd64.s:573
/usr/local/go/src/runtime/panic.go:502
/go/src/github.com/kubeflow/kfserving/pkg/controller/kfservice/resources/knative/configuration.go:50
/go/src/github.com/kubeflow/kfserving/pkg/controller/kfservice/reconcilers/knative/configuration_reconciler.go:51
/go/src/github.com/kubeflow/kfserving/pkg/controller/kfservice/kfservice_controller.go:150
/go/src/github.com/kubeflow/kfserving/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:215
/go/src/github.com/kubeflow/kfserving/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158
/go/src/github.com/kubeflow/kfserving/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133
/go/src/github.com/kubeflow/kfserving/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134
/go/src/github.com/kubeflow/kfserving/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88
/usr/local/go/src/runtime/asm_amd64.s:2361
panic: Unable to unmarshall json string due to invalid character '"' after object key:value pair  [recovered]
        panic: Unable to unmarshall json string due to invalid character '"' after object key:value pair 

goroutine 188 [running]:
github.com/kubeflow/kfserving/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
        /go/src/github.com/kubeflow/kfserving/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:58 +0x107
panic(0x1189860, 0xc420782000)

It goes into CrashLoopBackOff and then every resource deployment in the cluster fails with that webhook issue (even ones not related to KFServing).

@ryandawsonuk
Copy link
Contributor Author

I believe this would be fixed by #231

@ryandawsonuk
Copy link
Contributor Author

ryandawsonuk commented Jul 10, 2019

Also I couldn't fix my cluster by running kubectl delete -f https://raw.githubusercontent.com/kubeflow/kfserving/master/install/v0.1.0/kfserving.yaml. I still get failed to create resource: Internal error occurred: failed calling webhook "kfservice.kfserving-webhook-server.deployment-mutator" on deploying any resources, even though I don't even have a kf-serving namespace anymore.

I had to run kubectl delete mutatingwebhookconfigurations kfservice.serving.kubeflow.org manually to be able to deploy resources again.

@ellistarn
Copy link
Contributor

I think this is due to the configmap error. Unless there's something else we can do, I'm going to
/close

@k8s-ci-robot
Copy link
Contributor

@ellis-bigelow: Closing this issue.

In response to this:

I think this is due to the configmap error. Unless there's something else we can do, I'm going to
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@abrarmajeedi
Copy link

abrarmajeedi commented Jul 14, 2020

Try this, it worked for me:
kubectl delete mutatingwebhookconfigurations inferenceservice.serving.kubeflow.org && kubectl delete validatingwebhookconfigurations inferenceservice.serving.kubeflow.org && kubectl delete po kfserving-controller-manager-0 -n kfserving-system

@issue-label-bot
Copy link

Issue-Label Bot is automatically applying the labels:

Label Probability
kind/bug 0.91

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

rafvasq pushed a commit to rafvasq/kserve that referenced this issue Jul 21, 2023
Motivation

After many attempts to run the FVT tests it became clear that an undersized Kubernetes cluster (insufficient resources) prevented the FVTs to succeed.

Modifications

Update the FVT README with a Prerequisite section that outlines the required CLIs and the minimum cluster size.

Result

The FVT tests completed successfully.

Signed-off-by: Christian Kadner <ckadner@us.ibm.com>
hdefazio pushed a commit to hdefazio/kserve that referenced this issue Dec 10, 2024
[Cherry-Pick] Python vulnerability fixes (kserve#3441)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants