applying tensorflow sample gives webhook failure #235

ryandawsonuk · 2019-07-09T13:55:26Z

If I try applying the default model directly with kubectl apply -f https://raw.githubusercontent.com/kubeflow/kfserving/master/docs/samples/tensorflow/tensorflow.yaml then I get:

Error from server (InternalError): error when creating "https://raw.githubusercontent.com/kubeflow/kfserving/master/docs/samples/tensorflow/tensorflow.yaml": Internal error occurred: failed calling webhook "kfservice.kfserving-webhook-server.defaulter": Post https://kfserving-webhook-server-service.kfserving-system.svc:443/mutate-kfservices?timeout=30s: no endpoints available for service "kfserving-webhook-server-service"

Note that if I take the other TF model from the canary example (gs://kfserving-samples/models/tensorflow/flowers-2) and apply just that one then it does let me apply the resource.

This is on v0.1.0. Related #233

The text was updated successfully, but these errors were encountered:

rakelkar · 2019-07-09T15:34:27Z

I've seen this happen sometimes too.. maybe try killing the controller? We may want to consider separating the webhook into a separate container to scale it for redundancy?

yuzisun · 2019-07-09T15:46:38Z

I agree that we should separate the webhook as its own deployment.

ellistarn · 2019-07-09T19:48:34Z

This typically happens when the controller hasn't stood up yet. Agreed the separating to its own deployment is the way to go.

…

On Tue, Jul 9, 2019 at 8:46 AM Dan Sun ***@***.***> wrote: I agree that we should separate the webhook as its own deployment. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#235>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AATXLCFH4Z6SIQJJQUZID73P6SXF5ANCNFSM4H7FZDRQ> .

-- </Ellis>

ryandawsonuk · 2019-07-10T08:51:32Z

Drilling into the kfserving-controller-manager, I find it has this error in the logs:

{"level":"info","ts":1562748512.9474301,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"kfserving-controller","worker count":1}
E0710 08:48:33.939917       1 runtime.go:69] Observed a panic: &errors.errorString{s:"Unable to unmarshall json string due to invalid character '\"' after object key:value pair "} (Unable to unmarshall json string due to invalid character '"' after object key:value pair )
/go/src/github.com/kubeflow/kfserving/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:76
/go/src/github.com/kubeflow/kfserving/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65
/go/src/github.com/kubeflow/kfserving/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51
/usr/local/go/src/runtime/asm_amd64.s:573
/usr/local/go/src/runtime/panic.go:502
/go/src/github.com/kubeflow/kfserving/pkg/controller/kfservice/resources/knative/configuration.go:50
/go/src/github.com/kubeflow/kfserving/pkg/controller/kfservice/reconcilers/knative/configuration_reconciler.go:51
/go/src/github.com/kubeflow/kfserving/pkg/controller/kfservice/kfservice_controller.go:150
/go/src/github.com/kubeflow/kfserving/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:215
/go/src/github.com/kubeflow/kfserving/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158
/go/src/github.com/kubeflow/kfserving/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133
/go/src/github.com/kubeflow/kfserving/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134
/go/src/github.com/kubeflow/kfserving/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88
/usr/local/go/src/runtime/asm_amd64.s:2361
panic: Unable to unmarshall json string due to invalid character '"' after object key:value pair  [recovered]
        panic: Unable to unmarshall json string due to invalid character '"' after object key:value pair 

goroutine 188 [running]:
github.com/kubeflow/kfserving/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
        /go/src/github.com/kubeflow/kfserving/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:58 +0x107
panic(0x1189860, 0xc420782000)

It goes into CrashLoopBackOff and then every resource deployment in the cluster fails with that webhook issue (even ones not related to KFServing).

ryandawsonuk · 2019-07-10T08:55:35Z

I believe this would be fixed by #231

ryandawsonuk · 2019-07-10T09:54:54Z

Also I couldn't fix my cluster by running kubectl delete -f https://raw.githubusercontent.com/kubeflow/kfserving/master/install/v0.1.0/kfserving.yaml. I still get failed to create resource: Internal error occurred: failed calling webhook "kfservice.kfserving-webhook-server.deployment-mutator" on deploying any resources, even though I don't even have a kf-serving namespace anymore.

I had to run kubectl delete mutatingwebhookconfigurations kfservice.serving.kubeflow.org manually to be able to deploy resources again.

ellistarn · 2019-07-10T22:15:01Z

I think this is due to the configmap error. Unless there's something else we can do, I'm going to
/close

k8s-ci-robot · 2019-07-10T22:15:03Z

@ellis-bigelow: Closing this issue.

In response to this:

I think this is due to the configmap error. Unless there's something else we can do, I'm going to
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

abrarmajeedi · 2020-07-14T23:20:45Z

Try this, it worked for me:
kubectl delete mutatingwebhookconfigurations inferenceservice.serving.kubeflow.org && kubectl delete validatingwebhookconfigurations inferenceservice.serving.kubeflow.org && kubectl delete po kfserving-controller-manager-0 -n kfserving-system

issue-label-bot · 2020-07-14T23:20:52Z

Issue-Label Bot is automatically applying the labels:

Label	Probability
kind/bug	0.91

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

Motivation After many attempts to run the FVT tests it became clear that an undersized Kubernetes cluster (insufficient resources) prevented the FVTs to succeed. Modifications Update the FVT README with a Prerequisite section that outlines the required CLIs and the minimum cluster size. Result The FVT tests completed successfully. Signed-off-by: Christian Kadner <ckadner@us.ibm.com>

[Cherry-Pick] Python vulnerability fixes (kserve#3441)

k8s-ci-robot closed this as completed Jul 10, 2019

ryandawsonuk mentioned this issue Sep 23, 2019

Internal error occurred: failed calling webhook SeldonIO/seldon-core#877

Closed

AGKhalil mentioned this issue Nov 26, 2019

Kubeflow StatefulSets and Deployments fail to start #580

Closed

issue-label-bot bot added the kind/bug label Jul 14, 2020

hdefazio pushed a commit to hdefazio/kserve that referenced this issue Dec 10, 2024

Merge pull request kserve#235 from spolti/RHOAIENG-3381

298fe40

[Cherry-Pick] Python vulnerability fixes (kserve#3441)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

applying tensorflow sample gives webhook failure #235

applying tensorflow sample gives webhook failure #235

ryandawsonuk commented Jul 9, 2019 •

edited

Loading

rakelkar commented Jul 9, 2019

yuzisun commented Jul 9, 2019

ellistarn commented Jul 9, 2019 via email

ryandawsonuk commented Jul 10, 2019 •

edited

Loading

ryandawsonuk commented Jul 10, 2019

ryandawsonuk commented Jul 10, 2019 •

edited

Loading

ellistarn commented Jul 10, 2019

k8s-ci-robot commented Jul 10, 2019

abrarmajeedi commented Jul 14, 2020 •

edited

Loading

issue-label-bot bot commented Jul 14, 2020

applying tensorflow sample gives webhook failure #235

applying tensorflow sample gives webhook failure #235

Comments

ryandawsonuk commented Jul 9, 2019 • edited Loading

rakelkar commented Jul 9, 2019

yuzisun commented Jul 9, 2019

ellistarn commented Jul 9, 2019 via email

ryandawsonuk commented Jul 10, 2019 • edited Loading

ryandawsonuk commented Jul 10, 2019

ryandawsonuk commented Jul 10, 2019 • edited Loading

ellistarn commented Jul 10, 2019

k8s-ci-robot commented Jul 10, 2019

abrarmajeedi commented Jul 14, 2020 • edited Loading

issue-label-bot bot commented Jul 14, 2020

ryandawsonuk commented Jul 9, 2019 •

edited

Loading

ryandawsonuk commented Jul 10, 2019 •

edited

Loading

ryandawsonuk commented Jul 10, 2019 •

edited

Loading

abrarmajeedi commented Jul 14, 2020 •

edited

Loading