Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubeflow StatefulSets and Deployments fail to start #580

Closed
AGKhalil opened this issue Nov 26, 2019 · 3 comments
Closed

Kubeflow StatefulSets and Deployments fail to start #580

AGKhalil opened this issue Nov 26, 2019 · 3 comments
Labels

Comments

@AGKhalil
Copy link

/kind bug

I've been trying to get Kubeflow to work on a local k8s cluster without success. Interestingly, this problem only occurs when I delete and reapply my Kubeflow config. So if I completely remove my cluster and do a fresh install, Kubeflow works just fine. But once I delete Kubeflow and start it again I get the following.

kubectl get all -n kubeflow
NAME                                                   TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
service/admission-webhook-service                      ClusterIP   10.102.247.165   <none>        443/TCP             2m7s
service/application-controller-service                 ClusterIP   10.99.242.247    <none>        443/TCP             2m8s
service/argo-ui                                        NodePort    10.109.162.157   <none>        80:31308/TCP        2m8s
service/centraldashboard                               ClusterIP   10.97.140.39     <none>        80/TCP              2m7s
service/jupyter-web-app-service                        ClusterIP   10.101.71.122    <none>        80/TCP              2m7s
service/katib-controller                               ClusterIP   10.99.193.230    <none>        443/TCP             2m4s
service/katib-db                                       ClusterIP   10.96.18.211     <none>        3306/TCP            2m4s
service/katib-manager                                  ClusterIP   10.101.50.97     <none>        6789/TCP            2m4s
service/katib-ui                                       ClusterIP   10.100.172.154   <none>        80/TCP              2m4s
service/kfserving-controller-manager-metrics-service   ClusterIP   10.101.59.198    <none>        8443/TCP            2m5s
service/kfserving-controller-manager-service           ClusterIP   10.107.170.76    <none>        443/TCP             2m5s
service/metadata-db                                    ClusterIP   10.107.17.104    <none>        3306/TCP            2m7s
service/metadata-envoy-service                         ClusterIP   10.98.220.136    <none>        9090/TCP            2m7s
service/metadata-grpc-service                          ClusterIP   10.103.68.246    <none>        8080/TCP            2m7s
service/metadata-service                               ClusterIP   10.105.210.236   <none>        8080/TCP            2m6s
service/metadata-ui                                    ClusterIP   10.106.243.214   <none>        80/TCP              2m6s
service/minio-service                                  ClusterIP   10.103.33.32     <none>        9000/TCP            2m3s
service/ml-pipeline                                    ClusterIP   10.104.150.93    <none>        8888/TCP,8887/TCP   2m4s
service/ml-pipeline-ml-pipeline-visualizationserver    ClusterIP   10.102.80.25     <none>        8888/TCP            2m2s
service/ml-pipeline-tensorboard-ui                     ClusterIP   10.96.155.180    <none>        80/TCP              2m3s
service/ml-pipeline-ui                                 ClusterIP   10.108.171.200   <none>        80/TCP              2m3s
service/mysql                                          ClusterIP   10.101.151.255   <none>        3306/TCP            2m3s
service/notebook-controller-service                    ClusterIP   10.104.15.92     <none>        443/TCP             2m6s
service/profiles-kfam                                  ClusterIP   10.111.77.191    <none>        8081/TCP            2m2s
service/pytorch-operator                               ClusterIP   10.100.118.227   <none>        8443/TCP            2m6s
service/seldon-operator-controller-manager-service     ClusterIP   10.111.108.240   <none>        443/TCP             2m1s
service/tensorboard                                    ClusterIP   10.105.176.219   <none>        9000/TCP            2m5s
service/tf-job-operator                                ClusterIP   10.100.165.99    <none>        8443/TCP            2m4s
service/webhook-server-service                         ClusterIP   10.104.224.25    <none>        443/TCP             2m1s

NAME                                                          READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/admission-webhook-deployment                  0/1     0            0           2m7s
deployment.apps/argo-ui                                       0/1     0            0           2m8s
deployment.apps/centraldashboard                              0/1     0            0           2m7s
deployment.apps/jupyter-web-app-deployment                    0/1     0            0           2m7s
deployment.apps/katib-controller                              0/1     0            0           2m4s
deployment.apps/katib-db                                      0/1     0            0           2m4s
deployment.apps/katib-manager                                 0/1     0            0           2m4s
deployment.apps/katib-ui                                      0/1     0            0           2m4s
deployment.apps/metadata-db                                   0/1     0            0           2m6s
deployment.apps/metadata-deployment                           0/1     0            0           2m6s
deployment.apps/metadata-envoy-deployment                     0/1     0            0           2m6s
deployment.apps/metadata-grpc-deployment                      0/1     0            0           2m6s
deployment.apps/metadata-ui                                   0/1     0            0           2m6s
deployment.apps/minio                                         0/1     0            0           2m3s
deployment.apps/ml-pipeline                                   0/1     0            0           2m4s
deployment.apps/ml-pipeline-ml-pipeline-visualizationserver   0/1     0            0           2m2s
deployment.apps/ml-pipeline-persistenceagent                  0/1     0            0           2m3s
deployment.apps/ml-pipeline-scheduledworkflow                 0/1     0            0           2m2s
deployment.apps/ml-pipeline-ui                                0/1     0            0           2m3s
deployment.apps/ml-pipeline-viewer-controller-deployment      0/1     0            0           2m3s
deployment.apps/mysql                                         0/1     0            0           2m3s
deployment.apps/notebook-controller-deployment                0/1     0            0           2m6s
deployment.apps/profiles-deployment                           0/1     0            0           2m2s
deployment.apps/pytorch-operator                              0/1     0            0           2m6s
deployment.apps/spartakus-volunteer                           0/1     0            0           2m5s
deployment.apps/tensorboard                                   0/1     0            0           2m5s
deployment.apps/tf-job-operator                               0/1     0            0           2m4s
deployment.apps/workflow-controller                           0/1     0            0           2m8s

NAME                                                                     DESIRED   CURRENT   READY   AGE
replicaset.apps/admission-webhook-deployment-78d899bf68                  1         0         0       2m7s
replicaset.apps/argo-ui-55b859f7d7                                       1         0         0       2m8s
replicaset.apps/centraldashboard-75474d6f94                              1         0         0       2m7s
replicaset.apps/jupyter-web-app-deployment-6c8f4c8997                    1         0         0       2m7s
replicaset.apps/katib-controller-85995d4f5f                              1         0         0       2m4s
replicaset.apps/katib-db-6558656ffc                                      1         0         0       2m4s
replicaset.apps/katib-manager-79f78fc794                                 1         0         0       2m4s
replicaset.apps/katib-ui-fcdb8f8c5                                       1         0         0       2m4s
replicaset.apps/metadata-db-5dd459cc                                     1         0         0       2m6s
replicaset.apps/metadata-deployment-b745d8bcf                            1         0         0       2m6s
replicaset.apps/metadata-envoy-deployment-7ccf5c4f74                     1         0         0       2m6s
replicaset.apps/metadata-grpc-deployment-6496f66c8c                      1         0         0       2m6s
replicaset.apps/metadata-ui-78f5b59b56                                   1         0         0       2m6s
replicaset.apps/minio-6f48db9cc4                                         1         0         0       2m3s
replicaset.apps/ml-pipeline-844645fd                                     1         0         0       2m4s
replicaset.apps/ml-pipeline-ml-pipeline-visualizationserver-865894f5f7   1         0         0       2m2s
replicaset.apps/ml-pipeline-persistenceagent-66f89b56d9                  1         0         0       2m3s
replicaset.apps/ml-pipeline-scheduledworkflow-57445ddf88                 1         0         0       2m2s
replicaset.apps/ml-pipeline-ui-5c64b6c666                                1         0         0       2m3s
replicaset.apps/ml-pipeline-viewer-controller-deployment-7cc8d77468      1         0         0       2m3s
replicaset.apps/mysql-749f87bff5                                         1         0         0       2m3s
replicaset.apps/notebook-controller-deployment-6c887454f7                1         0         0       2m6s
replicaset.apps/profiles-deployment-bd576fd8f                            1         0         0       2m2s
replicaset.apps/pytorch-operator-84c58df794                              1         0         0       2m6s
replicaset.apps/spartakus-volunteer-9768df654                            1         0         0       2m5s
replicaset.apps/tensorboard-6544748d94                                   1         0         0       2m5s
replicaset.apps/tf-job-operator-db676465c                                1         0         0       2m4s
replicaset.apps/workflow-controller-676484d796                           1         0         0       2m8s

NAME                                                        READY   AGE
statefulset.apps/admission-webhook-bootstrap-stateful-set   0/1     2m7s
statefulset.apps/application-controller-stateful-set        0/1     2m8s
statefulset.apps/kfserving-controller-manager               0/1     2m5s
statefulset.apps/metacontroller                             0/1     2m8s
statefulset.apps/seldon-operator-controller-manager         0/1     2m1s

When I take a look at the events I get:

kubectl get events -n kubeflow
LAST SEEN   TYPE      REASON               OBJECT                                                              MESSAGE
10s         Warning   FailedCreate         statefulset/admission-webhook-bootstrap-stateful-set                create Pod admission-webhook-bootstrap-stateful-set-0 in StatefulSet admission-webhook-bootstrap-stateful-set failed error: Internal error occurred: failed calling webhook "inferenceservice.kfserving-webhook-server.pod-mutator": Post https://kfserving-webhook-server-service.kubeflow.svc:443/mutate-pods?timeout=30s: service "kfserving-webhook-server-service" not found
47s         Warning   FailedCreate         replicaset/admission-webhook-deployment-78d899bf68                  Error creating: Internal error occurred: failed calling webhook "inferenceservice.kfserving-webhook-server.pod-mutator": Post https://kfserving-webhook-server-service.kubeflow.svc:443/mutate-pods?timeout=30s: service "kfserving-webhook-server-service" not found
3m1s        Normal    ScalingReplicaSet    deployment/admission-webhook-deployment                             Scaled up replica set admission-webhook-deployment-78d899bf68 to 1
7s          Warning   FailedCreate         statefulset/application-controller-stateful-set                     create Pod application-controller-stateful-set-0 in StatefulSet application-controller-stateful-set failed error: Internal error occurred: failed calling webhook "inferenceservice.kfserving-webhook-server.pod-mutator": Post https://kfserving-webhook-server-service.kubeflow.svc:443/mutate-pods?timeout=30s: service "kfserving-webhook-server-service" not found
75s         Warning   FailedCreate         replicaset/argo-ui-55b859f7d7                                       Error creating: Internal error occurred: failed calling webhook "inferenceservice.kfserving-webhook-server.pod-mutator": Post https://kfserving-webhook-server-service.kubeflow.svc:443/mutate-pods?timeout=30s: service "kfserving-webhook-server-service" not found
3m2s        Normal    ScalingReplicaSet    deployment/argo-ui                                                  Scaled up replica set argo-ui-55b859f7d7 to 1
69s         Warning   FailedCreate         replicaset/centraldashboard-75474d6f94                              Error creating: Internal error occurred: failed calling webhook "inferenceservice.kfserving-webhook-server.pod-mutator": Post https://kfserving-webhook-server-service.kubeflow.svc:443/mutate-pods?timeout=30s: service "kfserving-webhook-server-service" not found
3m1s        Normal    ScalingReplicaSet    deployment/centraldashboard                                         Scaled up replica set centraldashboard-75474d6f94 to 1
47s         Warning   FailedCreate         replicaset/jupyter-web-app-deployment-6c8f4c8997                    Error creating: Internal error occurred: failed calling webhook "inferenceservice.kfserving-webhook-server.pod-mutator": Post https://kfserving-webhook-server-service.kubeflow.svc:443/mutate-pods?timeout=30s: service "kfserving-webhook-server-service" not found
3m1s        Normal    ScalingReplicaSet    deployment/jupyter-web-app-deployment                               Scaled up replica set jupyter-web-app-deployment-6c8f4c8997 to 1
42s         Warning   FailedCreate         replicaset/katib-controller-85995d4f5f                              Error creating: Internal error occurred: failed calling webhook "inferenceservice.kfserving-webhook-server.pod-mutator": Post https://kfserving-webhook-server-service.kubeflow.svc:443/mutate-pods?timeout=30s: service "kfserving-webhook-server-service" not found
2m58s       Normal    ScalingReplicaSet    deployment/katib-controller                                         Scaled up replica set katib-controller-85995d4f5f to 1
42s         Warning   FailedCreate         replicaset/katib-db-6558656ffc                                      Error creating: Internal error occurred: failed calling webhook "inferenceservice.kfserving-webhook-server.pod-mutator": Post https://kfserving-webhook-server-service.kubeflow.svc:443/mutate-pods?timeout=30s: service "kfserving-webhook-server-service" not found
2m58s       Normal    ScalingReplicaSet    deployment/katib-db                                                 Scaled up replica set katib-db-6558656ffc to 1
42s         Warning   FailedCreate         replicaset/katib-manager-79f78fc794                                 Error creating: Internal error occurred: failed calling webhook "inferenceservice.kfserving-webhook-server.pod-mutator": Post https://kfserving-webhook-server-service.kubeflow.svc:443/mutate-pods?timeout=30s: service "kfserving-webhook-server-service" not found
2m58s       Normal    ScalingReplicaSet    deployment/katib-manager                                            Scaled up replica set katib-manager-79f78fc794 to 1
37s         Warning   ProvisioningFailed   persistentvolumeclaim/katib-mysql                                   no volume plugin matched
41s         Warning   FailedCreate         replicaset/katib-ui-fcdb8f8c5                                       Error creating: Internal error occurred: failed calling webhook "inferenceservice.kfserving-webhook-server.pod-mutator": Post https://kfserving-webhook-server-service.kubeflow.svc:443/mutate-pods?timeout=30s: service "kfserving-webhook-server-service" not found
2m58s       Normal    ScalingReplicaSet    deployment/katib-ui                                                 Scaled up replica set katib-ui-fcdb8f8c5 to 1
9s          Warning   FailedCreate         statefulset/kfserving-controller-manager                            create Pod kfserving-controller-manager-0 in StatefulSet kfserving-controller-manager failed error: Internal error occurred: failed calling webhook "inferenceservice.kfserving-webhook-server.pod-mutator": Post https://kfserving-webhook-server-service.kubeflow.svc:443/mutate-pods?timeout=30s: service "kfserving-webhook-server-service" not found
6s          Warning   FailedCreate         statefulset/metacontroller                                          create Pod metacontroller-0 in StatefulSet metacontroller failed error: Internal error occurred: failed calling webhook "inferenceservice.kfserving-webhook-server.pod-mutator": Post https://kfserving-webhook-server-service.kubeflow.svc:443/mutate-pods?timeout=30s: service "kfserving-webhook-server-service" not found
46s         Warning   FailedCreate         replicaset/metadata-db-5dd459cc                                     Error creating: Internal error occurred: failed calling webhook "inferenceservice.kfserving-webhook-server.pod-mutator": Post https://kfserving-webhook-server-service.kubeflow.svc:443/mutate-pods?timeout=30s: service "kfserving-webhook-server-service" not found
3m          Normal    ScalingReplicaSet    deployment/metadata-db                                              Scaled up replica set metadata-db-5dd459cc to 1
46s         Warning   FailedCreate         replicaset/metadata-deployment-b745d8bcf                            Error creating: Internal error occurred: failed calling webhook "inferenceservice.kfserving-webhook-server.pod-mutator": Post https://kfserving-webhook-server-service.kubeflow.svc:443/mutate-pods?timeout=30s: service "kfserving-webhook-server-service" not found
3m          Normal    ScalingReplicaSet    deployment/metadata-deployment                                      Scaled up replica set metadata-deployment-b745d8bcf to 1
45s         Warning   FailedCreate         replicaset/metadata-envoy-deployment-7ccf5c4f74                     Error creating: Internal error occurred: failed calling webhook "inferenceservice.kfserving-webhook-server.pod-mutator": Post https://kfserving-webhook-server-service.kubeflow.svc:443/mutate-pods?timeout=30s: service "kfserving-webhook-server-service" not found
3m          Normal    ScalingReplicaSet    deployment/metadata-envoy-deployment                                Scaled up replica set metadata-envoy-deployment-7ccf5c4f74 to 1
45s         Warning   FailedCreate         replicaset/metadata-grpc-deployment-6496f66c8c                      Error creating: Internal error occurred: failed calling webhook "inferenceservice.kfserving-webhook-server.pod-mutator": Post https://kfserving-webhook-server-service.kubeflow.svc:443/mutate-pods?timeout=30s: service "kfserving-webhook-server-service" not found
3m          Normal    ScalingReplicaSet    deployment/metadata-grpc-deployment                                 Scaled up replica set metadata-grpc-deployment-6496f66c8c to 1
37s         Warning   ProvisioningFailed   persistentvolumeclaim/metadata-mysql                                no volume plugin matched
44s         Warning   FailedCreate         replicaset/metadata-ui-78f5b59b56                                   Error creating: Internal error occurred: failed calling webhook "inferenceservice.kfserving-webhook-server.pod-mutator": Post https://kfserving-webhook-server-service.kubeflow.svc:443/mutate-pods?timeout=30s: service "kfserving-webhook-server-service" not found
3m          Normal    ScalingReplicaSet    deployment/metadata-ui                                              Scaled up replica set metadata-ui-78f5b59b56 to 1
41s         Warning   FailedCreate         replicaset/minio-6f48db9cc4                                         Error creating: Internal error occurred: failed calling webhook "inferenceservice.kfserving-webhook-server.pod-mutator": Post https://kfserving-webhook-server-service.kubeflow.svc:443/mutate-pods?timeout=30s: service "kfserving-webhook-server-service" not found
37s         Warning   ProvisioningFailed   persistentvolumeclaim/minio-pv-claim                                no volume plugin matched
2m57s       Normal    ScalingReplicaSet    deployment/minio                                                    Scaled up replica set minio-6f48db9cc4 to 1
41s         Warning   FailedCreate         replicaset/ml-pipeline-844645fd                                     Error creating: Internal error occurred: failed calling webhook "inferenceservice.kfserving-webhook-server.pod-mutator": Post https://kfserving-webhook-server-service.kubeflow.svc:443/mutate-pods?timeout=30s: service "kfserving-webhook-server-service" not found
39s         Warning   FailedCreate         replicaset/ml-pipeline-ml-pipeline-visualizationserver-865894f5f7   Error creating: Internal error occurred: failed calling webhook "inferenceservice.kfserving-webhook-server.pod-mutator": Post https://kfserving-webhook-server-service.kubeflow.svc:443/mutate-pods?timeout=30s: service "kfserving-webhook-server-service" not found
2m56s       Normal    ScalingReplicaSet    deployment/ml-pipeline-ml-pipeline-visualizationserver              Scaled up replica set ml-pipeline-ml-pipeline-visualizationserver-865894f5f7 to 1
40s         Warning   FailedCreate         replicaset/ml-pipeline-persistenceagent-66f89b56d9                  Error creating: Internal error occurred: failed calling webhook "inferenceservice.kfserving-webhook-server.pod-mutator": Post https://kfserving-webhook-server-service.kubeflow.svc:443/mutate-pods?timeout=30s: service "kfserving-webhook-server-service" not found
2m57s       Normal    ScalingReplicaSet    deployment/ml-pipeline-persistenceagent                             Scaled up replica set ml-pipeline-persistenceagent-66f89b56d9 to 1
39s         Warning   FailedCreate         replicaset/ml-pipeline-scheduledworkflow-57445ddf88                 Error creating: Internal error occurred: failed calling webhook "inferenceservice.kfserving-webhook-server.pod-mutator": Post https://kfserving-webhook-server-service.kubeflow.svc:443/mutate-pods?timeout=30s: service "kfserving-webhook-server-service" not found
2m56s       Normal    ScalingReplicaSet    deployment/ml-pipeline-scheduledworkflow                            Scaled up replica set ml-pipeline-scheduledworkflow-57445ddf88 to 1
39s         Warning   FailedCreate         replicaset/ml-pipeline-ui-5c64b6c666                                Error creating: Internal error occurred: failed calling webhook "inferenceservice.kfserving-webhook-server.pod-mutator": Post https://kfserving-webhook-server-service.kubeflow.svc:443/mutate-pods?timeout=30s: service "kfserving-webhook-server-service" not found
2m57s       Normal    ScalingReplicaSet    deployment/ml-pipeline-ui                                           Scaled up replica set ml-pipeline-ui-5c64b6c666 to 1
39s         Warning   FailedCreate         replicaset/ml-pipeline-viewer-controller-deployment-7cc8d77468      Error creating: Internal error occurred: failed calling webhook "inferenceservice.kfserving-webhook-server.pod-mutator": Post https://kfserving-webhook-server-service.kubeflow.svc:443/mutate-pods?timeout=30s: service "kfserving-webhook-server-service" not found
2m57s       Normal    ScalingReplicaSet    deployment/ml-pipeline-viewer-controller-deployment                 Scaled up replica set ml-pipeline-viewer-controller-deployment-7cc8d77468 to 1
2m58s       Normal    ScalingReplicaSet    deployment/ml-pipeline                                              Scaled up replica set ml-pipeline-844645fd to 1
40s         Warning   FailedCreate         replicaset/mysql-749f87bff5                                         Error creating: Internal error occurred: failed calling webhook "inferenceservice.kfserving-webhook-server.pod-mutator": Post https://kfserving-webhook-server-service.kubeflow.svc:443/mutate-pods?timeout=30s: service "kfserving-webhook-server-service" not found
37s         Warning   ProvisioningFailed   persistentvolumeclaim/mysql-pv-claim                                no volume plugin matched
2m57s       Normal    ScalingReplicaSet    deployment/mysql                                                    Scaled up replica set mysql-749f87bff5 to 1
44s         Warning   FailedCreate         replicaset/notebook-controller-deployment-6c887454f7                Error creating: Internal error occurred: failed calling webhook "inferenceservice.kfserving-webhook-server.pod-mutator": Post https://kfserving-webhook-server-service.kubeflow.svc:443/mutate-pods?timeout=30s: service "kfserving-webhook-server-service" not found
3m          Normal    ScalingReplicaSet    deployment/notebook-controller-deployment                           Scaled up replica set notebook-controller-deployment-6c887454f7 to 1
38s         Warning   FailedCreate         replicaset/profiles-deployment-bd576fd8f                            Error creating: Internal error occurred: failed calling webhook "inferenceservice.kfserving-webhook-server.pod-mutator": Post https://kfserving-webhook-server-service.kubeflow.svc:443/mutate-pods?timeout=30s: service "kfserving-webhook-server-service" not found
2m56s       Normal    ScalingReplicaSet    deployment/profiles-deployment                                      Scaled up replica set profiles-deployment-bd576fd8f to 1
43s         Warning   FailedCreate         replicaset/pytorch-operator-84c58df794                              Error creating: Internal error occurred: failed calling webhook "inferenceservice.kfserving-webhook-server.pod-mutator": Post https://kfserving-webhook-server-service.kubeflow.svc:443/mutate-pods?timeout=30s: service "kfserving-webhook-server-service" not found
3m          Normal    ScalingReplicaSet    deployment/pytorch-operator                                         Scaled up replica set pytorch-operator-84c58df794 to 1
8s          Warning   FailedCreate         statefulset/seldon-operator-controller-manager                      create Pod seldon-operator-controller-manager-0 in StatefulSet seldon-operator-controller-manager failed error: Internal error occurred: failed calling webhook "inferenceservice.kfserving-webhook-server.pod-mutator": Post https://kfserving-webhook-server-service.kubeflow.svc:443/mutate-pods?timeout=30s: service "kfserving-webhook-server-service" not found
43s         Warning   FailedCreate         replicaset/spartakus-volunteer-9768df654                            Error creating: Internal error occurred: failed calling webhook "inferenceservice.kfserving-webhook-server.pod-mutator": Post https://kfserving-webhook-server-service.kubeflow.svc:443/mutate-pods?timeout=30s: service "kfserving-webhook-server-service" not found
2m59s       Normal    ScalingReplicaSet    deployment/spartakus-volunteer                                      Scaled up replica set spartakus-volunteer-9768df654 to 1
42s         Warning   FailedCreate         replicaset/tensorboard-6544748d94                                   Error creating: Internal error occurred: failed calling webhook "inferenceservice.kfserving-webhook-server.pod-mutator": Post https://kfserving-webhook-server-service.kubeflow.svc:443/mutate-pods?timeout=30s: service "kfserving-webhook-server-service" not found
2m59s       Normal    ScalingReplicaSet    deployment/tensorboard                                              Scaled up replica set tensorboard-6544748d94 to 1
42s         Warning   FailedCreate         replicaset/tf-job-operator-db676465c                                Error creating: Internal error occurred: failed calling webhook "inferenceservice.kfserving-webhook-server.pod-mutator": Post https://kfserving-webhook-server-service.kubeflow.svc:443/mutate-pods?timeout=30s: service "kfserving-webhook-server-service" not found
2m58s       Normal    ScalingReplicaSet    deployment/tf-job-operator                                          Scaled up replica set tf-job-operator-db676465c to 1
72s         Warning   FailedCreate         replicaset/workflow-controller-676484d796                           Error creating: Internal error occurred: failed calling webhook "inferenceservice.kfserving-webhook-server.pod-mutator": Post https://kfserving-webhook-server-service.kubeflow.svc:443/mutate-pods?timeout=30s: service "kfserving-webhook-server-service" not found
3m2s        Normal    ScalingReplicaSet    deployment/workflow-controller                                      Scaled up replica set workflow-controller-676484d796 to 1

I did a little digging and found similar issues #235 and #568. I believe these two caused this pull request (#571) to be merged today. But I honestly don't see how any of these can help me with my issue. I'm not even sure if my issue stems from the same root causes.

Any guidance on this would be very appreciated.

Environment:

Not sure how to obtain most of these versions, but if they're important please let me know and I'll do some digging.

  • Istio Version:
  • Knative Version:
  • KFServing Version:
  • Kubeflow version: v0.7.0
  • Minikube version:
  • Kubernetes version: (use kubectl version): Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.8", GitCommit:"211047e9a1922595eaa3a1127ed365e9299a6c23", GitTreeState:"clean", BuildDate:"2019-10-15T12:11:03Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}
    Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.9", GitCommit:"500f5aba80d71253cc01ac6a8622b8377f4a7ef9", GitTreeState:"clean", BuildDate:"2019-11-13T11:13:04Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}
  • OS (e.g. from /etc/os-release): 18.04.3 LTS (Bionic Beaver)
@ellistarn
Copy link
Contributor

ellistarn commented Nov 27, 2019

There's a problem w/ the kubeflow install and a admission webhook deadlock. Short term fix is:

kubectl delete mutatingwebhookconfigurations.admissionregistration.k8s.io --all
kubectl delete validatingwebhookconfigurations.admissionregistration.k8s.io --all

@ellistarn
Copy link
Contributor

This has been resolved in 0.2.2
/close

@k8s-ci-robot
Copy link
Contributor

@ellis-bigelow: Closing this issue.

In response to this:

This has been resolved in 0.2.2
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants