Skip to content
This repository has been archived by the owner on Jun 29, 2022. It is now read-only.

Pods take a long time to create after bootstrap #1090

Open
johananl opened this issue Oct 16, 2020 · 7 comments
Open

Pods take a long time to create after bootstrap #1090

johananl opened this issue Oct 16, 2020 · 7 comments
Labels
area/kubernetes Core Kubernetes stuff bug Something isn't working

Comments

@johananl
Copy link
Member

johananl commented Oct 16, 2020

Sometimes pods take a long time to get created immediately after cluster bootstrap. During these times, running e.g. kubectl get deployment can result in the following:

NAME      READY   UP-TO-DATE   AVAILABLE   AGE
httpbin   0/1     0            0           76s

The deployment object is created, however no related pods are shown in kubectl get pods. The cluster eventually converges after a couple of minutes and the problem doesn't seem to occur again.

Some pods were already created as this was happening, however for example the httpbin pods took more than 3 minutes to create:

kubectl get pods -A
NAMESPACE           NAME                                              READY   STATUS    RESTARTS   AGE
kube-system         calico-hostendpoint-controller-68778cb65f-tcxkf   1/1     Running   0          3m30s
kube-system         calico-kube-controllers-7ccff6d57f-kr7sq          1/1     Running   0          3m30s
kube-system         calico-node-cm8l2                                 1/1     Running   0          3m15s
kube-system         calico-node-v74rp                                 1/1     Running   0          3m15s
kube-system         calico-node-x25dz                                 1/1     Running   0          3m1s
kube-system         coredns-8lxhh                                     1/1     Running   0          2m34s
kube-system         kube-apiserver-79b4cc8fdd-jjnjz                   1/1     Running   0          3m30s
kube-system         kube-controller-manager-6h2lf                     1/1     Running   0          2m34s
kube-system         kube-proxy-5vtbr                                  1/1     Running   0          3m15s
kube-system         kube-proxy-kbq7s                                  1/1     Running   0          3m1s
kube-system         kube-proxy-kgpl6                                  1/1     Running   0          3m15s
kube-system         kube-scheduler-j2v9m                              1/1     Running   0          2m34s
kube-system         kubelet-gg975                                     1/1     Running   0          3m15s
kube-system         kubelet-kbv4h                                     1/1     Running   0          3m15s
kube-system         kubelet-zwv97                                     1/1     Running   0          3m1s
kube-system         pod-checkpointer-pdwgb                            1/1     Running   0          2m34s
lokomotive-system   admission-webhook-server-64859d4f48-wntkm         1/1     Running   0          3m29s

I think this has to do with either kube-controller-manager or kube-scheduler, not sure which.

In case it's relevant, I'm seeing the following logs in kube-scheduler while waiting for the pods to appear, however the repeating messages at the end seem to keep showing up also when everything seems to be working:

I1016 17:50:38.129406       1 registry.go:173] Registering SelectorSpread plugin
I1016 17:50:38.129525       1 registry.go:173] Registering SelectorSpread plugin
I1016 17:50:39.788347       1 serving.go:331] Generated self-signed cert in-memory
W1016 17:50:41.464755       1 requestheader_controller.go:193] Unable to get configmap/extension-apiserver-authentication in kube-system.  Usually fixed by 'kubectl create rolebinding -n kube-system ROLEBINDING_NAME --role=extension-apiserver-authentication-reader --serviceaccount=YOUR_NS:YOUR_SA'
W1016 17:50:41.464799       1 authentication.go:294] Error looking up in-cluster authentication configuration: configmaps "extension-apiserver-authentication" is forbidden: User "system:serviceaccount:kube-system:kube-scheduler" cannot get resource "configmaps" in API group "" in the namespace "kube-system"
W1016 17:50:41.464822       1 authentication.go:295] Continuing without authentication configuration. This may treat all requests as anonymous.
W1016 17:50:41.464840       1 authentication.go:296] To require authentication configuration lookup to succeed, set --authentication-tolerate-lookup-failure=false
W1016 17:50:41.468101       1 options.go:334] Neither --kubeconfig nor --master was specified. Using default API client. This might not work.
I1016 17:50:41.494273       1 registry.go:173] Registering SelectorSpread plugin
I1016 17:50:41.494332       1 registry.go:173] Registering SelectorSpread plugin
W1016 17:50:41.500035       1 authorization.go:47] Authorization is disabled
W1016 17:50:41.500065       1 authentication.go:40] Authentication is disabled
I1016 17:50:41.500095       1 deprecated_insecure_serving.go:51] Serving healthz insecurely on [::]:10251
I1016 17:50:41.502496       1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I1016 17:50:41.502537       1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
E1016 17:50:41.507142       1 reflector.go:127] k8s.io/apiserver/pkg/server/dynamiccertificates/configmap_cafile_content.go:206: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps "extension-apiserver-authentication" is forbidden: User "system:serviceaccount:kube-system:kube-scheduler" cannot list resource "configmaps" in API group "" in the namespace "kube-system"
I1016 17:50:41.508843       1 secure_serving.go:197] Serving securely on [::]:10259
I1016 17:50:41.509629       1 tlsconfig.go:240] Starting DynamicServingCertificateController
I1016 17:50:41.609420       1 leaderelection.go:243] attempting to acquire leader lease  kube-system/kube-scheduler...
E1016 17:50:43.009417       1 reflector.go:127] k8s.io/apiserver/pkg/server/dynamiccertificates/configmap_cafile_content.go:206: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps "extension-apiserver-authentication" is forbidden: User "system:serviceaccount:kube-system:kube-scheduler" cannot list resource "configmaps" in API group "" in the namespace "kube-system"
E1016 17:50:45.873255       1 reflector.go:127] k8s.io/apiserver/pkg/server/dynamiccertificates/configmap_cafile_content.go:206: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps "extension-apiserver-authentication" is forbidden: User "system:serviceaccount:kube-system:kube-scheduler" cannot list resource "configmaps" in API group "" in the namespace "kube-system"
E1016 17:50:50.238960       1 reflector.go:127] k8s.io/apiserver/pkg/server/dynamiccertificates/configmap_cafile_content.go:206: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps "extension-apiserver-authentication" is forbidden: User "system:serviceaccount:kube-system:kube-scheduler" cannot list resource "configmaps" in API group "" in the namespace "kube-system"
I1016 17:51:01.033690       1 leaderelection.go:253] successfully acquired lease kube-system/kube-scheduler
E1016 17:51:02.790221       1 reflector.go:127] k8s.io/apiserver/pkg/server/dynamiccertificates/configmap_cafile_content.go:206: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps "extension-apiserver-authentication" is forbidden: User "system:serviceaccount:kube-system:kube-scheduler" cannot list resource "configmaps" in API group "" in the namespace "kube-system"
E1016 17:51:26.952102       1 reflector.go:127] k8s.io/apiserver/pkg/server/dynamiccertificates/configmap_cafile_content.go:206: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps "extension-apiserver-authentication" is forbidden: User "system:serviceaccount:kube-system:kube-scheduler" cannot list resource "configmaps" in API group "" in the namespace "kube-system"
E1016 17:52:17.584373       1 reflector.go:127] k8s.io/apiserver/pkg/server/dynamiccertificates/configmap_cafile_content.go:206: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps "extension-apiserver-authentication" is forbidden: User "system:serviceaccount:kube-system:kube-scheduler" cannot list resource "configmaps" in API group "" in the namespace "kube-system"
E1016 17:53:11.770261       1 reflector.go:127] k8s.io/apiserver/pkg/server/dynamiccertificates/configmap_cafile_content.go:206: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps "extension-apiserver-authentication" is forbidden: User "system:serviceaccount:kube-system:kube-scheduler" cannot list resource "configmaps" in API group "" in the namespace "kube-system"

I'm not sure how easy this is going to be to reproduce, however following is what I used when I saw this:

lokoctl version: v0.4.1-202-g48cb56df

Cluster config:

cluster "packet" {
  asset_dir        = "./assets"
  cluster_name     = "johannes-test"
  controller_count = 1
  controller_type  = "t1.small.x86"
  dns {
    provider = "route53"
    zone     = "example.com"
  }
  facility    = "ams1"
  project_id = "xxxxxxxx"
  ssh_pubkeys       = ["ssh-rsa ..."]
  management_cidrs  = ["0.0.0.0/0"]
  node_private_cidr = "10.20.30.0/25"

  worker_pool "pool-1" {
    count     = 2
    node_type = "t1.small.x86"
  }
}

component "contour" {}

component "httpbin" {
  ingress_host = "httpbin.example.com"
}
@johananl johananl added bug Something isn't working area/kubernetes Core Kubernetes stuff labels Oct 16, 2020
@johananl
Copy link
Member Author

johananl commented Oct 19, 2020

In case it becomes relevant, I'm seeing the following logs from kube-apiserver while waiting for the pods to create:

I1019 10:18:33.923776       1 trace.go:205] Trace[680399967]: "Call mutating webhook" configuration:admission-webhook-server,webhook:mutating.kinvolk.io,resource:/v1, Resource=serviceaccounts,subresource:,operation:CREATE,UID:56d3cae1-a2f7-4f91-8d04-df30dfd75361 (19-Oct-2020 10:18:23.923) (total time: 10000ms):
Trace[680399967]: [10.000298763s] [10.000298763s] END
W1019 10:18:33.923861       1 dispatcher.go:170] Failed calling webhook, failing open mutating.kinvolk.io: failed calling webhook "mutating.kinvolk.io": Post "https://admission-webhook-server.lokomotive-system.svc:443/mutate?timeout=10s": context deadline exceeded
E1019 10:18:33.923909       1 dispatcher.go:171] failed calling webhook "mutating.kinvolk.io": Post "https://admission-webhook-server.lokomotive-system.svc:443/mutate?timeout=10s": context deadline exceeded
I1019 10:18:33.931762       1 trace.go:205] Trace[2077968529]: "Create" url:/api/v1/namespaces/kube-system/serviceaccounts,user-agent:kube-controller-manager/v1.19.3 (linux/amd64) kubernetes/1e11e4a/kube-controller-manager,client:10.2.150.4 (19-Oct-2020 10:18:23.922) (total time: 10008ms):
Trace[2077968529]: ---"Object stored in database" 10008ms (10:18:00.931)
Trace[2077968529]: [10.008751983s] [10.008751983s] END

EDIT: I'm actually not sure if I got this while waiting for pods. Leaving this here in case it helps somehow.

@invidian
Copy link
Member

I suspect #869 might improve it. Could you check this @johananl with #1085 ?

@johananl
Copy link
Member Author

@invidian what exactly should I do? Run make on the latest code in #1085 and deploy a cluster?

@invidian
Copy link
Member

@invidian what exactly should I do? Run make on the latest code in #1085 and deploy a cluster?

Yes. I'm expecting cluster bootstrapping should take slightly longer, but should be usable right after lokoctl reports that it's ready.

@johananl
Copy link
Member Author

Looks like this is hard to reproduce.
If we are going to merge #1085 anyway I suggest we leave this open and I'll keep an eye to see if the problem reappears.

@johananl
Copy link
Member Author

johananl commented Oct 19, 2020

OK, looks like the following is a red herring, too:

E1016 17:53:11.770261       1 reflector.go:127] k8s.io/apiserver/pkg/server/dynamiccertificates/configmap_cafile_content.go:206: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps "extension-apiserver-authentication" is forbidden: User "system:serviceaccount:kube-system:kube-scheduler" cannot list resource "configmaps" in API group "" in the namespace "kube-system"

I'm seeing these messages also under normal operation.

@invidian
Copy link
Member

invidian commented Nov 2, 2020

Just in case, #1085 is now merged. I'll try to reproduce it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/kubernetes Core Kubernetes stuff bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants