-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
openshift_node: "restart node" ansible task fails sometimes #15332
Comments
What's in the node log? |
Here's the last portion, including and error log and fatal log:
|
I am seeing the same error on my setup when I am using cloud provider as AWS. Node registers itself with Master but it can't start node service. I would like to know what needs to be done. root@osmaster01 centos]# kubectl describe node ip-10-30-2-251.us-west-1.compute.internal
Name: ip-10-30-2-251.us-west-1.compute.internal
Role:
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=m4.xlarge
beta.kubernetes.io/os=linux
failure-domain.beta.kubernetes.io/region=us-west-1
failure-domain.beta.kubernetes.io/zone=us-west-1a
kubernetes.io/hostname=osnode04.bdteam.local
region=primary
zone=west
Annotations: volumes.kubernetes.io/controller-managed-attach-detach=true
Taints: <none>
CreationTimestamp: Fri, 06 Oct 2017 06:05:55 +0000
Phase:
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
OutOfDisk False Fri, 06 Oct 2017 16:11:52 +0000 Fri, 06 Oct 2017 06:05:55 +0000 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Fri, 06 Oct 2017 16:11:52 +0000 Fri, 06 Oct 2017 06:05:55 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Fri, 06 Oct 2017 16:11:52 +0000 Fri, 06 Oct 2017 06:05:55 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure
Ready False Fri, 06 Oct 2017 16:11:52 +0000 Fri, 06 Oct 2017 06:05:55 +0000 KubeletNotReady runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Addresses: 10.30.2.251,10.30.2.251,ip-10-30-2-251.bdteam.local,osnode04.bdteam.local
Capacity:
cpu: 4
memory: 16266720Ki
pods: 40
Allocatable:
cpu: 4
memory: 16164320Ki
pods: 40
System Info:
Machine ID: 8bd05758fdfc1903174c9fcaf82b71ca
System UUID: EC2798A7-3C88-0538-2A95-D28F2BCCDF96
Boot ID: 5d7f71a8-95f8-4ed6-a7ba-07977e2dc926
Kernel Version: 3.10.0-693.2.2.el7.x86_64
OS Image: CentOS Linux 7 (Core)
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://1.12.6
Kubelet Version: v1.6.1+5115d708d7
Kube-Proxy Version: v1.6.1+5115d708d7
ExternalID: i-08ae279780695c5f7
Non-terminated Pods: (0 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
0 (0%) 0 (0%) 0 (0%) 0 (0%)
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
1h 1h 1 kubelet, ip-10-30-2-251.us-west-1.compute.internal Warning ImageGCFailed unable to find data for container /
1h 1h 1 kubelet, ip-10-30-2-251.us-west-1.compute.internal Normal NodeHasSufficientDisk Node ip-10-30-2-251.us-west-1.compute.internal status is now: NodeHasSufficientDisk
1h 1h 1 kubelet, ip-10-30-2-251.us-west-1.compute.internal Normal NodeHasSufficientMemory Node ip-10-30-2-251.us-west-1.compute.internal status is now: NodeHasSufficientMemory
1h 1h 1 kubelet, ip-10-30-2-251.us-west-1.compute.internal Normal NodeHasNoDiskPressure Node ip-10-30-2-251.us-west-1.compute.internal status is now: NodeHasNoDiskPressure
1h 1h 1 kubelet, ip-10-30-2-251.us-west-1.compute.internal Normal Starting Starting kubelet. |
@poonia0arun can you post your node log? The issue above seemed to be from CNI plugins not loading. /cc @mrunalp can you link in people that could help with this? |
@danwinship @dcbw for the SDN startup issue. |
@stevekuznetsov @mrunalp @danwinship and @dcbw please find below requested logs and additional information. Software Versions:
Ansible Hosts File[OSEv3:children]
masters
nodes
etcd
[OSEv3:vars]
ansible_ssh_user=root
openshift_deployment_type=origin
openshift_master_cluster_method=native
openshift_master_cluster_hostname=osmasterelb.bdteam.local
openshift_master_cluster_public_hostname=osmasterelb.bdteam.local
openshift_clock_enabled=true
openshift_master_default_subdomain= apps.bdteam.local
openshift_cloudprovider_kind=aws
openshift_cloudprovider_aws_access_key=XXXXXXX
openshift_cloudprovider_aws_secret_key=XXXXXXXXX
# host group for masters
[masters]
osmaster01.bdteam.local openshift_hostname=osmaster01.bdteam.local
osmaster02.bdteam.local openshift_hostname=osmaster02.bdteam.local
[etcd]
openshift-etcd.bdteam.local openshift_hostname=openshift-etcd.bdteam.local
[nodes]
osmaster01.bdteam.local openshift_hostname=osmaster01.bdteam.local
osmaster02.bdteam.local openshift_hostname=osmaster02.bdteam.local
osnode01.bdteam.local openshift_node_labels="{'region': 'infra', 'zone': 'west'}" openshift_hostname=osnode01.bdteam.local
osnode03.bdteam.local openshift_node_labels="{'region': 'infra', 'zone': 'west'}" openshift_hostname=osnode03.bdteam.local
osnode02.bdteam.local openshift_node_labels="{'region': 'primary', 'zone': 'west'}" openshift_hostname=osnode02.bdteam.local
osnode04.bdteam.local openshift_node_labels="{'region': 'primary', 'zone': 'west'}" openshift_hostname=osnode04.bdteam.local Ansible Task fails during restarting the node services on each nodes/mastersfatal: [osnode04.bdteam.local]: FAILED! => {
"attempts": 3,
"changed": false,
"failed": true,
"invocation": {
"module_args": {
"daemon_reload": false,
"enabled": null,
"masked": null,
"name": "origin-node",
"no_block": false,
"state": "restarted",
"user": false
}
},
"msg": "Unable to restart service origin-node: Job for origin-node.service failed because the control process exited with error code. See \"systemctl status origin-node.service\" and \"journalctl -xe\" for details.\n"
}
RUNNING HANDLER [openshift_node : reload systemd units] ************************************************************************************************************************************************************************************************************************
META: ran handlers
to retry, use: --limit @/root/openshift-ansible/playbooks/byo/config.retry
PLAY RECAP *********************************************************************************************************************************************************************************************************************************************************************
localhost : ok=21 changed=0 unreachable=0 failed=0
openshift-etcd.bdteam.local : ok=97 changed=34 unreachable=0 failed=0
osmaster01.bdteam.local : ok=365 changed=110 unreachable=0 failed=1
osmaster02.bdteam.local : ok=314 changed=95 unreachable=0 failed=1
osnode01.bdteam.local : ok=146 changed=38 unreachable=0 failed=1
osnode02.bdteam.local : ok=146 changed=38 unreachable=0 failed=1
osnode03.bdteam.local : ok=146 changed=38 unreachable=0 failed=1
osnode04.bdteam.local : ok=146 changed=38 unreachable=0 failed=1
INSTALLER STATUS ***************************************************************************************************************************************************************************************************************************************************************
Initialization : Complete
etcd Install : Complete
NFS Install : Not Started
Load balancer Install : Not Started
Master Install : Complete
Master Additional Install : Complete
Node Install : In Progress
This phase can be restarted by running: playbooks/byo/openshift-node/config.yml
GlusterFS Install : Not Started
Hosted Install : Not Started
Metrics Install : Not Started
Logging Install : Not Started
Service Catalog Install : Not Started
Failure summary:
1. Hosts: osmaster01.bdteam.local, osmaster02.bdteam.local, osnode01.bdteam.local, osnode02.bdteam.local, osnode03.bdteam.local, osnode04.bdteam.local
Play: Configure nodes
Task: restart node
Message: Unable to restart service origin-node: Job for origin-node.service failed because the control process exited with error code. See "systemctl status origin-node.service" and "journalctl -xe" for details.
[root@osmaster01 ~]# packet_write_wait: Connection to 10.X.X.X port 22: Broken pipe Command oc get nodes output [root@osmaster01 centos]# oc get nodes
NAME STATUS AGE VERSION
ip-10-30-1-200.us-west-1.compute.internal NotReady 2h v1.6.1+5115d708d7
ip-10-30-1-27.us-west-1.compute.internal NotReady 2h v1.6.1+5115d708d7
ip-10-30-1-43.us-west-1.compute.internal NotReady 2h v1.6.1+5115d708d7
ip-10-30-2-109.us-west-1.compute.internal NotReady 2h v1.6.1+5115d708d7
ip-10-30-2-182.us-west-1.compute.internal NotReady 2h v1.6.1+5115d708d7
ip-10-30-2-251.us-west-1.compute.internal NotReady 2h v1.6.1+5115d708d7 Kubectl describe node output [root@osmaster01 centos]# kubectl describe node ip-10-30-2-251.us-west-1.compute.internal
Name: ip-10-30-2-251.us-west-1.compute.internal
Role:
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=m4.xlarge
beta.kubernetes.io/os=linux
failure-domain.beta.kubernetes.io/region=us-west-1
failure-domain.beta.kubernetes.io/zone=us-west-1a
kubernetes.io/hostname=osnode04.bdteam.local
region=primary
zone=west
Annotations: volumes.kubernetes.io/controller-managed-attach-detach=true
Taints: <none>
CreationTimestamp: Fri, 06 Oct 2017 18:10:56 +0000
Phase:
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
OutOfDisk False Fri, 06 Oct 2017 20:37:50 +0000 Fri, 06 Oct 2017 18:10:56 +0000 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Fri, 06 Oct 2017 20:37:50 +0000 Fri, 06 Oct 2017 18:10:56 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Fri, 06 Oct 2017 20:37:50 +0000 Fri, 06 Oct 2017 18:10:56 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure
Ready False Fri, 06 Oct 2017 20:37:50 +0000 Fri, 06 Oct 2017 18:10:56 +0000 KubeletNotReady runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Addresses: 10.30.2.251,10.30.2.251,ip-10-30-2-251.bdteam.local,osnode04.bdteam.local
Capacity:
cpu: 4
memory: 16266720Ki
pods: 40
Allocatable:
cpu: 4
memory: 16164320Ki
pods: 40
System Info:
Machine ID: 8bd05758fdfc1903174c9fcaf82b71ca
System UUID: EC2798A7-3C88-0538-2A95-D28F2BCCDF96
Boot ID: 5d7f71a8-95f8-4ed6-a7ba-07977e2dc926
Kernel Version: 3.10.0-693.2.2.el7.x86_64
OS Image: CentOS Linux 7 (Core)
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://1.12.6
Kubelet Version: v1.6.1+5115d708d7
Kube-Proxy Version: v1.6.1+5115d708d7
ExternalID: i-08ae279780695c5f7
Non-terminated Pods: (0 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
0 (0%) 0 (0%) 0 (0%) 0 (0%)
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
1h 1h 1 kubelet, ip-10-30-2-251.us-west-1.compute.internal Warning ImageGCFailed unable to find data for container /
1h 1h 1 kubelet, ip-10-30-2-251.us-west-1.compute.internal Normal NodeHasSufficientDisk Node ip-10-30-2-251.us-west-1.compute.internal status is now: NodeHasSufficientDisk
1h 1h 1 kubelet, ip-10-30-2-251.us-west-1.compute.internal Normal NodeHasSufficientMemory Node ip-10-30-2-251.us-west-1.compute.internal status is now: NodeHasSufficientMemory
1h 1h 1 kubelet, ip-10-30-2-251.us-west-1.compute.internal Normal NodeHasNoDiskPressure Node ip-10-30-2-251.us-west-1.compute.internal status is now: NodeHasNoDiskPressure
1h 1h 1 kubelet, ip-10-30-2-251.us-west-1.compute.internal Normal Starting Starting kubelet. If I dont use any cloud provider in my ansible config.yml file my installation works fine but I need to resolve this for AWS or any cloud provider Systemctl output of node service on a particular node [root@osnode01 centos]# systemctl status origin-node.service
● origin-node.service - OpenShift Node
Loaded: loaded (/etc/systemd/system/origin-node.service; enabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/origin-node.service.d
└─openshift-sdn-ovs.conf
Active: activating (start) since Fri 2017-10-06 20:39:54 UTC; 8s ago
Docs: https://github.com/openshift/origin
Process: 56362 ExecStopPost=/usr/bin/dbus-send --system --dest=uk.org.thekelleys.dnsmasq /uk/org/thekelleys/dnsmasq uk.org.thekelleys.SetDomainServers array:string: (code=exited, status=0/SUCCESS)
Process: 56360 ExecStopPost=/usr/bin/rm /etc/dnsmasq.d/node-dnsmasq.conf (code=exited, status=0/SUCCESS)
Process: 56368 ExecStartPre=/usr/bin/dbus-send --system --dest=uk.org.thekelleys.dnsmasq /uk/org/thekelleys/dnsmasq uk.org.thekelleys.SetDomainServers array:string:/in-addr.arpa/127.0.0.1,/cluster.local/127.0.0.1 (code=exited, status=0/SUCCESS)
Process: 56365 ExecStartPre=/usr/bin/cp /etc/origin/node/node-dnsmasq.conf /etc/dnsmasq.d/ (code=exited, status=0/SUCCESS)
Main PID: 56370 (openshift)
Memory: 42.2M
CGroup: /system.slice/origin-node.service
├─56370 /usr/bin/openshift start node --config=/etc/origin/node/node-config.yaml --loglevel=2
└─56415 journalctl -k -f
Oct 06 20:39:59 osnode01.bdteam.local origin-node[56370]: W1006 20:39:59.546363 56370 pod_container_deletor.go:77] Container "2f4c53551f7b6e654cc1de1159d44856f81b6d16f4ed5d1eb580c9cb3a9bc575" not found in pod's containers
Oct 06 20:39:59 osnode01.bdteam.local origin-node[56370]: W1006 20:39:59.546425 56370 pod_container_deletor.go:77] Container "851f6503d78acd135e3a4b87009d4163a808856f14757f6123c1cf625123504d" not found in pod's containers
Oct 06 20:39:59 osnode01.bdteam.local origin-node[56370]: W1006 20:39:59.546448 56370 pod_container_deletor.go:77] Container "88a45a9147f05a0bd9e05ed712069f10b4cea6c2af3ccd0eb1601166f3ccf679" not found in pod's containers
Oct 06 20:39:59 osnode01.bdteam.local origin-node[56370]: W1006 20:39:59.546460 56370 pod_container_deletor.go:77] Container "a3ef9c2922877e2f25bd4814fd1f4e371fd98a19ad36b54371fd0b1bc51e255b" not found in pod's containers
Oct 06 20:39:59 osnode01.bdteam.local origin-node[56370]: W1006 20:39:59.546472 56370 pod_container_deletor.go:77] Container "c5102f50c2e01a2100e1dcb025096967e31134c43ffdb1655827b908e5b29f77" not found in pod's containers
Oct 06 20:39:59 osnode01.bdteam.local origin-node[56370]: W1006 20:39:59.546483 56370 pod_container_deletor.go:77] Container "d68f9392b34c6410e6154c95febcfb55dac109725750ae5c20671c39279c9730" not found in pod's containers
Oct 06 20:39:59 osnode01.bdteam.local origin-node[56370]: W1006 20:39:59.546494 56370 pod_container_deletor.go:77] Container "eb04adc0b544c64e20ac3c847e03de048f7c7a26ce4d4a6b46282817d0df8e10" not found in pod's containers
Oct 06 20:39:59 osnode01.bdteam.local origin-node[56370]: W1006 20:39:59.710842 56370 cni.go:157] Unable to update cni config: No networks found in /etc/cni/net.d
Oct 06 20:39:59 osnode01.bdteam.local origin-node[56370]: E1006 20:39:59.710981 56370 kubelet.go:2072] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Oct 06 20:40:00 osnode01.bdteam.local origin-node[56370]: W1006 20:40:00.816290 56370 sdn_controller.go:38] Could not find an allocated subnet for node: osnode01.bdteam.local, Waiting...
[root@osnode01 centos]# Logs output from one of the node (/var/log/messages) Oct 6 20:41:15 osnode01 NetworkManager[18586]: <info> [1507322475.5434] dhcp4 (eth0): address 10.30.1.43
Oct 6 20:41:15 osnode01 NetworkManager[18586]: <info> [1507322475.5434] dhcp4 (eth0): plen 24 (255.255.255.0)
Oct 6 20:41:15 osnode01 NetworkManager[18586]: <info> [1507322475.5434] dhcp4 (eth0): gateway 10.30.1.1
Oct 6 20:41:15 osnode01 NetworkManager[18586]: <info> [1507322475.5434] dhcp4 (eth0): lease time 3600
Oct 6 20:41:15 osnode01 NetworkManager[18586]: <info> [1507322475.5434] dhcp4 (eth0): hostname 'ip-10-30-1-43'
Oct 6 20:41:15 osnode01 NetworkManager[18586]: <info> [1507322475.5435] dhcp4 (eth0): nameserver '10.21.0.251'
Oct 6 20:41:15 osnode01 NetworkManager[18586]: <info> [1507322475.5435] dhcp4 (eth0): domain name 'bdteam.local'
Oct 6 20:41:15 osnode01 NetworkManager[18586]: <info> [1507322475.5435] dhcp4 (eth0): state changed bound -> bound
Oct 6 20:41:15 osnode01 dbus-daemon: dbus[632]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service'
Oct 6 20:41:15 osnode01 dbus[632]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service'
Oct 6 20:41:15 osnode01 systemd: Starting Network Manager Script Dispatcher Service...
Oct 6 20:41:15 osnode01 dhclient[18622]: bound to 10.30.1.43 -- renewal in 1686 seconds.
Oct 6 20:41:15 osnode01 dbus[632]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Oct 6 20:41:15 osnode01 dbus-daemon: dbus[632]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Oct 6 20:41:15 osnode01 systemd: Started Network Manager Script Dispatcher Service.
Oct 6 20:41:15 osnode01 nm-dispatcher: req:1 'dhcp4-change' [eth0]: new request (6 scripts)
Oct 6 20:41:15 osnode01 nm-dispatcher: req:1 'dhcp4-change' [eth0]: start running ordered scripts...
Oct 6 20:41:15 osnode01 nm-dispatcher: + cd /etc/sysconfig/network-scripts
Oct 6 20:41:15 osnode01 nm-dispatcher: + . ./network-functions
Oct 6 20:41:15 osnode01 nm-dispatcher: ++ PATH=/sbin:/usr/sbin:/bin:/usr/bin
Oct 6 20:41:15 osnode01 nm-dispatcher: ++ export PATH
Oct 6 20:41:15 osnode01 nm-dispatcher: +++ hostname
Oct 6 20:41:15 osnode01 nm-dispatcher: ++ HOSTNAME=osnode01.bdteam.local
Oct 6 20:41:15 osnode01 nm-dispatcher: ++ '[' -z '' ']'
Oct 6 20:41:15 osnode01 nm-dispatcher: ++ . /etc/init.d/functions
Oct 6 20:41:15 osnode01 nm-dispatcher: +++ TEXTDOMAIN=initscripts
Oct 6 20:41:15 osnode01 nm-dispatcher: +++ umask 022
Oct 6 20:41:15 osnode01 nm-dispatcher: +++ PATH=/sbin:/usr/sbin:/bin:/usr/bin
Oct 6 20:41:15 osnode01 nm-dispatcher: +++ export PATH
Oct 6 20:41:15 osnode01 nm-dispatcher: +++ '[' 56720 -ne 1 -a -z '' ']'
Oct 6 20:41:15 osnode01 nm-dispatcher: +++ '[' -d /run/systemd/system ']'
Oct 6 20:41:15 osnode01 nm-dispatcher: +++ case "$0" in
Oct 6 20:41:15 osnode01 nm-dispatcher: +++ '[' -z '' ']'
Oct 6 20:41:15 osnode01 nm-dispatcher: +++ COLUMNS=80
Oct 6 20:41:15 osnode01 nm-dispatcher: +++ '[' -z '' ']'
Oct 6 20:41:15 osnode01 nm-dispatcher: +++ '[' -c /dev/stderr -a -r /dev/stderr ']'
Oct 6 20:41:15 osnode01 nm-dispatcher: +++ CONSOLETYPE=serial
Oct 6 20:41:15 osnode01 nm-dispatcher: +++ '[' -z '' ']'
Oct 6 20:41:15 osnode01 nm-dispatcher: +++ '[' -z '' ']'
Oct 6 20:41:15 osnode01 nm-dispatcher: +++ '[' -f /etc/sysconfig/i18n -o -f /etc/locale.conf ']'
Oct 6 20:41:15 osnode01 nm-dispatcher: +++ . /etc/profile.d/lang.sh
Oct 6 20:41:15 osnode01 nm-dispatcher: +++ unset LANGSH_SOURCED
Oct 6 20:41:15 osnode01 nm-dispatcher: +++ '[' -z '' ']'
Oct 6 20:41:15 osnode01 nm-dispatcher: +++ '[' -f /etc/sysconfig/init ']'
Oct 6 20:41:15 osnode01 nm-dispatcher: +++ . /etc/sysconfig/init
Oct 6 20:41:15 osnode01 nm-dispatcher: ++++ BOOTUP=color
Oct 6 20:41:15 osnode01 nm-dispatcher: ++++ RES_COL=60
Oct 6 20:41:15 osnode01 nm-dispatcher: ++++ MOVE_TO_COL='echo -en \033[60G'
Oct 6 20:41:15 osnode01 nm-dispatcher: ++++ SETCOLOR_SUCCESS='echo -en \033[0;32m'
Oct 6 20:41:15 osnode01 nm-dispatcher: ++++ SETCOLOR_FAILURE='echo -en \033[0;31m'
Oct 6 20:41:15 osnode01 nm-dispatcher: ++++ SETCOLOR_WARNING='echo -en \033[0;33m'
Oct 6 20:41:15 osnode01 nm-dispatcher: ++++ SETCOLOR_NORMAL='echo -en \033[0;39m'
Oct 6 20:41:15 osnode01 nm-dispatcher: +++ '[' serial = serial ']'
Oct 6 20:41:15 osnode01 nm-dispatcher: +++ BOOTUP=serial
Oct 6 20:41:15 osnode01 nm-dispatcher: +++ MOVE_TO_COL=
Oct 6 20:41:15 osnode01 nm-dispatcher: +++ SETCOLOR_SUCCESS=
Oct 6 20:41:15 osnode01 nm-dispatcher: +++ SETCOLOR_FAILURE=
Oct 6 20:41:15 osnode01 nm-dispatcher: +++ SETCOLOR_WARNING=
Oct 6 20:41:15 osnode01 nm-dispatcher: +++ SETCOLOR_NORMAL=
Oct 6 20:41:15 osnode01 nm-dispatcher: +++ __sed_discard_ignored_files='/\(~\|\.bak\|\.orig\|\.rpmnew\|\.rpmorig\|\.rpmsave\)$/d'
Oct 6 20:41:15 osnode01 nm-dispatcher: +++ '[' '' = 1 ']'
Oct 6 20:41:15 osnode01 nm-dispatcher: ++++ cat /proc/cmdline
Oct 6 20:41:15 osnode01 nm-dispatcher: +++ strstr 'BOOT_IMAGE=/boot/vmlinuz-3.10.0-693.2.2.el7.x86_64 root=UUID=29342a0b-e20f-4676-9ecf-dfdf02ef6683 ro console=tty0 console=ttyS0,115200n8 crashkernel=auto console=ttyS0,115200 LANG=en_US.UTF-8' rc.debug
Oct 6 20:41:15 osnode01 nm-dispatcher: +++ '[' 'BOOT_IMAGE=/boot/vmlinuz-3.10.0-693.2.2.el7.x86_64 root=UUID=29342a0b-e20f-4676-9ecf-dfdf02ef6683 ro console=tty0 console=ttyS0,115200n8 crashkernel=auto console=ttyS0,115200 LANG=en_US.UTF-8' = 'BOOT_IMAGE=/boot/vmlinuz-3.10.0-693.2.2.el7.x86_64 root=UUID=29342a0b-e20f-4676-9ecf-dfdf02ef6683 ro console=tty0 console=ttyS0,115200n8 crashkernel=auto console=ttyS0,115200 LANG=en_US.UTF-8' ']'
Oct 6 20:41:15 osnode01 nm-dispatcher: +++ return 1
Oct 6 20:41:15 osnode01 nm-dispatcher: +++ return 0
Oct 6 20:41:15 osnode01 nm-dispatcher: + '[' -f ../network ']'
Oct 6 20:41:15 osnode01 nm-dispatcher: + . ../network
Oct 6 20:41:15 osnode01 nm-dispatcher: ++ NETWORKING=yes
Oct 6 20:41:15 osnode01 nm-dispatcher: ++ NOZEROCONF=yes
Oct 6 20:41:15 osnode01 nm-dispatcher: + [[ dhcp4-change =~ ^(up|dhcp4-change|dhcp6-change)$ ]]
Oct 6 20:41:15 osnode01 nm-dispatcher: + NEEDS_RESTART=0
Oct 6 20:41:15 osnode01 nm-dispatcher: + UPSTREAM_DNS=/etc/dnsmasq.d/origin-upstream-dns.conf
Oct 6 20:41:15 osnode01 nm-dispatcher: ++ mktemp
Oct 6 20:41:15 osnode01 nm-dispatcher: + UPSTREAM_DNS_TMP=/tmp/tmp.5DzdaQo1tn
Oct 6 20:41:15 osnode01 nm-dispatcher: ++ mktemp
Oct 6 20:41:15 osnode01 nm-dispatcher: + UPSTREAM_DNS_TMP_SORTED=/tmp/tmp.Ie4FFsjAgL
Oct 6 20:41:15 osnode01 nm-dispatcher: ++ mktemp
Oct 6 20:41:15 osnode01 nm-dispatcher: + CURRENT_UPSTREAM_DNS_SORTED=/tmp/tmp.0ZlG7MgcgO
Oct 6 20:41:15 osnode01 nm-dispatcher: ++ mktemp
Oct 6 20:41:15 osnode01 nm-dispatcher: + NEW_RESOLV_CONF=/tmp/tmp.293w7YIsqD
Oct 6 20:41:15 osnode01 nm-dispatcher: ++ mktemp
Oct 6 20:41:15 osnode01 nm-dispatcher: + NEW_NODE_RESOLV_CONF=/tmp/tmp.D9exxlKVYt
Oct 6 20:41:15 osnode01 nm-dispatcher: ++ /sbin/ip route list match 0.0.0.0/0
Oct 6 20:41:15 osnode01 nm-dispatcher: ++ awk '{print $3 }'
Oct 6 20:41:15 osnode01 nm-dispatcher: + def_route=10.30.1.1
Oct 6 20:41:15 osnode01 nm-dispatcher: ++ /sbin/ip route get to 10.30.1.1
Oct 6 20:41:15 osnode01 nm-dispatcher: ++ awk '{print $3}'
Oct 6 20:41:15 osnode01 nm-dispatcher: + def_route_int=eth0
Oct 6 20:41:15 osnode01 nm-dispatcher: ++ /sbin/ip route get to 10.30.1.1
Oct 6 20:41:15 osnode01 nm-dispatcher: ++ awk '{print $5}'
Oct 6 20:41:15 osnode01 nm-dispatcher: + def_route_ip=10.30.1.43
Oct 6 20:41:15 osnode01 nm-dispatcher: + [[ eth0 == eth0 ]]
Oct 6 20:41:15 osnode01 nm-dispatcher: + '[' '!' -f /etc/dnsmasq.d/origin-dns.conf ']'
Oct 6 20:41:15 osnode01 nm-dispatcher: + grep -q 99-origin-dns.sh /etc/resolv.conf
Oct 6 20:41:15 osnode01 nm-dispatcher: ++ systemctl -q is-active dnsmasq.service
Oct 6 20:41:15 osnode01 nm-dispatcher: + '[' 0 -eq 1 ']'
Oct 6 20:41:15 osnode01 nm-dispatcher: ++ systemctl -q is-active dnsmasq.service
Oct 6 20:41:15 osnode01 nm-dispatcher: + grep -q 99-origin-dns.sh /etc/resolv.conf
Oct 6 20:41:15 osnode01 nm-dispatcher: + sed -e '/^nameserver.*$/d' /etc/resolv.conf
Oct 6 20:41:15 osnode01 nm-dispatcher: + echo 'nameserver 10.30.1.43'
Oct 6 20:41:15 osnode01 nm-dispatcher: + grep -q 'search.*cluster.local' /tmp/tmp.293w7YIsqD
Oct 6 20:41:15 osnode01 nm-dispatcher: + grep -qw search /tmp/tmp.293w7YIsqD
Oct 6 20:41:15 osnode01 nm-dispatcher: + cp -Z /tmp/tmp.293w7YIsqD /etc/resolv.conf
Oct 6 20:41:15 osnode01 nm-dispatcher: + rm -f /tmp/tmp.5DzdaQo1tn /tmp/tmp.Ie4FFsjAgL /tmp/tmp.0ZlG7MgcgO /tmp/tmp.293w7YIsqD
Oct 6 20:41:18 osnode01 origin-node: I1006 20:41:18.210035 56657 aws.go:936] Could not determine public DNS from AWS metadata.
Oct 6 20:41:18 osnode01 origin-node: W1006 20:41:18.246426 56657 cni.go:157] Unable to update cni config: No networks found in /etc/cni/net.d
Oct 6 20:41:18 osnode01 origin-node: E1006 20:41:18.246581 56657 kubelet.go:2072] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Oct 6 20:41:20 osnode01 origin-node: W1006 20:41:20.737092 56657 sdn_controller.go:38] Could not find an allocated subnet for node: osnode01.bdteam.local, Waiting...
Oct 6 20:41:20 osnode01 origin-node: F1006 20:41:20.737146 56657 node.go:309] error: SDN node startup failed: failed to get subnet for this host: osnode01.bdteam.local, error: timed out waiting for the condition
Oct 6 20:41:20 osnode01 systemd: origin-node.service: main process exited, code=exited, status=255/n/a
Oct 6 20:41:20 osnode01 dnsmasq[18837]: setting upstream servers from DBus
Oct 6 20:41:20 osnode01 dnsmasq[18837]: using nameserver 10.21.0.251#53
Oct 6 20:41:20 osnode01 dbus-daemon: dbus[632]: [system] Rejected send message, 0 matched rules; type="method_return", sender=":1.7943" (uid=0 pid=18837 comm="/usr/sbin/dnsmasq -k ") interface="(unset)" member="(unset)" error name="(unset)" requested_reply="0" destination=":1.9458" (uid=0 pid=56795 comm="/usr/bin/dbus-send --system --dest=uk.org.thekelle")
Oct 6 20:41:20 osnode01 dbus[632]: [system] Rejected send message, 0 matched rules; type="method_return", sender=":1.7943" (uid=0 pid=18837 comm="/usr/sbin/dnsmasq -k ") interface="(unset)" member="(unset)" error name="(unset)" requested_reply="0" destination=":1.9458" (uid=0 pid=56795 comm="/usr/bin/dbus-send --system --dest=uk.org.thekelle")
Oct 6 20:41:20 osnode01 systemd: Failed to start OpenShift Node.
Oct 6 20:41:20 osnode01 systemd: Unit origin-node.service entered failed state.
Oct 6 20:41:20 osnode01 systemd: origin-node.service failed. |
This looks like the bug that was fixed by #15865. The SDN code assumed that kubelet registration would happen "quickly", and would bail out with an error if it didn't, but apparently sometimes it takes a long time. Workaround: bring the node up for the first time with |
Oh, looking more carefully, @poonia0arun 's bug is not necessarily the same as the original bug report. The original bug report is fixed by #15865.
That could be a problem with public vs private IP addresses. Look at |
@danwinship that's true.. I am not using any SDN plugin specifically.. I believe ansible installation uses a default SDN and as you can see from error it's unable to initialize cni ...which result in failure of openshift origin installation |
I'm good with closing the issue then as fixed by #15865 |
See https://ci.openshift.redhat.com/jenkins/job/test_branch_origin_extended_templates/24/consoleFull
There have been similar incidents in the last few months but all issues I found referencing it have been closed so opening a new one.
@openshift/devex @stevekuznetsov @dobbymoodge FYI
The text was updated successfully, but these errors were encountered: