Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OS][master][202012] SONiC not start well in Celestica E1031 #6602

Closed
Blueve opened this issue Jan 29, 2021 · 11 comments · Fixed by sonic-net/sonic-linux-kernel#207
Closed

Comments

@Blueve
Copy link
Contributor

Blueve commented Jan 29, 2021

Description

System required dockers not start as expect in Celestica E1031.

Steps to reproduce the issue:

  1. Install OS from ONIE
  2. Run show version
  3. Run sudo docker ps after a few minutes

Run above test for SONiC.202012.13-c8b3a709 and SONiC.master.565-e616a329

Describe the results you received:

Test Result for SONiC.202012.13-c8b3a709

  Booting `SONiC-OS-202012.13-c8b3a709'
Loading SONiC-OS OS kernel ...               Loading SONiC-OS OS initial ramdisk ...
Loading SONiC-OS OS initial ramdisk ...                                         
                                                                                
                                                                                
                                                                                
[    3.647125] systemd[1]: Failed to lookup module alias 'autofs4': Function not implemented
[UNSUPP] Starting of Arbitrary Exec…Automount Point not supported.
[  OK  ] Started Forward Password R…uests to Wall Directory Watch.
[  OK  ] Created slice system-getty.slice.
[  OK  ] Reached target Remote File Systems.
[  OK  ] Listening on udev Kernel Socket.
         Mounting Kernel Debug File System...
[  OK  ] Listening on udev Control Socket.
         Starting udev Coldplug all Devices...
[  OK  ] Listening on Journal Audit Socket.
[  OK  ] Listening on Journal Socket (/dev/log).
[  OK  ] Created slice User and Session Slice.
[  OK  ] Reached target System Time Synchronized.
[  OK  ] Started Dispatch Password …ts to Console Directory Watch.
[  OK  ] Reached target Local Encrypted Volumes.
[  OK  ] Reached target Paths.
         Mounting Huge Pages File System...
[  OK  ] Reached target Slices.
[  OK  ] Started ifupdown2 networking initialization.
[  OK  ] Started Remount Root and Kernel File Systems.
[  OK  ] Mounted POSIX Message Queue File System.
[FAILED] Failed to start Load Kernel Modules.
See 'systemctl status systemd-modules-load.service' for details.
[  OK  ] Mounted Kernel Debug File System.
[  OK  ] Mounted Huge Pages File System.
         Starting Apply Kernel Variables...
         Starting Create System Users...
         Starting Load/Save Random Seed...
[  OK  ] Started udev Coldplug all Devices.
[  OK  ] Started Apply Kernel Variables.
[  OK  ] Started Create System Users.
[  OK  ] Started Load/Save Random Seed.
         Starting Create Static Device Nodes in /dev...
[  OK  ] Started Create Static Device Nodes in /dev.
[  OK  ] Reached target Local File Systems (Pre).
[  OK  ] Reached target Local File Systems.
         Starting Load AppArmor profiles...
[  OK  ] Listening on Syslog Socket.
         Starting Journal Service...
         Starting ebtables ruleset management...
         Starting netfilter persistent configuration...
         Starting udev Kernel Device Manager...
[  OK  ] Started Journal Service.
[  OK  ] Started udev Kernel Device Manager.
[  OK  ] Started Load AppArmor profiles.
[FAILED] Failed to start netfilter persistent configuration.
See 'systemctl status netfilter-persistent.service' for details.
[  OK  ] Found device /dev/ttyS1.
[  OK  ] Started ebtables ruleset management.
[  OK  ] Reached target Network (Pre).
         Starting Flush Journal to Persistent Storage...
[  OK  ] Started Flush Journal to Persistent Storage.
         Starting Create Volatile Files and Directories...
[  OK  ] Started Create Volatile Files and Directories.
         Starting Update UTMP about System Boot/Shutdown...
[  OK  ] Started Entropy daemon using the HAVEGE algorithm.
[  OK  ] Started Update UTMP about System Boot/Shutdown.
[  OK  ] Reached target System Initialization.
         Starting Docker Socket for the API.
[  OK  ] Started Delays management …ainer until SONiC has started.
[  OK  ] Started Daily apt download activities.
[  OK  ] Listening on D-Bus System Message Bus Socket.
[  OK  ] Started Daily Cleanup of Temporary Directories.
[  OK  ] Started Delays telemetry c…ainer until SONiC has started.
[  OK  ] Started Daily rotation of log files.
[  OK  ] Started Daily apt upgrade and clean activities.
[  OK  ] Started Discard unused blocks once a week.
[  OK  ] Started Start the pcie-che…service 10 seconds after boot.
[  OK  ] Started Delays process-reb…l network is stably connected.
[  OK  ] Listening on Docker Socket for the API.
[  OK  ] Reached target Sockets.
[  OK  ] Reached target Basic System.
         Starting LSB: Execute the …-e command to reboot system...
         Starting LSB: service and resource monitoring daemon...
         Starting LSB: Set sysfs variables from /etc/sysfs.conf...
         Starting Celestica haliburton platform modules...
         Starting System Logging Service...
         Starting RAS daemon to log the RAS events...
         Starting OpenBSD Secure Shell server...
         Starting Login Service...
         Starting Kernel crash dump capture service...
[  OK  ] Started Regular background program processing daemon.
         Starting Permit User Sessions...
[  OK  ] Started D-Bus System Message Bus.
         Starting containerd container runtime...
         Starting Initialize EDAC v…rivers For Machine Hardware...
         Starting /etc/rc.local Compatibility...
         Starting Opennsl kernel modules init...
[   11.502725] rc.local[421]: + cat /etc/sonic/sonic_version.yml
[  OK  ] Started System Logging Service.
[   11.720647] rc.local[421]: + grep build_version
[  OK  ] Started RAS daemon to log the RAS events.
[   11.855611] rc.local[421]: + sed -e s/build_version: //g;s/'//g
[  OK  ] Started Permit User Sessions.
[   12.019918] rc.local[421]: + SONIC_VERSION=202012.13-c8b3a709
[  OK  ] Started OpenBSD Secure Shell server.
[   12.166360] rc.local[421]: + FIRST_BOOT_FILE=/host/image-202012.13-c8b3a709/platform/firsttime
[   12.350417] rc.local[421]: + SONIC_CONFIG_DIR=/host/image-202012.13-c8b3a709/sonic-config
[   12.454423] rc.local[421]: + SONIC_ENV_FILE=/host/image-202012.13-c8b3a709/sonic-config/sonic-environment
[   12.575250] rc.local[421]: + [ -d /host/image-202012.13-c8b3a709/sonic-config -a -f /host/image-202012.13-c8b3a709/sonic-config/sonic-environment ]
[  OK  ] Started Initialize EDAC v3… Drivers For Machine Hardware.
[  OK  ] Started Opennsl kernel modules init.
[   12.741129] rc.local[421]: + logger SONiC version 202012.13-c8b3a709 starting up...
[  OK  ] Started LSB: Execute the k…c -e command to reboot system.
[   13.025776] rc.local[421]: + grub_installation_needed=
[  OK  ] Started LSB: service and resource monitoring daemon.
[   13.208543] rc.local[421]: + [ ! -e /host/machine.conf ]
[  OK  ] Started containerd container runtime.
[   13.374421] rc.local[421]: + migrate_nos_configuration
[  OK  ] Started LSB: Set sysfs variables from /etc/sysfs.conf.
[   13.522423] rc.local[421]: + rm -rf /host/migration
[   13.688358] rc.local[421]: + mkdir -p /host/migration
[   13.767175] rc.local[421]: + cat /proc/cmdline
[   13.829696] kdump-tools[413]: /etc/init.d/kdump-tools: 117: /etc/default/kdump-tools: KDUMP_CMDLINE_APPEND+= panic=10 debug hpet=disable pcie_port=compat pci=nommconf sonic_platform=__PLATFORM__: not found
[   14.062999] rc.local[421]: + set -- BOOT_IMAGE=/image-202012.13-c8b3a709/boot/vmlinuz-4.19.0-9-2-amd64 root=UUID=508f74ff-aadf-4512-b894-62110a36d599 rw console=tty0 console=ttyS1,9600n8 quiet intel_idle.max_cstate=0 net.ifnames=0 biosdevname=0 loop=image-202012.13-c8b3a709/fs.squashfs loopfstype=squashfs apparmor=1 security=apparmor varlog_size=4096 usbcore.autosuspend=-1 module_blacklist=gpio_ich
         Starting Docker Application Container Engine...
[   14.496904] rc.local[421]: + [ -n  ]
         Starting LSB: Load kernel image with kexec...
[  OK  ] Started Kernel crash dump capture service.
[   14.624157] rc.local[421]: + . /host/machine.conf
[  OK  ] Started Login Service.
[   14.863208] rc.local[421]: + onie_version=2015.05.0.0.3
[  OK  ] Started LSB: Load kernel image with kexec.
[   15.002399] rc.local[421]: + onie_vendor_id=12244
[   15.150487] rc.local[421]: + onie_platform=x86_64-cel_e1031-r0
[   15.226389] rc.local[421]: + onie_machine=cel_e1031
[   15.286723] rc.local[421]: + onie_machine_rev=0
[   15.346399] rc.local[421]: + onie_arch=x86_64
[   15.406395] rc.local[421]: + onie_config_version=1
[   15.466516] rc.local[421]: + onie_build_date=2018-05-14T19:01-0400
[   15.542700] rc.local[421]: + onie_partition_type=gpt
[   15.614418] rc.local[421]: + onie_kernel_version=3.2.35
[   15.686413] rc.local[421]: + program_console_speed
[   15.752665] rc.local[421]: + cat /proc/cmdline
[   15.816144] rc.local[421]: + grep -Eo console=ttyS[0-9]+,[0-9]+
[   15.890894] kdump-tools[413]: Starting kdump-tools: no crashkernel= parameter in the kernel cmdline ... failed!
[   16.027365] rc.local[421]: + cut -d , -f2
[   16.096768] rc.local[421]: + speed=9600
[   16.162387] rc.local[421]: + [ -z 9600 ]
[   16.214376] rc.local[421]: + CONSOLE_SPEED=9600
[   16.274435] rc.local[421]: + sed -i s|\-\-keep\-baud .* %I| 9600 %I|g /lib/systemd/system/serial-getty@.service
[   16.402430] rc.local[421]: + systemctl daemon-reload
[   16.462429] rc.local[421]: + [ -f /host/image-202012.13-c8b3a709/platform/firsttime ]
[  OK  ] Stopped Celestica haliburton platform modules.
[   16.570801] rc.local[421]: + echo First boot detected. Performing first boot tasks...
[  OK  ] Started fan speed regulator.
[   16.768592] rc.local[421]: First boot detected. Performing first boot tasks...
[   16.939865] rc.local[421]: + [ -n  ]
[FAILED] Failed to start Docker Application Container Engine.
See 'systemctl status docker.service' for details.[   16.994408] rc.local
[421]: + [ -n x86_64-cel_e1031-r0 ]
[DEPEND] Dependency failed for database-chassis container.
[DEPEND] Dependency failed for Database container.
[DEPEND] Dependency failed for Proc…tilization data export daemon.
[DEPEND] Dependency failed for switch state service.
[DEPEND] Dependency failed for Conf…ization and migration service.
[   17.223248] rc.local[421]: + platform=x86_64-cel_e1031-r0
[DEPEND] Dependency failed for Upda…figuration based on minigraph.
[DEPEND] Dependency failed for Update rsyslog configuration.
[DEPEND] Dependency failed for Update hostname based on configdb.
[   17.777766] rc.local[421]: + [ -d /host/old_config ]
[DEPEND] Dependency failed for Host config enforcer daemon.
[DEPEND] Dependency failed for Update NTP configuration.
[DEPEND] Dependency failed for Update CoPP configuration.
[DEPEND] Dependency failed for Platform monitor container.
[DEPEND] Dependency failed for SONiC system health monitor.
[DEPEND] Dependency failed for BGP container.
[DEPEND] Dependency failed for Router advertiser container.
[DEPEND] Dependency failed for syncd service.
[DEPEND] Dependency failed for TEAMD container.
[DEPEND] Dependency failed for Cont…lane ACL configuration daemon.
[DEPEND] Dependency failed for DHCP relay container.
[   18.150392] rc.local[421]: + [ -f /host/minigraph.xml ]
[DEPEND] Dependency failed for Update interfaces configuration.
[   19.229320] rc.local[421]: + [ -n  ]
[DEPEND] Dependency failed for LLDP container.
[DEPEND] Dependency failed for Moni…nd disable warmboot when done.
[   19.378404] rc.local[421]: + touch /tmp/pending_config_initialization
[   19.666428] rc.local[421]: + touch /tmp/notify_firstboot_to_platform
         Starting Network Time Service...
[  OK  ] Started watchdog control service.
[  OK  ] Started Delays snmp container until SONiC has started.
[  OK  ] Reached target Timers.
[   19.754457] rc.local[421]: + [ ! -d /host/reboot-cause/platform ]
[   20.142448] rc.local[421]: + [ -d /host/image-202012.13-c8b3a709/platform/x86_64-cel_e1031-r0 ]
[   20.251245] rc.local[421]: + dpkg -i /host/image-202012.13-c8b3a709/platform/x86_64-cel_e1031-r0/platform-modules-haliburton_0.9_amd64.deb
[   20.408034] rc.local[421]: (Reading database ... 29810 files and directories currently installed.)
[   20.518420] rc.local[421]: Preparing to unpack .../platform-modules-haliburton_0.9_amd64.deb ...
[   20.626404] rc.local[421]: Unpacking platform-modules-haliburton (0.9) over (0.9) ...
[  OK  ] Started Network Time Service.
[  OK  ] Stopped Docker Application Container Engine.
[  OK  ] Closed Docker Socket for the API.
         Stopping Docker Socket for the API.
         Starting Docker Socket for the API.
[  OK  ] Listening on Docker Socket for the API.
         Starting Docker Application Container Engine...
[FAILED] Failed to start Docker Application Container Engine.
See 'systemctl status docker.service' for details.
[   23.624804] rc.local[421]: Setting up platform-modules-haliburton (0.9) ...
[  OK  ] Stopped Docker Application Container Engine.
[  OK  ] Closed Docker Socket for the API.
         Stopping Docker Socket for the API.
[   25.178016] rc.local[421]: Synchronizing state of platform-modules-haliburton.service with SysV service script with /lib/systemd/systemd-sysv-install.
         Starting Docker Socket for the API.
[   25.370421] rc.local[421]: Executing: /lib/systemd/systemd-sysv-install enable platform-modules-haliburton
[  OK  ] Listening on Docker Socket for the API.
         Starting Docker Application Container Engine...
[  OK  ] Started Docker Application Container Engine.
         Starting Celestica haliburton [   27.808282] i2c i2c-0: Failed to register i2c client pca9548 at 0x73 (-16)
platform modules...
[  OK  ] Started Celestica haliburton platform modules.
[   33.873211] rc.local[421]: DEPRECATION: Python 2.7 reached the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 is no longer maintained. pip 21.0 will drop support for Python 2.7 in January 2021. More details about Python 2 support in pip can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support pip 21.0 will remove support for this functionality.
[   34.314573] rc.local[421]: Processing /usr/share/sonic/device/x86_64-cel_e1031-r0/sonic_platform-1.0-py2-none-any.whl
[   34.455953] rc.local[421]: Installing collected packages: sonic-platform
[   34.572025] rc.local[421]: Successfully installed sonic-platform-1.0
[   34.799324] rc.local[421]: Processing triggers for systemd (241-7~deb10u5) ...
[   35.298948] rc.local[421]: + sync
[   35.346141] rc.local[421]: + [ -n x86_64-cel_e1031-r0 ]
[  OK  ] Started /etc/rc.local Compatibility.
[   35.410504] rc.local[421]: + [ -n  ]
[  OK  ] Started Getty on tty1.
[   35.530395] rc.local[421]: + mkdir -p /var/platform
[  OK  ] Started Serial Getty on ttyS1.
[   35.650386] rc.local[421]: + ebtables_config
[  OK  ] Reached target Login Prompts.
[   35.782398] rc.local[421]: + /usr/sbin/ebtables-restore
         Starting Config chassis_db...
[  OK  ] Started Check the PCIe device presence and status.
[  OK  ] Started Reboot cause determination service.
[   35.926463] rc.local[421]: + /usr/sbin/ebtables -t filter --atomic-file /etc/ebtables.filter --atomic-save
[   36.298759] rc.local[421]: + sed -i -e s/__PLATFORM__/x86_64-cel_e1031-r0/g /etc/default/kdump-tools
[   36.418477] rc.local[421]: + firsttime_exit
[   36.474448] rc.local[421]: + rm -rf /host/image-202012.13-c8b3a709/platform/firsttime
[  OK  ] Started Config chassis_db.
[   36.586830] rc.local[421]: + exit 0
[  OK  ] Reached target Multi-User System.
[  OK  ] Reached target Graphical Interface.
         Starting Update UTMP about System Runlevel Changes...
[  OK  ] Started Update UTMP about System Runlevel Changes.

Debian GNU/Linux 10 sonic ttyS1

Show version

admin@sonic:~$ show version

SONiC Software Version: SONiC.202012.13-c8b3a709
Distribution: Debian 10.7
Kernel: 4.19.0-9-2-amd64
Build commit: c8b3a709
Build date: Sun Jan 24 18:15:04 UTC 2021
Built by: johnar@jenkins-worker-4

Platform: x86_64-cel_e1031-r0
HwSKU: Celestica-E1031-T48S4
ASIC: broadcom
ASIC Count: 1
/usr/local/bin/decode-syseeprom : ERROR : Failed to read eeprom : [Errno 2] No such file or directory: '/sys/class/i2c-adapter/i2c-2/2-0050/eeprom'
Serial Number: 
Uptime: 08:33:04 up 6 min,  1 user,  load average: 6.68, 2.22, 0.82

Docker images:
REPOSITORY                    TAG                  IMAGE ID            SIZE
docker-syncd-brcm             202012.13-c8b3a709   45efbad0189a        642MB
docker-syncd-brcm             latest               45efbad0189a        642MB
docker-snmp                   202012.13-c8b3a709   ecf3c6d7184e        436MB
docker-snmp                   latest               ecf3c6d7184e        436MB
docker-teamd                  202012.13-c8b3a709   04735115bac6        406MB
docker-teamd                  latest               04735115bac6        406MB
docker-nat                    202012.13-c8b3a709   eb1946647f23        408MB
docker-nat                    latest               eb1946647f23        408MB
docker-router-advertiser      202012.13-c8b3a709   e447b03b927a        395MB
docker-router-advertiser      latest               e447b03b927a        395MB
docker-platform-monitor       202012.13-c8b3a709   6e9e777e59fb        604MB
docker-platform-monitor       latest               6e9e777e59fb        604MB
docker-lldp                   202012.13-c8b3a709   7b2ddb771ca0        435MB
docker-lldp                   latest               7b2ddb771ca0        435MB
docker-dhcp-relay             202012.13-c8b3a709   a013afe4444b        402MB
docker-dhcp-relay             latest               a013afe4444b        402MB
docker-sonic-mgmt-framework   202012.13-c8b3a709   aae6f04cbb50        613MB
docker-sonic-mgmt-framework   latest               aae6f04cbb50        613MB
docker-orchagent              202012.13-c8b3a709   12a3223cf4e1        424MB
docker-orchagent              latest               12a3223cf4e1        424MB
docker-sonic-telemetry        202012.13-c8b3a709   147f83d96865        470MB
docker-sonic-telemetry        latest               147f83d96865        470MB
docker-fpm-frr                202012.13-c8b3a709   a10a24092af0        422MB
docker-fpm-frr                latest               a10a24092af0        422MB
docker-sflow                  202012.13-c8b3a709   5e51bc3bcb30        406MB
docker-sflow                  latest               5e51bc3bcb30        406MB
docker-database               202012.13-c8b3a709   0d5b77bc9804        394MB
docker-database               latest               0d5b77bc9804        394MB
admin@sonic:~$ sudo docker ps
CONTAINER ID        IMAGE                                COMMAND                  CREATED             STATUS              PORTS               NAMES
8b494a0001e5        docker-sonic-mgmt-framework:latest   "/usr/local/bin/supe…"   2 minutes ago       Up 2 minutes                            mgmt-framework
946133c98f91        docker-sonic-telemetry:latest        "/usr/local/bin/supe…"   2 minutes ago       Up 2 minutes                            telemetry
79369bc4b810        docker-database:latest               "/usr/local/bin/dock…"   4 minutes ago       Up 4 minutes                            database
admin@sonic:~$ show interface status
  Interface    Lanes    Speed    MTU    FEC    Alias    Vlan    Oper    Admin    Type    Asym PFC
-----------  -------  -------  -----  -----  -------  ------  ------  -------  ------  ----------

Test Result for SONiC.master.565-e616a329

[    3.582468] systemd[1]: Failed to lookup module alias 'autofs4': Function not implemented
[UNSUPP] Starting of Arbitrary Exec…Automount Point not supported.
[  OK  ] Listening on udev Control Socket.
[  OK  ] Listening on Journal Socket.
         Starting ifupdown2 networking initialization...
         Mounting Kernel Debug File System...
         Starting Remount Root and Kernel File Systems...
[  OK  ] Reached target Network is Online.
[  OK  ] Started Dispatch Password …ts to Console Directory Watch.
[  OK  ] Reached target Swap.
[  OK  ] Created slice User and Session Slice.
         Mounting Huge Pages File System...
[  OK  ] Listening on udev Kernel Socket.
         Starting udev Coldplug all Devices...
[  OK  ] Reached target Slices.
[  OK  ] Created slice system-serial\x2dgetty.slice.
[  OK  ] Created slice system-getty.slice.
[  OK  ] Reached target Remote File Systems.
[  OK  ] Started Forward Password R…uests to Wall Directory Watch.
[  OK  ] Reached target Paths.
[  OK  ] Listening on initctl Compatibility Named Pipe.
         Starting Load Kernel Modules...
[  OK  ] Listening on Journal Socket (/dev/log).
[  OK  ] Listening on Journal Audit Socket.
[  OK  ] Reached target System Time Synchronized.
[  OK  ] Reached target Local Encrypted Volumes.
         Mounting POSIX Message Queue File System...
[  OK  ] Started ifupdown2 networking initialization.
[  OK  ] Mounted Kernel Debug File System.
[  OK  ] Started Remount Root and Kernel File Systems.
[  OK  ] Mounted Huge Pages File System.
[FAILED] Failed to start Load Kernel Modules.
See 'systemctl status systemd-modules-load.service' for details.
[  OK  ] Mounted POSIX Message Queue File System.
[  OK  ] Started udev Coldplug all Devices.
         Starting Apply Kernel Variables...
         Starting Create System Users...
         Starting Load/Save Random Seed...
[  OK  ] Started Apply Kernel Variables.
[  OK  ] Started Create System Users.
[  OK  ] Started Load/Save Random Seed.
         Starting Create Static Device Nodes in /dev...
[  OK  ] Started Create Static Device Nodes in /dev.
         Starting udev Kernel Device Manager...
[  OK  ] Reached target Local File Systems (Pre).
[  OK  ] Reached target Local File Systems.
         Starting Load AppArmor profiles...
         Starting ebtables ruleset management...
[  OK  ] Listening on Syslog Socket.
         Starting Journal Service...
         Starting netfilter persistent configuration...
[  OK  ] Started udev Kernel Device Manager.
[  OK  ] Started Load AppArmor profiles.
[FAILED] Failed to start netfilter persistent configuration.
See 'systemctl status netfilter-persistent.service' for details.
[  OK  ] Started Journal Service.
[  OK  ] Found device /dev/ttyS1.
[  OK  ] Started ebtables ruleset management.
         Starting Flush Journal to Persistent Storage...
[  OK  ] Reached target Network (Pre).
[  OK  ] Started Flush Journal to Persistent Storage.
         Starting Create Volatile Files and Directories...
[  OK  ] Started Create Volatile Files and Directories.
[  OK  ] Started Entropy daemon using the HAVEGE algorithm.
         Starting Update UTMP about System Boot/Shutdown...
[  OK  ] Started Update UTMP about System Boot/Shutdown.
[  OK  ] Reached target System Initialization.
[  OK  ] Listening on D-Bus System Message Bus Socket.
[  OK  ] Started Start the pcie-che…service 10 seconds after boot.
[  OK  ] Started Delays process-reb…l network is stably connected.
[  OK  ] Started Daily Cleanup of Temporary Directories.
[  OK  ] Started Daily apt download activities.
[  OK  ] Started Daily apt upgrade and clean activities.
[  OK  ] Started Daily rotation of log files.
         Starting Docker Socket for the API.
[  OK  ] Started Delays telemetry c…ainer until SONiC has started.
[  OK  ] Started Discard unused blocks once a week.
[  OK  ] Started Delays management …ainer until SONiC has started.
[  OK  ] Listening on Docker Socket for the API.
[  OK  ] Reached target Sockets.
[  OK  ] Reached target Basic System.
         Starting LSB: Execute the …-e command to reboot system...
         Starting /etc/rc.local Compatibility...
[   11.040566] rc.local[391]: + cat /etc/sonic/sonic_version.yml
         Starting Permit User Sessions...
[   11.178859] rc.local[391]: + grep build_version
         Starting Initialize EDAC v…rivers For Machine Hardware...
         Starting Opennsl kernel modules init...
         Starting RAS daemon to log the RAS events...
[   11.298383] rc.local[391]: + sed -e s/build_version: //g;s/'//g
         Starting LSB: Set sysfs variables from /etc/sysfs.conf...
[   11.632954] rc.local[391]: + SONIC_VERSION=master.565-e616a329
         Starting System Logging Service...
[   11.802015] rc.local[391]: + FIRST_BOOT_FILE=/host/image-master.565-e616a329/platform/firsttime
         Starting LSB: service and resource monitoring daemon...
[   11.977966] rc.local[391]: + SONIC_CONFIG_DIR=/host/image-master.565-e616a329/sonic-config
         Starting Login Service...
[   12.166803] rc.local[391]: + SONIC_ENV_FILE=/host/image-master.565-e616a329/sonic-config/sonic-environment
         Starting OpenBSD Secure Shell server...
[   12.346015] rc.local[391]: + [ -d /host/image-master.565-e616a329/sonic-config -a -f /host/image-master.565-e616a329/sonic-config/sonic-environment ]
         Starting Celestica haliburton platform modules...
[   12.590005] rc.local[391]: + logger SONiC version master.565-e616a329 starting up...
         Starting containerd container runtime...
[   12.782009] rc.local[391]: + grub_installation_needed=
[  OK  ] Started Regular background program processing daemon.
[   12.925972] rc.local[391]: + [ ! -e /host/machine.conf ]
[  OK  ] Started D-Bus System Message Bus.
[   13.090018] rc.local[391]: + migrate_nos_configuration
         Starting Kernel crash dump capture service...
[   13.238767] rc.local[391]: + rm -rf /host/migration
[  OK  ] Started System Logging Service.
[   13.378013] rc.local[391]: + mkdir -p /host/migration
[  OK  ] Started Permit User Sessions.
[   13.529377] rc.local[391]: + cat /proc/cmdline
[  OK  ] Started Initialize EDAC v3… Drivers For Machine Hardware.
[   13.665072] rc.local[391]: + set -- BOOT_IMAGE=/image-master.565-e616a329/boot/vmlinuz-4.19.0-9-2-amd64 root=UUID=a5efd993-0a35-48a2-891e-eae4db4e358f rw console=tty0 console=ttyS1,9600n8 quiet intel_idle.max_cstate=0 net.ifnames=0 biosdevname=0 loop=image-master.565-e616a329/fs.squashfs loopfstype=squashfs apparmor=1 security=apparmor varlog_size=4096 usbcore.autosuspend=-1 module_blacklist=gpio_ich
[  OK  ] Started Opennsl kernel modules init.
[   14.201974] rc.local[391]: + [ -n  ]
[  OK  ] Started RAS daemon to log the RAS events.
[   14.322749] rc.local[391]: + . /host/machine.conf
[   14.470009] rc.local[391]: + onie_version=2015.05.0.0.3
[   14.543057] rc.local[391]: + onie_vendor_id=12244
[   14.608161] rc.local[391]: + onie_platform=x86_64-cel_e1031-r0
[   14.682032] rc.local[391]: + onie_machine=cel_e1031
[   14.742062] rc.local[391]: + onie_machine_rev=0
[   14.801979] rc.local[391]: + onie_arch=x86_64
[   14.862189] rc.local[391]: + onie_config_version=1
[   14.922297] rc.local[391]: + onie_build_date=2018-05-14T19:01-0400
[  OK  ] Started OpenBSD Secure Shell server.
[   15.004578] rc.local[391]: + onie_partition_type=gpt
[   15.154008] rc.local[391]: + onie_kernel_version=3.2.35
[  OK  ] Started LSB: Execute the k…c -e command to reboot system.
[   15.233768] rc.local[391]: + program_console_speed
[  OK  ] Started LSB: Set sysfs variables from /etc/sysfs.conf.
[   15.418056] rc.local[391]: + cat /proc/cmdline
         Starting LSB: Load kernel image with kexec...
[   15.570567] rc.local[391]: + grep -Eo console=ttyS[0-9]+,[0-9]+
[  OK  ] Started LSB: service and resource monitoring daemon.
[  OK  ] Started LSB: Load kernel image with kexec.
[   15.738982] rc.local[391]: + cut -d , -f2
[  OK  ] Started containerd container runtime.
[   15.972386] rc.local[391]: + speed=9600
         Starting Docker Application Container Engine...
[  OK  ] Started Kernel crash dump capture service.
[   16.112292] rc.local[391]: + [ -z 9600 ]
[  OK  ] Started Login Service.
[   16.349424] rc.local[391]: + CONSOLE_SPEED=9600
[   16.474900] rc.local[391]: + sed -i s|\-\-keep\-baud .* %I| 9600 %I|g /lib/systemd/system/serial-getty@.service
[   16.602055] rc.local[391]: + systemctl daemon-reload
[   16.662017] rc.local[391]: + [ -f /host/image-master.565-e616a329/platform/firsttime ]
[   16.766137] rc.local[391]: + echo First boot detected. Performing first boot tasks...
[   16.870000] rc.local[391]: First boot detected. Performing first boot tasks...
[   16.962036] rc.local[391]: + [ -n  ]
[   17.013988] rc.local[391]: + [ -n x86_64-cel_e1031-r0 ]
[   17.086361] rc.local[391]: + platform=x86_64-cel_e1031-r0
[   17.157990] rc.local[391]: + [ -d /host/old_config ]
[   17.218020] rc.local[391]: + [ -f /host/minigraph.xml ]
[   17.290018] rc.local[391]: + [ -n  ]
[  OK  ] Stopped Celestica haliburton platform modules.
[  OK  ] Started fan speed regulator.
[   17.340163] rc.local[391]: + touch /tmp/pending_config_initialization
[   17.592304] rc.local[391]: + touch /tmp/notify_firstboot_to_platform
[   17.677990] rc.local[391]: + [ ! -d /host/reboot-cause/platform ]
[   17.754000] rc.local[391]: + [ -d /host/image-master.565-e616a329/platform/x86_64-cel_e1031-r0 ]
[   17.861989] rc.local[391]: + dpkg -i /host/image-master.565-e616a329/platform/x86_64-cel_e1031-r0/platform-modules-haliburton_0.9_amd64.deb
[   18.025534] rc.local[391]: (Reading database ... 29548 files and directories currently installed.)
[   18.134002] rc.local[391]: Preparing to unpack .../platform-modules-haliburton_0.9_amd64.deb ...
[   18.242000] rc.local[391]: Unpacking platform-modules-haliburton (0.9) over (0.9) ...
[FAILED] Failed to start Docker Application Container Engine.
See 'systemctl status docker.service' for details.
[   18.350322] kdump-tools[447]: /etc/init.d/kdump-tools: 117: /etc/default/kdump-tools: KDUMP_CMDLINE_APPEND+= panic=10 debug hpet=disable pcie_port=compat pci=nommconf sonic_platform=__PLATFORM__: not found
[DEPEND] Dependency failed for Database container.
[DEPEND] Dependency failed for BGP container.
[DEPEND] Dependency failed for switch state service.
[DEPEND] Dependency failed for Proc…tilization data export daemon.
[DEPEND] Dependency failed for Moni…nd disable warmboot when done.
[DEPEND] Dependency failed for LLDP container.
[DEPEND] Dependency failed for syncd service.
[DEPEND] Dependency failed for SONiC system health monitor.
[DEPEND] Dependency failed for Conf…ization and migration service.
[DEPEND] Dependency failed for Upda…figuration based on minigraph.
[DEPEND] Dependency failed for Cont…lane ACL configuration daemon.
[DEPEND] Dependency failed for Update NTP configuration.
[DEPEND] Dependency failed for Router advertiser container.
[DEPEND] Dependency failed for Platform monitor container.
[DEPEND] Dependency failed for Update interfaces configuration.
[DEPEND] Dependency failed for Host config enforcer daemon.
[DEPEND] Dependency failed for Update hostname based on configdb.
[DEPEND] Dependency failed for Update CoPP configuration.
[DEPEND] Dependency failed for DHCP relay container.
[DEPEND] Dependency failed for TEAMD container.
[DEPEND] Dependency failed for Update rsyslog configuration.
[DEPEND] Dependency failed for database-chassis container.
[   18.727766] kdump-tools[447]: Starting kdump-tools: no crashkernel= parameter in the kernel cmdline ... failed!
         Starting Network Time Service...
[  OK  ] Started watchdog control service.
[  OK  ] Started Delays snmp container until SONiC has started.
[  OK  ] Reached target Timers.
[  OK  ] Started Network Time Service.
[  OK  ] Stopped Docker Application Container Engine.
[  OK  ] Closed Docker Socket for the API.
         Stopping Docker Socket for the API.
         Starting Docker Socket for the API.
[  OK  ] Listening on Docker Socket for the API.
         Starting Docker Application Container Engine...
[FAILED] Failed to start Docker Application Container Engine.
See 'systemctl status docker.service' for details.
[   24.232979] rc.local[391]: Setting up platform-modules-haliburton (0.9) ...
[  OK  ] Stopped Docker Application Container Engine.
[  OK  ] Closed Docker Socket for the API.
         Stopping Docker Socket for the API.
         Starting Docker Socket for the API.
[  OK  ] Listening on Docker Socket for the API.
         Starting Docker Application Container Engine...
[   25.833221] rc.local[391]: Synchronizing state of platform-modules-haliburton.service with SysV service script with /lib/systemd/systemd-sysv-install.
[   26.002114] rc.local[391]: Executing: /lib/systemd/systemd-sysv-install enable platform-modules-haliburton
[  OK  ] Started Docker Application Container Engine.
         Starting Celestica haliburton [   27.809849] i2c i2c-0: Failed to register i2c client pca9548 at 0x73 (-16)
platform modules...
[  OK  ] Started Celestica haliburton platform modules.
[   33.718758] rc.local[391]: DEPRECATION: Python 2.7 reached the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 is no longer maintained. pip 21.0 will drop support for Python 2.7 in January 2021. More details about Python 2 support in pip can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support pip 21.0 will remove support for this functionality.
[   34.162184] rc.local[391]: Processing /usr/share/sonic/device/x86_64-cel_e1031-r0/sonic_platform-1.0-py2-none-any.whl
[   34.372997] rc.local[391]: Installing collected packages: sonic-platform
[   34.494511] rc.local[391]: Successfully installed sonic-platform-1.0
[   34.713588] rc.local[391]: Processing triggers for systemd (241-7~deb10u5) ...
[   35.221893] rc.local[391]: + sync
[   35.267080] rc.local[391]: + [ -n x86_64-cel_e1031-r0 ]
[  OK  ] Started /etc/rc.local Compatibility.
[   35.342128] rc.local[391]: + [ -n  ]
         Starting Config chassis_db...
[   35.476283] rc.local[391]: + mkdir -p /var/platform
[  OK  ] Started Reboot cause determination service.
[  OK  ] Started Check the PCIe device presence and status.
[   35.612220] rc.local[391]: + ebtables_config
[  OK  ] Started Getty on tty1.
[   35.858034] rc.local[391]: + /usr/sbin/ebtables-restore
[  OK  ] Started Serial Getty on ttyS1.
[   35.994044] rc.local[391]: + /usr/sbin/ebtables -t filter --atomic-file /etc/ebtables.filter --atomic-save
[  OK  ] Reached target Login Prompts.
[   36.196244] rc.local[391]: + sed -i -e s/__PLATFORM__/x86_64-cel_e1031-r0/g /etc/default/kdump-tools
[  OK  ] Started Config chassis_db.
[   36.400572] rc.local[391]: + firsttime_exit
[   36.526060] rc.local[391]: + rm -rf /host/image-master.565-e616a329/platform/firsttime
[   36.630051] rc.local[391]: + exit 0
[  OK  ] Reached target Multi-User System.
[  OK  ] Reached target Graphical Interface.
         Starting Update UTMP about System Runlevel Changes...
[  OK  ] Started Update UTMP about System Runlevel Changes.

Debian GNU/Linux 10 sonic ttyS1
admin@sonic:~$ show version

SONiC Software Version: SONiC.master.565-e616a329
Distribution: Debian 10.7
Kernel: 4.19.0-9-2-amd64
Build commit: e616a329
Build date: Wed Jan 27 09:04:10 UTC 2021
Built by: johnar@jenkins-worker-8

Platform: x86_64-cel_e1031-r0
HwSKU: Celestica-E1031-T48S4
ASIC: broadcom
ASIC Count: 1
/usr/local/bin/decode-syseeprom : ERROR : Failed to read eeprom : [Errno 2] No such file or directory: '/sys/class/i2c-adapter/i2c-2/2-0050/eeprom'
Serial Number: 
Uptime: 08:53:13 up 5 min,  1 user,  load average: 1.30, 0.56, 0.23

Docker images:
REPOSITORY                    TAG                   IMAGE ID            SIZE
docker-syncd-brcm             latest                aed8504c10f7        642MB
docker-syncd-brcm             master.565-e616a329   aed8504c10f7        642MB
docker-snmp                   latest                6022d6181cd4        436MB
docker-snmp                   master.565-e616a329   6022d6181cd4        436MB
docker-teamd                  latest                a91e58f252e5        406MB
docker-teamd                  master.565-e616a329   a91e58f252e5        406MB
docker-router-advertiser      latest                1d06a5a10073        395MB
docker-router-advertiser      master.565-e616a329   1d06a5a10073        395MB
docker-platform-monitor       latest                99dee506feef        603MB
docker-platform-monitor       master.565-e616a329   99dee506feef        603MB
docker-macsec                 latest                a7fe60aca839        409MB
docker-macsec                 master.565-e616a329   a7fe60aca839        409MB
docker-lldp                   latest                717acfba87df        435MB
docker-lldp                   master.565-e616a329   717acfba87df        435MB
docker-dhcp-relay             latest                b32c3f6f5e62        402MB
docker-dhcp-relay             master.565-e616a329   b32c3f6f5e62        402MB
docker-database               latest                5700e811eebe        395MB
docker-database               master.565-e616a329   5700e811eebe        395MB
docker-sonic-mgmt-framework   latest                eda2647815b7        613MB
docker-sonic-mgmt-framework   master.565-e616a329   eda2647815b7        613MB
docker-orchagent              latest                4a57a6e350df        424MB
docker-orchagent              master.565-e616a329   4a57a6e350df        424MB
docker-nat                    latest                48503087eeed        408MB
docker-nat                    master.565-e616a329   48503087eeed        408MB
docker-sonic-telemetry        latest                281caf9c8ec9        470MB
docker-sonic-telemetry        master.565-e616a329   281caf9c8ec9        470MB
docker-fpm-frr                latest                3308944dc040        424MB
docker-fpm-frr                master.565-e616a329   3308944dc040        424MB
docker-sflow                  latest                55299e85bef7        406MB
docker-sflow                  master.565-e616a329   55299e85bef7        406MB
admin@sonic:~$ sudo docker ps
CONTAINER ID        IMAGE                                COMMAND                  CREATED              STATUS              PORTS               NAMES
fc137c7a02c8        docker-sonic-mgmt-framework:latest   "/usr/local/bin/supe…"   About a minute ago   Up 59 seconds                           mgmt-framework
0320dd6b782d        docker-sonic-telemetry:latest        "/usr/local/bin/supe…"   About a minute ago   Up 59 seconds                           telemetry
b4f74c9292c6        docker-database:latest               "/usr/local/bin/dock…"   3 minutes ago        Up 3 minutes                            database

Describe the results you expected:

All SONiC docker should start properly.

Additional information you deem important (e.g. issue happens only occasionally):

Ying Xie mentioned this issue might be the SAI problem with Celestica.

@Blueve
Copy link
Contributor Author

Blueve commented Jan 29, 2021

If anything information need collect from my side, feel free to share the command and I will update the ticket.

@Blueve
Copy link
Contributor Author

Blueve commented Feb 2, 2021

Additional information, can be reproduce on both 202012 and master image

After install the image, we can login in to the SONiC.
Run sudo reboot
And it will reboot forever, the log shown below



                                                            Loading SONiC-OS OS initial ramdisk ...
                                                                                
                                                                                
                                                                                
                                                                                
               [    5.078447] rc.local[371]: + cat /etc/sonic/sonic_version.yml
[    5.204890] rc.local[371]: + grep build_version
[    5.268152] rc.local[371]: + sed -e s/build_version: //g;s/'//g
[    5.352736] rc.local[371]: + SONIC_VERSION=202012.13-c8b3a709
[    5.428937] rc.local[371]: + FIRST_BOOT_FILE=/host/image-202012.13-c8b3a709/platform/firsttime
[    5.542338] rc.local[371]: + SONIC_CONFIG_DIR=/host/image-202012.13-c8b3a709/sonic-config
[    5.647625] rc.local[371]: + SONIC_ENV_FILE=/host/image-202012.13-c8b3a709/sonic-config/sonic-environment
[    5.773311] rc.local[371]: + [ -d /host/image-202012.13-c8b3a709/sonic-config -a -f /host/image-202012.13-c8b3a709/sonic-config/sonic-environment ]
[    5.942124] rc.local[371]: + logger SONiC version 202012.13-c8b3a709 starting up...
[    6.038652] rc.local[371]: + grub_installation_needed=
[    6.123076] rc.local[371]: + [ ! -e /host/machine.conf ]
[    6.198095] rc.local[371]: + migrate_nos_configuration
[    6.270145] rc.local[371]: + rm -rf /host/migration
[    6.330912] rc.local[371]: + mkdir -p /host/migration
[    6.773070] kdump-tools[363]: /etc/init.d/kdump-tools: 117: /etc/default/kdump-tools: KDUMP_CMDLINE_APPEND+= panic=10 debug hpet=disable pcie_port=compat pci=nommconf sonic_platform=x86_64-cel_e1031-r0: not found
[    7.010302] rc.local[371]: + cat /proc/cmdline
[    7.073050] rc.local[371]: + set -- BOOT_IMAGE=/image-202012.13-c8b3a709/boot/vmlinuz-4.19.0-9-2-amd64 root=UUID=8e8f6789-6936-46c7-bfb4-a9265046afda rw console=tty0 console=ttyS1,9600n8 quiet intel_idle.max_cstate=0 net.ifnames=0 biosdevname=0 loop=image-202012.13-c8b3a709/fs.squashfs loopfstype=squashfs apparmor=1 security=apparmor varlog_size=4096 usbcore.autosuspend=-1 module_blacklist=gpio_ich
[    7.498469] rc.local[371]: + [ -n  ]
[    7.542670] rc.local[371]: + . /host/machine.conf
[    7.602130] rc.local[371]: + onie_version=2015.05.0.0.3
[    7.675003] rc.local[371]: + onie_vendor_id=12244
[    7.742852] rc.local[371]: + onie_platform=x86_64-cel_e1031-r0
[    7.818856] rc.local[371]: + onie_machine=cel_e1031
[    7.878618] rc.local[371]: + onie_machine_rev=0
[    7.938172] rc.local[371]: + onie_arch=x86_64
[    7.998148] rc.local[371]: + onie_config_version=1
[    8.065374] rc.local[371]: + onie_build_date=2018-05-14T19:01-0400
[    8.142168] rc.local[371]: + onie_partition_type=gpt
[    8.202178] rc.local[371]: + onie_kernel_version=3.2.35
[    8.274156] rc.local[371]: + program_console_speed
[    8.368594] kdump-tools[363]: Starting kdump-tools: no crashkernel= parameter in the kernel cmdline ... failed!
[    8.494310] rc.local[371]: + grep -Eo console=ttyS[0-9]+,[0-9]+
[    8.570883] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
[    8.660583] BUG: unable to handle kernel paging request at ffffffffc0824098
[    8.744023] PGD 14c0e067 P4D 14c0e067 PUD 14c10067 PMD 755fd067 PTE 8000000078a8d063
[    8.836844] Oops: 0011 [#1] SMP PTI
[    8.878621] CPU: 0 PID: 380 Comm: platform-module Tainted: G           OE     4.19.0-9-2-amd64 #1 Debian 4.19.118-2+deb10u1
[    9.012064] Hardware name: Celestica E1031/E1031, BIOS E1031010 06/25/2018
[    9.094472] RIP: 0010:__this_module+0x58/0xffffffffffffcfc0 [pmbus_core]
[    9.174791] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 e7 e9 7a b8 8f ff ff <98> f2 82 c0 ff ff ff ff 58 31 7f c0 ff ff ff ff 18 3c 52 7b b8 8f
[    9.399904] RSP: 0000:ffffa52b4077bb38 EFLAGS: 00010282
[    9.462511] RAX: ffffffffc0824098 RBX: ffff8fb876e19000 RCX: 0000000000000000
[    9.548031] RDX: 0000000000000003 RSI: 0000000000000000 RDI: ffff8fb876e19000
[    9.633554] RBP: 0000000000000000 R08: ffff8fb87c01aa40 R09: 0000000000000000
[    9.719080] R10: 0000000000003056 R11: 0000000000000001 R12: ffff8fb875c23818
[    9.804602] R13: ffff8fb876e19000 R14: ffff8fb875c23818 R15: ffff8fb876e19020
[    9.890130] FS:  00007fabad5ac740(0000) GS:ffff8fb87c000000(0000) knlGS:0000000000000000
[    9.987115] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   10.055974] CR2: ffffffffc0824098 CR3: 00000000790f6000 CR4: 00000000001006f0
[   10.141496] Call Trace:
[   10.170779]  ? _pmbus_write_byte.constprop.18+0x30/0x50 [pmbus_core]
[   10.246938]  ? pmbus_clear_faults+0x30/0x50 [pmbus_core]
[   10.310588]  ? pmbus_do_probe+0x21e/0xda0 [pmbus_core]
[   10.372157]  ? _cond_resched+0x15/0x30
[   10.417059]  ? kmem_cache_alloc_trace+0x15e/0x1e0
[   10.473418]  ? _cond_resched+0x15/0x30
[   10.518321]  ? kernfs_activate+0x63/0x80
[   10.565305]  ? kernfs_add_one+0xe7/0x130
[   10.612291]  ? 0xffffffffc082d000
[   10.651987]  ? i2c_device_probe+0x183/0x270
[   10.702098]  ? really_probe+0x24b/0x3b0
[   10.748040]  ? __driver_attach+0x110/0x110
[   10.797110]  ? driver_probe_device+0xb3/0xf0
[   10.848261]  ? __driver_attach+0x110/0x110
[   10.897330]  ? bus_for_each_drv+0x76/0xc0
[   10.945358]  ? __device_attach+0xd9/0x150
[   10.993385]  ? bus_probe_device+0x8a/0xa0
[   11.041412]  ? device_add+0x399/0x690
[   11.085273]  ? i2c_new_device+0x15c/0x360
[   11.133300]  ? i2c_sysfs_new_device+0x109/0x2b0
[   11.187580]  ? kernfs_fop_write+0x116/0x190
[   11.237690]  ? vfs_write+0xa5/0x1a0
[   11.279465]  ? ksys_write+0x57/0xd0
[   11.321245]  ? do_syscall_64+0x53/0x110
[   11.367188]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   11.429803] Modules linked in: kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) intel_cstate(E) wdat_wdt(E) pcspkr(E) sg(E) button(E) bonding(E) pcc_cpufreq(E) acpi_cpufreq(E) linux_knet_cb(OE) linux_bcm_knet(OE) dps200(OE) pmbus_core(E) psample(OE) linux_user_bde(OE) hlx_gpio_ich(OE) linux_kernel_bde(OE) smc(OE) cp210x(E) usbserial(E) ebt_vlan(E) ebtable_broute(E) bridge(E) stp(E) llc(E) ebtable_nat(E) ebtable_filter(E) ebtables(E) nf_tables(E) nfnetlink(E) i2c_mux_pca954x(E) i2c_mux_gpio(E) i2c_smbus(E) i2c_mux(E) i2c_dev(E) i2c_isch(E) ip_tables(E) x_tables(E) autofs4(E) loop(E) ext4(E) crc16(E) mbcache(E) jbd2(E) crc32c_generic(E) fscrypto(E) ecb(E) nvme(E) nvme_core(E) nls_utf8(E) nls_cp437(E) nls_ascii(E) vfat(E) fat(E) overlay(E) squashfs(E) zstd_decompress(E)
[   12.275819]  xxhash(E) sd_mod(E) crc32c_intel(E) aesni_intel(E) aes_x86_64(E) ahci(E) crypto_simd(E) cryptd(E) glue_helper(E) libahci(E) ehci_pci(E) libata(E) i2c_i801(E) ehci_hcd(E) igb(E) scsi_mod(E) i2c_algo_bit(E) dca(E) usbcore(E) lpc_ich(E) mfd_core(E) usb_common(E) i2c_ismt(E)
[   12.577000] CR2: ffffffffc0824098
[   12.616695] ---[ end trace 77f0dbd065b2afb0 ]---
[   12.738488] RIP: 0010:__this_module+0x58/0xffffffffffffcfc0 [pmbus_core]
[   12.818812] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 e7 e9 7a b8 8f ff ff <98> f2 82 c0 ff ff ff ff 58 31 7f c0 ff ff ff ff 18 3c 52 7b b8 8f
[   13.043923] RSP: 0000:ffffa52b4077bb38 EFLAGS: 00010282
[   13.106529] RAX: ffffffffc0824098 RBX: ffff8fb876e19000 RCX: 0000000000000000
[   13.192056] RDX: 0000000000000003 RSI: 0000000000000000 RDI: ffff8fb876e19000
[   13.277583] RBP: 0000000000000000 R08: ffff8fb87c01aa40 R09: 0000000000000000
[   13.363103] R10: 0000000000003056 R11: 0000000000000001 R12: ffff8fb875c23818
[   13.448625] R13: ffff8fb876e19000 R14: ffff8fb875c23818 R15: ffff8fb876e19020
[   13.534147] FS:  00007fabad5ac740(0000) GS:ffff8fb87c000000(0000) knlGS:0000000000000000
[   13.631133] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   13.699992] CR2: ffffffffc0824098 CR3: 00000000790f6000 CR4: 00000000001006f0
[   13.785520] Kernel panic - not syncing: Fatal exception
[   13.848138] Kernel Offset: 0x24c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[   14.043782] Rebooting in 10 seconds..

@119064273
Copy link
Contributor

@Blueve , can you check your syslog when system startup, the same with below issue ?
#6459

orchagent crash observed when remove SAI_OBJECT_TYPE_PORT is not supported

@Blueve
Copy link
Contributor Author

Blueve commented Mar 18, 2021

syslog.txt

@119064273 I am not seen any error log related to SAI_OBJECT_TYPE_PORT. There are some log mentioned container starting fail (around Line 3682).

un 11 08:37:54 sonic systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
Jun 11 08:37:54 sonic systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
Jun 11 08:37:54 sonic systemd[1]: docker.service: Failed with result 'exit-code'.
Jun 11 08:37:54 sonic systemd[1]: docker.service: Failed with result 'exit-code'.
Jun 11 08:37:54 sonic systemd[1]: Failed to start Docker Application Container Engine.
Jun 11 08:37:54 sonic systemd[1]: Failed to start Docker Application Container Engine.
Jun 11 08:37:54 sonic systemd[1]: Dependency failed for database-chassis container.
Jun 11 08:37:54 sonic systemd[1]: Dependency failed for database-chassis container.
Jun 11 08:37:54 sonic systemd[1]: database-chassis.service: Job database-chassis.service/start failed with result 'dependency'.
Jun 11 08:37:54 sonic systemd[1]: database-chassis.service: Job database-chassis.service/start failed with result 'dependency'.
Jun 11 08:37:54 sonic systemd[1]: Dependency failed for Database container.
Jun 11 08:37:54 sonic systemd[1]: Dependency failed for LLDP container.
Jun 11 08:37:54 sonic systemd[1]: lldp.service: Job lldp.service/start failed with result 'dependency'.
Jun 11 08:37:54 sonic systemd[1]: Dependency failed for Database container.
Jun 11 08:37:54 sonic systemd[1]: Dependency failed for LLDP container.
Jun 11 08:37:54 sonic systemd[1]: lldp.service: Job lldp.service/start failed with result 'dependency'.
Jun 11 08:37:54 sonic systemd[1]: Dependency failed for Monitor warm recovery and disable warmboot when done.
Jun 11 08:37:54 sonic systemd[1]: Dependency failed for Monitor warm recovery and disable warmboot when done.
Jun 11 08:37:54 sonic systemd[1]: warmboot-finalizer.service: Job warmboot-finalizer.service/start failed with result 'dependency'.
Jun 11 08:37:54 sonic systemd[1]: warmboot-finalizer.service: Job warmboot-finalizer.service/start failed with result 'dependency'.
Jun 11 08:37:54 sonic systemd[1]: Dependency failed for BGP container.
Jun 11 08:37:54 sonic systemd[1]: Dependency failed for BGP container.
Jun 11 08:37:54 sonic systemd[1]: bgp.service: Job bgp.service/start failed with result 'dependency'.
Jun 11 08:37:54 sonic systemd[1]: bgp.service: Job bgp.service/start failed with result 'dependency'.
Jun 11 08:37:54 sonic systemd[1]: Dependency failed for Config initialization and migration service.
Jun 11 08:37:54 sonic systemd[1]: Dependency failed for Config initialization and migration service.
Jun 11 08:37:54 sonic systemd[1]: Dependency failed for Update minigraph and set configuration based on minigraph.
Jun 11 08:37:54 sonic systemd[1]: Dependency failed for Host config enforcer daemon.
Jun 11 08:37:54 sonic systemd[1]: hostcfgd.service: Job hostcfgd.service/start failed with result 'dependency'.
Jun 11 08:37:54 sonic systemd[1]: Dependency failed for syncd service.
Jun 11 08:37:54 sonic systemd[1]: syncd.service: Job syncd.service/start failed with result 'dependency'.
Jun 11 08:37:54 sonic systemd[1]: Dependency failed for Platform monitor container.
Jun 11 08:37:54 sonic systemd[1]: Dependency failed for Update minigraph and set configuration based on minigraph.
Jun 11 08:37:54 sonic systemd[1]: Dependency failed for Host config enforcer daemon.
Jun 11 08:37:54 sonic systemd[1]: hostcfgd.service: Job hostcfgd.service/start failed with result 'dependency'.
Jun 11 08:37:54 sonic systemd[1]: Dependency failed for syncd service.
Jun 11 08:37:54 sonic systemd[1]: syncd.service: Job syncd.service/start failed with result 'dependency'.
Jun 11 08:37:54 sonic systemd[1]: Dependency failed for Platform monitor container.

@Blueve
Copy link
Contributor Author

Blueve commented Mar 23, 2021

Celestica report that they couldn't repro this issue with newest master branch image.
Since the issue has reported near 2 months, I will test newest master and 202012 branch image to see if the issue still on going.
ETA: 3/25

@Blueve
Copy link
Contributor Author

Blueve commented Mar 24, 2021

Couldn't repro with master 616:

admin@sonic:~$ show version

SONiC Software Version: SONiC.master.616-8f83b33e
Distribution: Debian 10.8
Kernel: 4.19.0-12-2-amd64
Build commit: 8f83b33e
Build date: Sat Mar 20 11:09:26 UTC 2021
Built by: johnar@jenkins-worker-4

Platform: x86_64-cel_e1031-r0
HwSKU: Celestica-E1031-T48S4
ASIC: broadcom
ASIC Count: 1
Traceback (most recent call last):
  File "/usr/local/bin/decode-syseeprom", line 18, in <module>
    import sonic_platform
ModuleNotFoundError: No module named 'sonic_platform'
Serial Number: 
Uptime: 05:56:36 up 8 min,  1 user,  load average: 12.18, 7.76, 3.36

Docker images:
REPOSITORY                    TAG                   IMAGE ID            SIZE
docker-syncd-brcm             latest                4bedb764fd70        681MB
docker-syncd-brcm             master.616-8f83b33e   4bedb764fd70        681MB
docker-snmp                   latest                9870bfea10c6        438MB
docker-snmp                   master.616-8f83b33e   9870bfea10c6        438MB
docker-teamd                  latest                8988ce642831        408MB
docker-teamd                  master.616-8f83b33e   8988ce642831        408MB
docker-nat                    latest                fa5cabfd53a4        411MB
docker-nat                    master.616-8f83b33e   fa5cabfd53a4        411MB
docker-router-advertiser      latest                11ab5e4f5bbc        398MB
docker-router-advertiser      master.616-8f83b33e   11ab5e4f5bbc        398MB
docker-platform-monitor       latest                dc7d2cdabeb2        606MB
docker-platform-monitor       master.616-8f83b33e   dc7d2cdabeb2        606MB
docker-lldp                   latest                270c6c69d47c        438MB
docker-lldp                   master.616-8f83b33e   270c6c69d47c        438MB
docker-dhcp-relay             latest                f4e5c03e71c1        405MB
docker-dhcp-relay             master.616-8f83b33e   f4e5c03e71c1        405MB
docker-sonic-mgmt-framework   latest                25e5cb2ad875        616MB
docker-sonic-mgmt-framework   master.616-8f83b33e   25e5cb2ad875        616MB
docker-orchagent              latest                66c818b0409b        427MB
docker-orchagent              master.616-8f83b33e   66c818b0409b        427MB
docker-macsec                 latest                ac5b0e4e1bc2        412MB
docker-macsec                 master.616-8f83b33e   ac5b0e4e1bc2        412MB
docker-sonic-telemetry        latest                c64fffe86092        487MB
docker-sonic-telemetry        master.616-8f83b33e   c64fffe86092        487MB
docker-fpm-frr                latest                41f14f5d42fb        427MB
docker-fpm-frr                master.616-8f83b33e   41f14f5d42fb        427MB
docker-sflow                  latest                4e953e0d66a8        409MB
docker-sflow                  master.616-8f83b33e   4e953e0d66a8        409MB
docker-database               latest                b9f3ba43113e        398MB
docker-database               master.616-8f83b33e   b9f3ba43113e        398MB

admin@sonic:~$ sudo docker ps
CONTAINER ID        IMAGE                                COMMAND                  CREATED             STATUS              PORTS               NAMES
8a216109d6f7        docker-teamd:latest                  "/usr/local/bin/supe…"   3 minutes ago       Up 29 seconds                           teamd
5717c0d4f6a9        docker-dhcp-relay:latest             "/usr/bin/docker_ini…"   4 minutes ago       Up 34 seconds                           dhcp_relay
3e6ab15b5a7b        docker-router-advertiser:latest      "/usr/bin/docker-ini…"   4 minutes ago       Up 38 seconds                           radv
fb4e633f8f21        docker-syncd-brcm:latest             "/usr/local/bin/supe…"   4 minutes ago       Up 45 seconds                           syncd
c594e1a37093        docker-snmp:latest                   "/usr/local/bin/supe…"   4 minutes ago       Up 48 seconds                           snmp
2becf1ccc4ee        docker-sonic-mgmt-framework:latest   "/usr/local/bin/supe…"   4 minutes ago       Up 4 minutes                            mgmt-framework
8d70e6f79802        docker-sonic-telemetry:latest        "/usr/local/bin/supe…"   4 minutes ago       Up 4 minutes                            telemetry
ea95e8496c07        docker-database:latest               "/usr/local/bin/dock…"   6 minutes ago       Up 6 minutes                            database

@Blueve
Copy link
Contributor Author

Blueve commented Mar 24, 2021

Couldn't repro with 202012-57

admin@sonic:~$ show version

SONiC Software Version: SONiC.202012.57-50e4cc15
Distribution: Debian 10.8
Kernel: 4.19.0-12-2-amd64
Build commit: 50e4cc15
Build date: Sat Mar 20 05:40:14 UTC 2021
Built by: johnar@jenkins-worker-1

Platform: x86_64-cel_e1031-r0
HwSKU: Celestica-E1031-T48S4
ASIC: broadcom
ASIC Count: 1
/usr/local/bin/decode-syseeprom : ERROR : Failed to read eeprom : [Errno 2] No such file or directory: '/sys/class/i2c-adapter/i2c-2/2-0050/eeprom'
Serial Number: 
Uptime: 06:26:49 up 8 min,  1 user,  load average: 12.64, 8.07, 3.49

Docker images:
REPOSITORY                    TAG                  IMAGE ID            SIZE
docker-syncd-brcm             202012.57-50e4cc15   2c9295634b46        681MB
docker-syncd-brcm             latest               2c9295634b46        681MB
docker-teamd                  202012.57-50e4cc15   0c73aa21a1ea        408MB
docker-teamd                  latest               0c73aa21a1ea        408MB
docker-nat                    202012.57-50e4cc15   5799e5e2e37f        411MB
docker-nat                    latest               5799e5e2e37f        411MB
docker-router-advertiser      202012.57-50e4cc15   259ffbf7a470        398MB
docker-router-advertiser      latest               259ffbf7a470        398MB
docker-platform-monitor       202012.57-50e4cc15   adf101944e1c        606MB
docker-platform-monitor       latest               adf101944e1c        606MB
docker-lldp                   202012.57-50e4cc15   3416df6bb1b8        438MB
docker-lldp                   latest               3416df6bb1b8        438MB
docker-database               202012.57-50e4cc15   4843355eeb7b        398MB
docker-database               latest               4843355eeb7b        398MB
docker-orchagent              202012.57-50e4cc15   86ffd5dcd651        427MB
docker-orchagent              latest               86ffd5dcd651        427MB
docker-snmp                   202012.57-50e4cc15   9915d4125951        439MB
docker-snmp                   latest               9915d4125951        439MB
docker-sonic-telemetry        202012.57-50e4cc15   28466649c823        487MB
docker-sonic-telemetry        latest               28466649c823        487MB
docker-dhcp-relay             202012.57-50e4cc15   e35af94ffaba        405MB
docker-dhcp-relay             latest               e35af94ffaba        405MB
docker-sonic-mgmt-framework   202012.57-50e4cc15   93eaf6dd3822        617MB
docker-sonic-mgmt-framework   latest               93eaf6dd3822        617MB
docker-fpm-frr                202012.57-50e4cc15   41835a8a646d        426MB
docker-fpm-frr                latest               41835a8a646d        426MB
docker-sflow                  202012.57-50e4cc15   092adff2fd5c        409MB
docker-sflow                  latest               092adff2fd5c        409MB

admin@sonic:~$ sudo docker ps
CONTAINER ID        IMAGE                                COMMAND                  CREATED             STATUS              PORTS               NAMES
fd94e83cb21d        docker-teamd:latest                  "/usr/local/bin/supe…"   4 minutes ago       Up About a minute                       teamd
5401c52cbfb9        docker-dhcp-relay:latest             "/usr/bin/docker_ini…"   5 minutes ago       Up About a minute                       dhcp_relay
73af7915ec63        docker-router-advertiser:latest      "/usr/bin/docker-ini…"   5 minutes ago       Up About a minute                       radv
f497907df96b        docker-syncd-brcm:latest             "/usr/local/bin/supe…"   5 minutes ago       Up About a minute                       syncd
86c12f8f5082        docker-snmp:latest                   "/usr/local/bin/supe…"   5 minutes ago       Up About a minute                       snmp
f8ef676c1a7e        docker-sonic-mgmt-framework:latest   "/usr/local/bin/supe…"   5 minutes ago       Up 5 minutes                            mgmt-framework
23cb2fc64758        docker-sonic-telemetry:latest        "/usr/local/bin/supe…"   5 minutes ago       Up 5 minutes                            telemetry
52e7e4aa8ec1        docker-database:latest               "/usr/local/bin/dock…"   7 minutes ago       Up 7 minutes                            database

@Blueve
Copy link
Contributor Author

Blueve commented Mar 24, 2021

The issue not observed, for the additional issue(bootloop), i opened a new issue: #7133

@Blueve Blueve closed this as completed Mar 24, 2021
@119064273
Copy link
Contributor

@Blueve I see the docker of "swss" not up ? is it right ?

@Blueve
Copy link
Contributor Author

Blueve commented Mar 25, 2021

@Blueve I see the docker of "swss" not up ? is it right ?

Most dockers are failed with same reason
Jun 11 08:37:54 sonic systemd[1]: syncd.service: Job syncd.service/start failed with result 'dependency'.

@Blueve Blueve reopened this Mar 31, 2021
@Blueve
Copy link
Contributor Author

Blueve commented Mar 31, 2021

@119064273
swss and syncd will crash in 2 mins
The syslog shows the same log with issue: #6459

yxieca pushed a commit to sonic-net/sonic-linux-kernel that referenced this issue May 6, 2021
Fix sonic-net/sonic-buildimage#6602

This change is add new dps200 PSU module driver for fix this issue sonic-net/sonic-buildimage#6602

I have try to use the generic pmbus driver, but it not support.

[ 5733.051510] pmbus 12-005a: Chip identification failed
[ 5733.112598] i2c i2c-12: new_device: Instantiated device pmbus at 0x5a
[ 5748.459851] pmbus 13-005b: Chip identification failed
[ 5748.520975] i2c i2c-13: new_device: Instantiated device pmbus at 0x5b
yxieca pushed a commit that referenced this issue May 9, 2021
Why I did it
Fix issues below.
#7133
#6602

So, remove the dps200 driver from the platform-specific driver.
Then, add the dps200 module driver to the Linux kernel tree.

How I did it
Remove the dps200 driver from the platform-specific driver and add the dps200 module driver to the Linux kernel.

How to verify it
Build an image with sonic-net/sonic-linux-kernel#207
Then, install to the Haliburton.
qiluo-msft pushed a commit to sonic-net/sonic-linux-kernel that referenced this issue May 18, 2021
Fix sonic-net/sonic-buildimage#6602

This change is add new dps200 PSU module driver for fix this issue sonic-net/sonic-buildimage#6602

I have try to use the generic pmbus driver, but it not support.

[ 5733.051510] pmbus 12-005a: Chip identification failed
[ 5733.112598] i2c i2c-12: new_device: Instantiated device pmbus at 0x5a
[ 5748.459851] pmbus 13-005b: Chip identification failed
[ 5748.520975] i2c i2c-13: new_device: Instantiated device pmbus at 0x5b
raphaelt-nvidia pushed a commit to raphaelt-nvidia/sonic-buildimage that referenced this issue May 23, 2021
…-net#7247)

Why I did it
Fix issues below.
sonic-net#7133
sonic-net#6602

So, remove the dps200 driver from the platform-specific driver.
Then, add the dps200 module driver to the Linux kernel tree.

How I did it
Remove the dps200 driver from the platform-specific driver and add the dps200 module driver to the Linux kernel.

How to verify it
Build an image with sonic-net/sonic-linux-kernel#207
Then, install to the Haliburton.
qiluo-msft pushed a commit that referenced this issue May 24, 2021
Why I did it
Fix issues below.
#7133
#6602

So, remove the dps200 driver from the platform-specific driver.
Then, add the dps200 module driver to the Linux kernel tree.

How I did it
Remove the dps200 driver from the platform-specific driver and add the dps200 module driver to the Linux kernel.

How to verify it
Build an image with sonic-net/sonic-linux-kernel#207
Then, install to the Haliburton.
carl-nokia pushed a commit to carl-nokia/sonic-buildimage that referenced this issue Aug 7, 2021
…-net#7247)

Why I did it
Fix issues below.
sonic-net#7133
sonic-net#6602

So, remove the dps200 driver from the platform-specific driver.
Then, add the dps200 module driver to the Linux kernel tree.

How I did it
Remove the dps200 driver from the platform-specific driver and add the dps200 module driver to the Linux kernel.

How to verify it
Build an image with sonic-net/sonic-linux-kernel#207
Then, install to the Haliburton.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants