Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[master][redis][database-chassis] swss/syncd crashes on the Supervisor card after it is rebooting up #18667

Closed
mlok-nokia opened this issue Apr 12, 2024 · 3 comments · Fixed by #18979
Assignees
Labels
Triaged this issue has been triaged

Comments

@mlok-nokia
Copy link
Contributor

mlok-nokia commented Apr 12, 2024

Description

On Master branch, "CHASSIS_STATE_DB" and "CHASSIS_APP_DB" in database-chassis container are restored with previous data after Supervisor is reboot. This causes swss/syncd crashed due to the "CHASSIS_FABRIC_ASIC_TABLE" present while Fabric hardware initialization is not done yet.
Master branch uses redis-server version "redis-cli 7.0.15" while branch 2022205 branch uses version "redis-cli 6.0.16". The RDB Snapshots is enabled by default in version 7.0.15 while it is disabled in version 6.0.16. The Snapshots data dump.rdb will be played back when system is rebooting up. Because of "CHASSIS_STATE_DB" and "CHASSIS_APP_DB" are not be cleaned up when Supervisor is booting up. The "CHASSIS_FABRIC_ASIC_TABLE" are present earlier before the hardware is ready for config.

Steps to reproduce the issue:

  1. Reboot the supervisor, using "docker ps" to check, swss/syncd containers are up even before the PMON started.
  2. Using "docker ps" to check, CHASSIS_FABRIC_ASIC_TABLE are present even the PMON is not started
admin@ixre-cpm-chassis7:~$ docker ps 
CONTAINER ID   IMAGE                             COMMAND                  CREATED       STATUS          PORTS     NAMES
87d74f86522a   docker-syncd-brcm-dnx:latest      "/usr/local/bin/supeâ¦"   9 hours ago   Up 19 seconds             syncd12
03d185ebb370   docker-syncd-brcm-dnx:latest      "/usr/local/bin/supeâ¦"   9 hours ago   Up 18 seconds             syncd13
ae4a42efe869   docker-orchagent:latest           "/usr/bin/docker-iniâ¦"   9 hours ago   Up 19 seconds             swss13
3466ebd78d6a   docker-orchagent:latest           "/usr/bin/docker-iniâ¦"   9 hours ago   Up 19 seconds             swss12
47ed47b16efb   docker-router-advertiser:latest   "/usr/bin/docker-iniâ¦"   9 hours ago   Up 20 seconds             radv
820675b273bd   docker-eventd:latest              "/usr/local/bin/supeâ¦"   9 hours ago   Up 22 seconds             eventd
2f625f8b19d8   docker-database:latest            "/usr/local/bin/dockâ¦"   9 hours ago   Up 29 seconds             database9
32d65bd95493   docker-database:latest            "/usr/local/bin/dockâ¦"   9 hours ago   Up 30 seconds             database7
ae39e63df6fd   docker-database:latest            "/usr/local/bin/dockâ¦"   9 hours ago   Up 29 seconds             database8
0db4a53a520f   docker-database:latest            "/usr/local/bin/dockâ¦"   9 hours ago   Up 29 seconds             database6
d8e685e51780   docker-database:latest            "/usr/local/bin/dockâ¦"   9 hours ago   Up 29 seconds             database5
c3ce45e43171   docker-database:latest            "/usr/local/bin/dockâ¦"   9 hours ago   Up 28 seconds             database3
5d84e96d9760   docker-database:latest            "/usr/local/bin/dockâ¦"   9 hours ago   Up 29 seconds             database4
0082f60a1f92   docker-database:latest            "/usr/local/bin/dockâ¦"   9 hours ago   Up 29 seconds             database15
ac4aeb5cd805   docker-database:latest            "/usr/local/bin/dockâ¦"   9 hours ago   Up 28 seconds             database2
c383f3071d95   docker-database:latest            "/usr/local/bin/dockâ¦"   9 hours ago   Up 30 seconds             database14
0158f4ae15eb   docker-database:latest            "/usr/local/bin/dockâ¦"   9 hours ago   Up 30 seconds             database13
df0b3fdff2ec   docker-database:latest            "/usr/local/bin/dockâ¦"   9 hours ago   Up 30 seconds             database10
d6fe80a3debe   docker-database:latest            "/usr/local/bin/dockâ¦"   9 hours ago   Up 30 seconds             database1
e978ca015cc9   docker-database:latest            "/usr/local/bin/dockâ¦"   9 hours ago   Up 30 seconds             database12
9f41298987f6   docker-database:latest            "/usr/local/bin/dockâ¦"   9 hours ago   Up 30 seconds             database11
57a4f3131d1d   docker-database:latest            "/usr/local/bin/dockâ¦"   9 hours ago   Up 30 seconds             database0
56e3fa5c3453   docker-database:latest            "/usr/local/bin/dockâ¦"   9 hours ago   Up 35 seconds             database
ffd252ada650   docker-database:latest            "/usr/local/bin/dockâ¦"   9 hours ago   Up 39 seconds             database-chassis
  1. Also, insert a test entry in CHASSIS_STATE_DB. Then reboot the supervisor card
  2. using sonic-db-cli check, the test entry is not removed
admin@ixre-cpm-chassis7:~$ sonic-db-cli CHASSIS_STATE_DB hset "CHASSIS_FABRIC_ASIC_TABLE|testEntry" "admin" "test"
1
admin@ixre-cpm-chassis7:~$ sonic-db-cli CHASSIS_STATE_DB keys "CHASSIS_FABRIC_ASIC_TABLE*" 
CHASSIS_FABRIC_ASIC_TABLE|asic13
CHASSIS_FABRIC_ASIC_TABLE|asic12
CHASSIS_FABRIC_ASIC_TABLE|testEnrty
admin@ixre-cpm-chassis7:~$ sudo reboot
...
...
admin@ixre-cpm-chassis7:~$ sonic-db-cli CHASSIS_STATE_DB keys "CHASSIS_FABRIC_ASIC_TABLE*" 
CHASSIS_FABRIC_ASIC_TABLE|asic13
CHASSIS_FABRIC_ASIC_TABLE|asic12
CHASSIS_FABRIC_ASIC_TABLE|testEnrty

Describe the results you received:

After reboot, the swss/syncd on SUP are up before the PMON is started.

admin@ixre-cpm-chassis7:~$ docker ps 
CONTAINER ID   IMAGE                             COMMAND                  CREATED       STATUS          PORTS     NAMES
87d74f86522a   docker-syncd-brcm-dnx:latest      "/usr/local/bin/supeâ¦"   9 hours ago   Up 19 seconds             syncd12
03d185ebb370   docker-syncd-brcm-dnx:latest      "/usr/local/bin/supeâ¦"   9 hours ago   Up 18 seconds             syncd13
ae4a42efe869   docker-orchagent:latest           "/usr/bin/docker-iniâ¦"   9 hours ago   Up 19 seconds             swss13
3466ebd78d6a   docker-orchagent:latest           "/usr/bin/docker-iniâ¦"   9 hours ago   Up 19 seconds             swss12
47ed47b16efb   docker-router-advertiser:latest   "/usr/bin/docker-iniâ¦"   9 hours ago   Up 20 seconds             radv
820675b273bd   docker-eventd:latest              "/usr/local/bin/supeâ¦"   9 hours ago   Up 22 seconds             eventd
2f625f8b19d8   docker-database:latest            "/usr/local/bin/dockâ¦"   9 hours ago   Up 29 seconds             database9
32d65bd95493   docker-database:latest            "/usr/local/bin/dockâ¦"   9 hours ago   Up 30 seconds             database7
ae39e63df6fd   docker-database:latest            "/usr/local/bin/dockâ¦"   9 hours ago   Up 29 seconds             database8
0db4a53a520f   docker-database:latest            "/usr/local/bin/dockâ¦"   9 hours ago   Up 29 seconds             database6
d8e685e51780   docker-database:latest            "/usr/local/bin/dockâ¦"   9 hours ago   Up 29 seconds             database5
c3ce45e43171   docker-database:latest            "/usr/local/bin/dockâ¦"   9 hours ago   Up 28 seconds             database3
5d84e96d9760   docker-database:latest            "/usr/local/bin/dockâ¦"   9 hours ago   Up 29 seconds             database4
0082f60a1f92   docker-database:latest            "/usr/local/bin/dockâ¦"   9 hours ago   Up 29 seconds             database15
ac4aeb5cd805   docker-database:latest            "/usr/local/bin/dockâ¦"   9 hours ago   Up 28 seconds             database2
c383f3071d95   docker-database:latest            "/usr/local/bin/dockâ¦"   9 hours ago   Up 30 seconds             database14
0158f4ae15eb   docker-database:latest            "/usr/local/bin/dockâ¦"   9 hours ago   Up 30 seconds             database13
df0b3fdff2ec   docker-database:latest            "/usr/local/bin/dockâ¦"   9 hours ago   Up 30 seconds             database10
d6fe80a3debe   docker-database:latest            "/usr/local/bin/dockâ¦"   9 hours ago   Up 30 seconds             database1
e978ca015cc9   docker-database:latest            "/usr/local/bin/dockâ¦"   9 hours ago   Up 30 seconds             database12
9f41298987f6   docker-database:latest            "/usr/local/bin/dockâ¦"   9 hours ago   Up 30 seconds             database11
57a4f3131d1d   docker-database:latest            "/usr/local/bin/dockâ¦"   9 hours ago   Up 30 seconds             database0
56e3fa5c3453   docker-database:latest            "/usr/local/bin/dockâ¦"   9 hours ago   Up 35 seconds             database
ffd252ada650   docker-database:latest            "/usr/local/bin/dockâ¦"   9 hours ago   Up 39 seconds             database-chassis

Describe the results you expected:

The swss/syncd should be up after the PMON is started to create CHASSIS_FABRIC_ASIC_TABLE for the corresponding Fabirc module

dmin@ixre-cpm-chassis15:~$ docker ps
CONTAINER ID   IMAGE                                COMMAND                  CREATED          STATUS          PORTS     NAMES
26121ebc87ff   docker-orchagent:latest              "/usr/bin/docker-ini?"   43 minutes ago   Up 1 second               swss7
58fa959c2cf2   docker-syncd-brcm-dnx:latest         "/usr/local/bin/supe?"   43 minutes ago   Up 1 second               syncd7
5054903349fd   docker-syncd-brcm-dnx:latest         "/usr/local/bin/supe?"   43 minutes ago   Up 1 second               syncd6
d90833ef53f5   docker-orchagent:latest              "/usr/bin/docker-ini?"   43 minutes ago   Up 1 second               swss6
3df87c6c87b3   docker-snmp:latest                   "/usr/local/bin/supe?"   43 minutes ago   Up 3 seconds              snmp
75d40456322c   docker-platform-monitor:latest       "/usr/bin/docker_ini?"   44 minutes ago   Up 26 seconds             pmon
186c8bb9897a   docker-sonic-mgmt-framework:latest   "/usr/local/bin/supe?"   44 minutes ago   Up 29 seconds             mgmt-framework
0fdadcd2e567   docker-lldp:latest                   "/usr/bin/docker-lld?"   44 minutes ago   Up 30 seconds             lldp
c64ec50d9daa   docker-sonic-gnmi:latest             "/usr/local/bin/supe?"   44 minutes ago   Up 32 seconds             gnmi
ca8494f1997a   docker-router-advertiser:latest      "/usr/bin/docker-ini?"   49 minutes ago   Up 3 minutes              radv
d063b6d186ef   docker-eventd:latest                 "/usr/local/bin/supe?"   49 minutes ago   Up 3 minutes              eventd
19fee59b6087   docker-database:latest               "/usr/local/bin/dock?"   50 minutes ago   Up 3 minutes              database7
e0b827492ca0   docker-database:latest               "/usr/local/bin/dock?"   50 minutes ago   Up 3 minutes              database9
777745d394a5   docker-database:latest               "/usr/local/bin/dock?"   50 minutes ago   Up 3 minutes              database8
3ff5184df3b4   docker-database:latest               "/usr/local/bin/dock?"   50 minutes ago   Up 3 minutes              database6
af5cbef28277   docker-database:latest               "/usr/local/bin/dock?"   50 minutes ago   Up 3 minutes              database4
8dc975ef797f   docker-database:latest               "/usr/local/bin/dock?"   50 minutes ago   Up 3 minutes              database5
252e660951da   docker-database:latest               "/usr/local/bin/dock?"   50 minutes ago   Up 3 minutes              database14
e18292ec6e48   docker-database:latest               "/usr/local/bin/dock?"   50 minutes ago   Up 3 minutes              database2
b884f776ab8b   docker-database:latest               "/usr/local/bin/dock?"   50 minutes ago   Up 3 minutes              database3
b13f2aa1b751   docker-database:latest               "/usr/local/bin/dock?"   50 minutes ago   Up 3 minutes              database15
79b2cf1ef916   docker-database:latest               "/usr/local/bin/dock?"   50 minutes ago   Up 3 minutes              database13
9ca623e3d905   docker-database:latest               "/usr/local/bin/dock?"   50 minutes ago   Up 3 minutes              database12
69b27b6d9aa6   docker-database:latest               "/usr/local/bin/dock?"   50 minutes ago   Up 3 minutes              database11
21059794e5b2   docker-database:latest               "/usr/local/bin/dock?"   50 minutes ago   Up 3 minutes              database0
ce7fe5c67bf2   docker-database:latest               "/usr/local/bin/dock?"   50 minutes ago   Up 3 minutes              database1
eb1eab20c37f   docker-database:latest               "/usr/local/bin/dock?"   50 minutes ago   Up 3 minutes              database10
37d5a17b45eb   docker-database:latest               "/usr/local/bin/dock?"   50 minutes ago   Up 4 minutes              database
993bca3faf06   docker-database:latest               "/usr/local/bin/dock?"   50 minutes ago   Up 4 minutes              database-chassis

Output of show version:

The latest Master image

(paste your output here)

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

@mlok-nokia mlok-nokia changed the title [master][redis][database-chassis] database-chassis restored with previous data after Supervisor is reboot. [master][redis][database-chassis] swss/syncd crashes on the Supervisor card after it is reboot. Apr 12, 2024
@mlok-nokia mlok-nokia changed the title [master][redis][database-chassis] swss/syncd crashes on the Supervisor card after it is reboot. [master][redis][database-chassis] swss/syncd crashes on the Supervisor card after it is rebooting up Apr 12, 2024
@rlhui rlhui added the Triaged this issue has been triaged label Apr 17, 2024
@rlhui
Copy link
Contributor

rlhui commented Apr 17, 2024

@qiluo-msft - please check this one, is "Snapshots is enabled by default " in master intended/desirable? Thanks.

@abdosi
Copy link
Contributor

abdosi commented Apr 22, 2024

@anamehra @bmridul for viz.

@arlakshm
Copy link
Contributor

arlakshm commented May 8, 2024

@qiluo-msft - please check this one, is "Snapshots is enabled by default " in master intended/desirable? Thanks.

@qiluo-msft, can you please take a look at this issue.

@abdosi abdosi linked a pull request May 16, 2024 that will close this issue
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Triaged this issue has been triaged
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants